date:20170324

Jenkins build is back to normal : beam_PostCommit_Python_Verify #1618

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-773) Implement Metrics support for Flink runner

2017-03-24 Thread Jingsong Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941542#comment-15941542
 ] 

Jingsong Lee commented on BEAM-773:
---

Best wishes for your vacation!  Do you think Metrics is necessary to be 
fault-tolerant?

> Implement Metrics support for Flink runner
> --
>
> Key: BEAM-773
> URL: https://issues.apache.org/jira/browse/BEAM-773
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ben Chambers
>Assignee: Jingsong Lee
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2637

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Resolved] (BEAM-1182) Direct runner should enforce encodability of unbounded source checkpoints

2017-03-24 Thread Thomas Groh (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1182.
---
   Resolution: Fixed
Fix Version/s: First stable release

> Direct runner should enforce encodability of unbounded source checkpoints
> -
>
> Key: BEAM-1182
> URL: https://issues.apache.org/jira/browse/BEAM-1182
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Eugene Kirpichov
>Assignee: Thomas Groh
> Fix For: First stable release
>
>
> As far as I can tell, 
> https://github.com/apache/incubator-beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/UnboundedReadEvaluatorFactory.java
>  currently uses the reader's getCheckpoint() only as an in-memory object, 
> i.e. it's not exercising that 1) the checkpoint can be encoded at all, and 2) 
> that the reader can be resumed from an encoded/decoded checkpoint.
> I've seen cases in code reviews where people implemented a non-serializable 
> checkpoint, and direct runner tests didn't catch that because of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1807) IO ITs: shared language neutral directory for kubernetes resources

2017-03-24 Thread Stephen Sisk (JIRA)

Stephen Sisk created BEAM-1807:
--

 Summary: IO ITs: shared language neutral directory for kubernetes 
resources
 Key: BEAM-1807
 URL: https://issues.apache.org/jira/browse/BEAM-1807
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Stephen Sisk
Assignee: Davor Bonaci


This is a follow-up to BEAM-1644. As was discussed there: 

"
It is the case that different IOs will be created that connect to the same data 
stores - HadoopInputFormat in particular uses ES and cassandra, which are also 
used in their respective IOs as well. Jdbc is likely to have the same type of 
overlap.

It would be nice to share [...] kubernetes/docker scripts so that we don't need 
to repeat them in each module. 
"

For BEAM-1644, we created a directory for java io-common resources - that's 
perfect for the java pipeline options we needed. However, we shouldn't put 
kubernetes resources in the newly created sdks/java/io/common because that'd 
indicate that the scripts are java specific. 

It's also worth noting that we have this problem already for jenkins and 
travis, and solved it by creating .jenkins and .travis directories at the 
top-level.

Proposal
===
move .jenkins and .travis into a new top level ".test-infra" folder, and put a 
kubernetes directory there.

So the new structure would look like:

.test-infra
  jenkins
  travis
  kubernetes
sdks
runners
examples
...

I don't know if travis/jenkins must look in .travis/.jenkins directories or if 
those are things that we can change. If both do, would lessen my excitement, 
but if at least one other thing can share that directory, that would make it 
worthwhile in my mind.

Alternate proposal
===
add a top-level .kubernetes directory alongside .jenkins/.travis. 

I'm not a huge fan of this since I'd love to not add more top level clutter.


Alternate proposal
===
We could create:
sdks/common/test-infra/kubernetes
and put the scripts there. 

I don't like this option as much because it's kind of a just a random directory 
and is disconnected from the rest of the test infrastructure scripts that we 
use. I also prefer the other option since it reduces the amount of top-level 
clutter.

Thoughts?

cc [~jasonkuster] [~davor] [~iemejia]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Build failed in Jenkins: beam_PostCommit_Python_Verify #1617

2017-03-24 Thread Apache Jenkins Server

See 


Changes:

[tgroh] Fix EmptyFlattenAsCreateFactory

[tgroh] Explicitly duplicate input PCollections in Dataflow

[tgroh] Reenable Flatten Tests in the Dataflow Runner

--
[...truncated 459.37 KB...]
test_getitem_duplicates_ignored 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_getitem_must_be_valid_type_param 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_getitem_must_be_valid_type_param_cant_be_object_instance 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_getitem_nested_unions_flattened 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_nested_compatibility 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_compatibility 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_enforcement_composite_type_in_union 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_enforcement_not_part_of_union 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_enforcement_part_of_union 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_repr (apache_beam.typehints.typehints_test.UnionHintTestCase) 
... ok
test_deprecated_with_since_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_deprecated_without_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_deprecated_without_since_should_fail 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_experimental_with_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_experimental_without_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_frequency (apache_beam.utils.annotations_test.AnnotationTests)
Tests that the filter 'once' is sufficient to print once per ... ok
test_gcs_path (apache_beam.utils.path_test.Path) ... ok
test_unix_path (apache_beam.utils.path_test.Path) ... ok
test_windows_path (apache_beam.utils.path_test.Path) ... ok
test_dataflow_job_file 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_display_data (apache_beam.utils.pipeline_options_test.PipelineOptionsTest) 
... ok
test_experiments (apache_beam.utils.pipeline_options_test.PipelineOptionsTest) 
... ok
test_extra_package 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_from_dictionary 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_get_all_options 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_option_with_space 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_override_options 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_redefine_options 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_template_location 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_dataflow_job_file_and_template_location_mutually_exclusive 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_gcs_path (apache_beam.utils.pipeline_options_validator_test.SetupTest) ... 
ok
test_is_service_runner 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_job_name (apache_beam.utils.pipeline_options_validator_test.SetupTest) ... 
ok
test_local_runner (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_missing_required_options 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_num_workers (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_project (apache_beam.utils.pipeline_options_validator_test.SetupTest) ... 
ok
test_streaming (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_test_matcher (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_validate_dataflow_job_file 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_validate_template_location 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_method_forwarding_not_windows (apache_beam.utils.processes_test.Exec) ... 
ok
test_method_forwarding_windows (apache_beam.utils.processes_test.Exec) ... ok
test_call_two_objects (apache_beam.utils.retry_test.RetryStateTest) ... ok
test_single_failure (apache_beam.utils.retry_test.RetryStateTest) ... ok
test_two_failures (apache_beam.utils.retry_test.RetryStateTest) ... ok
test_log_calls_for_permanent_failure (apache_beam.utils.retry_test.RetryTest) 
... ok
test_log_calls_for_transient_failure (apache_beam.utils.retry_test.RetryTest) 
... ok
test_with_default_number_of_retries (apache_beam.utils.retry_test.RetryTest) 
... ok
test_with_explicit_decorator (apache_beam.utils.retry_test.RetryTest) ... ok
test_with_explicit_initial_delay (apache_beam.utils.retry_test.RetryTest) ... ok

[jira] [Commented] (BEAM-1655) Evaluate the PubsubUnboundedSource

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941408#comment-15941408
 ] 

ASF GitHub Bot commented on BEAM-1655:
--

Github user tgroh closed the pull request at:

https://github.com/apache/beam/pull/2321


> Evaluate the PubsubUnboundedSource
> --
>
> Key: BEAM-1655
> URL: https://issues.apache.org/jira/browse/BEAM-1655
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Assignee: Thomas Groh
>
> This source includes a lot of assumptions & assertions that may cause 
> problems on runners that implement UnboundedSources differently. For example:
> 1. It requires that finalizeCheckpoint is called at most once.
> 2. It requires that the checkpoint be finalized within the pubsub timeout, or 
> the messages will be redelivered.
> ... (possibly other assumptions) ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-632) Dataflow runner does not correctly flatten duplicate inputs

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941407#comment-15941407
 ] 

ASF GitHub Bot commented on BEAM-632:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2325


> Dataflow runner does not correctly flatten duplicate inputs
> ---
>
> Key: BEAM-632
> URL: https://issues.apache.org/jira/browse/BEAM-632
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
>Priority: Critical
> Fix For: First stable release
>
>
> https://github.com/apache/incubator-beam/pull/960
> Builds #1148+ are failing the new test that [~tgroh] added in that PR.
> https://builds.apache.org/job/beam_PostCommit_RunnableOnService_GoogleCloudDataflow/changes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2321: [BEAM-1655] Enable nackAll on all PubsubCheckpoints

2017-03-24 Thread tgroh

Github user tgroh closed the pull request at:

https://github.com/apache/beam/pull/2321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #2325: [BEAM-632] Fix Flatten PCollections in Dataflow

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2325


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/4] beam git commit: Explicitly duplicate input PCollections in Dataflow

2017-03-24 Thread tgroh

Explicitly duplicate input PCollections in Dataflow


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/54a7de13
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/54a7de13
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/54a7de13

Branch: refs/heads/master
Commit: 54a7de13dc36cf3894cb1d34f4e3869a550e801a
Parents: 0f91068
Author: Thomas Groh 
Authored: Fri Mar 24 15:14:27 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 17:46:53 2017 -0700

--
 .../DeduplicatedFlattenFactory.java | 109 ++
 .../core/construction/PTransformMatchers.java   |  26 +
 .../DeduplicatedFlattenFactoryTest.java | 112 +++
 .../construction/PTransformMatchersTest.java|  67 +++
 .../beam/runners/dataflow/DataflowRunner.java   |   5 +-
 5 files changed, 318 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/54a7de13/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactory.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactory.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactory.java
new file mode 100644
index 000..093385e
--- /dev/null
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DeduplicatedFlattenFactory.java
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.runners.PTransformOverrideFactory;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Flatten;
+import org.apache.beam.sdk.transforms.Flatten.PCollections;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.beam.sdk.values.TaggedPValue;
+
+/**
+ * A {@link PTransformOverrideFactory} that will apply a flatten where no 
element appears in the
+ * input {@link PCollectionList} more than once.
+ */
+public class DeduplicatedFlattenFactory
+implements PTransformOverrideFactory<
+PCollectionList, PCollection, Flatten.PCollections> {
+
+  public static  DeduplicatedFlattenFactory create() {
+return new DeduplicatedFlattenFactory<>();
+  }
+
+  private DeduplicatedFlattenFactory() {}
+  @Override
+  public PTransform 
getReplacementTransform(
+  PCollections transform) {
+return new PTransform() {
+  @Override
+  public PCollection expand(PCollectionList input) {
+Map instances = new HashMap<>();
+for (PCollection pCollection : input.getAll()) {
+  int existing = instances.get(pCollection) == null ? 0 : 
instances.get(pCollection);
+  instances.put(pCollection, existing + 1);
+}
+PCollectionList output = PCollectionList.empty(input.getPipeline());
+for (Map.Entry instanceEntry : 
instances.entrySet()) {
+  if (instanceEntry.getValue().equals(1)) {
+output = output.and(instanceEntry.getKey());
+  } else {
+String duplicationName = String.format("Multiply %s", 
instanceEntry.getKey().getName());
+PCollection duplicated =
+instanceEntry
+.getKey()
+.apply(duplicationName, ParDo.of(new 
DuplicateFn(instanceEntry.getValue(;
+output = output.and(duplicated);
+  }
+}
+return

[4/4] beam git commit: Fix EmptyFlattenAsCreateFactory

2017-03-24 Thread tgroh

Fix EmptyFlattenAsCreateFactory

The input types of Flatten and Create don't match up, so a composite
must be provided instead of Create directly.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0f910688
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0f910688
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0f910688

Branch: refs/heads/master
Commit: 0f910688cb1d6dcf1bc701fbc4e4124cef190f10
Parents: 2723d38
Author: Thomas Groh 
Authored: Fri Mar 24 15:38:20 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 17:46:53 2017 -0700

--
 .../EmptyFlattenAsCreateFactory.java| 15 ++-
 .../EmptyFlattenAsCreateFactoryTest.java| 96 
 2 files changed, 109 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/0f910688/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactory.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactory.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactory.java
index 0168039..4328cf3 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactory.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactory.java
@@ -52,14 +52,17 @@ public class EmptyFlattenAsCreateFactory
   @Override
   public PTransform 
getReplacementTransform(
   Flatten.PCollections transform) {
-return (PTransform) Create.empty(VoidCoder.of());
+return new CreateEmptyFromList<>();
   }
 
   @Override
   public PCollectionList getInput(
   List inputs, Pipeline p) {
 checkArgument(
-inputs.isEmpty(), "Must have an empty input to use %s", 
getClass().getSimpleName());
+inputs.isEmpty(),
+"Unexpected nonempty input %s for %s",
+inputs,
+getClass().getSimpleName());
 return PCollectionList.empty(p);
   }
 
@@ -68,4 +71,12 @@ public class EmptyFlattenAsCreateFactory
   List outputs, PCollection newOutput) {
 return ReplacementOutputs.singleton(outputs, newOutput);
   }
+
+  private static class CreateEmptyFromList
+  extends PTransform {
+@Override
+public PCollection expand(PCollectionList input) {
+  return (PCollection) 
input.getPipeline().apply(Create.empty(VoidCoder.of()));
+}
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/0f910688/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java
--
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java
new file mode 100644
index 000..ad9d908
--- /dev/null
+++ 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/EmptyFlattenAsCreateFactoryTest.java
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction;
+
+import static org.hamcrest.Matchers.equalTo;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.Iterables;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.beam.sdk.io.CountingInput;
+import org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput;
+import org.apache.beam.sdk.testing.NeedsRunner;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import

[1/4] beam git commit: This closes #2325

2017-03-24 Thread tgroh

Repository: beam
Updated Branches:
  refs/heads/master 2723d38a6 -> e5f1a6479


This closes #2325


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e5f1a647
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e5f1a647
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e5f1a647

Branch: refs/heads/master
Commit: e5f1a6479c0a7c6f531c340fe371e4b66e7ea2a5
Parents: 2723d38 6bfde5d
Author: Thomas Groh 
Authored: Fri Mar 24 17:46:53 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 17:46:53 2017 -0700

--
 .../DeduplicatedFlattenFactory.java | 109 ++
 .../EmptyFlattenAsCreateFactory.java|  15 ++-
 .../core/construction/PTransformMatchers.java   |  26 +
 .../DeduplicatedFlattenFactoryTest.java | 112 +++
 .../EmptyFlattenAsCreateFactoryTest.java|  96 
 .../construction/PTransformMatchersTest.java|  67 +++
 runners/google-cloud-dataflow-java/pom.xml  |   3 -
 .../beam/runners/dataflow/DataflowRunner.java   |   5 +-
 8 files changed, 427 insertions(+), 6 deletions(-)
--

[3/4] beam git commit: Reenable Flatten Tests in the Dataflow Runner

2017-03-24 Thread tgroh

Reenable Flatten Tests in the Dataflow Runner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/6bfde5dc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/6bfde5dc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/6bfde5dc

Branch: refs/heads/master
Commit: 6bfde5dce82ef6ae7bda6c33832bb5e4a55dfafb
Parents: 54a7de1
Author: Thomas Groh 
Authored: Fri Mar 24 14:45:08 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 17:46:53 2017 -0700

--
 runners/google-cloud-dataflow-java/pom.xml | 3 ---
 1 file changed, 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/6bfde5dc/runners/google-cloud-dataflow-java/pom.xml
--
diff --git a/runners/google-cloud-dataflow-java/pom.xml 
b/runners/google-cloud-dataflow-java/pom.xml
index f72cc05..caa000f 100644
--- a/runners/google-cloud-dataflow-java/pom.xml
+++ b/runners/google-cloud-dataflow-java/pom.xml
@@ -87,9 +87,6 @@
 org.apache.beam.sdk.testing.UsesSplittableParDo,
 org.apache.beam.sdk.testing.UsesUnboundedPCollections,
   
-  
-org.apache.beam.sdk.transforms.FlattenTest
-

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1367

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Closed] (BEAM-1665) Update Beam site to reflect new Pipeline I/O docs structure

2017-03-24 Thread Stephen Sisk (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Sisk closed BEAM-1665.
--
   Resolution: Fixed
Fix Version/s: Not applicable

verified - you can check this out at 
https://beam.apache.org/documentation/io/io-toc/

> Update Beam site to reflect new Pipeline I/O docs structure
> ---
>
> Key: BEAM-1665
> URL: https://issues.apache.org/jira/browse/BEAM-1665
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: Not applicable
>
>
> I plan to do the following: 
> * create an entry link from Pipeline fundamentals section of "Documentation"
> * Have one page there which has TOC to all other Pipeline I/O pages
> * create individual pages for authoring 
> overview/java/python/testing/contributing
> * all contain link back to TOC page
> * update existing Python SDK & Java pages to point at the new IO TOC page
> * update programming guide to point at the IO TOC page
> Initially, this will mostly be empty, but I have the existing draft docs for 
> authoring and testing, so I will be able to create PRs to fill in content in 
> short order



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2636

2017-03-24 Thread Apache Jenkins Server

See

[3/3] beam git commit: Always require an UnboundedSource to provide a Checkpoint Coder

2017-03-24 Thread tgroh

Always require an UnboundedSource to provide a Checkpoint Coder

The coder can do no work, but should always be specified.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/4c1f2e4f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/4c1f2e4f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/4c1f2e4f

Branch: refs/heads/master
Commit: 4c1f2e4f496a9695517f6a3fbc953e910fb991ac
Parents: d63235c
Author: Thomas Groh 
Authored: Fri Mar 24 14:36:12 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 16:45:55 2017 -0700

--
 .../src/main/java/org/apache/beam/sdk/io/UnboundedSource.java| 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/4c1f2e4f/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java
index 043f2fc..3f1ba0e 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java
@@ -76,10 +76,8 @@ public abstract class UnboundedSource<
   PipelineOptions options, @Nullable CheckpointMarkT checkpointMark) 
throws IOException;
 
   /**
-   * Returns a {@link Coder} for encoding and decoding the checkpoints for 
this source, or
-   * null if the checkpoints do not need to be durably committed.
+   * Returns a {@link Coder} for encoding and decoding the checkpoints for 
this source.
*/
-  @Nullable
   public abstract Coder getCheckpointMarkCoder();
 
   /**

[GitHub] beam pull request #2323: [BEAM-1182] Clone Checkpoints before resuming in th...

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2323


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/3] beam git commit: This closes #2323

2017-03-24 Thread tgroh

Repository: beam
Updated Branches:
  refs/heads/master d63235cb5 -> 2723d38a6


This closes #2323


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2723d38a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2723d38a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2723d38a

Branch: refs/heads/master
Commit: 2723d38a67cee15b974a61da8f9c0c5d5262a02f
Parents: d63235c a5630f9
Author: Thomas Groh 
Authored: Fri Mar 24 16:45:55 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 16:45:55 2017 -0700

--
 .../runners/direct/UnboundedReadEvaluatorFactory.java   |  4 
 .../direct/UnboundedReadEvaluatorFactoryTest.java   | 12 +++-
 .../java/org/apache/beam/sdk/io/UnboundedSource.java|  4 +---
 3 files changed, 16 insertions(+), 4 deletions(-)
--

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1365

2017-03-24 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2635

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Closed] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread Daniel Halperin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-1751.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-632) Dataflow runner does not correctly flatten duplicate inputs

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941303#comment-15941303
 ] 

ASF GitHub Bot commented on BEAM-632:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2325

[BEAM-632] Fix Flatten PCollections in Dataflow

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam dataflow_flatten

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2325


commit 086d95f78e3188ea0036f3c4db7979e204775e4c
Author: Thomas Groh 
Date:   2017-03-24T22:38:20Z

Fix EmptyFlattenAsCreateFactory

The input types of Flatten and Create don't match up, so a composite
must be provided instead of Create directly.

commit 3982110f20e17f6022ec30d1ecb44e47653f12bb
Author: Thomas Groh 
Date:   2017-03-24T21:45:08Z

Reenable Flatten Tests in the Dataflow Runner

commit 125c4b09639e60ad3be5b34131185806b07b6b5f
Author: Thomas Groh 
Date:   2017-03-24T22:14:27Z

Explicitly duplicate input PCollections in Dataflow




> Dataflow runner does not correctly flatten duplicate inputs
> ---
>
> Key: BEAM-632
> URL: https://issues.apache.org/jira/browse/BEAM-632
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
>Priority: Critical
> Fix For: First stable release
>
>
> https://github.com/apache/incubator-beam/pull/960
> Builds #1148+ are failing the new test that [~tgroh] added in that PR.
> https://builds.apache.org/job/beam_PostCommit_RunnableOnService_GoogleCloudDataflow/changes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2325: [BEAM-632] Fix Flatten PCollections in Dataflow

2017-03-24 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2325

[BEAM-632] Fix Flatten PCollections in Dataflow

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam dataflow_flatten

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2325


commit 086d95f78e3188ea0036f3c4db7979e204775e4c
Author: Thomas Groh 
Date:   2017-03-24T22:38:20Z

Fix EmptyFlattenAsCreateFactory

The input types of Flatten and Create don't match up, so a composite
must be provided instead of Create directly.

commit 3982110f20e17f6022ec30d1ecb44e47653f12bb
Author: Thomas Groh 
Date:   2017-03-24T21:45:08Z

Reenable Flatten Tests in the Dataflow Runner

commit 125c4b09639e60ad3be5b34131185806b07b6b5f
Author: Thomas Groh 
Date:   2017-03-24T22:14:27Z

Explicitly duplicate input PCollections in Dataflow




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941302#comment-15941302
 ] 

ASF GitHub Bot commented on BEAM-1751:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2324


> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2324: [BEAM-1751] BigtableIO: trivial fixes to log messag...

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2324


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: BigtableIO: trivial fixes to log messages

2017-03-24 Thread dhalperi

Repository: beam
Updated Branches:
  refs/heads/master c80ef1837 -> d63235cb5


BigtableIO: trivial fixes to log messages


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fe352a30
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fe352a30
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fe352a30

Branch: refs/heads/master
Commit: fe352a30e08c748ba191b767b28de506913b02c1
Parents: c80ef18
Author: Dan Halperin 
Authored: Fri Mar 24 14:54:39 2017 -0700
Committer: Dan Halperin 
Committed: Fri Mar 24 14:54:39 2017 -0700

--
 .../java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/fe352a30/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
index a052e09..2cdd11d 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
@@ -1007,7 +1007,7 @@ public class BigtableIO {
 splitKey = rangeTracker.getRange().interpolateKey(fraction);
   } catch (RuntimeException e) {
 LOG.info(
-"%s: Failed to interpolate key for fraction %s.", 
rangeTracker.getRange(), fraction, e);
+"{}: Failed to interpolate key for fraction {}.", 
rangeTracker.getRange(), fraction, e);
 return null;
   }
   LOG.debug(
@@ -1019,7 +1019,7 @@ public class BigtableIO {
  residual =  source.withStartKey(splitKey);
   } catch (RuntimeException e) {
 LOG.info(
-"%s: Interpolating for fraction %s yielded invalid split key %s.",
+"{}: Interpolating for fraction {} yielded invalid split key {}.",
 rangeTracker.getRange(),
 fraction,
 splitKey,

[jira] [Created] (BEAM-1806) a new option `asLeftOuterJoin` for CoGroupByKey

2017-03-24 Thread Xu Mingmin (JIRA)

Xu Mingmin created BEAM-1806:


 Summary: a new option `asLeftOuterJoin` for CoGroupByKey
 Key: BEAM-1806
 URL: https://issues.apache.org/jira/browse/BEAM-1806
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Xu Mingmin
Assignee: Xu Mingmin


Similar as BEAM-1805, restrict it as left-outer-join. 

The first {{PCollection}} is used as the key, if it's empty, output is ignored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1805) a new option `asInnerJoin` for CoGroupByKey

2017-03-24 Thread Xu Mingmin (JIRA)

Xu Mingmin created BEAM-1805:


 Summary: a new option `asInnerJoin` for CoGroupByKey
 Key: BEAM-1805
 URL: https://issues.apache.org/jira/browse/BEAM-1805
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Xu Mingmin
Assignee: Xu Mingmin


{{CoGroupByKey}} joins multiple PCollection>, act as full-outer join.

Option {{asInnerJoin()}} restrict the output to convert to an inner-join 
behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #3024

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Created] (BEAM-1804) expose values in CoGbkResult as Iterable

2017-03-24 Thread Xu Mingmin (JIRA)

Xu Mingmin created BEAM-1804:


 Summary: expose values in CoGbkResult as Iterable
 Key: BEAM-1804
 URL: https://issues.apache.org/jira/browse/BEAM-1804
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Xu Mingmin
Assignee: Xu Mingmin


The joined result of {{CoGroupByKey}} is stored in {{List 
valueMap}} of {{CoGbkResult}}.

Now it's exposed as a singleton value with methods {{ V getOnly(TupleTag 
tag)}}. This function throws exception if there're more than one records for 
given tag. 

Would like to add a new method {{Iterator getIterator(TupleTag tag)}}, to 
cover the case with multiple-row. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1364

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941217#comment-15941217
 ] 

ASF GitHub Bot commented on BEAM-1751:
--

GitHub user dhalperi opened a pull request:

https://github.com/apache/beam/pull/2324

[BEAM-1751] BigtableIO: trivial fixes to log messages



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/beam bigtable-string-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2324






> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2324: [BEAM-1751] BigtableIO: trivial fixes to log messag...

2017-03-24 Thread dhalperi

GitHub user dhalperi opened a pull request:

https://github.com/apache/beam/pull/2324

[BEAM-1751] BigtableIO: trivial fixes to log messages



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/beam bigtable-string-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2324






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1363

2017-03-24 Thread Apache Jenkins Server

See

[GitHub] beam pull request #2318: Add link to contributions guides to github README

2017-03-24 Thread robertwb

Github user robertwb closed the pull request at:

https://github.com/apache/beam/pull/2318


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #2319: Allow customizing the GCE metadata service URL.

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2319


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Allow customizing the GCE metadata service URL.

2017-03-24 Thread robertwb

Repository: beam
Updated Branches:
  refs/heads/master 9b8e16fca -> c80ef1837


Allow customizing the GCE metadata service URL.

The goal here is to allow a user to customize where a job finds the metadata
service; it would also be possible to do this programmatically (eg expose a
variable), but making it an environment variable allows a caller to do this
without need to do so in-process.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7453766b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7453766b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7453766b

Branch: refs/heads/master
Commit: 7453766b0045fabcb8e6e683b30f7fa1a20b334c
Parents: 9b8e16f
Author: Craig Citro 
Authored: Fri Mar 24 13:38:50 2017 -0700
Committer: Robert Bradshaw 
Committed: Fri Mar 24 14:47:19 2017 -0700

--
 sdks/python/apache_beam/internal/gcp/auth.py | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/7453766b/sdks/python/apache_beam/internal/gcp/auth.py
--
diff --git a/sdks/python/apache_beam/internal/gcp/auth.py 
b/sdks/python/apache_beam/internal/gcp/auth.py
index ccc67c6..8304658 100644
--- a/sdks/python/apache_beam/internal/gcp/auth.py
+++ b/sdks/python/apache_beam/internal/gcp/auth.py
@@ -86,9 +86,11 @@ class GCEMetadataCredentials(OAuth2Credentials):
   retry_filter=retry.retry_on_server_errors_and_timeout_filter)
   def _refresh(self, http_request):
 refresh_time = datetime.datetime.now()
-req = urllib2.Request('http://metadata.google.internal/computeMetadata/v1/'
-  'instance/service-accounts/default/token',
-  headers={'Metadata-Flavor': 'Google'})
+metadata_root = os.environ.get(
+'GCE_METADATA_ROOT', 'metadata.google.internal')
+token_url = ('http://{}/computeMetadata/v1/instance/service-accounts/'
+ 'default/token').format(metadata_root)
+req = urllib2.Request(token_url, headers={'Metadata-Flavor': 'Google'})
 token_data = json.loads(urllib2.urlopen(req).read())
 self.access_token = token_data['access_token']
 self.token_expiry = (refresh_time +

[2/2] beam git commit: Closes #2319

2017-03-24 Thread robertwb

Closes #2319


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c80ef183
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c80ef183
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c80ef183

Branch: refs/heads/master
Commit: c80ef18376cb8c5b2cf8c1d9ac6000386d1eceb2
Parents: 9b8e16f 7453766
Author: Robert Bradshaw 
Authored: Fri Mar 24 14:47:20 2017 -0700
Committer: Robert Bradshaw 
Committed: Fri Mar 24 14:47:20 2017 -0700

--
 sdks/python/apache_beam/internal/gcp/auth.py | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--

[jira] [Commented] (BEAM-1182) Direct runner should enforce encodability of unbounded source checkpoints

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941180#comment-15941180
 ] 

ASF GitHub Bot commented on BEAM-1182:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2323

[BEAM-1182] Clone Checkpoints before resuming in the DirectRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam clone_before_resume

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2323


commit 04fe4b63e84b56c455ccc966b591ae5e24d67149
Author: Thomas Groh 
Date:   2017-03-24T21:36:12Z

Always require an UnboundedSource to provide a Checkpoint Coder

The coder can do no work, but should always be specified.

commit 0de77171d5cd273e7915cf875d796b9550daa71e
Author: Thomas Groh 
Date:   2017-03-24T21:36:46Z

Clone before Resume in DirectRunner Unbounded Reads

This exercises the CheckpointMarkCoder of all Unbounded Sources in the
DirectRunner.




> Direct runner should enforce encodability of unbounded source checkpoints
> -
>
> Key: BEAM-1182
> URL: https://issues.apache.org/jira/browse/BEAM-1182
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Eugene Kirpichov
>Assignee: Thomas Groh
>
> As far as I can tell, 
> https://github.com/apache/incubator-beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/UnboundedReadEvaluatorFactory.java
>  currently uses the reader's getCheckpoint() only as an in-memory object, 
> i.e. it's not exercising that 1) the checkpoint can be encoded at all, and 2) 
> that the reader can be resumed from an encoded/decoded checkpoint.
> I've seen cases in code reviews where people implemented a non-serializable 
> checkpoint, and direct runner tests didn't catch that because of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2323: [BEAM-1182] Clone Checkpoints before resuming in th...

2017-03-24 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2323

[BEAM-1182] Clone Checkpoints before resuming in the DirectRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam clone_before_resume

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2323


commit 04fe4b63e84b56c455ccc966b591ae5e24d67149
Author: Thomas Groh 
Date:   2017-03-24T21:36:12Z

Always require an UnboundedSource to provide a Checkpoint Coder

The coder can do no work, but should always be specified.

commit 0de77171d5cd273e7915cf875d796b9550daa71e
Author: Thomas Groh 
Date:   2017-03-24T21:36:46Z

Clone before Resume in DirectRunner Unbounded Reads

This exercises the CheckpointMarkCoder of all Unbounded Sources in the
DirectRunner.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #2322: [BEAM-1801] Fix Python Dataflow default job name

2017-03-24 Thread pabloem

GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/2322

[BEAM-1801] Fix Python Dataflow default job name

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
r: @aaltay 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam fix-py-def-jobname

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2322


commit 8a37324bc240336f43135523107dc47a86ddbb66
Author: Pablo 
Date:   2017-03-24T21:31:33Z

Fix Python Dataflow default job name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread Daniel Halperin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941167#comment-15941167
 ] 

Daniel Halperin commented on BEAM-1751:
---

Oof. Good catch, thanks!

> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2320: Add a link to the contributor's guide to README.md.

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2320


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] beam git commit: This closes #2320

2017-03-24 Thread dhalperi

This closes #2320


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9b8e16fc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9b8e16fc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9b8e16fc

Branch: refs/heads/master
Commit: 9b8e16fcab0c09ea06f299efb21c611cf3129bd7
Parents: 8044408 b98c797
Author: Dan Halperin 
Authored: Fri Mar 24 14:25:31 2017 -0700
Committer: Dan Halperin 
Committed: Fri Mar 24 14:25:31 2017 -0700

--
 README.md | 2 ++
 1 file changed, 2 insertions(+)
--

[1/2] beam git commit: Add a link to the contributor's guide to README.md.

2017-03-24 Thread dhalperi

Repository: beam
Updated Branches:
  refs/heads/master 804440826 -> 9b8e16fca


Add a link to the contributor's guide to README.md.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b98c7973
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b98c7973
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b98c7973

Branch: refs/heads/master
Commit: b98c797335e3c6ab547e872c9c216576e24d4bb5
Parents: 8044408
Author: Craig Citro 
Authored: Fri Mar 24 13:47:08 2017 -0700
Committer: Craig Citro 
Committed: Fri Mar 24 13:47:08 2017 -0700

--
 README.md | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b98c7973/README.md
--
diff --git a/README.md b/README.md
index 1577ac9..0878668 100644
--- a/README.md
+++ b/README.md
@@ -96,6 +96,8 @@ To get involved in Apache Beam:
 * [Subscribe](mailto:dev-subscr...@beam.apache.org) or 
[mail](mailto:d...@beam.apache.org) the 
[d...@beam.apache.org](http://mail-archives.apache.org/mod_mbox/beam-dev/) list.
 * Report issues on [JIRA](https://issues.apache.org/jira/browse/BEAM).
 
+We also have a [contributor's 
guide](https://beam.apache.org/contribute/contribution-guide/).
+
 ## More Information
 
 * [Apache Beam](http://beam.apache.org)

[jira] [Comment Edited] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941125#comment-15941125
 ] 

peay edited comment on BEAM-1751 at 3/24/17 9:13 PM:
-

Hum, good point -- I messed up my staging. With 
{{beam-sdks-java-io-google-cloud-platform-0.7.0-20170324.071656-17.jar}}, the 
error is gone, and I have confirmed that the collection contains all the rows 
from the table. Thanks a lot for looking into that! 

Logs have
{code}
  message: "%s: Interpolating for fraction %s yielded invalid split key %s."
{code}
which does not seem to get interpolated with actual values.

edit: I guess you want either `{}` or a `String.format` in that diff?



was (Author: peay):
Hum, good point -- I messed up my staging. With 
{{beam-sdks-java-io-google-cloud-platform-0.7.0-20170324.071656-17.jar}}, the 
error is gone, and I have confirmed that the collection contains all the rows 
from the table. Thanks a lot for looking into that! 

Logs have
{code}
  message: "%s: Interpolating for fraction %s yielded invalid split key %s."
{code}
which does not seem to get interpolated with actual values.


> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1655) Evaluate the PubsubUnboundedSource

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941130#comment-15941130
 ] 

ASF GitHub Bot commented on BEAM-1655:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2321

[BEAM-1655] Enable nackAll on all PubsubCheckpoints

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This method should not fail if the checkpoint has not been decoded.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam unbounded_read_encode_decode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2321


commit 09610cbbbcff2795edd8f5f175a2ba73e0496b17
Author: Thomas Groh 
Date:   2017-03-24T21:01:51Z

Enable nackAll on all PubsubCheckpoints

This method should not fail if the checkpoint has not been decoded.




> Evaluate the PubsubUnboundedSource
> --
>
> Key: BEAM-1655
> URL: https://issues.apache.org/jira/browse/BEAM-1655
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Assignee: Thomas Groh
>
> This source includes a lot of assumptions & assertions that may cause 
> problems on runners that implement UnboundedSources differently. For example:
> 1. It requires that finalizeCheckpoint is called at most once.
> 2. It requires that the checkpoint be finalized within the pubsub timeout, or 
> the messages will be redelivered.
> ... (possibly other assumptions) ... 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2321: [BEAM-1655] Enable nackAll on all PubsubCheckpoints

2017-03-24 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2321

[BEAM-1655] Enable nackAll on all PubsubCheckpoints

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This method should not fail if the checkpoint has not been decoded.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam unbounded_read_encode_decode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2321


commit 09610cbbbcff2795edd8f5f175a2ba73e0496b17
Author: Thomas Groh 
Date:   2017-03-24T21:01:51Z

Enable nackAll on all PubsubCheckpoints

This method should not fail if the checkpoint has not been decoded.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941125#comment-15941125
 ] 

peay commented on BEAM-1751:


Hum, good point -- I messed up my staging. With 
{{beam-sdks-java-io-google-cloud-platform-0.7.0-20170324.071656-17.jar}}, the 
error is gone, and I have confirmed that the collection contains all the rows 
from the table. Thanks a lot for looking into that! 

Logs have
{code}
  message: "%s: Interpolating for fraction %s yielded invalid split key %s."
{code}
which does not seem to get interpolated with actual values.


> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2320: Add a link to the contributor's guide to README.md.

2017-03-24 Thread craigcitro

GitHub user craigcitro opened a pull request:

https://github.com/apache/beam/pull/2320

Add a link to the contributor's guide to README.md.

PTAL @robertwb 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/craigcitro/beam readme

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2320


commit b98c797335e3c6ab547e872c9c216576e24d4bb5
Author: Craig Citro 
Date:   2017-03-24T20:47:08Z

Add a link to the contributor's guide to README.md.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #2319: Allow customizing the GCE metadata service URL.

2017-03-24 Thread craigcitro

GitHub user craigcitro opened a pull request:

https://github.com/apache/beam/pull/2319

Allow customizing the GCE metadata service URL.

The goal here is to allow a user to customize where a job finds the metadata
service; it would also be possible to do this programmatically (eg expose a
variable), but making it an environment variable allows a caller to do this
without need to do so in-process.

PTAL @robertwb 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/craigcitro/beam gce

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2319


commit ca4a2953b6739e874c2191ce1a0c74eb931acc27
Author: Craig Citro 
Date:   2017-03-24T20:38:50Z

Allow customizing the GCE metadata service URL.

The goal here is to allow a user to customize where a job finds the metadata
service; it would also be possible to do this programmatically (eg expose a
variable), but making it an environment variable allows a caller to do this
without need to do so in-process.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (BEAM-1047) DataflowRunner: support regionalization.

2017-03-24 Thread Daniel Halperin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-1047.
-
   Resolution: Fixed
Fix Version/s: First stable release

> DataflowRunner: support regionalization.
> 
>
> Key: BEAM-1047
> URL: https://issues.apache.org/jira/browse/BEAM-1047
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Pei He
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> Tracking bug.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2318: Add link to contributions guides to github README

2017-03-24 Thread robertwb

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/2318

Add link to contributions guides to github README

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam patch-5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2318


commit 950c0a36bc7e8046e67ee22089af902e40f24f4a
Author: Robert Bradshaw 
Date:   2017-03-24T20:36:36Z

Add link to contributions guides to github README




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1803) Metrics filters have a missmatch in class-based namespace

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941038#comment-15941038
 ] 

ASF GitHub Bot commented on BEAM-1803:
--

GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/2317

[BEAM-1803] Fixing bug in Metrics filters. Adding unittests.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
r: @aviemzur ¿Can you take a look? It should be fairly simple.
Thanks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam fix-filtering

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2317.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2317


commit a53a0f38206f6a93f5e9620773d98c25b9a630c0
Author: Pablo 
Date:   2017-03-24T19:53:10Z

Unittests for metrics filtering.




> Metrics filters have a missmatch in class-based namespace
> -
>
> Key: BEAM-1803
> URL: https://issues.apache.org/jira/browse/BEAM-1803
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2317: [BEAM-1803] Fixing bug in Metrics filters. Adding u...

2017-03-24 Thread pabloem

GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/2317

[BEAM-1803] Fixing bug in Metrics filters. Adding unittests.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
r: @aviemzur Â¿Can you take a look? It should be fairly simple.
Thanks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam fix-filtering

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2317.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2317


commit a53a0f38206f6a93f5e9620773d98c25b9a630c0
Author: Pablo 
Date:   2017-03-24T19:53:10Z

Unittests for metrics filtering.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (BEAM-1803) Metrics filters have a missmatch in class-based namespace

2017-03-24 Thread Pablo Estrada (JIRA)

Pablo Estrada created BEAM-1803:
---

 Summary: Metrics filters have a missmatch in class-based namespace
 Key: BEAM-1803
 URL: https://issues.apache.org/jira/browse/BEAM-1803
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2634

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread Daniel Halperin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941026#comment-15941026
 ] 

Daniel Halperin commented on BEAM-1751:
---

The line numbers in your stack trace don't line up with the state of the repo 
near HEAD. Can you send a Dataflow Job ID and double-check the exact Beam 
snapshot version you were using?

> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is in the log right before:
> {code}
> "Proposing to split 
> ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xx], endKey=[]}, 
> position=null} at fraction 0.0 (key [xx])"   
> {code}
> I have replaced the actual key with {{xx}}, but it is always the same 
> everywhere. In 
> https://github.com/apache/beam/blob/e68a70e08c9fe00df9ec163d1532da130f69588a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L260,
>  the end position is obtained by truncating the fractional part of {{size * 
> fraction}}, such that the resulting offset can just be zero if {{fraction}} 
> is too small. `ByteKeyRange` does not allow a singleton range, however. Since 
> {{fraction}} is zero here, the call to {{splitAtFraction}} fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner

2017-03-24 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941015#comment-15941015
 ] 

peay commented on BEAM-1751:


I just ran it with {{0.7.0-SNAPSHOT}} from this morning -- same issue, nothing 
changed.

{code}
Exception: java.lang.IllegalArgumentException: Start [] must be less than 
end []
org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
{code}

When submitting the job,
{code}
15:34:11.767 [main] INFO  o.a.b.sdk.io.gcp.bigtable.BigtableIO - About to split 
into bundles of size 805306368 with sampleRowKeys length 1 first element 
offset_bytes: 805306368
15:34:11.767 [main] INFO  o.a.b.sdk.io.gcp.bigtable.BigtableIO - Generated 1 
splits. First split: BigtableSource{tableId=apRadioProperties-default, 
filter=null, range=ByteKeyRange{startKey=[], endKey=[]}, 
estimatedSizeBytes=805306368}
{code}

>From the worker logs:
{code}
"Proposing to split ByteKeyRangeTracker{range=ByteKeyRange{startKey=[], 
endKey=[]}, position=null} at fraction 0.0 (key [])"
{code}

> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> --
>
> Key: BEAM-1751
> URL: https://issues.apache.org/jira/browse/BEAM-1751
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: peay
>Assignee: Daniel Halperin
>
> I am getting this exception on a smallish table of a couple hundreds of rows 
> from Bigtable, when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on 
> Dataflow. Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xx] must be less 
> than end [xx]
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.(ByteKeyRange.java:288)
>   at 
> org.apache.beam.sdk.io.range.ByteKeyRange.withEndKey(ByteKeyRange.java:278)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.withEndKey(BigtableIO.java:728)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:1034)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableReader.splitAtFraction(BigtableIO.java:953)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:530)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:386)
>   at 
> org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(DataflowUnboundedReadFromBoundedSource.java:283)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:278)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:778)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$700(StreamingDataflowWorker.java:105)
>   at 
> com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$9.run(StreamingDataflowWorker.java:858)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #3023

2017-03-24 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1362

2017-03-24 Thread Apache Jenkins Server

See

[GitHub] beam pull request #2316: [BEAM-1778] Clean up unused dataflow javadoc direct...

2017-03-24 Thread melap

GitHub user melap opened a pull request:

https://github.com/apache/beam/pull/2316

[BEAM-1778] Clean up unused dataflow javadoc directory

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/melap/beam removedir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2316






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-778) Make fileio._CompressedFile seekable.

2017-03-24 Thread Konstantinos Katsiapis (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940890#comment-15940890
 ] 

Konstantinos Katsiapis commented on BEAM-778:
-

Yes, when I had looked at that in the past, it was very easy to use 
gzip.GzipFile(..., fileobj = self._file, ...) [1] but unfortunately not so for 
Bzip2 [2] or Snappy [3]. And we wanted to share as much implementation as 
possible (as opposed to have completely different codepaths for each 
compression type).

Provided that we can have a single interface that allows us to handle 
Gzip/Bzip2 (and ideally in the future Snappy and other whole-file compression 
techniques) with minimal diffs, changing the underlying implementation is I 
think fair game.

[1] https://docs.python.org/2/library/gzip.html#gzip.GzipFile and 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystem.py#L103
[2] https://docs.python.org/2/library/bz2.html#bz2.BZ2File
[3] https://pypi.python.org/pypi/python-snappy

> Make fileio._CompressedFile seekable.
> -
>
> Key: BEAM-778
> URL: https://issues.apache.org/jira/browse/BEAM-778
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Tibor Kiss
> Fix For: Not applicable
>
>
> We have a TODO to make fileio._CompressedFile seekable.
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L692
> Without this, compressed file objects produce for FileBasedSource 
> implementations may not be able to use libraries that utilize methods seek() 
> and tell().
> For example tarfile.open().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2633

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Closed] (BEAM-1794) Bigtable: improve user agent

2017-03-24 Thread Daniel Halperin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-1794.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Bigtable: improve user agent
> 
>
> Key: BEAM-1794
> URL: https://issues.apache.org/jira/browse/BEAM-1794
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> The bigtable-client-core has changed the way it generates user agent strings 
> to automatically provide the information we were providing manually before. 
> Update the BigtableIO client code to fit the new improved scheme.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1794) Bigtable: improve user agent

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940844#comment-15940844
 ] 

ASF GitHub Bot commented on BEAM-1794:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2306


> Bigtable: improve user agent
> 
>
> Key: BEAM-1794
> URL: https://issues.apache.org/jira/browse/BEAM-1794
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> The bigtable-client-core has changed the way it generates user agent strings 
> to automatically provide the information we were providing manually before. 
> Update the BigtableIO client code to fit the new improved scheme.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[1/2] beam git commit: [BEAM-1794] BigtableIO: update user agent computation for new bigtable-client-core

2017-03-24 Thread dhalperi

Repository: beam
Updated Branches:
  refs/heads/master 5903e6994 -> 804440826


[BEAM-1794] BigtableIO: update user agent computation for new 
bigtable-client-core


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/282c3ff2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/282c3ff2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/282c3ff2

Branch: refs/heads/master
Commit: 282c3ff2d599d23eeb7ad8aaa5af26f9020fc3f4
Parents: 5903e69
Author: Dan Halperin 
Authored: Wed Mar 22 10:21:33 2017 -0700
Committer: Dan Halperin 
Committed: Fri Mar 24 11:07:15 2017 -0700

--
 .../beam/sdk/io/gcp/bigtable/BigtableIO.java| 24 +++-
 1 file changed, 13 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/282c3ff2/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
index 9d02f65..a052e09 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
@@ -215,7 +215,8 @@ public class BigtableIO {
   // Set data channel count to one because there is only 1 scanner in this 
session
   BigtableOptions.Builder clonedBuilder = options.toBuilder()
   .setDataChannelCount(1);
-  BigtableOptions optionsWithAgent = 
clonedBuilder.setUserAgent(getUserAgent()).build();
+  BigtableOptions optionsWithAgent =
+  clonedBuilder.setUserAgent(getBeamSdkPartOfUserAgent()).build();
 
   return new Read(optionsWithAgent, tableId, keyRange, filter, 
bigtableService);
 }
@@ -449,7 +450,8 @@ public class BigtableIO {
   options.getBulkOptions().toBuilder()
   .setUseBulkApi(true)
   .build());
-  BigtableOptions optionsWithAgent = 
clonedBuilder.setUserAgent(getUserAgent()).build();
+  BigtableOptions optionsWithAgent =
+  clonedBuilder.setUserAgent(getBeamSdkPartOfUserAgent()).build();
   return new Write(optionsWithAgent, tableId, bigtableService);
 }
 
@@ -1047,16 +1049,16 @@ public class BigtableIO {
   }
 
   /**
-   * A helper function to produce a Cloud Bigtable user agent string.
+   * A helper function to produce a Cloud Bigtable user agent string. This 
need only include
+   * information about the Apache Beam SDK itself, because Bigtable will 
automatically append
+   * other relevant system and Bigtable client-specific version information.
+   *
+   * @see com.google.cloud.bigtable.config.BigtableVersionInfo
*/
-  private static String getUserAgent() {
-String javaVersion = System.getProperty("java.specification.version");
+  private static String getBeamSdkPartOfUserAgent() {
 ReleaseInfo info = ReleaseInfo.getReleaseInfo();
-return String.format(
-"%s/%s (%s); %s",
-info.getName(),
-info.getVersion(),
-javaVersion,
-"0.3.0" /* TODO get Bigtable client version directly from jar. */);
+return
+String.format("%s/%s", info.getName(), info.getVersion())
+.replace(" ", "_");
   }
 }

[1/2] beam git commit: This closes #2315

2017-03-24 Thread tgroh

Repository: beam
Updated Branches:
  refs/heads/master 5c2da7dc2 -> 5903e6994


This closes #2315


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5903e699
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5903e699
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5903e699

Branch: refs/heads/master
Commit: 5903e6994875b671e31a2a53c1a9102859eaf022
Parents: 5c2da7d ae31f55
Author: Thomas Groh 
Authored: Fri Mar 24 10:43:07 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 10:43:07 2017 -0700

--
 .../java/org/apache/beam/sdk/options/StreamingOptions.java| 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
--

[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-24 Thread Davor Bonaci (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940793#comment-15940793
 ] 

Davor Bonaci commented on BEAM-849:
---

This would be very nice to have, but I think it's a lot of work -- and I don't 
know that anybody is actively working on it... so, I'd say it is unlikely to 
happen. I wouldn't track, then.

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2315: Hide the isStreaming PipelineOption

2017-03-24 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2315

Hide the isStreaming PipelineOption

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This is automatically inferred when necessary in all existing runners.
Users should not need to set it unless they wish to force an alternative
execution environment in a runner that respects this flag while running
a Bounded Pipeline.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam hide_isStreaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2315.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2315


commit 50f0c12fc6facaba7f31440193b5caad789fd009
Author: Thomas Groh 
Date:   2017-03-24T17:28:23Z

Hide the isStreaming PipelineOption

This is automatically inferred when necessary in all existing runners.
Users should not need to set it unless they wish to force an alternative
execution environment in a runner that respects this flag while running
a Bounded Pipeline.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940768#comment-15940768
 ] 

Ismaël Mejía commented on BEAM-849:
---

I have missed this JIRA somehow, does it make sense to make this one part of 
the "First Stable Release" group ?

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1514) change default timestamp in KafkaIO

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940760#comment-15940760
 ] 

ASF GitHub Bot commented on BEAM-1514:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2267


> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2267: [BEAM-1514] change default timestamp in KafkaIO

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2267


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: append to #2135, add 1). fix issue of NO_TIMESTAMP type in 10; 2). rename field to 'timestamp';

2017-03-24 Thread davor

Repository: beam
Updated Branches:
  refs/heads/master 741242732 -> 5c2da7dc2


append to #2135, add
1). fix issue of NO_TIMESTAMP type in 10;
2). rename field to 'timestamp';


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f10509e7
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f10509e7
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f10509e7

Branch: refs/heads/master
Commit: f10509e745ff234110bc50d16aba1cb6813036b6
Parents: 7412427
Author: mingmxu 
Authored: Fri Mar 17 13:18:17 2017 -0700
Committer: Davor Bonaci 
Committed: Fri Mar 24 10:14:33 2017 -0700

--
 .../apache/beam/sdk/io/kafka/ConsumerSpEL.java  | 43 +---
 .../org/apache/beam/sdk/io/kafka/KafkaIO.java   | 21 +-
 .../apache/beam/sdk/io/kafka/KafkaRecord.java   | 15 +--
 .../beam/sdk/io/kafka/KafkaRecordCoder.java |  5 +++
 4 files changed, 73 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f10509e7/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
--
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
index b92b6fa..8fe17c1 100644
--- 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
+++ 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
@@ -17,10 +17,14 @@
  */
 package org.apache.beam.sdk.io.kafka;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
 import java.util.Collection;
-
 import org.apache.kafka.clients.consumer.Consumer;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
 import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import org.springframework.expression.Expression;
 import org.springframework.expression.ExpressionParser;
 import org.springframework.expression.spel.SpelParserConfiguration;
@@ -33,16 +37,28 @@ import 
org.springframework.expression.spel.support.StandardEvaluationContext;
  * to eliminate the method definition differences.
  */
 class ConsumerSpEL {
-  SpelParserConfiguration config = new SpelParserConfiguration(true, true);
-  ExpressionParser parser = new SpelExpressionParser(config);
+  private static final Logger LOG = 
LoggerFactory.getLogger(ConsumerSpEL.class);
+
+  private SpelParserConfiguration config = new SpelParserConfiguration(true, 
true);
+  private ExpressionParser parser = new SpelExpressionParser(config);
 
-  Expression seek2endExpression =
+  private Expression seek2endExpression =
   parser.parseExpression("#consumer.seekToEnd(#tp)");
 
-  Expression assignExpression =
+  private Expression assignExpression =
   parser.parseExpression("#consumer.assign(#tp)");
 
-  public ConsumerSpEL() {}
+  private Method timestampMethod;
+  private boolean hasRecordTimestamp = false;
+
+  public ConsumerSpEL() {
+try {
+  timestampMethod = ConsumerRecord.class.getMethod("timestamp", 
(Class[]) null);
+  hasRecordTimestamp = timestampMethod.getReturnType().equals(Long.TYPE);
+} catch (NoSuchMethodException | SecurityException e) {
+  LOG.debug("Timestamp for Kafka message is not available.");
+}
+  }
 
   public void evaluateSeek2End(Consumer consumer, TopicPartition 
topicPartitions) {
 StandardEvaluationContext mapContext = new StandardEvaluationContext();
@@ -57,4 +73,19 @@ class ConsumerSpEL {
 mapContext.setVariable("tp", topicPartitions);
 assignExpression.getValue(mapContext);
   }
+
+  public long getRecordTimestamp(ConsumerRecord rawRecord) {
+long timestamp;
+try {
+  //for Kafka 0.9, set to System.currentTimeMillis();
+  //for kafka 0.10, when NO_TIMESTAMP also set to 
System.currentTimeMillis();
+  if (!hasRecordTimestamp || (timestamp = (long) 
timestampMethod.invoke(rawRecord)) <= 0L) {
+timestamp = System.currentTimeMillis();
+  }
+} catch (IllegalAccessException | IllegalArgumentException | 
InvocationTargetException e) {
+  // Not expected. Method timestamp() is already checked.
+  throw new RuntimeException(e);
+}
+return timestamp;
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/f10509e7/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
--
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
index 890fb2b..310392c 100644
---

[2/2] beam git commit: This closes #2267

2017-03-24 Thread davor

This closes #2267


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5c2da7dc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5c2da7dc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5c2da7dc

Branch: refs/heads/master
Commit: 5c2da7dc2eb2a6a6ab7138ea4b37884d9327de7e
Parents: 7412427 f10509e
Author: Davor Bonaci 
Authored: Fri Mar 24 10:15:07 2017 -0700
Committer: Davor Bonaci 
Committed: Fri Mar 24 10:15:07 2017 -0700

--
 .../apache/beam/sdk/io/kafka/ConsumerSpEL.java  | 43 +---
 .../org/apache/beam/sdk/io/kafka/KafkaIO.java   | 21 +-
 .../apache/beam/sdk/io/kafka/KafkaRecord.java   | 15 +--
 .../beam/sdk/io/kafka/KafkaRecordCoder.java |  5 +++
 4 files changed, 73 insertions(+), 11 deletions(-)
--

[jira] [Commented] (BEAM-1795) Upgrade google-cloud-bigquery to 0.23.0

2017-03-24 Thread Daniel Halperin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940757#comment-15940757
 ] 

Daniel Halperin commented on BEAM-1795:
---

I'd definitely encourage inspect the transitive dependencies to be part of the 
upgrade process, however. I review every Java {{pom.xml}} change by diffing the 
dependency tree. That's part of how you learn over time which libraries are 
well-behaved and which aren't/

> Upgrade google-cloud-bigquery to 0.23.0
> ---
>
> Key: BEAM-1795
> URL: https://issues.apache.org/jira/browse/BEAM-1795
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Mark Liu
>
> Should we upgrade this?
> https://pypi.python.org/pypi/google-cloud-bigquery/0.23.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1800) Can't save datastore objects

2017-03-24 Thread Vikas Kedigehalli (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940746#comment-15940746
 ] 

Vikas Kedigehalli commented on BEAM-1800:
-

That is weird because the protobuf message is serailized to string before 
sending to httplib 
(https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/python/googledatastore/connection.py#L191)

[~mlambert] could you bypass Beam and trying writing directly using this 
library and see if it fails? That would tell us if its a Beam issue or not. 
https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/python/googledatastore/connection.py#L127

> Can't save datastore objects
> 
>
> Key: BEAM-1800
> URL: https://issues.apache.org/jira/browse/BEAM-1800
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Mike Lambert
>Assignee: Vikas Kedigehalli
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it 
> errors out on a strange unicode issue when trying to write a batch. 
> Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
>   self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
>   self.reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
>   raise type(exn), args, sys.exc_info()[2] 
> File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
>   self._dofn_simple_invoker(element) 
> File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5262)
>   self._process_outputs(element, self.dofn_process(element.value)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 354, in process
>   self._flush_batch() 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 363, in _flush_batch
>   helper.write_mutations(self._datastore, self._project, self._mutations) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 187, in write_mutations
>   commit(commit_request) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", 
> line 174, in wrapper
>   return fun(*args, **kwargs) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 185, in commit
>   datastore.commit(req) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 140, in commit
>   datastore_pb2.CommitResponse) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 199, in _call_method
>   method='POST', body=payload, headers=headers) 
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 
> 631, in new_request
>   redirections, connection_type) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1609, in request (response, content)
>   = self._request(conn, authority, uri, request_uri, method, body, headers, 
> redirections, cachekey) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1351, in _request (response, content)
>   = self._conn_request(conn, request_uri, method, body, headers) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1273, in _conn_request
>   conn.request(method, request_uri, body, headers) 
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
>   self._send_request(method, url, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
>self.endheaders(body) 
> File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
>   self._send_output(message_body) 
> File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
>   msg += message_body TypeError: must be str, not unicode
> [while running 'write to datastore/Convert to Mutation']
> {noformat}
> My code is basically:
> {noformat}
> | 'convert from entity' >> beam.Map(ConvertFromEntity)
> | 'write to datastore' >> WriteToDatastore(client.project)
> {noformat}
> Where {{ConvertFromEntity}} converts from a google.cloud.datastore object 
> (which has a nice API/interface) into the underlying protobuf (which is what 
> the beam gcp/datastore library expects):
> {noformat}
> from google.cloud.datastore import helpers
> def ConvertFromEntity(entity):
> return helpers.entity_to_protobuf(entity)

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1359

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1795) Upgrade google-cloud-bigquery to 0.23.0

2017-03-24 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940715#comment-15940715
 ] 

Ahmet Altay commented on BEAM-1795:
---

> Also wondering what's the general rules to upgrade a library version?
This is a good question and I think should be documented on the web page. (Also 
[~dhalp...@google.com] for the policy in Java)

Our current policy is to upgrade to the latest version of the dependencies as 
soon as possible unless they introduce a breaking change, require a lot of code 
change, or because of a shared dependency it is not possible for us to upgrade. 
In the latter case we decide on case by case basis.

In this I would argue for upgrading now. It is is easier to keep libraries 
up-to-date by upgrading at small incremental steps.



> Upgrade google-cloud-bigquery to 0.23.0
> ---
>
> Key: BEAM-1795
> URL: https://issues.apache.org/jira/browse/BEAM-1795
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Mark Liu
>
> Should we upgrade this?
> https://pypi.python.org/pypi/google-cloud-bigquery/0.23.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (BEAM-1447) Autodetect streaming/not streaming in DataflowRunner

2017-03-24 Thread Thomas Groh (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1447.
---
   Resolution: Fixed
Fix Version/s: First stable release

> Autodetect streaming/not streaming in DataflowRunner
> 
>
> Key: BEAM-1447
> URL: https://issues.apache.org/jira/browse/BEAM-1447
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
> Fix For: First stable release
>
>
> Once pipeline surgery happens after construction, the Dataflow runner should 
> be able to automatically decide how to execute a pipeline based on 
> PCollection boundedness.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1795) Upgrade google-cloud-bigquery to 0.23.0

2017-03-24 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940693#comment-15940693
 ] 

Mark Liu commented on BEAM-1795:


I'd love to keep the version to the latest since we're currently using 0.22.1, 
but seems there is no big changes/fix related to API we use in Beam. (those are 
basic data query from a table)

Also wondering what's the general rules to upgrade a library version?

> Upgrade google-cloud-bigquery to 0.23.0
> ---
>
> Key: BEAM-1795
> URL: https://issues.apache.org/jira/browse/BEAM-1795
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Mark Liu
>
> Should we upgrade this?
> https://pypi.python.org/pypi/google-cloud-bigquery/0.23.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2310: Remove unused field in FlinkRunner

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2310


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: This closes #2310

2017-03-24 Thread tgroh

Repository: beam
Updated Branches:
  refs/heads/master 14aba8125 -> 741242732


This closes #2310


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/74124273
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/74124273
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/74124273

Branch: refs/heads/master
Commit: 7412427327c8c66e91e8d4c141854d1df343c1e0
Parents: 14aba81 92cdc08
Author: Thomas Groh 
Authored: Fri Mar 24 09:41:42 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 09:41:42 2017 -0700

--
 .../apache/beam/runners/flink/FlinkRunner.java| 18 --
 1 file changed, 18 deletions(-)
--

[2/2] beam git commit: Remove unused field in FlinkRunner

2017-03-24 Thread tgroh

Remove unused field in FlinkRunner

These overrides are performed in FlinkStreamingPipelineTranslator


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/92cdc089
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/92cdc089
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/92cdc089

Branch: refs/heads/master
Commit: 92cdc0891f33afdc0ef7545fbd812532555631ff
Parents: 14aba81
Author: Thomas Groh 
Authored: Thu Mar 23 15:52:22 2017 -0700
Committer: Thomas Groh 
Committed: Fri Mar 24 09:41:42 2017 -0700

--
 .../apache/beam/runners/flink/FlinkRunner.java| 18 --
 1 file changed, 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/92cdc089/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
--
diff --git 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
index 5610dd4..096f030 100644
--- 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
+++ 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
@@ -18,7 +18,6 @@
 package org.apache.beam.runners.flink;
 
 import com.google.common.base.Joiner;
-import com.google.common.collect.ImmutableMap;
 import java.io.File;
 import java.net.URISyntaxException;
 import java.net.URL;
@@ -36,7 +35,6 @@ import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsValidator;
 import org.apache.beam.sdk.runners.PipelineRunner;
 import org.apache.beam.sdk.runners.TransformHierarchy;
-import org.apache.beam.sdk.transforms.Combine;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.transforms.View;
 import org.apache.beam.sdk.values.PValue;
@@ -59,9 +57,6 @@ public class FlinkRunner extends 
PipelineRunner {
*/
   private final FlinkPipelineOptions options;
 
-  /** Custom transforms implementations. */
-  private final Map overrides;
-
   /**
* Construct a runner from the provided options.
*
@@ -102,19 +97,6 @@ public class FlinkRunner extends 
PipelineRunner {
   private FlinkRunner(FlinkPipelineOptions options) {
 this.options = options;
 this.ptransformViewsWithNonDeterministicKeyCoders = new HashSet<>();
-
-ImmutableMap.Builder builder = ImmutableMap.builder();
-if (options.isStreaming()) {
-  builder.put(Combine.GloballyAsSingletonView.class,
-  
FlinkStreamingViewOverrides.StreamingCombineGloballyAsSingletonView.class);
-  builder.put(View.AsMap.class, 
FlinkStreamingViewOverrides.StreamingViewAsMap.class);
-  builder.put(View.AsMultimap.class, 
FlinkStreamingViewOverrides.StreamingViewAsMultimap.class);
-  builder.put(View.AsSingleton.class,
-  FlinkStreamingViewOverrides.StreamingViewAsSingleton.class);
-  builder.put(View.AsList.class, 
FlinkStreamingViewOverrides.StreamingViewAsList.class);
-  builder.put(View.AsIterable.class, 
FlinkStreamingViewOverrides.StreamingViewAsIterable.class);
-}
-overrides = builder.build();
   }
 
   @Override

[jira] [Assigned] (BEAM-1800) Can't save datastore objects

2017-03-24 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-1800:
-

Assignee: Vikas Kedigehalli  (was: Ahmet Altay)

> Can't save datastore objects
> 
>
> Key: BEAM-1800
> URL: https://issues.apache.org/jira/browse/BEAM-1800
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Mike Lambert
>Assignee: Vikas Kedigehalli
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it 
> errors out on a strange unicode issue when trying to write a batch. 
> Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
>   self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
>   self.reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
>   raise type(exn), args, sys.exc_info()[2] 
> File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
>   self._dofn_simple_invoker(element) 
> File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5262)
>   self._process_outputs(element, self.dofn_process(element.value)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 354, in process
>   self._flush_batch() 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 363, in _flush_batch
>   helper.write_mutations(self._datastore, self._project, self._mutations) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 187, in write_mutations
>   commit(commit_request) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", 
> line 174, in wrapper
>   return fun(*args, **kwargs) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 185, in commit
>   datastore.commit(req) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 140, in commit
>   datastore_pb2.CommitResponse) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 199, in _call_method
>   method='POST', body=payload, headers=headers) 
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 
> 631, in new_request
>   redirections, connection_type) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1609, in request (response, content)
>   = self._request(conn, authority, uri, request_uri, method, body, headers, 
> redirections, cachekey) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1351, in _request (response, content)
>   = self._conn_request(conn, request_uri, method, body, headers) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1273, in _conn_request
>   conn.request(method, request_uri, body, headers) 
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
>   self._send_request(method, url, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
>self.endheaders(body) 
> File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
>   self._send_output(message_body) 
> File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
>   msg += message_body TypeError: must be str, not unicode
> [while running 'write to datastore/Convert to Mutation']
> {noformat}
> My code is basically:
> {noformat}
> | 'convert from entity' >> beam.Map(ConvertFromEntity)
> | 'write to datastore' >> WriteToDatastore(client.project)
> {noformat}
> Where {{ConvertFromEntity}} converts from a google.cloud.datastore object 
> (which has a nice API/interface) into the underlying protobuf (which is what 
> the beam gcp/datastore library expects):
> {noformat}
> from google.cloud.datastore import helpers
> def ConvertFromEntity(entity):
> return helpers.entity_to_protobuf(entity)
> {noformat}
> I assume entity_to_protobuf works fine/normally, since it's also what is used 
> by {{google/cloud/datastore/batch.py}} to write a bunch of 
> {{entity_pb2.Entity}} objects into the 
> {{datastore_pb2.CommitRequest.mutations[n].upsert}}:
> In batch.py: {{put() -> _assign_entity_to_pb() -> entity_to_protobuf()}}.
> In datastoreio.py: 
> {{WriteToDatastore->DatastoreWriteFn.to_upsert_mutation->_Mutate.DatastoreWriteFn->helper.write_mutations}}
> Any idea

[jira] [Assigned] (BEAM-1801) default_job_name can generate names not accepted by DataFlow

2017-03-24 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-1801:
-

Assignee: Pablo Estrada  (was: Ahmet Altay)

> default_job_name can generate names not accepted by DataFlow
> 
>
> Key: BEAM-1801
> URL: https://issues.apache.org/jira/browse/BEAM-1801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Dennis Docter
>Assignee: Pablo Estrada
>Priority: Trivial
>
> The default job name generated by: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L288
> is partially derived from the os username of the executing user. These may 
> contain characters not accepted by Dataflow, resulting in errors like:
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:
> (a1b878f3562c0e6d): Error processing pipeline. Causes: (a1b878f3562c04ae): 
> Prefix for cluster 'beamapp-dennis.docter-032-03231324-1edc-harness' should 
> match '[a-z]([-a-z0-9]{0,61}[a-z0-9])?'. This probably means the joblabel is 
> invalid.
> To solve this issue, sanitise the username to only container alphanumeric 
> characters and dashes.
> Also there seems to be no length restriction and dataflow imposes a 63 
> character length limit in the above case. Limiting on length substantially 
> shorter than that to allow for postfixes (like -harness in this case) may be 
> wise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1069) Add CountingInput Transform to python sdk

2017-03-24 Thread Eugene Kirpichov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940535#comment-15940535
 ] 

Eugene Kirpichov commented on BEAM-1069:


Please take note of https://issues.apache.org/jira/browse/BEAM-1414 - would be 
nice if the Python version was style-compliant from the start.

> Add CountingInput Transform to python sdk
> -
>
> Key: BEAM-1069
> URL: https://issues.apache.org/jira/browse/BEAM-1069
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Frances Perry
>Priority: Minor
>  Labels: starter
>
> Similar to java sdk,  
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CountingInput.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1357

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Comment Edited] (BEAM-1800) Can't save datastore objects

2017-03-24 Thread Mike Lambert (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940462#comment-15940462
 ] 

Mike Lambert edited comment on BEAM-1800 at 3/24/17 2:37 PM:
-

Okay, got server-side errors (instead of client-side errors) when attempting to 
run client.put(obj) directly...so it actually was getting further along. The 
server-side errors were as follows, and due to my own code:
{noformat}
BadRequest: 400 list_value cannot contain a Value containing another list_value.
# needed to serialize my json into a string
BadRequest: 400 The value of property "top_people_json" is longer than 1500 
bytes.
# needed to exclude the json-string field from indexing
{noformat}

Fixing them causes the client.put(obj) approach to run successfully. So there 
was something wrong with my WriteToDatastore batching (or my use of it).


was (Author: mlambert):
Okay, got server-side errors (instead of client-side errors) when attempting to 
run client.put(obj) directly...so it actually was getting further along. The 
server-side errors were as follows, and due to my own code:
{noinput}
BadRequest: 400 list_value cannot contain a Value containing another list_value.
# needed to serialize my json into a string
BadRequest: 400 The value of property "top_people_json" is longer than 1500 
bytes.
# needed to exclude the json-string field from indexing
{noinput}

Fixing them causes the client.put(obj) approach to run successfully. So there 
was something wrong with my WriteToDatastore batching (or my use of it).

> Can't save datastore objects
> 
>
> Key: BEAM-1800
> URL: https://issues.apache.org/jira/browse/BEAM-1800
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Mike Lambert
>Assignee: Ahmet Altay
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it 
> errors out on a strange unicode issue when trying to write a batch. 
> Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
>   self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
>   self.reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
>   raise type(exn), args, sys.exc_info()[2] 
> File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
>   self._dofn_simple_invoker(element) 
> File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5262)
>   self._process_outputs(element, self.dofn_process(element.value)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 354, in process
>   self._flush_batch() 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 363, in _flush_batch
>   helper.write_mutations(self._datastore, self._project, self._mutations) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 187, in write_mutations
>   commit(commit_request) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", 
> line 174, in wrapper
>   return fun(*args, **kwargs) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 185, in commit
>   datastore.commit(req) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 140, in commit
>   datastore_pb2.CommitResponse) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 199, in _call_method
>   method='POST', body=payload, headers=headers) 
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 
> 631, in new_request
>   redirections, connection_type) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1609, in request (response, content)
>   = self._request(conn, authority, uri, request_uri, method, body, headers, 
> redirections, cachekey) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1351, in _request (response, content)
>   = self._conn_request(conn, request_uri, method, body, headers) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1273, in _conn_request
>   conn.request(method, request_uri, body, headers) 
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
>   self._send_request(method, url, body, headers)
> File

[GitHub] beam pull request #2314: Fix caching in the Spark streaming, doing the cache...

2017-03-24 Thread jbonofre

GitHub user jbonofre opened a pull request:

https://github.com/apache/beam/pull/2314

Fix caching in the Spark streaming, doing the cache update in the strâ¦

â¦eaming context

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/beam FIX_CACHING_SPARK_STREAMING

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2314


commit d6411c9ca29aedee87dffa2f598c772e135c507c
Author: Jean-Baptiste OnofrÃ© 
Date:   2017-03-24T14:27:04Z

Fix caching in the Spark streaming, doing the cache update in the streaming 
context




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-24 Thread Amit Sela (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940422#comment-15940422
 ] 

Amit Sela commented on BEAM-1802:
-

{{PipelineResult#cancel()}} should do the trick for now

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-24 Thread Aviem Zur (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940346#comment-15940346
 ] 

Aviem Zur edited comment on BEAM-1802 at 3/24/17 1:37 PM:
--

[~iemejia] Please call {{stop()}} after calling {{waitUntilFinish()}} it should 
stop the first context so you can run a second one after it.


was (Author: aviemzur):
[~iemejia] Please call `stop()` after calling `waitUntilFinish()` it should 
stop the first context so you can run a second one after it.

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940344#comment-15940344
 ] 

ASF GitHub Bot commented on BEAM-1802:
--

Github user aviemzur closed the pull request at:

https://github.com/apache/beam/pull/2313


> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-24 Thread Aviem Zur (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940346#comment-15940346
 ] 

Aviem Zur commented on BEAM-1802:
-

[~iemejia] Please call `stop()` after calling `waitUntilFinish()` it should 
stop the first context so you can run a second one after it.

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2313: [BEAM-1802] Call stop in SparkPipelineResult#waitUn...

2017-03-24 Thread aviemzur

Github user aviemzur closed the pull request at:

https://github.com/apache/beam/pull/2313


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1356

2017-03-24 Thread Apache Jenkins Server

See

[jira] [Closed] (BEAM-385) Include Javadoc on Continous Integration

2017-03-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-385.
-

I am closing following the recent discussion.

> Include Javadoc on Continous Integration
> 
>
> Key: BEAM-385
> URL: https://issues.apache.org/jira/browse/BEAM-385
> Project: Beam
>  Issue Type: Wish
>  Components: project-management
>Reporter: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> I noticed that many apache projects publish also snapshots of their javadoc, 
> this is quite useful for programmers following master. We can add the Beam 
> docs as well at
> https://ci.apache.org/projects/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (BEAM-385) Include Javadoc on Continous Integration

2017-03-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-385:
--
Issue Type: Wish  (was: Improvement)

> Include Javadoc on Continous Integration
> 
>
> Key: BEAM-385
> URL: https://issues.apache.org/jira/browse/BEAM-385
> Project: Beam
>  Issue Type: Wish
>  Components: project-management
>Reporter: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> I noticed that many apache projects publish also snapshots of their javadoc, 
> this is quite useful for programmers following master. We can add the Beam 
> docs as well at
> https://ci.apache.org/projects/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 >

1 - 100 of 110 matches

Mail list logo