[GitHub] incubator-beam pull request #980: TextIO/CompressedSource: support splitting...

2016-09-19 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/980

TextIO/CompressedSource: support splitting AUTO mode files into bundles

R: @jkff please take a look. This code has been the cause of major bugs in 
the past.

I am still planning a pass tomorrow to clean up the tests -- so much 
redundant code, but it's so entangled it's a pain.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam 
filebasedsource-bundles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #980


commit 37e7e8d919daf52a75f26e705b2a98358d669012
Author: Dan Halperin 
Date:   2016-09-20T05:46:26Z

TextIO/CompressedSource: split AUTO mode files into bundles




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-643) Allow users to specify a custom service account

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505641#comment-15505641
 ] 

ASF GitHub Bot commented on BEAM-643:
-

GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/979

[BEAM-643] Updates lint configurations to ignore generated files.

Adds ability to ignore certain generated files when running pylint and pep8.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
dataflow_pylint_exclude_files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/979.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #979


commit c9a7434804cb84ced7de84794dc9070de4069db2
Author: Chamikara Jayalath 
Date:   2016-09-20T05:27:47Z

Updates lint configurations to ignore generated files.




> Allow users to specify a custom service account
> ---
>
> Key: BEAM-643
> URL: https://issues.apache.org/jira/browse/BEAM-643
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Users should be able to specify a custom service account which can be used 
> when creating VMs. This feature is specify to DataflowRunner and 
> corresponding user option will be added to GoogleCloudOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #979: [BEAM-643] Updates lint configurations to ...

2016-09-19 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/979

[BEAM-643] Updates lint configurations to ignore generated files.

Adds ability to ignore certain generated files when running pylint and pep8.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
dataflow_pylint_exclude_files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/979.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #979


commit c9a7434804cb84ced7de84794dc9070de4069db2
Author: Chamikara Jayalath 
Date:   2016-09-20T05:27:47Z

Updates lint configurations to ignore generated files.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #951: Check Dataflow Job Status Before Terminate

2016-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/951


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-643) Allow users to specify a custom service account

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505616#comment-15505616
 ] 

ASF GitHub Bot commented on BEAM-643:
-

GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/978

[BEAM-643] Updates Dataflow API client.

Updates Cloud Dataflow API client files to the latest version.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
dataflow_client_api_update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/978.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #978


commit a1cc0ba4e8eed75252587fc7cbd9aaec8c396f01
Author: Chamikara Jayalath 
Date:   2016-09-20T05:16:04Z

Updates Dataflow API client.




> Allow users to specify a custom service account
> ---
>
> Key: BEAM-643
> URL: https://issues.apache.org/jira/browse/BEAM-643
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Users should be able to specify a custom service account which can be used 
> when creating VMs. This feature is specify to DataflowRunner and 
> corresponding user option will be added to GoogleCloudOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: Check Dataflow Job Status Before Terminate

2016-09-19 Thread lcwik
Check Dataflow Job Status Before Terminate

This closes #951


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/9e7ed292
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/9e7ed292
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/9e7ed292

Branch: refs/heads/master
Commit: 9e7ed29290911bb341261240d8d799bd7f0a4e9b
Parents: 9c8e19c e776ae7
Author: Luke Cwik 
Authored: Mon Sep 19 21:30:04 2016 -0700
Committer: Luke Cwik 
Committed: Mon Sep 19 21:30:04 2016 -0700

--
 .../runners/dataflow/DataflowPipelineJob.java   | 19 +++--
 .../dataflow/DataflowPipelineJobTest.java   | 74 
 2 files changed, 87 insertions(+), 6 deletions(-)
--




[1/2] incubator-beam git commit: Check Dataflow Job Status Before Terminate

2016-09-19 Thread lcwik
Repository: incubator-beam
Updated Branches:
  refs/heads/master 9c8e19c1c -> 9e7ed2929


Check Dataflow Job Status Before Terminate


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/e776ae73
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/e776ae73
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/e776ae73

Branch: refs/heads/master
Commit: e776ae7334aed7e42d152dbbcfa7b5a1fb998e27
Parents: 9c8e19c
Author: Mark Liu 
Authored: Mon Sep 19 15:24:17 2016 -0700
Committer: Luke Cwik 
Committed: Mon Sep 19 21:29:21 2016 -0700

--
 .../runners/dataflow/DataflowPipelineJob.java   | 19 +++--
 .../dataflow/DataflowPipelineJobTest.java   | 74 
 2 files changed, 87 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/e776ae73/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
index 1af8c98..269b824 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
@@ -301,14 +301,21 @@ public class DataflowPipelineJob implements 
PipelineResult {
   dataflowOptions.getDataflowClient().projects().jobs()
   .update(projectId, jobId, content)
   .execute();
+  return State.CANCELLED;
 } catch (IOException e) {
-  String errorMsg = String.format(
-  "Failed to cancel the job, please go to the Developers Console to 
cancel it manually: %s",
-  MonitoringUtil.getJobMonitoringPageURL(getProjectId(), getJobId()));
-  LOG.warn(errorMsg);
-  throw new IOException(errorMsg, e);
+  State state = getState();
+  if (state.isTerminal()) {
+LOG.warn("Job is already terminated. State is {}", state);
+return state;
+  } else {
+String errorMsg = String.format(
+"Failed to cancel the job, "
++ "please go to the Developers Console to cancel it manually: 
%s",
+MonitoringUtil.getJobMonitoringPageURL(getProjectId(), 
getJobId()));
+LOG.warn(errorMsg);
+throw new IOException(errorMsg, e);
+  }
 }
-return State.CANCELLED;
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/e776ae73/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
index 4c70d12..2af95e2 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineJobTest.java
@@ -29,7 +29,11 @@ import static org.hamcrest.Matchers.is;
 import static org.hamcrest.Matchers.lessThanOrEqualTo;
 import static org.junit.Assert.assertEquals;
 import static org.mockito.Matchers.eq;
+import static org.mockito.Mockito.any;
+import static org.mockito.Mockito.anyString;
 import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.verifyNoMoreInteractions;
 import static org.mockito.Mockito.when;
 
 import com.google.api.client.util.NanoClock;
@@ -655,4 +659,74 @@ public class DataflowPipelineJobTest {
   fastNanoTime += millis * 100L + 
ThreadLocalRandom.current().nextInt(50);
 }
   }
+
+  @Test
+  public void testCancelUnterminatedJobThatSucceeds() throws IOException {
+Dataflow.Projects.Jobs.Update update = 
mock(Dataflow.Projects.Jobs.Update.class);
+when(mockJobs.update(anyString(), anyString(), 
any(Job.class))).thenReturn(update);
+when(update.execute()).thenReturn(new Job());
+
+DataflowPipelineJob job = new DataflowPipelineJob(PROJECT_ID, JOB_ID, 
options, null);
+
+assertEquals(State.CANCELLED, job.cancel());
+Job content = new Job();
+content.setProjectId(PROJECT_ID);
+content.setId(JOB_ID);
+content.setRequestedState("JOB_STATE_CANCELLED");
+

[GitHub] incubator-beam pull request #978: [BEAM-643] Updates Dataflow API client.

2016-09-19 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/978

[BEAM-643] Updates Dataflow API client.

Updates Cloud Dataflow API client files to the latest version.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
dataflow_client_api_update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/978.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #978


commit a1cc0ba4e8eed75252587fc7cbd9aaec8c396f01
Author: Chamikara Jayalath 
Date:   2016-09-20T05:16:04Z

Updates Dataflow API client.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1173

2016-09-19 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request #977: Support BigQuery BYTES type

2016-09-19 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/977

Support BigQuery BYTES type



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam bq-new-types

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #977


commit 274d228488a51ccf698aa7a1cb587f1fba43a8b6
Author: Pei He 
Date:   2016-09-13T18:08:34Z

Support BigQuery BYTES type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1172

2016-09-19 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request #976: [BEAM-625] Making Dataflow Python Material...

2016-09-19 Thread katsiapis
GitHub user katsiapis opened a pull request:

https://github.com/apache/incubator-beam/pull/976

[BEAM-625] Making Dataflow Python Materialized PCollection representation 
more efficient (5 of several).

Changing ToStringCoder to BytesCoder since the former can't be used with 
sources (it can only be used with Sinks). Since we are opening files as 'rb', 
using BytesCoder makes sense.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/katsiapis/incubator-beam python-sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/976.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #976


commit 1b97de04987231227f6612d3f8c57b1eaf76408b
Author: Gus Katsiapis 
Date:   2016-09-20T00:21:45Z

Changint ToStringCoder to BytesCoder since the former can't be used with
sources.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Changed ToStringCoder to BytesCoder in test

2016-09-19 Thread robertwb
Repository: incubator-beam
Updated Branches:
  refs/heads/python-sdk 2f09003e3 -> adda16320


Changed ToStringCoder to BytesCoder in test

The former can't be used with sources, as it can only encode.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/19a8407f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/19a8407f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/19a8407f

Branch: refs/heads/python-sdk
Commit: 19a8407f741848db0dc86e587ea9739b17888768
Parents: 2f09003
Author: Gus Katsiapis 
Authored: Mon Sep 19 17:21:45 2016 -0700
Committer: Robert Bradshaw 
Committed: Mon Sep 19 17:53:44 2016 -0700

--
 sdks/python/apache_beam/io/sources_test.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/19a8407f/sdks/python/apache_beam/io/sources_test.py
--
diff --git a/sdks/python/apache_beam/io/sources_test.py 
b/sdks/python/apache_beam/io/sources_test.py
index 45c5ab8..293f437 100644
--- a/sdks/python/apache_beam/io/sources_test.py
+++ b/sdks/python/apache_beam/io/sources_test.py
@@ -24,6 +24,7 @@ import unittest
 
 import apache_beam as beam
 
+from apache_beam import coders
 from apache_beam.io import iobase
 from apache_beam.io import range_trackers
 from apache_beam.transforms.util import assert_that
@@ -76,7 +77,7 @@ class LineSource(iobase.BoundedSource):
 return range_trackers.OffsetRangeTracker(start_position, stop_position)
 
   def default_output_coder(self):
-return beam.coders.ToStringCoder()
+return coders.BytesCoder()
 
 
 class SourcesTest(unittest.TestCase):



[2/2] incubator-beam git commit: Closes #976

2016-09-19 Thread robertwb
Closes #976


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/adda1632
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/adda1632
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/adda1632

Branch: refs/heads/python-sdk
Commit: adda1632015043e352c75e2f9966cc04a458b30c
Parents: 2f09003 19a8407
Author: Robert Bradshaw 
Authored: Mon Sep 19 17:54:27 2016 -0700
Committer: Robert Bradshaw 
Committed: Mon Sep 19 17:54:27 2016 -0700

--
 sdks/python/apache_beam/io/sources_test.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--




[1/2] incubator-beam git commit: Use sys.executable and "-m pip" to ensure we use the same Python and pip as the currently running one.

2016-09-19 Thread robertwb
Repository: incubator-beam
Updated Branches:
  refs/heads/python-sdk 29b55e956 -> 2f09003e3


Use sys.executable and "-m pip" to ensure we use the same Python and pip as the 
currently running one.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/7de9830d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/7de9830d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/7de9830d

Branch: refs/heads/python-sdk
Commit: 7de9830d96c7928444a5a4849698e70ec423ef62
Parents: 29b55e9
Author: Christian Hudon 
Authored: Thu Sep 15 15:10:57 2016 -0400
Committer: Robert Bradshaw 
Committed: Mon Sep 19 17:50:57 2016 -0700

--
 sdks/python/apache_beam/utils/dependency.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/7de9830d/sdks/python/apache_beam/utils/dependency.py
--
diff --git a/sdks/python/apache_beam/utils/dependency.py 
b/sdks/python/apache_beam/utils/dependency.py
index 56700f5..e9d73ae 100644
--- a/sdks/python/apache_beam/utils/dependency.py
+++ b/sdks/python/apache_beam/utils/dependency.py
@@ -57,6 +57,7 @@ import logging
 import os
 import re
 import shutil
+import sys
 import tempfile
 
 
@@ -194,7 +195,7 @@ def _populate_requirements_cache(requirements_file, 
cache_dir):
   # It will get the packages downloaded in the order they are presented in
   # the requirements file and will not download package dependencies.
   cmd_args = [
-  'pip', 'install', '--download', cache_dir,
+  sys.executable, '-m', 'pip', 'install', '--download', cache_dir,
   '-r', requirements_file,
   # Download from PyPI source distributions.
   '--no-binary', ':all:']
@@ -374,7 +375,7 @@ def _build_setup_package(setup_file, temp_dir, 
build_setup_args=None):
 os.chdir(os.path.dirname(setup_file))
 if build_setup_args is None:
   build_setup_args = [
-  'python', os.path.basename(setup_file),
+  sys.executable, os.path.basename(setup_file),
   'sdist', '--dist-dir', temp_dir]
 logging.info('Executing command: %s', build_setup_args)
 processes.check_call(build_setup_args)
@@ -460,7 +461,7 @@ def _download_pypi_sdk_package(temp_dir):
   version = pkg.get_distribution(GOOGLE_PACKAGE_NAME).version
   # Get a source distribution for the SDK package from PyPI.
   cmd_args = [
-  'pip', 'install', '--download', temp_dir,
+  sys.executable, '-m', 'pip', 'install', '--download', temp_dir,
   '%s==%s' % (GOOGLE_PACKAGE_NAME, version),
   '--no-binary', ':all:', '--no-deps']
   logging.info('Executing command: %s', cmd_args)



[2/2] incubator-beam git commit: Closes #962

2016-09-19 Thread robertwb
Closes #962


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/2f09003e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/2f09003e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/2f09003e

Branch: refs/heads/python-sdk
Commit: 2f09003e33b63594fd87a6cd8bf2803005174fd1
Parents: 29b55e9 7de9830
Author: Robert Bradshaw 
Authored: Mon Sep 19 17:50:58 2016 -0700
Committer: Robert Bradshaw 
Committed: Mon Sep 19 17:50:58 2016 -0700

--
 sdks/python/apache_beam/utils/dependency.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--




[2/2] incubator-beam git commit: Removed unnecessary throttling of rename parallelism.

2016-09-19 Thread robertwb
Removed unnecessary throttling of rename parallelism.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/24bb8f19
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/24bb8f19
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/24bb8f19

Branch: refs/heads/python-sdk
Commit: 24bb8f19329b3d0c1d0330e0c16c41ab1554684d
Parents: 4b7fe2d
Author: Marian Dvorsky 
Authored: Fri Sep 16 10:46:32 2016 -0700
Committer: Robert Bradshaw 
Committed: Mon Sep 19 17:39:47 2016 -0700

--
 sdks/python/apache_beam/io/fileio.py | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/24bb8f19/sdks/python/apache_beam/io/fileio.py
--
diff --git a/sdks/python/apache_beam/io/fileio.py 
b/sdks/python/apache_beam/io/fileio.py
index e3d4dae..d640d50 100644
--- a/sdks/python/apache_beam/io/fileio.py
+++ b/sdks/python/apache_beam/io/fileio.py
@@ -693,11 +693,7 @@ class FileSink(iobase.Sink):
   The output of this write is a PCollection of all written shards.
   """
 
-  # Approximate number of write results be assigned for each rename thread.
-  _WRITE_RESULTS_PER_RENAME_THREAD = 100
-
-  # Max number of threads to be used for renaming even if it means each thread
-  # will process more write results.
+  # Max number of threads to be used for renaming.
   _MAX_RENAME_THREADS = 64
 
   def __init__(self,
@@ -785,8 +781,7 @@ class FileSink(iobase.Sink):
 writer_results = sorted(writer_results)
 num_shards = len(writer_results)
 channel_factory = ChannelFactory()
-min_threads = min(num_shards / FileSink._WRITE_RESULTS_PER_RENAME_THREAD,
-  FileSink._MAX_RENAME_THREADS)
+min_threads = min(num_shards, FileSink._MAX_RENAME_THREADS)
 num_threads = max(1, min_threads)
 
 rename_ops = []



[1/2] incubator-beam git commit: Closes #965

2016-09-19 Thread robertwb
Repository: incubator-beam
Updated Branches:
  refs/heads/python-sdk 4b7fe2dc5 -> 29b55e956


Closes #965


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/29b55e95
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/29b55e95
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/29b55e95

Branch: refs/heads/python-sdk
Commit: 29b55e95600fbf299ac7eb9527ff33bd8030275e
Parents: 4b7fe2d 24bb8f1
Author: Robert Bradshaw 
Authored: Mon Sep 19 17:39:47 2016 -0700
Committer: Robert Bradshaw 
Committed: Mon Sep 19 17:39:47 2016 -0700

--
 sdks/python/apache_beam/io/fileio.py | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)
--




[jira] [Commented] (BEAM-625) Make Dataflow Python Materialized PCollection representation more efficient

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505137#comment-15505137
 ] 

ASF GitHub Bot commented on BEAM-625:
-

GitHub user katsiapis opened a pull request:

https://github.com/apache/incubator-beam/pull/976

[BEAM-625] Making Dataflow Python Materialized PCollection representation 
more efficient (5 of several).

Changing ToStringCoder to BytesCoder since the former can't be used with 
sources (it can only be used with Sinks). Since we are opening files as 'rb', 
using BytesCoder makes sense.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/katsiapis/incubator-beam python-sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/976.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #976


commit 1b97de04987231227f6612d3f8c57b1eaf76408b
Author: Gus Katsiapis 
Date:   2016-09-20T00:21:45Z

Changint ToStringCoder to BytesCoder since the former can't be used with
sources.




> Make Dataflow Python Materialized PCollection representation more efficient
> ---
>
> Key: BEAM-625
> URL: https://issues.apache.org/jira/browse/BEAM-625
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Konstantinos Katsiapis
>Assignee: Frances Perry
>
> This will be a several step process which will involve adding better support 
> for compression as well as Avro.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #973: Deprecate TeardownPolicy for Dataflow serv...

2016-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/973


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: Closes #973

2016-09-19 Thread dhalperi
Closes #973


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/9c8e19c1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/9c8e19c1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/9c8e19c1

Branch: refs/heads/master
Commit: 9c8e19c1c69084fdfa3e24886185279b42f258d3
Parents: 62c56c9 7509308
Author: Dan Halperin 
Authored: Mon Sep 19 17:22:31 2016 -0700
Committer: Dan Halperin 
Committed: Mon Sep 19 17:22:31 2016 -0700

--
 .../dataflow/options/DataflowPipelineWorkerPoolOptions.java   | 3 +++
 1 file changed, 3 insertions(+)
--




[1/2] incubator-beam git commit: Deprecate TeardownPolicy for Dataflow service

2016-09-19 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master 62c56c99b -> 9c8e19c1c


Deprecate TeardownPolicy for Dataflow service

We are moving towards supporting only TEARDOWN_ALWAYS.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/7509308c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/7509308c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/7509308c

Branch: refs/heads/master
Commit: 7509308c7eca146f94463fd3c6adc9e9b84c4ac2
Parents: 62c56c9
Author: Eugene Kirpichov 
Authored: Mon Sep 19 15:12:51 2016 -0700
Committer: Eugene Kirpichov 
Committed: Mon Sep 19 15:12:51 2016 -0700

--
 .../dataflow/options/DataflowPipelineWorkerPoolOptions.java   | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/7509308c/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java
index 6c59f38..7d043c3 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java
@@ -192,7 +192,10 @@ public interface DataflowPipelineWorkerPoolOptions extends 
PipelineOptions {
 
   /**
* The policy for tearing down the workers spun up by the service.
+   *
+   * @deprecated Dataflow Service will only support TEARDOWN_ALWAYS policy in 
the future.
*/
+  @Deprecated
   public enum TeardownPolicy {
 /**
  * All VMs created for a Dataflow job are deleted when the job finishes, 
regardless of whether



Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1171

2016-09-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-643) Allow users to specify a custom service account

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504940#comment-15504940
 ] 

ASF GitHub Bot commented on BEAM-643:
-

GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/975

[BEAM-643] Adds support for specifying a custom service account.

Adds support for specifying a custom service account when using 
DataflowPipelineRunner.

Updates Dataflow API client to latest version.

Adds ability to skip generated files during lint checks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
custom_service_account

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #975


commit 11ace177a238cbd8439b1ee8d13e83cf6285c304
Author: Chamikara Jayalath 
Date:   2016-09-19T22:52:53Z

Adds support for specifying a custom service account.

Updates Dataflow API client to latest version.

Adds ability to skip generated files during lint checks.




> Allow users to specify a custom service account
> ---
>
> Key: BEAM-643
> URL: https://issues.apache.org/jira/browse/BEAM-643
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Users should be able to specify a custom service account which can be used 
> when creating VMs. This feature is specify to DataflowRunner and 
> corresponding user option will be added to GoogleCloudOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #975: [BEAM-643] Adds support for specifying a c...

2016-09-19 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/975

[BEAM-643] Adds support for specifying a custom service account.

Adds support for specifying a custom service account when using 
DataflowPipelineRunner.

Updates Dataflow API client to latest version.

Adds ability to skip generated files during lint checks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
custom_service_account

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #975


commit 11ace177a238cbd8439b1ee8d13e83cf6285c304
Author: Chamikara Jayalath 
Date:   2016-09-19T22:52:53Z

Adds support for specifying a custom service account.

Updates Dataflow API client to latest version.

Adds ability to skip generated files during lint checks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-643) Allow users to specify a custom service account

2016-09-19 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-643:
---

 Summary: Allow users to specify a custom service account
 Key: BEAM-643
 URL: https://issues.apache.org/jira/browse/BEAM-643
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath


Users should be able to specify a custom service account which can be used when 
creating VMs. This feature is specify to DataflowRunner and corresponding user 
option will be added to GoogleCloudOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-642) FlinkRunner does not support Detached Mode

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504855#comment-15504855
 ] 

ASF GitHub Bot commented on BEAM-642:
-

GitHub user sumitchawla opened a pull request:

https://github.com/apache/incubator-beam/pull/974

BEAM-642 - Support Flink Detached Mode for JOB execution

FlinkRunner gives exception when job is submitted in detached mode.
Following code will throw exception:
{code}
 LOG.info("Execution finished in {} msecs", result.getNetRuntime()); 
{code}

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sumitchawla/incubator-beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/974.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #974


commit 25482cc32fb04965b055d9ab376095d75e10080b
Author: Sumit Chawla 
Date:   2016-09-19T22:10:53Z

BEAM-642 - Support Flink Detached Mode for JOB execution




> FlinkRunner does not support Detached Mode
> --
>
> Key: BEAM-642
> URL: https://issues.apache.org/jira/browse/BEAM-642
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Sumit Chawla
>Assignee: Amit Sela
>
> FlinkRunner gives exception when job is submitted in detached mode.
> Following code will throw exception: 
> {code}
>  LOG.info("Execution finished in {} msecs", result.getNetRuntime()); 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #974: BEAM-642 - Support Flink Detached Mode for...

2016-09-19 Thread sumitchawla
GitHub user sumitchawla opened a pull request:

https://github.com/apache/incubator-beam/pull/974

BEAM-642 - Support Flink Detached Mode for JOB execution

FlinkRunner gives exception when job is submitted in detached mode.
Following code will throw exception:
{code}
 LOG.info("Execution finished in {} msecs", result.getNetRuntime()); 
{code}

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sumitchawla/incubator-beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/974.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #974


commit 25482cc32fb04965b055d9ab376095d75e10080b
Author: Sumit Chawla 
Date:   2016-09-19T22:10:53Z

BEAM-642 - Support Flink Detached Mode for JOB execution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-639) BigtableIO.Read: support for user specified row range

2016-09-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-639:
-
Summary: BigtableIO.Read: support for user specified row range  (was: 
BigtableIO.Read: expose withStartKey/withEndKey)

> BigtableIO.Read: support for user specified row range
> -
>
> Key: BEAM-639
> URL: https://issues.apache.org/jira/browse/BEAM-639
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: 0.3.0-incubating
>
>
> BigtableIO.Read takes a table and a filter, but does not let the user 
> customize the row range. If a user wants to scan a relatively narrow range, 
> they can do it by specifying a filter ("starts with abcd") or ("less than 
> abcd"), but under the hood Cloud Bigtable will implement this as a full table 
> scan -> filter instead of a narrower scan.
> We should expose these setters publicly so that users can implement row scans 
> directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-642) FlinkRunner does not support Detached Mode

2016-09-19 Thread Sumit Chawla (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Chawla updated BEAM-642:
--
Description: 
FlinkRunner gives exception when job is submitted in detached mode.

Following code will throw exception: 
{code}
 LOG.info("Execution finished in {} msecs", result.getNetRuntime()); 

{code}

  was:
FlinkRunner gives exception when job is submitted in detached mode.

Following code will throw exception: 
{{   LOG.info("Execution finished in {} msecs", result.getNetRuntime()); }}


> FlinkRunner does not support Detached Mode
> --
>
> Key: BEAM-642
> URL: https://issues.apache.org/jira/browse/BEAM-642
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Sumit Chawla
>Assignee: Amit Sela
>
> FlinkRunner gives exception when job is submitted in detached mode.
> Following code will throw exception: 
> {code}
>  LOG.info("Execution finished in {} msecs", result.getNetRuntime()); 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-639) BigtableIO.Read: support for user specified row range

2016-09-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-639.

   Resolution: Fixed
Fix Version/s: 0.3.0-incubating

> BigtableIO.Read: support for user specified row range
> -
>
> Key: BEAM-639
> URL: https://issues.apache.org/jira/browse/BEAM-639
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: 0.3.0-incubating
>
>
> BigtableIO.Read takes a table and a filter, but does not let the user 
> customize the row range. If a user wants to scan a relatively narrow range, 
> they can do it by specifying a filter ("starts with abcd") or ("less than 
> abcd"), but under the hood Cloud Bigtable will implement this as a full table 
> scan -> filter instead of a narrower scan.
> We should expose these setters publicly so that users can implement row scans 
> directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #973: Deprecate TeardownPolicy for Dataflow serv...

2016-09-19 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/incubator-beam/pull/973

Deprecate TeardownPolicy for Dataflow service

We are moving towards supporting only TEARDOWN_ALWAYS.

(This is a clone of 
https://github.com/pjesa/DataflowJavaSDK/commit/657ce1ff85caf44f7cf16d67b973055657980c2e
 on behalf of @pjesa)

R: @dhalperi 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam deprecate-teardown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/973.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #973


commit 7509308c7eca146f94463fd3c6adc9e9b84c4ac2
Author: Eugene Kirpichov 
Date:   2016-09-19T22:12:51Z

Deprecate TeardownPolicy for Dataflow service

We are moving towards supporting only TEARDOWN_ALWAYS.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-642) FlinkRunner does not support Detached Mode

2016-09-19 Thread Sumit Chawla (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504842#comment-15504842
 ] 

Sumit Chawla commented on BEAM-642:
---

Can somebody please assign this issue to me?

> FlinkRunner does not support Detached Mode
> --
>
> Key: BEAM-642
> URL: https://issues.apache.org/jira/browse/BEAM-642
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Sumit Chawla
>Assignee: Amit Sela
>
> FlinkRunner gives exception when job is submitted in detached mode.
> Following code will throw exception: 
> {{   LOG.info("Execution finished in {} msecs", result.getNetRuntime()); 
> }}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-642) FlinkRunner does not support Detached Mode

2016-09-19 Thread Sumit Chawla (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Chawla updated BEAM-642:
--
Description: 
FlinkRunner gives exception when job is submitted in detached mode.

Following code will throw exception: 
{{   LOG.info("Execution finished in {} msecs", result.getNetRuntime()); }}

  was:
When using {{JavaSerializer}} instead of {{KryoSerializer}} by configuring 
{{JavaSparkContext}} with  {{"spark.serializer"}} set to 
{{JavaSerializer.class.getCanonicalName()}}, an exception is thrown:

{noformat}
object not serializable (class: 
org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
{noformat}

{noformat}
Serialization stack:
- object not serializable (class: 
org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at 
org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:440)
at 
org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:46)
at 
org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:175)
at 
org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:172)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at 

[jira] [Created] (BEAM-642) FlinkRunner does not support Detached Mode

2016-09-19 Thread Sumit Chawla (JIRA)
Sumit Chawla created BEAM-642:
-

 Summary: FlinkRunner does not support Detached Mode
 Key: BEAM-642
 URL: https://issues.apache.org/jira/browse/BEAM-642
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Sumit Chawla
Assignee: Amit Sela


When using {{JavaSerializer}} instead of {{KryoSerializer}} by configuring 
{{JavaSparkContext}} with  {{"spark.serializer"}} set to 
{{JavaSerializer.class.getCanonicalName()}}, an exception is thrown:

{noformat}
object not serializable (class: 
org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
{noformat}

{noformat}
Serialization stack:
- object not serializable (class: 
org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at 
org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:440)
at 
org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:46)
at 
org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:175)
at 
org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:172)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at 

[jira] [Updated] (BEAM-642) FlinkRunner does not support Detached Mode

2016-09-19 Thread Sumit Chawla (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Chawla updated BEAM-642:
--
Component/s: (was: runner-spark)
 runner-flink

> FlinkRunner does not support Detached Mode
> --
>
> Key: BEAM-642
> URL: https://issues.apache.org/jira/browse/BEAM-642
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Sumit Chawla
>Assignee: Amit Sela
>
> When using {{JavaSerializer}} instead of {{KryoSerializer}} by configuring 
> {{JavaSparkContext}} with  {{"spark.serializer"}} set to 
> {{JavaSerializer.class.getCanonicalName()}}, an exception is thrown:
> {noformat}
> object not serializable (class: 
> org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
> ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
> {noformat}
> {noformat}
> Serialization stack:
>   - object not serializable (class: 
> org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
> ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>   at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
>   at 
> org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:440)
>   at 
> org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:46)
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:175)
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:172)
>   at 
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
>   at 
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
>   at scala.util.Try$.apply(Try.scala:161)
>   at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
>   at 
> 

[jira] [Commented] (BEAM-639) BigtableIO.Read: expose withStartKey/withEndKey

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504834#comment-15504834
 ] 

ASF GitHub Bot commented on BEAM-639:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/969


> BigtableIO.Read: expose withStartKey/withEndKey
> ---
>
> Key: BEAM-639
> URL: https://issues.apache.org/jira/browse/BEAM-639
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> BigtableIO.Read takes a table and a filter, but does not let the user 
> customize the row range. If a user wants to scan a relatively narrow range, 
> they can do it by specifying a filter ("starts with abcd") or ("less than 
> abcd"), but under the hood Cloud Bigtable will implement this as a full table 
> scan -> filter instead of a narrower scan.
> We should expose these setters publicly so that users can implement row scans 
> directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #972: checkstyle.xml: blacklist imports of two s...

2016-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/972


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #969: [BEAM-639] BigtableIO: add support for use...

2016-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/969


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: Closes #969

2016-09-19 Thread dhalperi
Closes #969


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/62c56c99
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/62c56c99
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/62c56c99

Branch: refs/heads/master
Commit: 62c56c99b109e060aad6d210d81163a1fda36825
Parents: 984b32f dace48c
Author: Dan Halperin 
Authored: Mon Sep 19 15:05:18 2016 -0700
Committer: Dan Halperin 
Committed: Mon Sep 19 15:05:18 2016 -0700

--
 .../beam/sdk/io/gcp/bigtable/BigtableIO.java| 54 ---
 .../sdk/io/gcp/bigtable/BigtableIOTest.java | 95 
 2 files changed, 122 insertions(+), 27 deletions(-)
--




[1/2] incubator-beam git commit: [BEAM-639] BigtableIO: add support for users to scan table subranges

2016-09-19 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master 984b32ff2 -> 62c56c99b


[BEAM-639] BigtableIO: add support for users to scan table subranges


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/dace48c7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/dace48c7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/dace48c7

Branch: refs/heads/master
Commit: dace48c70d2d00500514c53734e9dd45dcb1465f
Parents: 984b32f
Author: Dan Halperin 
Authored: Mon Sep 19 11:54:07 2016 -0700
Committer: Dan Halperin 
Committed: Mon Sep 19 15:05:17 2016 -0700

--
 .../beam/sdk/io/gcp/bigtable/BigtableIO.java| 54 ---
 .../sdk/io/gcp/bigtable/BigtableIOTest.java | 95 
 2 files changed, 122 insertions(+), 27 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/dace48c7/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
index 67dde50..c1b882a 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
@@ -76,8 +76,9 @@ import org.slf4j.LoggerFactory;
  *
  * To configure a Cloud Bigtable source, you must supply a table id and a 
{@link BigtableOptions}
  * or builder configured with the project and other information necessary to 
identify the
- * Bigtable instance. A {@link RowFilter} may also optionally be specified 
using
- * {@link BigtableIO.Read#withRowFilter}. For example:
+ * Bigtable instance. By default, {@link BigtableIO.Read} will read all rows 
in the table. The row
+ * range to be read can optionally be restricted using {@link 
BigtableIO.Read#withKeyRange}, and
+ * a {@link RowFilter} can be specified using {@link 
BigtableIO.Read#withRowFilter}. For example:
  *
  * {@code
  * BigtableOptions.Builder optionsBuilder =
@@ -93,6 +94,14 @@ import org.slf4j.LoggerFactory;
  * .withBigtableOptions(optionsBuilder)
  * .withTableId("table"));
  *
+ * // Scan a prefix of the table.
+ * ByteKeyRange keyRange = ...;
+ * p.apply("read",
+ * BigtableIO.read()
+ * .withBigtableOptions(optionsBuilder)
+ * .withTableId("table")
+ * .withKeyRange(keyRange));
+ *
  * // Scan a subset of rows that match the specified row filter.
  * p.apply("filtered read",
  * BigtableIO.read()
@@ -152,7 +161,7 @@ public class BigtableIO {
*/
   @Experimental
   public static Read read() {
-return new Read(null, "", null, null);
+return new Read(null, "", ByteKeyRange.ALL_KEYS, null, null);
   }
 
   /**
@@ -215,7 +224,7 @@ public class BigtableIO {
   .build());
   BigtableOptions optionsWithAgent = 
clonedBuilder.setUserAgent(getUserAgent()).build();
 
-  return new Read(optionsWithAgent, tableId, filter, bigtableService);
+  return new Read(optionsWithAgent, tableId, keyRange, filter, 
bigtableService);
 }
 
 /**
@@ -226,7 +235,17 @@ public class BigtableIO {
  */
 public Read withRowFilter(RowFilter filter) {
   checkNotNull(filter, "filter");
-  return new Read(options, tableId, filter, bigtableService);
+  return new Read(options, tableId, keyRange, filter, bigtableService);
+}
+
+/**
+ * Returns a new {@link BigtableIO.Read} that will read only rows in the 
specified range.
+ *
+ * Does not modify this object.
+ */
+public Read withKeyRange(ByteKeyRange keyRange) {
+  checkNotNull(keyRange, "keyRange");
+  return new Read(options, tableId, keyRange, filter, bigtableService);
 }
 
 /**
@@ -236,7 +255,7 @@ public class BigtableIO {
  */
 public Read withTableId(String tableId) {
   checkNotNull(tableId, "tableId");
-  return new Read(options, tableId, filter, bigtableService);
+  return new Read(options, tableId, keyRange, filter, bigtableService);
 }
 
 /**
@@ -247,6 +266,14 @@ public class BigtableIO {
 }
 
 /**
+ * Returns the range of keys that will be read from the table. By default, 
returns
+ * {@link ByteKeyRange#ALL_KEYS} to scan the entire table.
+ */
+public ByteKeyRange getKeyRange() {
+  return keyRange;
+}
+
+/**
  * Returns the table being read from.
  */
 public String getTableId() {
@@ -256,7 +283,7 @@ public class BigtableIO {
 

[2/2] incubator-beam git commit: Closes #972

2016-09-19 Thread dhalperi
Closes #972


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/984b32ff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/984b32ff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/984b32ff

Branch: refs/heads/master
Commit: 984b32ff23232d2c42a668d408bd19fe07f46040
Parents: c403675 a9b68aa
Author: Dan Halperin 
Authored: Mon Sep 19 15:04:44 2016 -0700
Committer: Dan Halperin 
Committed: Mon Sep 19 15:04:44 2016 -0700

--
 sdks/java/build-tools/src/main/resources/beam/checkstyle.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[1/2] incubator-beam git commit: checkstyle.xml: blacklist imports of two shaded dependencies

2016-09-19 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master c4036753f -> 984b32ff2


checkstyle.xml: blacklist imports of two shaded dependencies


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/a9b68aa2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/a9b68aa2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/a9b68aa2

Branch: refs/heads/master
Commit: a9b68aa210519828e34545905cee256aa8a783c4
Parents: c403675
Author: Dan Halperin 
Authored: Mon Sep 19 14:27:16 2016 -0700
Committer: Dan Halperin 
Committed: Mon Sep 19 14:27:34 2016 -0700

--
 sdks/java/build-tools/src/main/resources/beam/checkstyle.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a9b68aa2/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml
--
diff --git a/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml 
b/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml
index a3313ca..48418e9 100644
--- a/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml
+++ b/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml
@@ -103,7 +103,7 @@ page at http://checkstyle.sourceforge.net/config.html -->
 
 
 
-  
+  
 
 
 

[GitHub] incubator-beam pull request #972: checkstyle.xml: blacklist imports of two s...

2016-09-19 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/972

checkstyle.xml: blacklist imports of two shaded dependencies

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam checkstyle-blacklist

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/972.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #972


commit a9b68aa210519828e34545905cee256aa8a783c4
Author: Dan Halperin 
Date:   2016-09-19T21:27:16Z

checkstyle.xml: blacklist imports of two shaded dependencies




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-606) Create MqttIO

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504693#comment-15504693
 ] 

ASF GitHub Bot commented on BEAM-606:
-

GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/971

[BEAM-606] Create MqttIO

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam BEAM-606-MQTTIO

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/971.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #971


commit f826b4a8a671dc6a02067d9e5feda6a4c1ab3405
Author: Jean-Baptiste Onofré 
Date:   2016-09-12T16:49:36Z

[BEAM-606] Create MqttIO




> Create MqttIO
> -
>
> Key: BEAM-606
> URL: https://issues.apache.org/jira/browse/BEAM-606
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> As we now have JmsIO, it would make sense to provide a MQTT IO (unbounded).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #971: [BEAM-606] Create MqttIO

2016-09-19 Thread jbonofre
GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/971

[BEAM-606] Create MqttIO

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam BEAM-606-MQTTIO

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/971.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #971


commit f826b4a8a671dc6a02067d9e5feda6a4c1ab3405
Author: Jean-Baptiste Onofré 
Date:   2016-09-12T16:49:36Z

[BEAM-606] Create MqttIO




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-640) Update Python SDK README text

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504496#comment-15504496
 ] 

ASF GitHub Bot commented on BEAM-640:
-

GitHub user hadarhg opened a pull request:

https://github.com/apache/incubator-beam/pull/970

[BEAM-640] Update Python SDK README text

Update the README for Beam Python SDK:
- [x] Remove references to Alpha release
- [x] Change "Dataflow" references to "Beam"
- [x] Update Getting Started instructions 
- [x] Update the rest of the sections and the TOC

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hadarhg/incubator-beam python-sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/970.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #970


commit 2cf8e4f0910909df1698294b2f27abdbcdfd8cc1
Author: Hadar Hod 
Date:   2016-09-19T19:08:00Z

Update README to reflect Dataflow to Apache Beam migration. Remove 
references to Alpha for Python SDK.

commit 80de63dd045ec912de24054ea30448e7700f2745
Author: Hadar Hod 
Date:   2016-09-19T19:23:27Z

Merge branch 'python-sdk' of https://github.com/apache/incubator-beam into 
python-sdk




> Update Python SDK README text
> -
>
> Key: BEAM-640
> URL: https://issues.apache.org/jira/browse/BEAM-640
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Hadar Hod
>Assignee: Hadar Hod
>Priority: Minor
>
> Update text for README - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/README.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #970: [BEAM-640] Update Python SDK README text

2016-09-19 Thread hadarhg
GitHub user hadarhg opened a pull request:

https://github.com/apache/incubator-beam/pull/970

[BEAM-640] Update Python SDK README text

Update the README for Beam Python SDK:
- [x] Remove references to Alpha release
- [x] Change "Dataflow" references to "Beam"
- [x] Update Getting Started instructions 
- [x] Update the rest of the sections and the TOC

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hadarhg/incubator-beam python-sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/970.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #970


commit 2cf8e4f0910909df1698294b2f27abdbcdfd8cc1
Author: Hadar Hod 
Date:   2016-09-19T19:08:00Z

Update README to reflect Dataflow to Apache Beam migration. Remove 
references to Alpha for Python SDK.

commit 80de63dd045ec912de24054ea30448e7700f2745
Author: Hadar Hod 
Date:   2016-09-19T19:23:27Z

Merge branch 'python-sdk' of https://github.com/apache/incubator-beam into 
python-sdk




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-641) Need to test the generated archetypes projects

2016-09-19 Thread Pei He (JIRA)
Pei He created BEAM-641:
---

 Summary: Need to test the generated archetypes projects
 Key: BEAM-641
 URL: https://issues.apache.org/jira/browse/BEAM-641
 Project: Beam
  Issue Type: Test
Reporter: Pei He
Assignee: Pei He
Priority: Minor


Travis and Jenkins pre-submits don't test building the generated archetypes 
projects.
Currently, changes to archetypes have to be manually verified by:

mvn archetype:generate \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeVersion=0.3.0-incubating-SNAPSHOT \
-DgroupId=com.example \
-DartifactId=first-beam \
-Dversion="0.3.0-incubating-SNAPSHOT" \
-DinteractiveMode=false \
-Dpackage=org.apache.beam.examples

and did "mvn clean install" in first-beam project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-641) Need to test the generated archetypes projects

2016-09-19 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He updated BEAM-641:

Component/s: testing

> Need to test the generated archetypes projects
> --
>
> Key: BEAM-641
> URL: https://issues.apache.org/jira/browse/BEAM-641
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Pei He
>Assignee: Pei He
>Priority: Minor
>
> Travis and Jenkins pre-submits don't test building the generated archetypes 
> projects.
> Currently, changes to archetypes have to be manually verified by:
> mvn archetype:generate \
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
> -DarchetypeGroupId=org.apache.beam \
> -DarchetypeVersion=0.3.0-incubating-SNAPSHOT \
> -DgroupId=com.example \
> -DartifactId=first-beam \
> -Dversion="0.3.0-incubating-SNAPSHOT" \
> -DinteractiveMode=false \
> -Dpackage=org.apache.beam.examples
> and did "mvn clean install" in first-beam project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-640) Update Python SDK README text

2016-09-19 Thread Hadar Hod (JIRA)
Hadar Hod created BEAM-640:
--

 Summary: Update Python SDK README text
 Key: BEAM-640
 URL: https://issues.apache.org/jira/browse/BEAM-640
 Project: Beam
  Issue Type: Task
  Components: sdk-py
Reporter: Hadar Hod
Assignee: Hadar Hod
Priority: Minor


Update text for README - 
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/README.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-639) BigtableIO.Read: expose withStartKey/withEndKey

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504349#comment-15504349
 ] 

ASF GitHub Bot commented on BEAM-639:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/969

[BEAM-639] BigtableIO: add support for users to scan table subranges

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam bigtable-key-setters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #969


commit 9796ae1ec09cd299cacd71ff80232bcf58f45cbd
Author: Dan Halperin 
Date:   2016-09-19T18:54:07Z

[BEAM-639] BigtableIO: add support for users to scan table subranges




> BigtableIO.Read: expose withStartKey/withEndKey
> ---
>
> Key: BEAM-639
> URL: https://issues.apache.org/jira/browse/BEAM-639
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> BigtableIO.Read takes a table and a filter, but does not let the user 
> customize the row range. If a user wants to scan a relatively narrow range, 
> they can do it by specifying a filter ("starts with abcd") or ("less than 
> abcd"), but under the hood Cloud Bigtable will implement this as a full table 
> scan -> filter instead of a narrower scan.
> We should expose these setters publicly so that users can implement row scans 
> directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #969: [BEAM-639] BigtableIO: add support for use...

2016-09-19 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/969

[BEAM-639] BigtableIO: add support for users to scan table subranges

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam bigtable-key-setters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #969


commit 9796ae1ec09cd299cacd71ff80232bcf58f45cbd
Author: Dan Halperin 
Date:   2016-09-19T18:54:07Z

[BEAM-639] BigtableIO: add support for users to scan table subranges




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-639) BigtableIO.Read: expose withStartKey/withEndKey

2016-09-19 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-639:


 Summary: BigtableIO.Read: expose withStartKey/withEndKey
 Key: BEAM-639
 URL: https://issues.apache.org/jira/browse/BEAM-639
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-gcp
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor


BigtableIO.Read takes a table and a filter, but does not let the user customize 
the row range. If a user wants to scan a relatively narrow range, they can do 
it by specifying a filter ("starts with abcd") or ("less than abcd"), but under 
the hood Cloud Bigtable will implement this as a full table scan -> filter 
instead of a narrower scan.

We should expose these setters publicly so that users can implement row scans 
directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-633) Be able to import Beam codebase in Eclipse and support m2e

2016-09-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-633.
---
   Resolution: Duplicate
Fix Version/s: 0.3.0-incubating

> Be able to import Beam codebase in Eclipse and support m2e
> --
>
> Key: BEAM-633
> URL: https://issues.apache.org/jira/browse/BEAM-633
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.3.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-635) Release 0.2.0-incubating - Support Flink Release Version 1.1.2

2016-09-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503779#comment-15503779
 ] 

Jean-Baptiste Onofré commented on BEAM-635:
---

I would recommend to wait 0.3.0 release containing the Flink upgrade.

> Release 0.2.0-incubating - Support Flink Release Version 1.1.2
> --
>
> Key: BEAM-635
> URL: https://issues.apache.org/jira/browse/BEAM-635
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Sumit Chawla
>
> Support Latest Flink Release of version of 1.1.0 in BEAM 0.2.0-incubating 
> release
> 0.2.0-incubating has just been released.  It will be great if we can add 
> support for latest Flink 1.1.0 release.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-638) Add a Window function to create a bounded PCollection from an unbounded one

2016-09-19 Thread JIRA
Jean-Baptiste Onofré created BEAM-638:
-

 Summary: Add a Window function to create a bounded PCollection 
from an unbounded one
 Key: BEAM-638
 URL: https://issues.apache.org/jira/browse/BEAM-638
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Reporter: Jean-Baptiste Onofré
Assignee: Davor Bonaci


Today, if the pipeline source is unbounded, and the sink expects a bounded 
collection, there's no way to use a single pipeline. Even a window creates a 
chunk on the unbounded PCollection, but the "sub" PCollection is still 
unbounded.
It would be helpful for users to have a Window function that create a bounded 
PCollection (on the window) from an unbounded PCollection coming from the 
source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1169

2016-09-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-637) WindowedValue$ValueInGlobalWindow is not serializable when using JavaSerializer instead of Kryo

2016-09-19 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502734#comment-15502734
 ] 

Amit Sela commented on BEAM-637:


By default, the SparkRunner is hard-coded to use Kryo for Serialisation.
In many places though, especially when shuffling data is concerned, we use Beam 
coders to encode-shuffle-decode.
It looks like this happens in the "triggering-count" which is used to trigger 
an action of a branch in the DAG if non was applied (Spark is lazy-evaluated 
and requires so) - since this is a trigger-only op. you could try to 
"WindowingHelpers.unwindowValueFunction()" before the count, that'll just 
leave T (the elements) obligated to be Serializable. The most robust solution 
would be to "CoderHelpers.toByteArrays(windowedValues, windowCoder)" encode 
them, and so you count byte[] elements. 

> WindowedValue$ValueInGlobalWindow is not serializable when using 
> JavaSerializer instead of Kryo 
> 
>
> Key: BEAM-637
> URL: https://issues.apache.org/jira/browse/BEAM-637
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Stas Levin
>Assignee: Amit Sela
>
> When using {{JavaSerializer}} instead of {{KryoSerializer}} by configuring 
> {{JavaSparkContext}} with  {{"spark.serializer"}} set to 
> {{JavaSerializer.class.getCanonicalName()}}, an exception is thrown:
> {noformat}
> object not serializable (class: 
> org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
> ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
> {noformat}
> {noformat}
> Serialization stack:
>   - object not serializable (class: 
> org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
> ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>   at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
>   at 
> org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:440)
>   at 
> org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:46)
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:175)
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:172)
>   at 
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
>   at 
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>   at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
>   at 
> 

[jira] [Commented] (BEAM-621) Add MapValues and MapKeys functions

2016-09-19 Thread Marco Buccini (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502707#comment-15502707
 ] 

Marco Buccini commented on BEAM-621:


Another user opened a similar issue: 
https://issues.apache.org/jira/browse/BEAM-611 
Maybe these two need to be merged. 

> Add MapValues and MapKeys functions
> ---
>
> Key: BEAM-621
> URL: https://issues.apache.org/jira/browse/BEAM-621
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Currently, we have the {{MapElements}} {{PTransform}} that "convert" a 
> {{PCollection}} of {{KV}} to another {{PCollection}} (for instance 
> {{String}}).
> A very classic mapping function is to just have the keys or values of {{KV}}.
> To do it currently, we can use {{MapElements}} or a generic {{ParDo}} (with 
> {{DoFn}}).
> It would be helpful and reduce the user code to have {{MapValues}} and 
> {{MapKeys}}. It would take a {{PCollection}} of {{KV}}: {{MapKeys}} will map 
> the input {{PCollection}} to a {{PCollection}} of {{K}} and {{MapValues}} to 
> a {{PCollection}} of {{V}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-637) WindowedValue$ValueInGlobalWindow is not serializable when using JavaSerializer instead of Kryo

2016-09-19 Thread Stas Levin (JIRA)
Stas Levin created BEAM-637:
---

 Summary: WindowedValue$ValueInGlobalWindow is not serializable 
when using JavaSerializer instead of Kryo 
 Key: BEAM-637
 URL: https://issues.apache.org/jira/browse/BEAM-637
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Stas Levin
Assignee: Amit Sela


When using {{JavaSerializer}} instead of {{KryoSerializer}} by configuring 
{{JavaSparkContext}} with  {{"spark.serializer"}} set to 
{{JavaSerializer.class.getCanonicalName()}}, an exception is thrown:

{noformat}
object not serializable (class: 
org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
{noformat}

{noformat}
Serialization stack:
- object not serializable (class: 
org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow, value: 
ValueInGlobalWindow{value=hi there, pane=PaneInfo.NO_FIRING})
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at 
org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:440)
at 
org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:46)
at 
org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:175)
at 
org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext$1.call(StreamingEvaluationContext.java:172)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$3.apply(JavaDStreamLike.scala:335)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at 

Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1168

2016-09-19 Thread Apache Jenkins Server
See