[jira] [Commented] (BEAM-377) BigQueryIO should validate a table or query to read from

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349839#comment-15349839
 ] 

ASF GitHub Bot commented on BEAM-377:
-

GitHub user bjchambers opened a pull request:

https://github.com/apache/incubator-beam/pull/535

[BEAM-377] Validate BigQueryIO.Read is properly configured

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [*] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [*] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [*] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [*] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Previously, using withoutValidation would disable all validation,
leading to a NullPointerException if there wasn't a table or schema
provided.

The intention of the withoutValidation parameter is to bypass more
expensive (and possibly incorrect checks, such as the existence of
the table prior to pipeline execution in cases where earlier stages
create the table).

This moves the basic usage validation to always happen, while the
extended validation is still disabled by withoutValidation.

This closes BEAM-377.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/incubator-beam beam-377-bigquery-io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #535


commit bf4f8b693aab539282c72dbc77195cd07d3f24f8
Author: Ben Chambers 
Date:   2016-06-25T21:11:17Z

[BEAM-377] Validate BigQueryIO.Read is properly configured

Previously, using withoutValidation would disable all validation,
leading to a NullPointerException if there wasn't a table or schema
provided.

The intention of the withoutValidation parameter is to bypass more
expensive (and possibly incorrect checks, such as the existence of
the table prior to pipeline execution in cases where earlier stages
create the table).

This moves the basic usage validation to always happen, while the
extended validation is still disabled by withoutValidation.




> BigQueryIO should validate a table or query to read from
> 
>
> Key: BEAM-377
> URL: https://issues.apache.org/jira/browse/BEAM-377
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Assignee: Ben Chambers
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #535: [BEAM-377] Validate BigQueryIO.Read is pro...

2016-06-25 Thread bjchambers
GitHub user bjchambers opened a pull request:

https://github.com/apache/incubator-beam/pull/535

[BEAM-377] Validate BigQueryIO.Read is properly configured

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [*] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [*] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [*] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [*] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Previously, using withoutValidation would disable all validation,
leading to a NullPointerException if there wasn't a table or schema
provided.

The intention of the withoutValidation parameter is to bypass more
expensive (and possibly incorrect checks, such as the existence of
the table prior to pipeline execution in cases where earlier stages
create the table).

This moves the basic usage validation to always happen, while the
extended validation is still disabled by withoutValidation.

This closes BEAM-377.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/incubator-beam beam-377-bigquery-io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #535


commit bf4f8b693aab539282c72dbc77195cd07d3f24f8
Author: Ben Chambers 
Date:   2016-06-25T21:11:17Z

[BEAM-377] Validate BigQueryIO.Read is properly configured

Previously, using withoutValidation would disable all validation,
leading to a NullPointerException if there wasn't a table or schema
provided.

The intention of the withoutValidation parameter is to bypass more
expensive (and possibly incorrect checks, such as the existence of
the table prior to pipeline execution in cases where earlier stages
create the table).

This moves the basic usage validation to always happen, while the
extended validation is still disabled by withoutValidation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-377) BigQueryIO should validate a table or query to read from

2016-06-25 Thread Ben Chambers (JIRA)
Ben Chambers created BEAM-377:
-

 Summary: BigQueryIO should validate a table or query to read from
 Key: BEAM-377
 URL: https://issues.apache.org/jira/browse/BEAM-377
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-extensions
Reporter: Ben Chambers
Assignee: Ben Chambers
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-159) Support fixed number of shards in sinks

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349764#comment-15349764
 ] 

ASF GitHub Bot commented on BEAM-159:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/534

[BEAM-159][BEAM-92][BEAM-365] Write: add support for setting a fixed number 
of shards

And remove special support in Dataflow and Direct runners for it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam write-numshards

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #534


commit ccdcd40c249373a15f10dc70555276e7f3218f12
Author: Dan Halperin 
Date:   2016-06-14T21:03:41Z

Write: add support for setting a fixed number of shards

And remove special support in Dataflow and Direct runners for it.




> Support fixed number of shards in sinks
> ---
>
> Key: BEAM-159
> URL: https://issues.apache.org/jira/browse/BEAM-159
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Daniel Halperin
>
> TextIO supports .withNumShards, however custom sinks, in particular 
> FileBasedSinks, provide no support for controlling sharding. Some users want 
> this, e.g. 
> http://stackoverflow.com/questions/36316304/set-num-of-output-shard-in-write-tosink-in-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #534: [BEAM-159][BEAM-92][BEAM-365] Write: add s...

2016-06-25 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/534

[BEAM-159][BEAM-92][BEAM-365] Write: add support for setting a fixed number 
of shards

And remove special support in Dataflow and Direct runners for it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam write-numshards

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #534


commit ccdcd40c249373a15f10dc70555276e7f3218f12
Author: Dan Halperin 
Date:   2016-06-14T21:03:41Z

Write: add support for setting a fixed number of shards

And remove special support in Dataflow and Direct runners for it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-365) TextIO withoutSharding causes Flink to throw IllegalStateException

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-365:


Assignee: Daniel Halperin

> TextIO withoutSharding causes Flink to throw IllegalStateException
> --
>
> Key: BEAM-365
> URL: https://issues.apache.org/jira/browse/BEAM-365
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.2.0-incubating
>Reporter: Pawel Szczur
>Assignee: Daniel Halperin
>
> The exception: 
> {code}java.lang.IllegalStateException: Shard name template '' only generated 
> 1 distinct file names for 3 files{code}
> The initial discussion took place some time ago, the {{withoutSharding}} was 
> then silently ignored by the runner.
> Explanation from Aljoscha Krettek:
> {quote}
> Hi,
> the issue is a bit more complicated and involves the Beam sink API and the
> Flink runner.
> I'll have to get a bit into how Beam sinks work. The base class for sinks
> is Sink (TextIO.write gets translated to Write.to(new TextSink())).
> Write.to normally gets translated to three ParDo operations that cooperate
> to do the writing:
>  - "Initialize": this does initial initialization of the Sink, this is run
> only once, per sink, non-parallel.
>  - "WriteBundles": this gets an initialized sink on a side-input and the
> values to write on the main input. This runs in parallel, so for Flink, if
> you set parallelism=6 you'll get 6 parallel instances of this operation at
> runtime. This operation forwards information about where it writes to
> downstream. This does not write to the final file location but an
> intermediate staging location.
>  - "Finalize": This gets the initialized sink on the main-input and and the
> information about written files from "WriteBundles" as a side-input. This
> also only runs once, non-parallel. Here we're writing the intermediate
> files to a final location based on the sharding template.
> The problem is that Write.to() and TextSink, as well as all other sinks,
> are not aware of the number of shards. If you set "withoutSharding()" this
> will set the shard template to "" (empty string) and the number of shards
> to 1. "WriteBundles", however is not aware of this and will write 6
> intermediate files if you set parallelism=6. In "Finalize" we will copy an
> intermediate file to the same final location 6 times based on the sharding
> template. The end result is that you only get one of the six result shards.
> The reason why this does only occur in the Flink runner is that all other
> runners have special overrides for TextIO.Write and AvroIO.Write that kick
> in if sharding control is required. So, for the time being this is a Flink
> runner bug and we might have to introduce special overrides as well until
> this is solved in the general case.
> Cheers,
> Aljoscha
> {quote}
> Original discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201606.mbox/%3CCAMdX74-VPUsNOc9NKue2A2tYXZisnHNZ7UkPWk82_TFexpnySg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-350) beam_PostCommit_RunnableOnService_GoogleCloudDataflow flaky

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-350:
-
Fix Version/s: Not applicable

> beam_PostCommit_RunnableOnService_GoogleCloudDataflow flaky
> ---
>
> Key: BEAM-350
> URL: https://issues.apache.org/jira/browse/BEAM-350
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> These tests have been flaking for a little while with 500 server errors 
> calling Google Cloud Dataflow APIS. There may be additional retry logic 
> needed in the SDK around the calls to check the job status.
> {code}
> {
>   "code" : 500,
>   "errors" : [ {
> "domain" : "global",
> "message" : "Internal error encountered.",
> "reason" : "backendError"
>   } ],
>   "message" : "Internal error encountered.",
>   "status" : "INTERNAL"
> }
>   at 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
>   at 
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
>   at 
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
>   at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
>   at 
> org.apache.beam.runners.dataflow.testing.TestDataflowPipelineRunner.checkForSuccess(TestDataflowPipelineRunner.java:185)
>   at 
> org.apache.beam.runners.dataflow.testing.TestDataflowPipelineRunner.run(TestDataflowPipelineRunner.java:141)
>   ... 25 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-364) Integration Tests for Bigtable Read and Write

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-364:
-
Assignee: Ian Zhou  (was: Daniel Halperin)

> Integration Tests for Bigtable Read and Write
> -
>
> Key: BEAM-364
> URL: https://issues.apache.org/jira/browse/BEAM-364
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp
>Reporter: Ian Zhou
>Assignee: Ian Zhou
>Priority: Minor
>
> Integration tests should be added for BigtableIO.read and BigtableIO.write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-355) Update to Bigtable version 0.3.0

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-355:
-
Assignee: Ian Zhou  (was: Daniel Halperin)

> Update to Bigtable version 0.3.0
> 
>
> Key: BEAM-355
> URL: https://issues.apache.org/jira/browse/BEAM-355
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Ian Zhou
>Assignee: Ian Zhou
>Priority: Minor
>
> The Bigtable version should be updated from 0.2.3 to 0.3.0. This allows for 
> large performance improvements through bulk writes and reads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (BEAM-159) Support fixed number of shards in sinks

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reopened BEAM-159:
--
  Assignee: Daniel Halperin

> Support fixed number of shards in sinks
> ---
>
> Key: BEAM-159
> URL: https://issues.apache.org/jira/browse/BEAM-159
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Daniel Halperin
>
> TextIO supports .withNumShards, however custom sinks, in particular 
> FileBasedSinks, provide no support for controlling sharding. Some users want 
> this, e.g. 
> http://stackoverflow.com/questions/36316304/set-num-of-output-shard-in-write-tosink-in-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-240) Add display data link URLs for sources / sinks

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-240:
--
Fix Version/s: (was: 0.1.0-incubating)
   Not applicable

> Add display data link URLs for sources / sinks
> --
>
> Key: BEAM-240
> URL: https://issues.apache.org/jira/browse/BEAM-240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-322) Compare encoded keys in streaming mode

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-322:
--
Fix Version/s: (was: 0.1.0-incubating)
   0.2.0-incubating

> Compare encoded keys in streaming mode
> --
>
> Key: BEAM-322
> URL: https://issues.apache.org/jira/browse/BEAM-322
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
> Fix For: 0.2.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-366) Support Display Data on Composite Transforms

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-366:
-
Issue Type: Improvement  (was: Bug)

> Support Display Data on Composite Transforms
> 
>
> Key: BEAM-366
> URL: https://issues.apache.org/jira/browse/BEAM-366
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Ben Chambers
>
> Today, Dataflow doesn't represent composites all the way to the UI (it 
> reconstructs them from the name). This means it doesn't support attaching 
> Display Data to composites.
> With the runner API refactoring, Dataflow should start supporting composites, 
> at which point we should make sure that Display Data is plumbed through 
> properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-206) FileBasedSink does serial renames

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-206:
-
Priority: Major  (was: Critical)

> FileBasedSink does serial renames
> -
>
> Key: BEAM-206
> URL: https://issues.apache.org/jira/browse/BEAM-206
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> This code can be the bottleneck if there are many small files. Should be 
> parallelized when possible.
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L633



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-374) Enable dependency-plugin and failOnWarning at global level

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-374:
-
Priority: Minor  (was: Major)

> Enable dependency-plugin and failOnWarning at global level
> --
>
> Key: BEAM-374
> URL: https://issues.apache.org/jira/browse/BEAM-374
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Pei He
>Assignee: Pei He
>Priority: Minor
>
> Several modules don't run dependency-plugin, and are pulling dependencies 
> from other modules.
> Those modules can be broken by dependency changes in other places (removing, 
> changing version, and even adding).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-78) Rename Dataflow to Beam

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-78:

Fix Version/s: 0.1.0-incubating

> Rename Dataflow to Beam 
> 
>
> Key: BEAM-78
> URL: https://issues.apache.org/jira/browse/BEAM-78
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
> Fix For: 0.1.0-incubating
>
>
> The initial code drop contains code that uses "Dataflow" to refer to the 
> SDK/model and Cloud Dataflow service. The first usage needs to be swapped to 
> Beam.
> This includes:
> - mentions throughout the javadoc
> - packages of classes that belong to the java sdk core
> And does not include:
> - the DataflowPipelineRunner
> We plan to postpone this rename until other code drops have been integrated 
> into the repository, and we have completed the refactoring that will separate 
> these two uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-203) Travis Build is Broken

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-203:
-
Fix Version/s: Not applicable

> Travis Build is Broken
> --
>
> Key: BEAM-203
> URL: https://issues.apache.org/jira/browse/BEAM-203
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> Ever since the default runner was changed to the InProcessRunner, Travis 
> fails on CountingInputTest:
> {code}
> Running org.apache.beam.sdk.io.CountingInputTest
> No output has been received in the last 10 minutes, this potentially 
> indicates a stalled build or something wrong with the build itself.
> The build has been terminated
> {code}
> https://travis-ci.org/apache/incubator-beam/jobs/123484813



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-149) IntelliJ: Java 8 tests configuration

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-149:
-
Fix Version/s: Not applicable

> IntelliJ: Java 8 tests configuration
> 
>
> Key: BEAM-149
> URL: https://issues.apache.org/jira/browse/BEAM-149
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Maximilian Michels
>Priority: Minor
> Fix For: Not applicable
>
>
> Is it only me or does IntelliJ not correctly handle the Java 8 tests module? 
> It always complains about the language level. When I set the language level 
> to Java 8 for this module, it states: "Error:java: javacTask: source release 
> 1.8 requires target release 1.8".
> My solution for now is to unmark the sources directory for the test. However, 
> I'd like to run the Java 8 tests from IntelliJ :) Anybody an idea how to fix 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-213) Fix README to use refactored package names and use AutoService for Registrar

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-213:
-
Fix Version/s: Not applicable

> Fix README to use refactored package names and use AutoService for Registrar
> 
>
> Key: BEAM-213
> URL: https://issues.apache.org/jira/browse/BEAM-213
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
> Fix For: Not applicable
>
>
> Change mainClass to org.apache.beam.examples.WordCount
> and runner to the fully qualified runner class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-208) Flaky test: org.apache.beam.runners.flink.streaming.GroupByNullKeyTest.testJob

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-208:
-
Fix Version/s: Not applicable

> Flaky test: org.apache.beam.runners.flink.streaming.GroupByNullKeyTest.testJob
> --
>
> Key: BEAM-208
> URL: https://issues.apache.org/jira/browse/BEAM-208
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>Priority: Minor
> Fix For: Not applicable
>
>
> org.apache.beam.runners.flink.streaming.GroupByNullKeyTest.testJob sometimes 
> flakes out.
> Error:
> Different number of lines in expected and obtained result. expected:<1> but 
> was:<2>
> Here's an example on Jenkins:
> https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$flink-runner_2.10/199/testReport/junit/org.apache.beam.runners.flink.streaming/GroupByNullKeyTest/testJob/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-133) Test flakiness in the Spark runner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-133:
-
Component/s: (was: project-management)
 testing

> Test flakiness in the Spark runner
> --
>
> Key: BEAM-133
> URL: https://issues.apache.org/jira/browse/BEAM-133
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Davor Bonaci
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Jenkins shows some flakiness in the Spark runner in the context of an 
> unrelated pre-commit test.
> {code}
> Results :
> Tests in error: 
>   AvroPipelineTest.testGeneric:75 » Runtime java.io.IOException: Could 
> not creat...
>   NumShardsTest.testText:77 » Runtime java.io.IOException: Could not 
> create File...
>   HadoopFileFormatPipelineTest.testSequenceFile:83 » Runtime 
> java.io.IOException...
>   
> TransformTranslatorTest.testTextIOReadAndWriteTransforms:76->runPipeline:96 » 
> Runtime
>   KafkaStreamingTest.testRun:121 » Runtime java.io.IOException: failure 
> to login...
> Tests run: 27, Failures: 0, Errors: 5, Skipped: 0
> [ERROR] There are test failures.
> {code}
> https://builds.apache.org/job/beam_PreCommit/98/console
> Amit, does this sounds like a test code issue or the infrastructure issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-143) Add test for UnboundedSourceWrapper

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-143:
-
Fix Version/s: Not applicable

> Add test for UnboundedSourceWrapper
> ---
>
> Key: BEAM-143
> URL: https://issues.apache.org/jira/browse/BEAM-143
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: Not applicable
>
>
> A test case should ensure the wrapper works as expected. In particular, a 
> full integration test should be run to check whether serialization and 
> instantiation of the wrapper works in a cluster setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-133) Test flakiness in the Spark runner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-133:
-
Fix Version/s: Not applicable

> Test flakiness in the Spark runner
> --
>
> Key: BEAM-133
> URL: https://issues.apache.org/jira/browse/BEAM-133
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Davor Bonaci
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Jenkins shows some flakiness in the Spark runner in the context of an 
> unrelated pre-commit test.
> {code}
> Results :
> Tests in error: 
>   AvroPipelineTest.testGeneric:75 » Runtime java.io.IOException: Could 
> not creat...
>   NumShardsTest.testText:77 » Runtime java.io.IOException: Could not 
> create File...
>   HadoopFileFormatPipelineTest.testSequenceFile:83 » Runtime 
> java.io.IOException...
>   
> TransformTranslatorTest.testTextIOReadAndWriteTransforms:76->runPipeline:96 » 
> Runtime
>   KafkaStreamingTest.testRun:121 » Runtime java.io.IOException: failure 
> to login...
> Tests run: 27, Failures: 0, Errors: 5, Skipped: 0
> [ERROR] There are test failures.
> {code}
> https://builds.apache.org/job/beam_PreCommit/98/console
> Amit, does this sounds like a test code issue or the infrastructure issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-118) Add DisplayData to SDK

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-118:
-
Fix Version/s: 0.1.0-incubating

> Add DisplayData to SDK
> --
>
> Key: BEAM-118
> URL: https://issues.apache.org/jira/browse/BEAM-118
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
> Fix For: 0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-189) The Spark runner uses valueInEmptyWindow which causes values to be dropped

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-189:
-
Fix Version/s: 0.1.0-incubating

> The Spark runner uses valueInEmptyWindow which causes values to be dropped
> --
>
> Key: BEAM-189
> URL: https://issues.apache.org/jira/browse/BEAM-189
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
> Fix For: 0.1.0-incubating
>
>
> Values in empty windowed may be dropped at anytime and so the default 
> windowing should be with GlobalWindow
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-173) Publish DisplayData for PipelineOptions

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-173:
-
Fix Version/s: 0.1.0-incubating

> Publish DisplayData for PipelineOptions
> ---
>
> Key: BEAM-173
> URL: https://issues.apache.org/jira/browse/BEAM-173
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
> Fix For: 0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-247) CombineFn's only definable/usable inside sdk.transforms package

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-247:
-
Fix Version/s: Not applicable

> CombineFn's only definable/usable inside sdk.transforms package
> ---
>
> Key: BEAM-247
> URL: https://issues.apache.org/jira/browse/BEAM-247
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Pei He
>Priority: Critical
> Fix For: Not applicable
>
>
> {code:java}
> public abstract static class CombineFn
>   extends AbstractGlobalCombineFn { /* ... */ }
> abstract static class AbstractGlobalCombineFn
>   implements GlobalCombineFn, Serializable { /* ... 
> */ }
> {code}
> Since {{AbstractGlobalCombineFn}} is package protected (and therefore not 
> visible outside of the {{transform}} package, it is not possible to cast any 
> class that extends {{CombineFn}} to a {{GlobalCombineFn}} outside of this 
> package.
> This prevents applying existing {{CombineFns}} directly (such as 
> {{Combine.perKey(new Sum.SumIntegersFn())}}, as used in our documentation) 
> and also means that a user cannot define their own {{CombineFn}} unless they 
> put them in the {{transform}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-204) truncateStackTrace fails with empty stack trace

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-204:
-
Fix Version/s: 0.1.0-incubating

> truncateStackTrace fails with empty stack trace
> ---
>
> Key: BEAM-204
> URL: https://issues.apache.org/jira/browse/BEAM-204
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Malo Denielou
>Assignee: Mark Shields
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> From a user job: 
> exception:
> "java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> com.google.cloud.dataflow.sdk.util.UserCodeException.truncateStackTrace(UserCodeException.java:72)
>   at 
> com.google.cloud.dataflow.sdk.util.UserCodeException.(UserCodeException.java:52)
>   at 
> com.google.cloud.dataflow.sdk.util.UserCodeException.wrap(UserCodeException.java:35)
>   at 
> com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:40)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:369)
>   at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:51)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:161)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:284)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:508)
>   at 
> com.google.cloud.dataflow.sdk.util.AssignWindowsDoFn.processElement(AssignWindowsDoFn.java:65)
>   at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:161)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.sideOutputWindowedValue(DoFnRunnerBase.java:315)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.sideOutput(DoFnRunnerBase.java:471)
>   at 
> com.google.cloud.dataflow.sdk.transforms.Partition$PartitionDoFn.processElement(Partition.java:165)
> Looking at the code, it seems that if the user code throwable has an empty 
> stacktrace we would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-199) Improve fluent interface for manipulating DisplayData.Items

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-199:
-
Fix Version/s: 0.1.0-incubating

> Improve fluent interface for manipulating DisplayData.Items
> ---
>
> Key: BEAM-199
> URL: https://issues.apache.org/jira/browse/BEAM-199
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> See discussion 
> [here|https://github.com/apache/incubator-beam/pull/126#discussion_r59785549].
>  The current fluent API may be difficult to use and could cause some 
> ambiguity. We have some ideas in the linked thread on how to improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-231) Remove ClassForDisplay infrastructure class.

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-231:
-
Fix Version/s: 0.1.0-incubating

> Remove ClassForDisplay infrastructure class.
> 
>
> Key: BEAM-231
> URL: https://issues.apache.org/jira/browse/BEAM-231
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
> Fix For: 0.1.0-incubating
>
>
> See discussion here: 
> https://github.com/apache/incubator-beam/pull/247#discussion-diff-61184975
> This class should no longer be needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-168) IntervalBoundedExponentialBackOff change broke Beam-on-Dataflow

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-168:
-
Fix Version/s: 0.1.0-incubating

> IntervalBoundedExponentialBackOff change broke Beam-on-Dataflow
> ---
>
> Key: BEAM-168
> URL: https://issues.apache.org/jira/browse/BEAM-168
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: 0.1.0-incubating
>
>
> Changing the `int` to a `long` breaks ABI compatibility, which Dataflow 
> service uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-154) Provide Maven BOM

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-154:
-
Fix Version/s: 0.1.0-incubating

> Provide Maven BOM
> -
>
> Key: BEAM-154
> URL: https://issues.apache.org/jira/browse/BEAM-154
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.1.0-incubating
>
>
> When using the Java SDK (for instance to develop IO), the developer has to 
> add dependencies in his pom.xml (like junit, hamcrest, slf4j, ...).
> To simplify the way to define the dependencies, each Beam SDK could provide a 
> Maven BoM (Bill of Material) describing these dependencies. Then the 
> developer could simply define this BoM as pom.xml dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-272) Flink Runner depends on Dataflow Runner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-272:
-
Fix Version/s: 0.1.0-incubating

> Flink Runner depends on Dataflow Runner
> ---
>
> Key: BEAM-272
> URL: https://issues.apache.org/jira/browse/BEAM-272
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 0.1.0-incubating
>
>
> During restructuring of the modules, we have introduced a dependency of the 
> Flink Runner on the Dataflow Runner. The {{PipelineOptionsFactory}} used to 
> be part of the SDK core but moved to the Dataflow Runner. We should get rid 
> of this dependency to avoid classpath related problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-48) BigQueryIO.Read reimplemented as BoundedSource

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-48:

Fix Version/s: 0.1.0-incubating

> BigQueryIO.Read reimplemented as BoundedSource
> --
>
> Key: BEAM-48
> URL: https://issues.apache.org/jira/browse/BEAM-48
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
> Fix For: 0.1.0-incubating
>
>
> BigQueryIO.Read is currently implemented in a hacky way: the 
> DirectPipelineRunner streams all rows in the table or query result directly 
> using the JSON API, in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path 
> implemented in the Google Cloud Dataflow service. (A BigQuery export job to 
> GCS, followed by a parallel read from GCS).
> We need to reimplement BigQueryIO as a BoundedSource in order to support 
> other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO source in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * We could also directly add support for a RowParser to a user's POJO.
> * We should expose TableSchema as a side output from the BigQueryIO.Read.
> * Our builders for BigQueryIO.Read are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying intermediate tables, query export, etc. This would 
> let users directly control result flattening, location of intermediate 
> tables, table decorators, etc., and also optimistically let users take 
> advantage of some new BigQuery features without code changes.
> * We could use switch between whether we use a BigQuery export + parallel 
> scan vs API read based on factors such as the size of the table at pipeline 
> construction time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-128) The transform BigQueryIO.Read is currently not supported.

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-128:
-
Fix Version/s: Not applicable

> The transform BigQueryIO.Read is currently not supported.
> -
>
> Key: BEAM-128
> URL: https://issues.apache.org/jira/browse/BEAM-128
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kostas Kloudas
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-128) The transform BigQueryIO.Read is currently not supported.

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-128.
--
Resolution: Fixed

Now that BEAM-48 is fixed and Flink supports Read transform, should be good!

> The transform BigQueryIO.Read is currently not supported.
> -
>
> Key: BEAM-128
> URL: https://issues.apache.org/jira/browse/BEAM-128
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kostas Kloudas
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-36) TimestampedValueInMultipleWindows should use a more compact set representation

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-36:

Fix Version/s: Not applicable

> TimestampedValueInMultipleWindows should use a more compact set representation
> --
>
> Key: BEAM-36
> URL: https://issues.apache.org/jira/browse/BEAM-36
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Trivial
>  Labels: Windowing
> Fix For: Not applicable
>
>
> Today TimestampedValueInMultipleWindows converts its collection of windows to 
> a LinkedHashSet for comparisons and hashing. Since it is an immutable set, 
> more compact representations are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-99) Travis: Execute Maven "verify" phase

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-99:

Fix Version/s: Not applicable

> Travis: Execute Maven "verify" phase
> 
>
> Key: BEAM-99
> URL: https://issues.apache.org/jira/browse/BEAM-99
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: Not applicable
>
>
> For proper execution of all tests, we should execute {{mvn verify}} instead 
> of {{mvn install}}. The install phase is part of the verify phase.
> Integration tests are typically executed in the verify phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-12) Apply GroupByKey transforms on PCollection of normal type other than KV

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-12:

Fix Version/s: Not applicable

> Apply GroupByKey transforms on PCollection of normal type other than KV
> ---
>
> Key: BEAM-12
> URL: https://issues.apache.org/jira/browse/BEAM-12
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: bakeypan
>Assignee: Frances Perry
>Priority: Trivial
> Fix For: Not applicable
>
>
> Now the GroupByKey transforms can only apply on PCollection>.So I 
> have to transform PCollection to PCollection> before I want to 
> apply GroupByKey.
> I think we can do better by apply GroupByKey on normal type of PCollection 
> other than KV.And user can offer one custome extract key function or we can 
> offer default extract key function.Just like this:
> PCollection input = ...
> PCollection> result = input.apply(GroupByKey. V>create(new ExtractFn()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-164) Dynamic splitting for unbounded sources

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-164:
-
Fix Version/s: Not applicable

> Dynamic splitting for unbounded sources
> ---
>
> Key: BEAM-164
> URL: https://issues.apache.org/jira/browse/BEAM-164
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Halperin
> Fix For: Not applicable
>
>
> Something like BoundedReader#splitAtFraction, but for Unbounded 
> sources/readers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-180) Add English labels for registered display data

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-180:
-
Fix Version/s: Not applicable

> Add English labels for registered display data
> --
>
> Key: BEAM-180
> URL: https://issues.apache.org/jira/browse/BEAM-180
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
> Fix For: Not applicable
>
>
> Currently display data is registered with camelCased keys: "maxNumRecords": 
> 1234. We have support to add English labels, for UI's to display, but the 
> feature isn't being used yet. We should add labels for common transforms 
> which would benefit from them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-224) Two assert fails with merging windows and closable triggers

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-224:
-
Fix Version/s: Not applicable

> Two assert fails with merging windows and closable triggers
> ---
>
> Key: BEAM-224
> URL: https://issues.apache.org/jira/browse/BEAM-224
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
> Fix For: Not applicable
>
>
> Sessions.withGapDuration(Duration.standardMinutes(30))).triggering( 
> AfterFirst.of(AfterPane.elementCountAtLeast(400),
> AfterProcessingTime.pastFirstElementInPane()
> .plusDelayOf(Duration.standardMinutes(120)), 
> AfterWatermark.pastEndOfWindow())).
> withOutputTimeFn(OutputTimeFns.outputAtEndOfWindow()). 
> java.lang.NullPointerException
>   at 
> com.google.cloud.dataflow.sdk.util.MergingActiveWindowSet.remove(MergingActiveWindowSet.java:198)
>   at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.clearAllState(ReduceFnRunner.java:625)
>   at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.onTimer(ReduceFnRunner.java:556)
>   at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:85)
>   at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
>   at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.ParDoFnBase.processElement(ParDoFnBase.java:207)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:223)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:169)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:621)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.access$500(StreamingDataflowWorker.java:84)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:464)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> "
> 
> java.lang.IllegalStateException: Window 
> [2016-04-25T13:00:04.000Z..2016-04-25T13:30:04.000Z) should have been added
>   at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
>   at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
>   at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
>   at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
>   at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
>   at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
>   at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.ParDoFnBase.processElement(ParDoFnBase.java:207)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
>   at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)
>   at 
> com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
>   at 
> 

[jira] [Updated] (BEAM-205) An 'in process' runner optimized for efficiency rather than debugging

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-205:
-
Fix Version/s: Not applicable

> An 'in process' runner optimized for efficiency rather than debugging
> -
>
> Key: BEAM-205
> URL: https://issues.apache.org/jira/browse/BEAM-205
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Mark Shields
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> IoT applications often want to distribute processing between the device and 
> the cloud. Device will often massage data, filter, window or batch for 
> communication efficiency, etc.
> During development one could imagine wanting to refactor processing between 
> the device and the cloud. So could imagine having both sides in BEAM would be 
> compelling.
> This bug is to explore an 'in process' runner lite. Ie retain all the 
> streaming support, but disable anything which burns cpu/mem only for the 
> purposes of acid testing pipeline wrt serialiazability, concurrency safety, 
> order assumptions, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-139) Print mode (batch/streaming) during translation

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-139:
-
Fix Version/s: Not applicable

> Print mode (batch/streaming) during translation 
> 
>
> Key: BEAM-139
> URL: https://issues.apache.org/jira/browse/BEAM-139
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-flink
>Reporter: Maximilian Michels
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Runners have different feature sets for batch and streaming. It may be a good 
> idea to print/log the translation mode during parsing of the options in the 
> {{PipelineOptionsFactory}}. That would help users to understand they are 
> missing the streaming flag in the options and that the default mode is batch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-230) Remove WindowedValue#valueInEmptyWindows

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-230:
-
Fix Version/s: Not applicable

> Remove WindowedValue#valueInEmptyWindows
> 
>
> Key: BEAM-230
> URL: https://issues.apache.org/jira/browse/BEAM-230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: Not applicable
>
>
> A WindowedValue in no windows does not exist, and can be dropped by a runner 
> at any time.
> We should also assert that any collection of windows is nonempty when 
> creating a new WindowedValue. If a user wants to drop an element, they should 
> explicitly filter it out rather than expecting it to be dropped by the runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-227) Log sdk version in worker-startup logs

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-227:
-
Fix Version/s: Not applicable

> Log sdk version in worker-startup logs
> --
>
> Key: BEAM-227
> URL: https://issues.apache.org/jira/browse/BEAM-227
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> As a failsafe against bizzare jar versioning problems it would be great if 
> the worker could simply print its build version (as a hard-coded, impossible 
> to mess up string)
> I just saw a customer who's jars were all tagged 1.5.1 but their contents 
> were 1.5.0. I am not able rightly to apprehend the kind of confusion of 
> implementation that could provoke such a scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-185) XmlSink output file pattern missing "." in extension

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-185:
-
Fix Version/s: Not applicable

> XmlSink output file pattern missing "." in extension
> 
>
> Key: BEAM-185
> URL: https://issues.apache.org/jira/browse/BEAM-185
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Priority: Minor
> Fix For: Not applicable
>
>
> The XmlSink takes as input a filename prefix and adds the shard name and 
> extension automatically. However, it is missing the "." when adding the 
> extension.
> For an XmlSink configured as:
> {{XmlSink.write().toFilenamePrefix("foobar");}}
> the fileNamePattern is {{foobar-S-of-Nxml}}. Instead, it should be 
> {{foobar-S-of-N.xml}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-236) Implement Windowing in batch execution

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-236:
-
Fix Version/s: Not applicable

> Implement Windowing in batch execution
> --
>
> Key: BEAM-236
> URL: https://issues.apache.org/jira/browse/BEAM-236
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
> Fix For: Not applicable
>
>
> Windows need to be handled correctly in the batched execution of the Flink 
> Runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-183) Update package name for DisplayData types

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-183:
-
Fix Version/s: Not applicable

> Update package name for DisplayData types
> -
>
> Key: BEAM-183
> URL: https://issues.apache.org/jira/browse/BEAM-183
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
> Fix For: Not applicable
>
>
> {{DisplayData}} and it's related components are currently in package 
> {{com.google.cloud.dataflow.sdk.transforms.display}}. We may want to move 
> them down to a lower level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-216) Create Storm Runner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-216:
-
Fix Version/s: Not applicable

> Create Storm Runner 
> 
>
> Key: BEAM-216
> URL: https://issues.apache.org/jira/browse/BEAM-216
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Sriharsha Chintalapani
>Assignee: Jean-Baptiste Onofré
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-291) PDone type translation fails

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-291:
-
Fix Version/s: Not applicable

> PDone type translation fails
> 
>
> Key: BEAM-291
> URL: https://issues.apache.org/jira/browse/BEAM-291
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: Not applicable
>
>
> The {{PDone}} output type is currently not supported by the Flink Runner 
> because it doesn't have a Coder associated. This could also get in the way 
> when translating native Beam sinks which would likely return PDone.
> The simplest solution is to create a dummy PDone coder. Alternatively, we 
> could check for the PDone return type during translation and not retrieve the 
> coder at all.
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Unable to return 
> a default Coder for AnonymousParDo.out [PCollection]. Correct one of the 
> following root causes:
>   No Coder has been manually specified;  you may do so using .setCoder().
>   Inferring a Coder from the CoderRegistry failed: Unable to provide a 
> default Coder for org.apache.beam.sdk.values.PDone. Correct one of the 
> following root causes:
>   Building a Coder using a registered CoderFactory failed: Cannot provide 
> coder based on value with class org.apache.beam.sdk.values.PDone: No 
> CoderFactory has been registered for the class.
>   Building a Coder from the @DefaultCoder annotation failed: Class 
> org.apache.beam.sdk.values.PDone does not have a @DefaultCoder annotation.
>   Building a Coder from the fallback CoderProvider failed: Cannot provide 
> coder for type org.apache.beam.sdk.values.PDone: 
> org.apache.beam.sdk.coders.protobuf.ProtoCoder$1@72ef8d15 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide ProtoCoder 
> because org.apache.beam.sdk.values.PDone is not a subclass of 
> com.google.protobuf.Message; 
> org.apache.beam.sdk.coders.SerializableCoder$1@6aa8e115 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide 
> SerializableCoder because org.apache.beam.sdk.values.PDone does not implement 
> Serializable.
>   Using the default output Coder from the producing PTransform failed: Unable 
> to provide a default Coder for org.apache.beam.sdk.values.PDone. Correct one 
> of the following root causes:
>   Building a Coder using a registered CoderFactory failed: Cannot provide 
> coder based on value with class org.apache.beam.sdk.values.PDone: No 
> CoderFactory has been registered for the class.
>   Building a Coder from the @DefaultCoder annotation failed: Class 
> org.apache.beam.sdk.values.PDone does not have a @DefaultCoder annotation.
>   Building a Coder from the fallback CoderProvider failed: Cannot provide 
> coder for type org.apache.beam.sdk.values.PDone: 
> org.apache.beam.sdk.coders.protobuf.ProtoCoder$1@72ef8d15 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide ProtoCoder 
> because org.apache.beam.sdk.values.PDone is not a subclass of 
> com.google.protobuf.Message; 
> org.apache.beam.sdk.coders.SerializableCoder$1@6aa8e115 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide 
> SerializableCoder because org.apache.beam.sdk.values.PDone does not implement 
> Serializable.
>   at 
> org.apache.beam.sdk.values.TypedPValue.inferCoderOrFail(TypedPValue.java:196)
>   at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:49)
>   at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:138)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$ParDoBoundStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:315)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$ParDoBoundStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:305)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:108)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:89)
>   at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:225)
>   at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:220)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:104)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:292)
>   at 
> 

[jira] [Updated] (BEAM-211) When using Create.of use it's #withCoder method instead of the created PCollection's #setCoder

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-211:
-
Fix Version/s: 0.1.0-incubating

> When using Create.of use it's #withCoder method instead of the created 
> PCollection's #setCoder
> --
>
> Key: BEAM-211
> URL: https://issues.apache.org/jira/browse/BEAM-211
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Amit Sela
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> See [~kenn] comment here:
> https://github.com/apache/incubator-beam/pull/179#discussion_r60171526



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-77) Reorganize Directory structure

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-77:

Fix Version/s: 0.1.0-incubating

> Reorganize Directory structure
> --
>
> Key: BEAM-77
> URL: https://issues.apache.org/jira/browse/BEAM-77
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Reporter: Frances Perry
>Assignee: Davor Bonaci
> Fix For: 0.1.0-incubating
>
>
> Now that we've done the initial Dataflow code drop, we will restructure 
> directories to provide space for additional SDKs and Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-250) Exclude TextSource.minBundleSize default value from display data.

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-250:
-
Fix Version/s: 0.1.0-incubating

> Exclude TextSource.minBundleSize default value from display data.
> -
>
> Key: BEAM-250
> URL: https://issues.apache.org/jira/browse/BEAM-250
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> TextSource currently registers minBundleSize as display data. We should 
> exclude it when the value is default of 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-249) Combine.GroupedValues re-wraps combineFn from Combine.PerKey and loses identity for DisplayData

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-249:
-
Fix Version/s: 0.1.0-incubating

> Combine.GroupedValues re-wraps combineFn from Combine.PerKey and loses 
> identity for DisplayData
> ---
>
> Key: BEAM-249
> URL: https://issues.apache.org/jira/browse/BEAM-249
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> Combine.PerKey is implemented in terms of Combine.GroupedValues, but it 
> passed as wrapped combineFn which gets re-wrapped in Combine.GroupedValues. 
> As a result, we lose the identity of the original combineFn for display data.
> We should update the API to also pass the original identity downstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-251) Exclude PipelineOptions with @JsonIgnore from display data

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-251:
-
Fix Version/s: 0.1.0-incubating

> Exclude PipelineOptions with @JsonIgnore from display data
> --
>
> Key: BEAM-251
> URL: https://issues.apache.org/jira/browse/BEAM-251
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> JsonIgnore properties are generally not useful and often very long. We should 
> exclude them from display data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-292) TextIO.Write.to Empty Files

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-292:
-
Issue Type: Improvement  (was: Bug)

> TextIO.Write.to Empty Files
> ---
>
> Key: BEAM-292
> URL: https://issues.apache.org/jira/browse/BEAM-292
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Assignee: Daniel Halperin
> Fix For: 0.2.0-incubating
>
>
> When a PCollection is empty and is written out with TextIO.Write.to, the 
> output file is unchanged. This makes it seem like the PCollection was not 
> empty. The output file's contents should be changed to be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-248) Register DisplayData from anonymous implementation PTransforms

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-248:
-
Fix Version/s: 0.1.0-incubating

> Register DisplayData from anonymous implementation PTransforms
> --
>
> Key: BEAM-248
> URL: https://issues.apache.org/jira/browse/BEAM-248
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
> Fix For: 0.1.0-incubating
>
>
> Most SDK PTransforms are implemented in terms of lower-level PTransforms, 
> often with anonymous user-fn implementations at the leaf-level. Currently 
> display data is only being registered on the composite node and not within 
> the anonymous implementation. As a result, the details are lost.
> We should register display data both in the composite and internal leaf 
> nodes, particularly when the implementation is anonymous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-52) KafkaIO - bounded/unbounded, source/sink

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-52:

Fix Version/s: 0.1.0-incubating

> KafkaIO - bounded/unbounded, source/sink
> 
>
> Key: BEAM-52
> URL: https://issues.apache.org/jira/browse/BEAM-52
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Daniel Halperin
>Assignee: Raghu Angadi
> Fix For: 0.1.0-incubating
>
>
> We should support Apache Kafka. The priority list is probably:
> * UnboundedSource
> * unbounded Sink
> * BoundedSource
> * bounded Sink
> The connector should be well-tested, especially around UnboundedSource 
> checkpointing and resuming, and data duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-288) Improve javadoc for UnboundedSource

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-288:
-
Issue Type: Improvement  (was: Bug)

> Improve javadoc for UnboundedSource
> ---
>
> Key: BEAM-288
> URL: https://issues.apache.org/jira/browse/BEAM-288
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Mark Shields
>Assignee: Mark Shields
> Fix For: 0.2.0-incubating
>
>
> While working on the pubsub source noticed the UnboundedSource and associated 
> Reader / Checkpoint API needed some important clarrification. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-288) Improve javadoc for UnboundedSource

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-288:
-
Fix Version/s: 0.2.0-incubating

> Improve javadoc for UnboundedSource
> ---
>
> Key: BEAM-288
> URL: https://issues.apache.org/jira/browse/BEAM-288
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Mark Shields
>Assignee: Mark Shields
> Fix For: 0.2.0-incubating
>
>
> While working on the pubsub source noticed the UnboundedSource and associated 
> Reader / Checkpoint API needed some important clarrification. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-240) Add display data link URLs for sources / sinks

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-240:
-
Fix Version/s: 0.1.0-incubating

> Add display data link URLs for sources / sinks
> --
>
> Key: BEAM-240
> URL: https://issues.apache.org/jira/browse/BEAM-240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
> Fix For: 0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-159) Support fixed number of shards in sinks

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-159:
-
Fix Version/s: Not applicable

> Support fixed number of shards in sinks
> ---
>
> Key: BEAM-159
> URL: https://issues.apache.org/jira/browse/BEAM-159
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Eugene Kirpichov
> Fix For: Not applicable
>
>
> TextIO supports .withNumShards, however custom sinks, in particular 
> FileBasedSinks, provide no support for controlling sharding. Some users want 
> this, e.g. 
> http://stackoverflow.com/questions/36316304/set-num-of-output-shard-in-write-tosink-in-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-243) Remove DirectPipelineRunner and keep only the InProcessPipelineRunner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-243:
-
Fix Version/s: 0.2.0-incubating

> Remove DirectPipelineRunner and keep only the InProcessPipelineRunner
> -
>
> Key: BEAM-243
> URL: https://issues.apache.org/jira/browse/BEAM-243
> Project: Beam
>  Issue Type: Task
>  Components: runner-direct
>Reporter: Jean-Baptiste Onofré
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> We have two runners for local JVM/process: the "old" DirectPipelineRunner and 
> the "new" InProcessPipelineRunner.
> They have different feature (for instance the DirectPipelineRunner doesn't 
> support Unbounded PCollection, whereas InProcessPipelineRunner does).
> To avoid confusion, we could remove the "old" DirectPipelineRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-22:

Fix Version/s: 0.2.0-incubating

> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-33) Make DataflowAssert more window-aware

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-33:

Fix Version/s: Not applicable

> Make DataflowAssert more window-aware
> -
>
> Key: BEAM-33
> URL: https://issues.apache.org/jira/browse/BEAM-33
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>  Labels: Triggers, Windowing
> Fix For: Not applicable
>
>
> Today DataflowAssert rewindows into the global window before performing a 
> side input (as an implementation detail). This precludes support for other 
> windowing schemes and triggers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-234) Remove the word Pipeline from the name of all PipelineRunner implementations

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-234:
-
Issue Type: Improvement  (was: Bug)

> Remove the word Pipeline from the name of all PipelineRunner implementations
> 
>
> Key: BEAM-234
> URL: https://issues.apache.org/jira/browse/BEAM-234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, runner-direct, runner-flink, 
> runner-spark
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> The fact that a PipelineRunner runs a Pipeline is provided by its 
> implementation of the PipelineRunner abstract class, so all the inclusion of 
> "Pipeline" is makes it inconvenient to type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-280) TestPipeline should be constructible without a runner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-280:
-
Issue Type: Improvement  (was: Bug)

> TestPipeline should be constructible without a runner
> -
>
> Key: BEAM-280
> URL: https://issues.apache.org/jira/browse/BEAM-280
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> Today, one cannot create a {{Pipeline}} without a runner, as the runner is 
> wired in to do transform expansions. However, we want to remove the 
> {{DirectPipelineRunner}} from the SDK, so a {{TestPipeline}} should default 
> to a no-op runner that performs no expansion, but crashes upon {{run()}}, or 
> some such, in order to execute tests that do not really require a runner.
> (As soon as possible, this expansion wiring will be removed, but if we keep 
> the {{Pipeline.run()}} convenience method, we may still need some optional 
> runner set up)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-234) Remove the word Pipeline from the name of all PipelineRunner implementations

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-234:
-
Fix Version/s: 0.2.0-incubating

> Remove the word Pipeline from the name of all PipelineRunner implementations
> 
>
> Key: BEAM-234
> URL: https://issues.apache.org/jira/browse/BEAM-234
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, runner-direct, runner-flink, 
> runner-spark
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> The fact that a PipelineRunner runs a Pipeline is provided by its 
> implementation of the PipelineRunner abstract class, so all the inclusion of 
> "Pipeline" is makes it inconvenient to type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-280) TestPipeline should be constructible without a runner

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-280:
-
Fix Version/s: 0.2.0-incubating

> TestPipeline should be constructible without a runner
> -
>
> Key: BEAM-280
> URL: https://issues.apache.org/jira/browse/BEAM-280
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> Today, one cannot create a {{Pipeline}} without a runner, as the runner is 
> wired in to do transform expansions. However, we want to remove the 
> {{DirectPipelineRunner}} from the SDK, so a {{TestPipeline}} should default 
> to a no-op runner that performs no expansion, but crashes upon {{run()}}, or 
> some such, in order to execute tests that do not really require a runner.
> (As soon as possible, this expansion wiring will be removed, but if we keep 
> the {{Pipeline.run()}} convenience method, we may still need some optional 
> runner set up)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-146) WindowFn.AssignContext leaks implementation details about compressed WindowedValue representation

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-146:
-
Fix Version/s: 0.2.0-incubating

> WindowFn.AssignContext leaks implementation details about compressed 
> WindowedValue representation
> -
>
> Key: BEAM-146
> URL: https://issues.apache.org/jira/browse/BEAM-146
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Today, {{WindowFn.AssignContext}} provides simultaneous access to all of the 
> windows that a value has been placed in.
> Providing access to the current window for a value is convenient for, e.g. 
> converting day windows to hour windows for each hour of the assign day. But 
> providing access to all the assigned windows allows spooky action across 
> windows, and is generally not intended to be observable - elements are 
> semantically considered to be "duplicated" into each of the assigned windows.
> This ticket proposes that the {{AssignContext}} should provide only a single 
> window, and that windows should be "exploded" prior to window re-assignment 
> so that elements are only observed within one window at a time. This can be 
> accomplished trivially today via surgical insertion of 
> {{RequiresWindowAccess}} but the {{AssignContext}} should have its API 
> adjusted to be explicit about it, too.
> This will affect only pipelines for which _all_ of the following hold:
>  - assigns to sliding windows (or custom {{WindowFn}} that places each 
> element in multiple windows)
>  - re-assigns to different windows without a {{GroupByKey}} between.
>  - the new window assignment actually does depend on the full set of windows 
> assigned
> I hypothesize the number of such pipelines is zero.
> I expect to address this during the Beam Runner API design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-45) Unbounded source for Google Cloud Bigtable

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-45:

Fix Version/s: Not applicable

> Unbounded source for Google Cloud Bigtable
> --
>
> Key: BEAM-45
> URL: https://issues.apache.org/jira/browse/BEAM-45
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
> Fix For: Not applicable
>
>
> Google Cloud Bigtable is currently in Beta. 
> https://cloud.google.com/bigtable/ A bounded source is included in the 
> initial code for Beam, and does a table scan (with an optional row filter) 
> and dynamic work rebalancing.
> The unbounded source should support initial splitting based on key ranges but 
> then streaming along the timestamp dimension.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-146) WindowFn.AssignContext leaks implementation details about compressed WindowedValue representation

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-146:
-
Summary: WindowFn.AssignContext leaks implementation details about 
compressed WindowedValue representation  (was: WindowFn.AssingContext leaks 
implementation details about compressed WindowedValue representation)

> WindowFn.AssignContext leaks implementation details about compressed 
> WindowedValue representation
> -
>
> Key: BEAM-146
> URL: https://issues.apache.org/jira/browse/BEAM-146
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Today, {{WindowFn.AssignContext}} provides simultaneous access to all of the 
> windows that a value has been placed in.
> Providing access to the current window for a value is convenient for, e.g. 
> converting day windows to hour windows for each hour of the assign day. But 
> providing access to all the assigned windows allows spooky action across 
> windows, and is generally not intended to be observable - elements are 
> semantically considered to be "duplicated" into each of the assigned windows.
> This ticket proposes that the {{AssignContext}} should provide only a single 
> window, and that windows should be "exploded" prior to window re-assignment 
> so that elements are only observed within one window at a time. This can be 
> accomplished trivially today via surgical insertion of 
> {{RequiresWindowAccess}} but the {{AssignContext}} should have its API 
> adjusted to be explicit about it, too.
> This will affect only pipelines for which _all_ of the following hold:
>  - assigns to sliding windows (or custom {{WindowFn}} that places each 
> element in multiple windows)
>  - re-assigns to different windows without a {{GroupByKey}} between.
>  - the new window assignment actually does depend on the full set of windows 
> assigned
> I hypothesize the number of such pipelines is zero.
> I expect to address this during the Beam Runner API design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-304) Build break: maven is picking up Kafka snapshots

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-304:
-
Fix Version/s: 0.1.0-incubating

> Build break: maven is picking up Kafka snapshots
> 
>
> Key: BEAM-304
> URL: https://issues.apache.org/jira/browse/BEAM-304
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Critical
> Fix For: 0.1.0-incubating
>
>
> A few concurrent issues:
> 1. We did not pick a specific version of Kafka in our pom, instead we just 
> dependended on 0.9+
> 2. It looks like Kafka has deployed a 0.10 snapshot
> 3. For some reason, our build is picking up snapshot versions instead of 
> using only things available on Maven central.
> 4. 0.10 version of KafkaIO has a breaking change that breaks our code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-305) In Spark runner tests - When using Create.of use it's #withCoder method instead of the created PCollection's #setCoder

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-305:
-
Fix Version/s: 0.1.0-incubating

> In Spark runner tests - When using Create.of use it's #withCoder method 
> instead of the created PCollection's #setCoder
> --
>
> Key: BEAM-305
> URL: https://issues.apache.org/jira/browse/BEAM-305
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Priority: Minor
> Fix For: 0.1.0-incubating
>
>
> See [~kenn] comment here:
> https://github.com/apache/incubator-beam/pull/179#discussion_r60171526



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-323) The InProcessPipelineRunner uses user objects for State and Timer lookups

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-323:
-
Fix Version/s: 0.2.0-incubating

> The InProcessPipelineRunner uses user objects for State and Timer lookups
> -
>
> Key: BEAM-323
> URL: https://issues.apache.org/jira/browse/BEAM-323
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Critical
> Fix For: 0.2.0-incubating
>
>
> The runner keys state based on the user key that it was grouped by, instead 
> of the structural value of that key. This causes callbacks to be updated 
> based on java equality, so an object with a poor equals can cause the 
> Pipeline to not progress.
> Initially uncovered in https://github.com/apache/incubator-beam/pull/409



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-327) Dataflow runner should have configuration for System.out/err handling

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-327:
-
Fix Version/s: 0.2.0-incubating

> Dataflow runner should have configuration for System.out/err handling
> -
>
> Key: BEAM-327
> URL: https://issues.apache.org/jira/browse/BEAM-327
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
> Fix For: 0.2.0-incubating
>
>
> We would like to support the following scenarios:
> # Respect global logging filter configuration for System.out/System.err log 
> messages.
> # Suppress all log message for a given source, including System.out/err 
> (Level.OFF)
> # Set the log level for messages emitted from System.out/err.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-330) Maven exec warning message on Dataflow WordCount Example

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-330:
-
Fix Version/s: 0.2.0-incubating

> Maven exec warning message on Dataflow WordCount Example
> 
>
> Key: BEAM-330
> URL: https://issues.apache.org/jira/browse/BEAM-330
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> When running the WordCount example with DataflowPipelineRunner, sometimes the 
> Maven process will emit a warning when trying to tear down the process:
> {quote}{noformat}
> [DEBUG] interrupting thread 
> Thread[pool-1-thread-1,5,org.apache.beam.examples.WordCount]
> [DEBUG] joining on thread 
> Thread[pool-1-thread-1,5,org.apache.beam.examples.WordCount]
> [WARNING] thread Thread[pool-1-thread-1,5,org.apache.beam.examples.WordCount] 
> was interrupted but is still alive after waiting at least 14999msecs
> [WARNING] thread Thread[pool-1-thread-1,5,org.apache.beam.examples.WordCount] 
> will linger despite being asked to die via interruption
> [WARNING] NOTE: 1 thread(s) did not finish despite being asked to  via 
> interruption. This is not a problem with exec:java, it is a problem with the 
> running code. Although not serious, it should be remedied.
> [WARNING] Couldn't destroy threadgroup 
> org.codehaus.mojo.exec.ExecJavaMojo$IsolatedThreadGroup[name=org.apache.beam.examples.WordCount,maxpri=10]
> {noformat}{quote}
> This appears to be some bad interaction between exec-maven-plugin and 
> DataflowPipelineRunner, possibly due to exec-maven-plugin's use of Guice and 
> our shading of it.
> The problem doesn't always reproduce, except when executing multiple Maven 
> targets, such as "mvn install exec:java \[...\]"
> Disabling exec:java's cleanupDaemonThreads indeed suppresses the issue.  I 
> recommend we add this configuration to the root pom.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-331) WindowedWordCount.AddTimestampFn has nondeterministic timestamp bounds

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-331:
-
Fix Version/s: 0.2.0-incubating

> WindowedWordCount.AddTimestampFn has nondeterministic timestamp bounds
> --
>
> Key: BEAM-331
> URL: https://issues.apache.org/jira/browse/BEAM-331
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> The timestamps added in the WindowedWordCount example are based on when the 
> bundles are executed, which makes the min/max bounds non-deterministic. 
> It would be more desirable to capture the min/max at construction time. We 
> would like this to be able to use this as an example of adding display data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-343) DisplayData error in AfterWatermark

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-343:
-
Fix Version/s: 0.2.0-incubating

> DisplayData error in AfterWatermark
> ---
>
> Key: BEAM-343
> URL: https://issues.apache.org/jira/browse/BEAM-343
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating
>Reporter: Daniel Mills
>Assignee: Scott Wegner
> Fix For: 0.2.0-incubating
>
>
> If AfterWatermark is used with early firings but no late firings, toString 
> (used by DisplayData) crashes because there is no trigger at LATE_INDEX.
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterWatermark.java#L239



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-341) ReduceFnRunner allows GC time overflow

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-341:
-
Fix Version/s: 0.2.0-incubating

> ReduceFnRunner allows GC time overflow
> --
>
> Key: BEAM-341
> URL: https://issues.apache.org/jira/browse/BEAM-341
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 0.2.0-incubating
>
>
> In {{ReduceFnRunner}}, any window ending after the global window has its GC 
> time capped to the end of the global window. But for windows ending before 
> the global window the allowed lateness can still be arbitrary, causing 
> overflow.
> http://stackoverflow.com/questions/37808159/why-am-i-getting-java-lang-illegalstateexception-on-google-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-342) Change Filter#greaterThan, etc. to actually use Filter

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-342:
-
Fix Version/s: 0.2.0-incubating

> Change Filter#greaterThan, etc. to actually use Filter
> --
>
> Key: BEAM-342
> URL: https://issues.apache.org/jira/browse/BEAM-342
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Manu Zhang
>Priority: Minor
>  Labels: starter
> Fix For: 0.2.0-incubating
>
>
> This is a good starter task.
> Right now, 
> [{{Filter#greaterThan}}|https://github.com/apache/incubator-beam/blob/315b3c8e333e5f42730c19e89f856d778ce93cab/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java#L134]
>  constructs a new DoFn rather than using {{Filter#byPredicate}}. We should 
> fix this to make it consistent and simpler.
> We can also remove deprecated functions in that file, and if possible 
> redundant display data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-346) When WindowFn not specified in Window.into(...), NullPointerException

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-346:
-
Fix Version/s: Not applicable

> When WindowFn not specified in Window.into(...), NullPointerException
> -
>
> Key: BEAM-346
> URL: https://issues.apache.org/jira/browse/BEAM-346
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-358) JAXB Coder is not thread safe

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-358:
-
Fix Version/s: 0.2.0-incubating

> JAXB Coder is not thread safe
> -
>
> Key: BEAM-358
> URL: https://issues.apache.org/jira/browse/BEAM-358
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: 0.2.0-incubating
>
>
> Marshaller and Unmarshaller are cached for reuse in an instance variable. 
> These objects are not thread safe 
> (http://stackoverflow.com/questions/7400422/jaxb-creating-context-and-marshallers-cost),
>  so they should be accessed in a single-threaded manner, either via use of a 
> ThreadLocal or creating a new instance in calls to encode/decode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-353) Correct the licenses

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-353:
-
Fix Version/s: Not applicable

> Correct the licenses
> 
>
> Key: BEAM-353
> URL: https://issues.apache.org/jira/browse/BEAM-353
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> Fix the licenses to the correct one and add license to files with a missing 
> license.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-367) GetFractionConsumed() inaccurate for non-uniform records

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-367:
-
Fix Version/s: 0.2.0-incubating

> GetFractionConsumed() inaccurate for non-uniform records
> 
>
> Key: BEAM-367
> URL: https://issues.apache.org/jira/browse/BEAM-367
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Ian Zhou
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> GetFractionConsumed() provides inaccurate progress updates for clustered 
> records. For example, for a range spanning [1, 10], a cluster of records 
> around 5 (e.g. 5.01 ..., 5.09) will be recorded as ~50% complete upon 
> reading the first record, and will remain at this percentage until the final 
> record has been read. Instead, the start of the range should be changed to 
> the first record seen (e.g. new range [5.01, 10]). The end of the range 
> can be changed over time through dynamic work rebalancing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-348) Clean temp_dir usage in _stage_extra_packages

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-348:
-
Fix Version/s: Not applicable

> Clean temp_dir usage in _stage_extra_packages
> -
>
> Key: BEAM-348
> URL: https://issues.apache.org/jira/browse/BEAM-348
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-373) CoGbkResult will crash after 10,000 elements on any runner that does not provide a Reiterator

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-373:
-
Fix Version/s: Not applicable

> CoGbkResult will crash after 10,000 elements on any runner that does not 
> provide a Reiterator
> -
>
> Key: BEAM-373
> URL: https://issues.apache.org/jira/browse/BEAM-373
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>Priority: Critical
> Fix For: Not applicable
>
>
> Unless I am reading this wrong, there is a hardcoded downcast to 
> {{Reiterator}} after 10,000 elements. This appears to be an assumption built 
> in from the early days. (discovered while trying to decide whether 
> {{Reiterator}} should live in the SDK or runner-core)
> I will throw together a test shortly to confirm.
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/join/CoGbkResult.java#L122



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-159) Support fixed number of shards in sinks

2016-06-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-159:
-
Summary: Support fixed number of shards in sinks  (was: Support fixed 
number of shards in custom sinks)

> Support fixed number of shards in sinks
> ---
>
> Key: BEAM-159
> URL: https://issues.apache.org/jira/browse/BEAM-159
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Eugene Kirpichov
>
> TextIO supports .withNumShards, however custom sinks, in particular 
> FileBasedSinks, provide no support for controlling sharding. Some users want 
> this, e.g. 
> http://stackoverflow.com/questions/36316304/set-num-of-output-shard-in-write-tosink-in-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-374) Enable dependency-plugin and failOnWarning at global level

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-374:
--
Component/s: project-management

> Enable dependency-plugin and failOnWarning at global level
> --
>
> Key: BEAM-374
> URL: https://issues.apache.org/jira/browse/BEAM-374
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Reporter: Pei He
>Assignee: Pei He
>
> Several modules don't run dependency-plugin, and are pulling dependencies 
> from other modules.
> Those modules can be broken by dependency changes in other places (removing, 
> changing version, and even adding).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-331) WindowedWordCount.AddTimestampFn has nondeterministic timestamp bounds

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-331:
--
Component/s: sdk-java-core

> WindowedWordCount.AddTimestampFn has nondeterministic timestamp bounds
> --
>
> Key: BEAM-331
> URL: https://issues.apache.org/jira/browse/BEAM-331
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>
> The timestamps added in the WindowedWordCount example are based on when the 
> bundles are executed, which makes the min/max bounds non-deterministic. 
> It would be more desirable to capture the min/max at construction time. We 
> would like this to be able to use this as an example of adding display data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-212) SparkPipelineRunner should support the Read Primitive

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-212:
--
Component/s: runner-spark

> SparkPipelineRunner should support the Read Primitive
> -
>
> Key: BEAM-212
> URL: https://issues.apache.org/jira/browse/BEAM-212
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Thomas Groh
>
> The TransformTranslator doesn't currently translate Read.Bounded or 
> Read.Unbounded, which are the primitive PTransforms for creating a 
> PCollection from no inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-159) Support fixed number of shards in custom sinks

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-159:
--
Component/s: sdk-java-core
 beam-model

> Support fixed number of shards in custom sinks
> --
>
> Key: BEAM-159
> URL: https://issues.apache.org/jira/browse/BEAM-159
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core
>Reporter: Eugene Kirpichov
>
> TextIO supports .withNumShards, however custom sinks, in particular 
> FileBasedSinks, provide no support for controlling sharding. Some users want 
> this, e.g. 
> http://stackoverflow.com/questions/36316304/set-num-of-output-shard-in-write-tosink-in-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-340) Use Avro coder for KafkaIO

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-340:
--
Component/s: sdk-java-extensions

> Use Avro coder for KafkaIO 
> ---
>
> Key: BEAM-340
> URL: https://issues.apache.org/jira/browse/BEAM-340
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Raghu Angadi
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-376) Correct Python pip install target

2016-06-25 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci resolved BEAM-376.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-incubating

> Correct Python pip install target
> -
>
> Key: BEAM-376
> URL: https://issues.apache.org/jira/browse/BEAM-376
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Eric Anderson
>Assignee: Eric Anderson
>Priority: Trivial
> Fix For: 0.2.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-376) Correct Python pip install target

2016-06-25 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349705#comment-15349705
 ] 

Davor Bonaci commented on BEAM-376:
---

Fixed?

> Correct Python pip install target
> -
>
> Key: BEAM-376
> URL: https://issues.apache.org/jira/browse/BEAM-376
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Eric Anderson
>Assignee: Eric Anderson
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-375) HadoopIO and runners-spark conflict with hadoop.version

2016-06-25 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349483#comment-15349483
 ] 

Amit Sela commented on BEAM-375:


But do we want Beam Hadoop components to support multiple Hadoop versions ? 
with a default build like Spark does ?

Correct me if I'm wrong here, but going with the latest is not the best choice 
anyway (unless there is a constraint) because AFAIK Hadoop is not something 
users upgrade as fast as say, Spark/Flink versions.

WDYT ?

> HadoopIO and runners-spark conflict with hadoop.version
> ---
>
> Key: BEAM-375
> URL: https://issues.apache.org/jira/browse/BEAM-375
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Pei He
>Assignee: Pei He
>
> HadoopIO currently uses 2.7.0 and runners-spark uses 2.2.0 for hadoop-client, 
> hadoop-common.
> From [~amitsela]
> "Spark can be built against different hadoop versions, but the release in 
> maven central is a 2.2.0 build (latest). ''
> For HadoopIO, I don't know why 2.7.0 is picked at the beginning. I can check 
> if it will work with 2.2.0.
> I am creating this issue, since I think it there is a general question.
> In principle, HadoopIO and other sdks Sources should work with any runners. 
> But, when one set of runners require version A, but the other set of runners 
> require version B, we will need a general solution for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)