[jira] [Updated] (BEAM-1201) Remove producesSortedKeys from Source

2016-12-21 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1201:
--
Labels: backward-incompatible  (was: )

> Remove producesSortedKeys from Source
> -
>
> Key: BEAM-1201
> URL: https://issues.apache.org/jira/browse/BEAM-1201
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>  Labels: backward-incompatible
>
> This is a holdover from a precursor of the old Dataflow SDK that we just 
> failed to delete before releasing Dataflow 1.0, but we can delete before the 
> first stable release of Beam.
> This function has never been used by any runner. It does not mean anything 
> obvious to implementors, as many sources produce {{T}}, not {{KV}} -- 
> what does it mean in the former case? (And how do you get the latter case 
> correct?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1201) Remove producesSortedKeys from BoundedSource

2016-12-21 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1201:
--
Summary: Remove producesSortedKeys from BoundedSource  (was: Remove 
producesSortedKeys from Source)

> Remove producesSortedKeys from BoundedSource
> 
>
> Key: BEAM-1201
> URL: https://issues.apache.org/jira/browse/BEAM-1201
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>  Labels: backward-incompatible
>
> This is a holdover from a precursor of the old Dataflow SDK that we just 
> failed to delete before releasing Dataflow 1.0, but we can delete before the 
> first stable release of Beam.
> This function has never been used by any runner. It does not mean anything 
> obvious to implementors, as many sources produce {{T}}, not {{KV}} -- 
> what does it mean in the former case? (And how do you get the latter case 
> correct?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-469) NullableCoder optimized encoding via passthrough context

2016-12-21 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768476#comment-15768476
 ] 

Daniel Halperin commented on BEAM-469:
--

Sorry I missed this JIRA comment, [~mariusz89016]! A bit late, but...

Say a coder C does not have the nested context. Then we actually have the 
guarantee that no one will put later elements.

So if {{NullableCoder}} does not have the nested context, then no one will put 
more elements after whatever the {{NullableCoder}} puts. If the NC puts {{0}} 
then that's it -- the element is null. But if the NC puts {{1}}, then we know 
that all remaining bytes in the encoded string belong to the inner coder. That 
is effectively saying that the inner coder also does not need to have the 
nested context, so it does not need to write its own length.

In your example, the {{NullableCoder}} is used in an inner context. So the 
inner coder needs to also use the inner context, because there may be more 
encoded elements later.

In either case: the context of the nullable coder can be the same as the 
context of the inner coder. This is why in the patch here, we simply pass the 
NC's context down into the inner coder. All we have removed is the _additional_ 
nesting that was used.

https://patch-diff.githubusercontent.com/raw/apache/incubator-beam/pull/992.patch
 

> NullableCoder optimized encoding via passthrough context
> 
>
> Key: BEAM-469
> URL: https://issues.apache.org/jira/browse/BEAM-469
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Thomas Groh
>Priority: Trivial
>  Labels: backward-incompatible
> Fix For: 0.3.0-incubating
>
>
> NullableCoder should encode using the context given and not always use the 
> nested context. For coders which can efficiently encode in the outer context 
> such as StringUtf8Coder or ByteArrayCoder, we are forcing them to prefix 
> themselves with their length.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1201) Remove producesSortedKeys from Source

2016-12-21 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1201:
-

 Summary: Remove producesSortedKeys from Source
 Key: BEAM-1201
 URL: https://issues.apache.org/jira/browse/BEAM-1201
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor


This is a holdover from a precursor of the old Dataflow SDK that we just failed 
to delete before releasing Dataflow 1.0, but we can delete before the first 
stable release of Beam.

This function has never been used by any runner. It does not mean anything 
obvious to implementors, as many sources produce {{T}}, not {{KV}} -- what 
does it mean in the former case? (And how do you get the latter case correct?)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-21 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767937#comment-15767937
 ] 

Daniel Halperin commented on BEAM-1190:
---

I do not think this is generally safe -- it may mask underlying bugs. For 
example, we should never invoke this code unless the filesystem is known be 
eventually list-consistent but consistent with stat.

This change does not obviate the need for [BEAM-60] -- because users may want 
to go the other way, and expand the inconsistent list they get. I propose you 
package this logic up in whatever the new name for IOChannelUtils is as one of 
the things users can do in the code they run at expand-time.

Bringing the user into the loop is also nice because it makes them deal with 
eventual consistency up front. We are burned a lot by users who don't realize 
what their globs really mean.

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1194) DataflowRunner should test a variety of valid tempLocation/stagingLocation/etc formats.

2016-12-21 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1194:
--
Description: 
Cloud Dataflow has a minor history of small bugs related to various code paths 
expecting there to be or not be a trailing forward-slash in these location 
fields. The way that Beam's integration tests are set up, we are likely to only 
have one of these two cases tested (there is a single set of integration test 
pipeline options).

We should add a dedicated DataflowRunner integration test to handle this case.

  was:
Cloud Dataflow has a minor history of small bugs related to various code paths 
expecting there to be or not be a trailing forward-slash in these location 
fields. The way that Beam's integration tests are set up, we are likely to only 
have one of these two cases tested (there is a single set of integration test 
pipeline options).

We should add a dedicated integration test to handle this case.


> DataflowRunner should test a variety of valid 
> tempLocation/stagingLocation/etc formats.
> ---
>
> Key: BEAM-1194
> URL: https://issues.apache.org/jira/browse/BEAM-1194
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Cloud Dataflow has a minor history of small bugs related to various code 
> paths expecting there to be or not be a trailing forward-slash in these 
> location fields. The way that Beam's integration tests are set up, we are 
> likely to only have one of these two cases tested (there is a single set of 
> integration test pipeline options).
> We should add a dedicated DataflowRunner integration test to handle this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1194) DataflowRunner should test a variety of valid tempLocation/stagingLocation/etc formats.

2016-12-21 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1194:
-

 Summary: DataflowRunner should test a variety of valid 
tempLocation/stagingLocation/etc formats.
 Key: BEAM-1194
 URL: https://issues.apache.org/jira/browse/BEAM-1194
 Project: Beam
  Issue Type: Test
  Components: runner-dataflow
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor


Cloud Dataflow has a minor history of small bugs related to various code paths 
expecting there to be or not be a trailing forward-slash in these location 
fields. The way that Beam's integration tests are set up, we are likely to only 
have one of these two cases tested (there is a single set of integration test 
pipeline options).

We should add a dedicated integration test to handle this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-21 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767818#comment-15767818
 ] 

Daniel Halperin edited comment on BEAM-1190 at 12/21/16 6:46 PM:
-

Not for very long -- the stat at open-time is getting removed. We get the size 
information we need from the list call, but currently throw it away for silly 
reasons.

How would you feel about the ability to execute code in the worker when the 
glob is expanded. I think checking which files actually exist then and deciding 
in one centralized place in time which files you want to read (and committing 
to that decision for later) is probably a simpler and safer solution.


was (Author: dhalp...@google.com):
Not for very long -- the stat at open-time is getting removed as we get the 
information we need from the list call, but throw it away like we shouldn't be.

How would you feel about the ability to execute code in the worker when the 
glob is expanded. I think checking which files actually exist then and deciding 
in one centralized place in time which files you want to read (and committing 
to that decision for later) is probably a simpler and safer solution.

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-21 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767818#comment-15767818
 ] 

Daniel Halperin commented on BEAM-1190:
---

Not for very long -- the stat at open-time is getting removed as we get the 
information we need from the list call, but throw it away like we shouldn't be.

How would you feel about the ability to execute code in the worker when the 
glob is expanded. I think checking which files actually exist then and deciding 
in one centralized place in time which files you want to read (and committing 
to that decision for later) is probably a simpler and safer solution.

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-21 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767812#comment-15767812
 ] 

Daniel Halperin commented on BEAM-1190:
---

Here are my general concerns:

* How do we distinguish this event from files getting deleted between expand 
and read time, which is obviously an error.
* How do we ensure consistency? Aka, suppose I produce 3 shards of a file 
returned by a list on an eventually-list-consistent filesystem, and when we try 
to open them at runtime the file does not exist. How can we distinguish whether 
the file was deleted before expand-glob, after expand-glob but before any 
reading, after reading some shards but not others, etc?
* If the filesystem (or the part of the filesystem used by the user) is not 
eventually consistent, this is clearly a bad thing to do. So this would somehow 
have to be filesystem+glob aware.

So I would prefer a solution like giving [BEAM-60] which will give the user 
code they can execute at expand-the-glob time. That's where the user can put in 
the "stat the file and make sure it's there, or remove it" code, if they want 
such.

For your proposal:

* Will it inspect the properties of the Filesystem to ensure it is eventually 
consistent for the expanded glob?
* Do we plan to add a mandatory stat at glob-expand time? That seems like the 
only correct place to add this logic -- otherwise, we can't tell from files 
that legitimately did exist at expand time, but were deleted in error at 
runtime.
* How do we plan to allow the user to opt into this behavior?

I think putting the user explicitly in the loop is the right thing to do, 
either through one of the two JIRA issues I linked earlier. I am not yet 
persuaded that changing default behavior is the right answer.

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-20 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765683#comment-15765683
 ] 

Daniel Halperin commented on BEAM-1190:
---

The two relevant JIRA issues: [BEAM-76] and [BEAM-60]

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-20 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765675#comment-15765675
 ] 

Daniel Halperin commented on BEAM-1190:
---

I think this is a very scary default behavior, and something the user should 
implement on their own in pipeline construction.

Alternately, there's already a JIRA issue for giving the user a hook to run 
code at expansion time in order to, e.g., autocomplete sharding templates that 
eventual consistency chose not to show.

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist

2016-12-20 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765675#comment-15765675
 ] 

Daniel Halperin edited comment on BEAM-1190 at 12/21/16 12:33 AM:
--

I think this is a very scary proposal for a new default behavior, and something 
the user should implement on their own in pipeline construction.

Alternately, there's already a JIRA issue for giving the user a hook to run 
code at expansion time in order to, e.g., autocomplete sharding templates that 
eventual consistency chose not to show.


was (Author: dhalp...@google.com):
I think this is a very scary default behavior, and something the user should 
implement on their own in pipeline construction.

Alternately, there's already a JIRA issue for giving the user a hook to run 
code at expansion time in order to, e.g., autocomplete sharding templates that 
eventual consistency chose not to show.

> FileBasedSource should ignore files that matched the glob but don't exist
> -
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1181) Investigate: does BigQueryIO honor write disposition in all cases?

2016-12-19 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762278#comment-15762278
 ] 

Daniel Halperin commented on BEAM-1181:
---

[~pei...@gmail.com] can you please take a look, when you get a chance?

> Investigate: does BigQueryIO honor write disposition in all cases?
> --
>
> Key: BEAM-1181
> URL: https://issues.apache.org/jira/browse/BEAM-1181
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
> Fix For: 0.5.0-incubating
>
>
> I am not certain that we have proper testing of the spread of write 
> dispositions, and I do not see enough (any?) testing of the non-default 
> values. We should verify that these are properly used, even for large 
> multi-table copies, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1181) Investigate: does BigQueryIO honor write disposition in all cases?

2016-12-19 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1181:
-

 Summary: Investigate: does BigQueryIO honor write disposition in 
all cases?
 Key: BEAM-1181
 URL: https://issues.apache.org/jira/browse/BEAM-1181
 Project: Beam
  Issue Type: Task
  Components: sdk-java-gcp
Reporter: Daniel Halperin
Assignee: Pei He
 Fix For: 0.5.0-incubating


I am not certain that we have proper testing of the spread of write 
dispositions, and I do not see enough (any?) testing of the non-default values. 
We should verify that these are properly used, even for large multi-table 
copies, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-318) AvroCoder may be affected by AVRO-607

2016-12-16 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-318:
-
Assignee: (was: Daniel Halperin)

> AvroCoder may be affected by AVRO-607
> -
>
> Key: BEAM-318
> URL: https://issues.apache.org/jira/browse/BEAM-318
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>
> See AVRO-607
> The hypothesis is that even though AvroCoder is threadsafe according to 
> Avro's javadoc, an underlying bug in Avro library makes thread-unsafe access 
> to a Java WeakHashMap. However I have not yet succeeded at reproducing the 
> bug; with some failed attempts in the linked GitHub Pull Request history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1120) Prepare Dataflow runner for 0.4.0 release

2016-12-15 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1120.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Prepare Dataflow runner for 0.4.0 release
> -
>
> Key: BEAM-1120
> URL: https://issues.apache.org/jira/browse/BEAM-1120
> Project: Beam
>  Issue Type: Improvement
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1166) Source.getDefaultOutputCoder() documentation should mention CannotProvideCoderException

2016-12-15 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1166:
--
Component/s: sdk-java-core
 beam-model

> Source.getDefaultOutputCoder() documentation should mention 
> CannotProvideCoderException
> ---
>
> Key: BEAM-1166
> URL: https://issues.apache.org/jira/browse/BEAM-1166
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-core
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> Knowing that you can throw CannotProviderCoderException is an important part 
> of implementing getDefaultOutputCoder
> The documentation for PTransform's getDefaultOutputCoder mentions this class, 
> and we should do this for the Source class as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1023) Add test coverage for BigQueryIO.Write in streaming mode

2016-12-15 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752428#comment-15752428
 ] 

Daniel Halperin commented on BEAM-1023:
---

https://github.com/apache/incubator-beam/pull/1400

> Add test coverage for BigQueryIO.Write in streaming mode
> 
>
> Key: BEAM-1023
> URL: https://issues.apache.org/jira/browse/BEAM-1023
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 0.5.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1023) Add test coverage for BigQueryIO.Write in streaming mode

2016-12-15 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752424#comment-15752424
 ] 

Daniel Halperin commented on BEAM-1023:
---

This was implemented by PR #1400, which was accidentally tagged with BEAM-1022 
instead of this issue.

> Add test coverage for BigQueryIO.Write in streaming mode
> 
>
> Key: BEAM-1023
> URL: https://issues.apache.org/jira/browse/BEAM-1023
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 0.5.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1023) Add test coverage for BigQueryIO.Write in streaming mode

2016-12-15 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1023.
---
   Resolution: Fixed
Fix Version/s: 0.5.0-incubating

> Add test coverage for BigQueryIO.Write in streaming mode
> 
>
> Key: BEAM-1023
> URL: https://issues.apache.org/jira/browse/BEAM-1023
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 0.5.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749908#comment-15749908
 ] 

Daniel Halperin commented on BEAM-1126:
---

So https://issues.apache.org/jira/browse/BEAM-774 is a relevant subtask?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1153) GcsUtil needs to set timeout and retry explicitly in BatchRequest.

2016-12-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749758#comment-15749758
 ] 

Daniel Halperin commented on BEAM-1153:
---

So there appear to be two related bugs in staging jars. This issue and fix will 
address:

{code}
java.lang.RuntimeException: Could not stage classpath element: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_RunnableOnService_Dataflow/.repository/com/google/http-client/google-http-client-jackson2/1.22.0/google-http-client-jackson2-1.22.0.jar
at 
org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:240)
at 
org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:142)
at 
org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:51)
at 
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:487)
at 
org.apache.beam.runners.dataflow.testing.TestDataflowRunner.run(TestDataflowRunner.java:103)
at 
org.apache.beam.runners.dataflow.testing.TestDataflowRunner.run(TestDataflowRunner.java:96)
at 
org.apache.beam.runners.dataflow.testing.TestDataflowRunner.run(TestDataflowRunner.java:62)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
at 
org.apache.beam.sdk.transforms.CountTest.testCountGloballyBasic(CountTest.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at 
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error executing batch GCS request
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:481)
at org.apache.beam.sdk.util.GcsUtil.fileSizes(GcsUtil.java:280)
at org.apache.beam.sdk.util.GcsUtil.fileSize(GcsUtil.java:270)
at 
org.apache.beam.sdk.util.GcsIOChannelFactory.getSizeBytes(GcsIOChannelFactory.java:82)
at 
org.apache.beam.sdk.util.IOChannelUtils.getSizeBytes(IOChannelUtils.java:240)
at 
org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:195)
... 27 more
Caused by: java.util.concurrent.ExecutionException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.beam.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
at 
org.apache.beam.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:455)
at 
org.apache.beam.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:473)
... 32 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fi

[jira] [Comment Edited] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749237#comment-15749237
 ] 

Daniel Halperin edited comment on BEAM-1149 at 12/14/16 7:30 PM:
-

Yes. I think the "right" fix is more to point out that "has side inputs" 
implies "observesWindow = true", because the window of the main input is used 
when identifying the side input. Which is effectively what you have done in 
your change.


was (Author: dhalp...@google.com):
Yes. I think the "right" fix is more to point out that "has side inputs" 
implies "observesWindow = true". Which is effectively what you have done in 
your change.

> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749237#comment-15749237
 ] 

Daniel Halperin commented on BEAM-1149:
---

Yes. I think the "right" fix is more to point out that "has side inputs" 
implies "observesWindow = true". Which is effectively what you have done in 
your change.

> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1153) GcsUtil needs to set timeout and retry explicitly in BatchRequest.

2016-12-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749181#comment-15749181
 ] 

Daniel Halperin commented on BEAM-1153:
---

Thanks for the info!

Why do we think the read timeout has anything to do with pushing files to GCS? 
I would imagine we're not issuing read commands.

> GcsUtil needs to set timeout and retry explicitly in BatchRequest.
> --
>
> Key: BEAM-1153
> URL: https://issues.apache.org/jira/browse/BEAM-1153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>Priority: Blocker
>
> Non-batch requests uses RetryHttpRequestInitializer, which set read timeout 
> as 80 seconds, and does more retries.
> Google Cloud auto generated Json library doesn't set HttpRequestInitializer 
> for batch requests.
> GcsUtil uses storageClient.batch(), and it is defined in here:
> https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256
> Without the HttpRequestInitializer, the default read timeout is 20 seconds.
> Possible fix is: https://github.com/apache/incubator-beam/pull/1608
> In additional, we can partially rollback 
> https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch 
> API for fileSize() for single files. This will make sure existing code will 
> keep work as the same way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1152) Java-Maven Archetypes-Starter Build Failed

2016-12-13 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1152.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Java-Maven Archetypes-Starter Build Failed
> --
>
> Key: BEAM-1152
> URL: https://issues.apache.org/jira/browse/BEAM-1152
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter keep failed breaks 
> Jenkins precommit.
> https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5867/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1152) Java-Maven Archetypes-Starter Build Failed

2016-12-13 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747040#comment-15747040
 ] 

Daniel Halperin commented on BEAM-1152:
---

Thanks Mark! This is fixed by the recent 
https://github.com/apache/incubator-beam/pull/1610

> Java-Maven Archetypes-Starter Build Failed
> --
>
> Key: BEAM-1152
> URL: https://issues.apache.org/jira/browse/BEAM-1152
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter keep failed breaks 
> Jenkins precommit.
> https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5867/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1152) Java-Maven Archetypes-Starter Build Failed

2016-12-13 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1152:
--
Component/s: (was: build-system)

> Java-Maven Archetypes-Starter Build Failed
> --
>
> Key: BEAM-1152
> URL: https://issues.apache.org/jira/browse/BEAM-1152
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter keep failed breaks 
> Jenkins precommit.
> https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5867/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-840) Add Java SDK extension to support non-distributed sorting

2016-12-13 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-840.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Add Java SDK extension to support non-distributed sorting
> -
>
> Key: BEAM-840
> URL: https://issues.apache.org/jira/browse/BEAM-840
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Mitch Shanklin
>Assignee: Mitch Shanklin
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> Add an extension that provides a PTransform which performs 
> local(non-distributed) sorting. It will sort in memory until the buffer is 
> full, then flush to disk and use external sorting.
> 
> Consumes a PCollection of KVs from primary key to iterable of secondary key 
> and value KVs and sorts the iterables. Would probably be called after a 
> GroupByKey. Uses coders to convert secondary keys and values into byte arrays 
> and does a lexicographical comparison on the secondary keys.
> Uses Hadoop as an external sorting library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-909) Starter archetype's pom doesn't include the right dependencies

2016-12-13 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-909.
--
   Resolution: Fixed
Fix Version/s: (was: 0.4.0-incubating)
   Not applicable

> Starter archetype's pom doesn't include the right dependencies
> --
>
> Key: BEAM-909
> URL: https://issues.apache.org/jira/browse/BEAM-909
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Frances Perry
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> Repro:
> $ mvn archetype:generate -DarchetypeGroupId=org.apache.beam 
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter 
> -DarchetypeVersion=LATEST -DgroupId=org.example 
> -DartifactId=beam-starter -Dversion="0.1" -DinteractiveMode=false
> The resulting pom doesn't seem to have dependencies on any runners or a 
> profile for enabling them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-13 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745258#comment-15745258
 ] 

Daniel Halperin edited comment on BEAM-1126 at 12/13/16 2:15 PM:
-

If UnboundedSource supported aggregators/metrics today, would you still want 
this change?

(Assuming the answer is no: )

I don't like the idea of complicating core APIs solely as a short term 
workaround for the issue that metrics aren't supported everywhere. We are 
actively trying to fix metrics to be usable in more places; it seems a better 
long-term solution to wait for [or help with!] that effort.

(Assuming the answer is yes: )

Say more?


was (Author: dhalp...@google.com):
If UnboundedSource supported aggregators/metrics today, would you still want 
this change?

(Assuming the answer is no:)

I don't like the idea of complicating core APIs solely as a short term 
workaround for the issue that metrics aren't supported everywhere. We are 
actively trying to fix metrics to be usable in more places; it seems a better 
long-term solution to wait for [or help with!] that effort.

(Assuming the answer is yes:)

Say more?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-13 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745258#comment-15745258
 ] 

Daniel Halperin commented on BEAM-1126:
---

If UnboundedSource supported aggregators/metrics today, would you still want 
this change?

(Assuming the answer is no:)

I don't like the idea of complicating core APIs solely as a short term 
workaround for the issue that metrics aren't supported everywhere. We are 
actively trying to fix metrics to be usable in more places; it seems a better 
long-term solution to wait for [or help with!] that effort.

(Assuming the answer is yes:)

Say more?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (BEAM-1139) Failures in precommit - Apex & Kryo

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743947#comment-15743947
 ] 

Daniel Halperin edited comment on BEAM-1139 at 12/13/16 2:51 AM:
-

Do we know the PR or other root cause of this precommit break?


was (Author: dhalp...@google.com):
Do we know the PR root cause of this precommit break?

> Failures in precommit - Apex & Kryo
> ---
>
> Key: BEAM-1139
> URL: https://issues.apache.org/jira/browse/BEAM-1139
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex
>Reporter: Kenneth Knowles
>Assignee: Thomas Weise
>Priority: Blocker
>
> https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_MavenInstall/org.apache.beam$beam-examples-java/5775/testReport/junit/org.apache.beam.examples/WordCountIT/testE2EWordCount/
> This is not necessarily a bug in the Apex runner, but it looks like this 
> class cannot be serialized via Kryo while the Apex runner needs it to be. 
> Probably the fix is to roll-forwards a simple change to make it Kryo 
> serializable.
> It is not clear to me the difference between this test run and others. 
> Clearly there is a coverage gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1139) Failures in precommit - Apex & Kryo

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743947#comment-15743947
 ] 

Daniel Halperin commented on BEAM-1139:
---

Do we know the PR root cause of this precommit break?

> Failures in precommit - Apex & Kryo
> ---
>
> Key: BEAM-1139
> URL: https://issues.apache.org/jira/browse/BEAM-1139
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex
>Reporter: Kenneth Knowles
>Assignee: Thomas Weise
>Priority: Blocker
>
> https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_MavenInstall/org.apache.beam$beam-examples-java/5775/testReport/junit/org.apache.beam.examples/WordCountIT/testE2EWordCount/
> This is not necessarily a bug in the Apex runner, but it looks like this 
> class cannot be serialized via Kryo while the Apex runner needs it to be. 
> Probably the fix is to roll-forwards a simple change to make it Kryo 
> serializable.
> It is not clear to me the difference between this test run and others. 
> Clearly there is a coverage gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1071) Support pre-existing tables with streaming BigQueryIO

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742781#comment-15742781
 ] 

Daniel Halperin commented on BEAM-1071:
---

Looks like this is the PR we discussed this in: 
https://github.com/apache/incubator-beam/pull/1459#pullrequestreview-10862503

Requiring {{CREATE_NEVER}} makes sense to me.

> Support pre-existing tables with streaming BigQueryIO
> -
>
> Key: BEAM-1071
> URL: https://issues.apache.org/jira/browse/BEAM-1071
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Sam McVeety
>Priority: Minor
>
> Specifically, with a tableRef function, CREATE_NEVER should be allowed for 
> pre-existing tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-599) Return KafkaIO getWatermark log in debug mode

2016-12-12 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-599:
-
Assignee: Aviem Zur

> Return KafkaIO getWatermark log in debug mode
> -
>
> Key: BEAM-599
> URL: https://issues.apache.org/jira/browse/BEAM-599
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
> Fix For: Not applicable
>
>
> https://issues.apache.org/jira/browse/BEAM-574 removes the getWatermark log 
> line from KafkaIO
> PR: https://github.com/apache/incubator-beam/pull/859
> I actually found this log line useful, instead of removing it completely can 
> we return this log line but change the log level to 'debug'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-599) Return KafkaIO getWatermark log in debug mode

2016-12-12 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-599.
--
   Resolution: Fixed
Fix Version/s: Not applicable

Thanks [~aviemzur]

> Return KafkaIO getWatermark log in debug mode
> -
>
> Key: BEAM-599
> URL: https://issues.apache.org/jira/browse/BEAM-599
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
> Fix For: Not applicable
>
>
> https://issues.apache.org/jira/browse/BEAM-574 removes the getWatermark log 
> line from KafkaIO
> PR: https://github.com/apache/incubator-beam/pull/859
> I actually found this log line useful, instead of removing it completely can 
> we return this log line but change the log level to 'debug'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1127) JmsIO should create an unique source in case of topic

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742684#comment-15742684
 ] 

Daniel Halperin commented on BEAM-1127:
---

Another alternative here might be to automatically create a queue when reading 
from a topic, and then read from the queue in parallel. That way we still get 
scalability without duplication. Thoughts?

> JmsIO should create an unique source in case of topic
> -
>
> Key: BEAM-1127
> URL: https://issues.apache.org/jira/browse/BEAM-1127
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> The JmsIO deals with both queue and topic.
> Currently, we create multiple sources depending of the desired number of 
> splits.
> If this behavior is correct when using a queue (we have concurrent consumers 
> of the queue), it's basically wrong when using a topic (we have multiple 
> subscribers, so potentially duplicated message).
> When using a topic, the JmsIO should use an unique source (to avoid messages 
> duplication).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1120) Prepare Dataflow runner for 0.4.0 release

2016-12-12 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1120:
--
Fix Version/s: (was: 0.4.0-incubating)

> Prepare Dataflow runner for 0.4.0 release
> -
>
> Key: BEAM-1120
> URL: https://issues.apache.org/jira/browse/BEAM-1120
> Project: Beam
>  Issue Type: Improvement
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1120) Prepare Dataflow runner for 0.4.0 release

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742631#comment-15742631
 ] 

Daniel Halperin commented on BEAM-1120:
---

I think we've done enough work on the {{DataflowRunner}} to unblock the next 
release.

> Prepare Dataflow runner for 0.4.0 release
> -
>
> Key: BEAM-1120
> URL: https://issues.apache.org/jira/browse/BEAM-1120
> Project: Beam
>  Issue Type: Improvement
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-12 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742473#comment-15742473
 ] 

Daniel Halperin commented on BEAM-1126:
---

This [thread on the dev 
list|https://lists.apache.org/thread.html/03792d43e94b7d1c342617e64511a62a681b7c2c6797055394ff22a8@%3Cdev.beam.apache.org%3E]
 has the additional context Davor is presumably asking for.

I think the confusion is between human-comprehensible and 
machine-comprehensible. Using {{bytes}} as the measure of backlog was not 
written with PubSub in mind, it was written because bytes is more directly 
related to overhead than events. Using bytes also allows for comparison between 
sources of different types... so {{bytes}} is generally a pretty good signal 
for runners, and better than {{events}}.

If the purpose of exposing {{events}} is purely for human visibility, this is 
probably indeed better done using metric or aggregator reporting. [~bchambers] 
has been thinking most about metrics recently, maybe he has additional thoughts?

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-758) Per-step, per-execution nonce

2016-12-09 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736625#comment-15736625
 ] 

Daniel Halperin commented on BEAM-758:
--

If plausible, I'd also like to not need to expose this in the PipelineOptions 
-- it really complicates the user API. Is there a mechanism we can use to get 
"runner-inserted" values other than user-facing pipeline options?

> Per-step, per-execution nonce
> -
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Sam McVeety
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON 
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random 
> value (nonce). These values are typically generated at apply time, so that 
> they are deterministic (don't change across retries of DoFns) and global (are 
> the same across all workers).
> However, once the runner API lands the existing code would result in the same 
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input. 
> Should work, as along as side inputs are actually checkpointed. But does not 
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated 
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and 
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but 
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded 
> pipelines -- we probably need one. This would enable us to, e.g., use 
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once 
> triggering per window]. Or generalizing to multiple firings...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-758) Per-step, per-execution nonce

2016-12-09 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736622#comment-15736622
 ] 

Daniel Halperin commented on BEAM-758:
--

This indeed seems like a step along the right track.

We also need this to be per-step. (Aka, so two different instances (A & B) of a 
transform in the pipeline consistently use the same ('A nonce' and 'B nonce') 
across machines, and 'A nonce' != 'B nonce'.

> Per-step, per-execution nonce
> -
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Sam McVeety
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON 
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random 
> value (nonce). These values are typically generated at apply time, so that 
> they are deterministic (don't change across retries of DoFns) and global (are 
> the same across all workers).
> However, once the runner API lands the existing code would result in the same 
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input. 
> Should work, as along as side inputs are actually checkpointed. But does not 
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated 
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and 
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but 
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded 
> pipelines -- we probably need one. This would enable us to, e.g., use 
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once 
> triggering per window]. Or generalizing to multiple firings...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1121) Update documentation following rename of PTransform.apply

2016-12-08 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1121:
-

 Summary: Update documentation following rename of PTransform.apply
 Key: BEAM-1121
 URL: https://issues.apache.org/jira/browse/BEAM-1121
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Daniel Halperin
Assignee: Kenneth Knowles
 Fix For: 0.4.0-incubating


Since PTransform#apply does not exist any more, significant website 
documentation may be wrong.

Fix version: 0.4.0-incubating really just means this needs to be done as part 
of the 0.4.0-incubating release, since this change will make it into said 
release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1120) Prepare Dataflow runner for 0.4.0 release

2016-12-08 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1120:
-

 Summary: Prepare Dataflow runner for 0.4.0 release
 Key: BEAM-1120
 URL: https://issues.apache.org/jira/browse/BEAM-1120
 Project: Beam
  Issue Type: Improvement
Reporter: Daniel Halperin
Assignee: Daniel Halperin
 Fix For: 0.4.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1110) Propagate operator- and pipeline-level metadata where possible

2016-12-07 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1110:
-

 Summary: Propagate operator- and pipeline-level metadata where 
possible
 Key: BEAM-1110
 URL: https://issues.apache.org/jira/browse/BEAM-1110
 Project: Beam
  Issue Type: Improvement
  Components: runner-apex
Reporter: Daniel Halperin
Priority: Minor


User metadata from Beam pipelines:

* Aggregators (/ Metrics)
* Step names
* Pipeline options

It would be nice to propagate these metadata to the REST APIs served by the 
stram. From talking to [~sandeepdeshmukh], this seems doable but probably not 
done yet.

Supporting these features will give Beam pipelines a consistent experience 
across runners. See also this [blog 
post|https://cloud.google.com/blog/big-data/2016/06/dataflow-updates-see-more-details-about-your-pipelines]
 for how this is realized in the Cloud Dataflow runner, and a related JIRA 
(BEAM-1107) for the Flink Runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation

2016-12-07 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1108:
-

 Summary: Remove deprecated Dataflow Runner options and update 
documentation
 Key: BEAM-1108
 URL: https://issues.apache.org/jira/browse/BEAM-1108
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Affects Versions: Not applicable
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor
 Fix For: Not applicable


Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} 
configurations, plus improving documentation. Will update bug description as 
more tasks arise.

1. Remove the {{TEARDOWN_POLICY}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730313#comment-15730313
 ] 

Daniel Halperin commented on BEAM-1107:
---

Ack -- I guess I have this intuition there's opportunity for more cleanup, but 
I may be wrong (or it may be a Flink-general, not Beam-on-Flink issue).

E.g., look at the attached screenshot:

* The name (grey) at the top is MapPartition -> Map -> GroupCombine -> Map
* The name of the steps (black) includes the identical as the grey, with 
additionally (step name)
* The Operation: text (small, grey) at the bottom includes the same (almost - 
logical vs physical?) information, although there appears to be some HTML error 
with inserting a break tag inside another break tag.


> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1107:
--
Attachment: screenshot-1.png

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730212#comment-15730212
 ] 

Daniel Halperin commented on BEAM-1107:
---

Also copying [~aljoscha]'s response :)

{quote}
I think we can get it down to "Data Source (ReadLines/Read)" (and similarly for 
other operators). The problem is that the String parameter is not the correct 
way to set the name of the operator but some other (admittedly weird) thing 
called "location name". To set the name we have to call .name(String) on the 
created operator after creating it.
{quote}

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1107:
-

 Summary: Display user names for steps in the Flink Web UI
 Key: BEAM-1107
 URL: https://issues.apache.org/jira/browse/BEAM-1107
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Daniel Halperin
Assignee: Aljoscha Krettek


[copying in-person / email discussion at Strata Singapore to JIRA]


The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
"SDK name" for the transform.

The "user name" for the transform is not available here, it is in fact on the 
TransformHierarchy.Node as node.getFullName() [2].

getFullName() is used some in Flink, but not when setting step names.

I drafted a quick commit that sort of propagates the user names to the web UI 
(but only for DataSource, and still too verbose: 
https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)

Before this change, the "ReadLines" step showed up as: "DataSource (at 
Read(CompressedSource) 
(org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"

With this change, it shows up as "DataSource (at ReadLines/Read 
(org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"

which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].

Thoughts?


[1] 
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
[2] 
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1104) WordCount: Metrics error in the DirectRunner

2016-12-07 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1104:
--
Assignee: Thomas Groh  (was: Davor Bonaci)

> WordCount: Metrics error in the DirectRunner
> 
>
> Key: BEAM-1104
> URL: https://issues.apache.org/jira/browse/BEAM-1104
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
>
> I'm following the Beam quickstart to analyze the pom.xml for the examples 
> archetype in the DirectRunner:
> Generate the project:
> {code}
> mvn archetype:generate \
>   
> -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots 
> \  
>   -DarchetypeGroupId=org.apache.beam \
>   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>   -DarchetypeVersion=LATEST \
>   -DgroupId=org.example \
>   -DartifactId=word-count-beam \
>   -Dversion="0.1" \
>   -Dpackage=org.apache.beam.examples \
>   -DinteractiveMode=false
> {code}
> Count words in the pom.xml:
> {code}
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner
> {code}
> The logs:
> {code}
> INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern pom.xml
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
> getCurrentContainer
> SEVERE: Unable to update metrics on the current thread. Most likely caused by 
> using metrics outside the managed work-execution thread.
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement
> INFO: Initializing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement
> INFO: Finalizing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a.
> {code}
> Presumably, this {{SEVERE}} warning is indicative of a bug (or should be 
> masked).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1104) WordCount: Metrics error in the DirectRunner

2016-12-07 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1104:
-

 Summary: WordCount: Metrics error in the DirectRunner
 Key: BEAM-1104
 URL: https://issues.apache.org/jira/browse/BEAM-1104
 Project: Beam
  Issue Type: Bug
  Components: runner-direct
Reporter: Daniel Halperin
Assignee: Davor Bonaci


I'm following the Beam quickstart to analyze the pom.xml for the examples 
archetype in the DirectRunner:

Generate the project:

{code}
mvn archetype:generate \
  
-DarchetypeRepository=https://repository.apache.org/content/groups/snapshots \  

  -DarchetypeGroupId=org.apache.beam \
  -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
  -DarchetypeVersion=LATEST \
  -DgroupId=org.example \
  -DartifactId=word-count-beam \
  -Dversion="0.1" \
  -Dpackage=org.apache.beam.examples \
  -DinteractiveMode=false
{code}

Count words in the pom.xml:

{code}
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
 -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner
{code}

The logs:

{code}
INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource expandFilePattern
INFO: Matched 1 files for pattern pom.xml
Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
getCurrentContainer
SEVERE: Unable to update metrics on the current thread. Most likely caused by 
using metrics outside the managed work-execution thread.
Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement
INFO: Initializing write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement
INFO: Finalizing write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a.
{code}

Presumably, this {{SEVERE}} warning is indicative of a bug (or should be 
masked).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-812) Shade guava in beam-sdks-java-io-google-cloud-platform

2016-12-06 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727720#comment-15727720
 ] 

Daniel Halperin commented on BEAM-812:
--

This is not likely to be fixed in the 0.4.0 release because Bigtable has not 
yet pushed a release of their jar with the API surface cleanup integrated.

> Shade guava in beam-sdks-java-io-google-cloud-platform
> --
>
> Key: BEAM-812
> URL: https://issues.apache.org/jira/browse/BEAM-812
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> Looking at 0.3.0-incubating RC1, we are not properly shading Guava.
> https://repository.apache.org/content/repositories/staging/org/apache/beam/beam-sdks-java-io-google-cloud-platform/0.3.0-incubating/beam-sdks-java-io-google-cloud-platform-0.3.0-incubating.pom
> has 
> {code}
> 
>   com.google.guava
>   guava
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-812) Shade guava in beam-sdks-java-io-google-cloud-platform

2016-12-06 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-812:
-
Fix Version/s: (was: 0.4.0-incubating)

> Shade guava in beam-sdks-java-io-google-cloud-platform
> --
>
> Key: BEAM-812
> URL: https://issues.apache.org/jira/browse/BEAM-812
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> Looking at 0.3.0-incubating RC1, we are not properly shading Guava.
> https://repository.apache.org/content/repositories/staging/org/apache/beam/beam-sdks-java-io-google-cloud-platform/0.3.0-incubating/beam-sdks-java-io-google-cloud-platform-0.3.0-incubating.pom
> has 
> {code}
> 
>   com.google.guava
>   guava
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1102) Flink Batch Runner does not populate aggregator values

2016-12-06 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1102:
-

 Summary: Flink Batch Runner does not populate aggregator values
 Key: BEAM-1102
 URL: https://issues.apache.org/jira/browse/BEAM-1102
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Affects Versions: 0.3.0-incubating
Reporter: Daniel Halperin
Assignee: Aljoscha Krettek
Priority: Minor


Running the quickstart gives 0 for emptyLines.

Running with {{--streaming=true}} gives the correct value (for my input file, 
the default examples archetype {{pom.xml}}, the true value is 27 at the time of 
writing).

Streaming output:

{code}
Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
INFO: Final aggregator values:
Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
INFO: DroppedDueToLateness : 0
Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
INFO: emptyLines : 27
Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
INFO: DroppedDueToClosedWindow : 0
{code}

Non-streaming output:

{code}
Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
INFO: Final aggregator values:
Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
INFO: emptyLines : 0
{code}

(Note also that the lateness etc. aggregators are missing entirely, may be 
expected).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-02 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15717483#comment-15717483
 ] 

Daniel Halperin commented on BEAM-498:
--

I think we have simply deleted the DatastoreWordCount example, so should just 
delete that link from the README.md.

> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1078) Modifying the links in JavaDocs to point to the Beam github repo rather than GCP

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1078:
--
Issue Type: Improvement  (was: Bug)

> Modifying the links in JavaDocs to point to the Beam github repo rather than 
> GCP 
> -
>
> Key: BEAM-1078
> URL: https://issues.apache.org/jira/browse/BEAM-1078
> Project: Beam
>  Issue Type: Improvement
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1078) Modifying the links in JavaDocs to point to the Beam github repo rather than GCP

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1078:
--
Fix Version/s: Not applicable

> Modifying the links in JavaDocs to point to the Beam github repo rather than 
> GCP 
> -
>
> Key: BEAM-1078
> URL: https://issues.apache.org/jira/browse/BEAM-1078
> Project: Beam
>  Issue Type: Improvement
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1065) FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)

2016-12-02 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15717184#comment-15717184
 ] 

Daniel Halperin commented on BEAM-1065:
---

More impressions based on your list, and in reverse order :).

3. Yes, giving the source implementation the ability to control the starting 
office is a clear win, and can save a seek -- love it! However, this can (and 
should) be done independent of any changes to seekability.

2. Two concerns:
A) I am not certain that a file system that cannot provide a seek can 
provide an open-at-a-nonzero-offset. So I'm not so convinced this is a trivial 
change.
B) Just because the stream is opened at a specific place does not mean the 
user would not want to seek. For example, consider a very efficient reader for 
PDF files. They have an index at the beginning, so you know exactly where every 
page starts. Maybe the "open offset" would be the start of the file, and then 
we would immediate seek to the first page in range. So I think seekability is 
useful.
  Considering the combination of A/B, I would actually be supportive of the 
other direction -- just change the return value of {{open}} to 
{{SeekableByteChannel}} -- requiring that seek be supported. I'm not sure we 
have any examples of filesystems that don't support seeking in practice.

1. This is true, but (see below) I think that {{SeekableByteChannel}} is still 
important.

> FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)
> --
>
> Key: BEAM-1065
> URL: https://issues.apache.org/jira/browse/BEAM-1065
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>
> FileBasedReader should be able to open the file with the 
> Source.getStartOffset(), and then read forward to find the first input 
> element.
> The benefits are:
> 1. It is easier to implement a ReadableByteChannel.
> 2. Dynamically splitting won't require file systems to support seeking.
> 3. Doesn't need to seek to position twice, which is what current API does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-511) Fill in the contribute/technical-vision section of the website

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-511.
--
   Resolution: Fixed
Fix Version/s: Not applicable

Page has been deleted, does not need populating.

> Fill in the contribute/technical-vision section of the website
> --
>
> Key: BEAM-511
> URL: https://issues.apache.org/jira/browse/BEAM-511
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Hadar Hod
> Fix For: Not applicable
>
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-511) Fill in the contribute/technical-vision section of the website

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-511.


> Fill in the contribute/technical-vision section of the website
> --
>
> Key: BEAM-511
> URL: https://issues.apache.org/jira/browse/BEAM-511
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Hadar Hod
> Fix For: Not applicable
>
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1073) Staged websites have extra whitespace around links

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1073.
---
Resolution: Won't Fix

Basically, Jason has looked into this and it's probably not worth fixing.

> Staged websites have extra whitespace around links
> --
>
> Key: BEAM-1073
> URL: https://issues.apache.org/jira/browse/BEAM-1073
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Jason Kuster
>Priority: Minor
> Fix For: Not applicable
>
>
> cc [~davor] [~frances]
> e.g., 
> http://apache-beam-website-pull-requests.storage.googleapis.com/97/documentation/runners/flink/index.html
> has this source when staged:
> {code}
> 
>  
>   Programming Guide
>  
> 
> {code}
> but this source:
> {code}
> Programming Guide
> {code}
> when live. I assume this comes from the rewriting tool we use to make 
> directories work.
> The former (space between end of {{Guide}} and {{}}) is what I assume 
> causes the visual effects.
> NBD, but I spent a while figuring out why someone's PR caused this to happen 
> and then saw it disappear after merging and going live.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-194) Create a walkthrough of Beam examples in mobile gaming domain

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-194.
--
   Resolution: Fixed
Fix Version/s: Not applicable

Closing per request from Hadar. Looks like open PR is an SDK change, not a 
website change, so this seems to make sense.

> Create a walkthrough of Beam examples in mobile gaming domain
> -
>
> Key: BEAM-194
> URL: https://issues.apache.org/jira/browse/BEAM-194
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Devin Donnelly
>Assignee: Hadar Hod
> Fix For: Not applicable
>
>
> The Beam SDKs provide a series of example pipelines in the mobile gaming 
> domain. The Dataflow documentation contains an detailed walkthrough of these 
> examples, explaining the use case, pipeline design, and some of the code.
> Port these examples to the Beam website for Beam users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-906) Fill in the get-started/downloads portion of the website

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-906.
--
   Resolution: Fixed
Fix Version/s: Not applicable

Done as part of https://github.com/apache/incubator-beam-site/pull/85

> Fill in the get-started/downloads portion of the website
> 
>
> Key: BEAM-906
> URL: https://issues.apache.org/jira/browse/BEAM-906
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Hadar Hod
>Assignee: Hadar Hod
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1079) Validation speed for really large number of files

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1079.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Validation speed for really large number of files
> -
>
> Key: BEAM-1079
> URL: https://issues.apache.org/jira/browse/BEAM-1079
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>Priority: Trivial
> Fix For: Not applicable
>
>
> Filebased source during validation at pipeline creation does a full glob when 
> it only needs to validate atleast one file. So create a limit parameter to 
> make this faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-914) Update Beam Overview doc (landing page)

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-914.
--
   Resolution: Fixed
Fix Version/s: Not applicable

https://github.com/apache/incubator-beam-site/pull/93

> Update Beam Overview doc (landing page)
> ---
>
> Key: BEAM-914
> URL: https://issues.apache.org/jira/browse/BEAM-914
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Hadar Hod
>Assignee: Hadar Hod
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1060) Make DoFnTester use new DoFn

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1060.
---
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Make DoFnTester use new DoFn
> 
>
> Key: BEAM-1060
> URL: https://issues.apache.org/jira/browse/BEAM-1060
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 0.4.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-505) Fill in the documentation/runners/direct portion of the website

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-505.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Fill in the documentation/runners/direct portion of the website
> ---
>
> Key: BEAM-505
> URL: https://issues.apache.org/jira/browse/BEAM-505
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Frances Perry
>Assignee: Melissa Pashniak
> Fix For: Not applicable
>
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for the Direct runner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-904) Dataflow setup instructions

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-904.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Dataflow setup instructions
> ---
>
> Key: BEAM-904
> URL: https://issues.apache.org/jira/browse/BEAM-904
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Frances Perry
>Assignee: Melissa Pashniak
> Fix For: Not applicable
>
>
> As you are working on the Dataflow Runner page, please include the getting 
> started instructions, as I'm linking there from the quickstart. Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-508) Fill in the documentation/runners/dataflow portion of the website

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-508.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Fill in the documentation/runners/dataflow portion of the website
> -
>
> Key: BEAM-508
> URL: https://issues.apache.org/jira/browse/BEAM-508
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Melissa Pashniak
> Fix For: Not applicable
>
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for Dataflow-runner-specific content



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-505) Fill in the documentation/runners/direct portion of the website

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-505:
-
Issue Type: Task  (was: Bug)

> Fill in the documentation/runners/direct portion of the website
> ---
>
> Key: BEAM-505
> URL: https://issues.apache.org/jira/browse/BEAM-505
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Frances Perry
>Assignee: Melissa Pashniak
> Fix For: Not applicable
>
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for the Direct runner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-277) Add Transforms Section

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-277.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Add Transforms Section
> --
>
> Key: BEAM-277
> URL: https://issues.apache.org/jira/browse/BEAM-277
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Devin Donnelly
>Assignee: Melissa Pashniak
> Fix For: Not applicable
>
>
> Document general transforms usage and ParDo usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-239) Rename RemoveDuplicates to Distinct

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-239.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Rename RemoveDuplicates to Distinct
> ---
>
> Key: BEAM-239
> URL: https://issues.apache.org/jira/browse/BEAM-239
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: backward-incompatible, newbie, starter
> Fix For: 0.4.0-incubating
>
>
> I had a really tough time finding this transform in the docs. I suggest 
> changing this class' name to Distinct instead of RemoveDuplicates. At the 
> very least, the JavaDoc for RemoveDuplicates should have the word distinct in 
> it to make this more findable/searchable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-879) Renaming DeDupExample to DistinctExample

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-879.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Renaming DeDupExample to DistinctExample
> 
>
> Key: BEAM-879
> URL: https://issues.apache.org/jira/browse/BEAM-879
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> In BEAM-239, we renamed DeDupExampleTest to DistinctExampleTest.
> Need to modify DeDupExample to DistinctExample as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-641) Need to test the generated archetypes projects

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-641:
-
Priority: Major  (was: Critical)

> Need to test the generated archetypes projects
> --
>
> Key: BEAM-641
> URL: https://issues.apache.org/jira/browse/BEAM-641
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Pei He
>Assignee: Pei He
> Fix For: 0.4.0-incubating
>
>
> Travis and Jenkins pre-submits don't test building the generated archetypes 
> projects.
> Currently, changes to archetypes have to be manually verified by:
> mvn archetype:generate \
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
> -DarchetypeGroupId=org.apache.beam \
> -DarchetypeVersion=0.3.0-incubating-SNAPSHOT \
> -DgroupId=com.example \
> -DartifactId=first-beam \
> -Dversion="0.3.0-incubating-SNAPSHOT" \
> -DinteractiveMode=false \
> -Dpackage=org.apache.beam.examples
> and did "mvn clean install" in first-beam project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-682) Invoker Class should be created in Thread Context Classloader

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-682:
-
Priority: Major  (was: Critical)

> Invoker Class should be created in Thread Context Classloader
> -
>
> Key: BEAM-682
> URL: https://issues.apache.org/jira/browse/BEAM-682
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 0.3.0-incubating
>Reporter: Sumit Chawla
>Assignee: Sumit Chawla
>
> As of now the InvokerClass is being loaded in wrong classloader. It should be 
> loaded into Thread.currentThread.getContextClassLoader()
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvokers.java#L167
> {code}
>  Class> res =
> (Class>)
> unloaded
> .load(DoFnInvokers.class.getClassLoader(), 
> ClassLoadingStrategy.Default.INJECTION)
> .getLoaded();
> {code}
> Fix 
> {code}
>  Class> res =
> (Class>)
> unloaded
> .load(Thread.currentThread().getContextClassLoader(),
> ClassLoadingStrategy.Default.INJECTION)
> .getLoaded();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-555) Documentation in BiqQueryIO.java has awkward cut-and-paste error.

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-555.
--
   Resolution: Fixed
 Assignee: Frank Yellin  (was: Neelesh Srinivas Salian)
Fix Version/s: Not applicable

Was implemented in https://github.com/apache/incubator-beam/pull/840

This PR was tagged [Beam-555] instead of [BEAM-555] which apparently does not 
trigger the JIRA tagging.

> Documentation in BiqQueryIO.java has awkward cut-and-paste error.
> -
>
> Key: BEAM-555
> URL: https://issues.apache.org/jira/browse/BEAM-555
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Frank Yellin
>Assignee: Frank Yellin
>Priority: Trivial
> Fix For: Not applicable
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Twice in the documentation, the sample code reads from 
> samples.weather_stations and called the resulting TableRow "shakespeare".
> I suspect that these lines of code were copied from a different example, and 
> then only partially modified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-555) Documentation in BiqQueryIO.java has awkward cut-and-paste error.

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-555:
-
Assignee: Neelesh Srinivas Salian  (was: Frank Yellin)

> Documentation in BiqQueryIO.java has awkward cut-and-paste error.
> -
>
> Key: BEAM-555
> URL: https://issues.apache.org/jira/browse/BEAM-555
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Frank Yellin
>Assignee: Neelesh Srinivas Salian
>Priority: Trivial
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Twice in the documentation, the sample code reads from 
> samples.weather_stations and called the resulting TableRow "shakespeare".
> I suspect that these lines of code were copied from a different example, and 
> then only partially modified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-551) Support Dynamic PipelineOptions

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-551:
-
Assignee: Sam McVeety  (was: Frances Perry)

> Support Dynamic PipelineOptions
> ---
>
> Key: BEAM-551
> URL: https://issues.apache.org/jira/browse/BEAM-551
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Sam McVeety
>Assignee: Sam McVeety
>Priority: Minor
>
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1031) Starter archetype uses OldDoFn

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1031.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Starter archetype uses OldDoFn
> --
>
> Key: BEAM-1031
> URL: https://issues.apache.org/jira/browse/BEAM-1031
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: Not applicable
>
>
> The starter archetype should give a positive first impression of Beam. The 
> starter pipeline uses OldDoFn instead of the new DoFn. We should convert it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-939) New credentials code broke Dataflow runner

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-939.
--
Resolution: Fixed

> New credentials code broke Dataflow runner
> --
>
> Key: BEAM-939
> URL: https://issues.apache.org/jira/browse/BEAM-939
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> https://builds.apache.org/view/Beam/job/beam_PostCommit_MavenVerify/1753/
> {code}
> java.lang.NoSuchMethodError: 
> com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(Lcom/google/api/client/http/HttpTransport;)Lcom/google/auth/oauth2/GoogleCredentials;
>   at 
> com.google.cloud.bigtable.config.CredentialFactory.getApplicationDefaultCredential(CredentialFactory.java:207)
>   at 
> com.google.cloud.bigtable.config.CredentialFactory.getCredentials(CredentialFactory.java:112)
>   at 
> com.google.cloud.bigtable.grpc.io.CredentialInterceptorCache.getCredentialsInterceptor(CredentialInterceptorCache.java:94)
>   at 
> com.google.cloud.bigtable.grpc.BigtableSession.(BigtableSession.java:272)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableServiceImpl.tableExists(BigtableServiceImpl.java:81)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Read.validate(BigtableIO.java:296)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Read.validate(BigtableIO.java:185)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:399)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:307)
>   at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
>   at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:158)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableReadIT.testE2EBigtableRead(BigtableReadIT.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-857) Add documentation for configuring Checkstyle-IDEA

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-857.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Add documentation for configuring Checkstyle-IDEA
> -
>
> Key: BEAM-857
> URL: https://issues.apache.org/jira/browse/BEAM-857
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: Not applicable
>
>
> Because the suppressions file is specified by property, it is not a valid 
> {{checkstyle.xml}} for tools that do not set the property, which includes 
> IntelliJ's plugin.
> The UI does prompt for a value for the property, but ideally the checkstyle 
> file would work standalone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-857) Add documentation for configuring Checkstyle-IDEA

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-857:
-
Summary: Add documentation for configuring Checkstyle-IDEA  (was: Beam's 
checkstyle.xml broken for Checkstyle-IDEA)

> Add documentation for configuring Checkstyle-IDEA
> -
>
> Key: BEAM-857
> URL: https://issues.apache.org/jira/browse/BEAM-857
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Scott Wegner
>Priority: Minor
>
> Because the suppressions file is specified by property, it is not a valid 
> {{checkstyle.xml}} for tools that do not set the property, which includes 
> IntelliJ's plugin.
> The UI does prompt for a value for the property, but ideally the checkstyle 
> file would work standalone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-857) Beam's checkstyle.xml broken for Checkstyle-IDEA

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-857:
-
Issue Type: Improvement  (was: Bug)

> Beam's checkstyle.xml broken for Checkstyle-IDEA
> 
>
> Key: BEAM-857
> URL: https://issues.apache.org/jira/browse/BEAM-857
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Scott Wegner
>Priority: Minor
>
> Because the suppressions file is specified by property, it is not a valid 
> {{checkstyle.xml}} for tools that do not set the property, which includes 
> IntelliJ's plugin.
> The UI does prompt for a value for the property, but ideally the checkstyle 
> file would work standalone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-351) Add DisplayData to KafkaIO

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-351:
-
Issue Type: Improvement  (was: Bug)

> Add DisplayData to KafkaIO
> --
>
> Key: BEAM-351
> URL: https://issues.apache.org/jira/browse/BEAM-351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Assignee: James Malone
>Priority: Minor
>  Labels: starter
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-351) Add DisplayData to KafkaIO

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-351:
-
Assignee: (was: James Malone)

> Add DisplayData to KafkaIO
> --
>
> Key: BEAM-351
> URL: https://issues.apache.org/jira/browse/BEAM-351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Priority: Minor
>  Labels: starter
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-246) Streamline default Maven profile for user efficiency

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-246:
-
Summary: Streamline default Maven profile for user efficiency  (was: Do not 
build source jar, javadoc, in default Maven profile.)

> Streamline default Maven profile for user efficiency
> 
>
> Key: BEAM-246
> URL: https://issues.apache.org/jira/browse/BEAM-246
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> We should optimize the default maven profile for interactive human use. Right 
> now it does a lot of things that waste time & CPU and do not speed 
> development, like building the source jar and javadoc.
> We certainly still want to build all of these things in release tests, 
> probably in post-commit tests, and case-by-case in pre-commit tests. But for 
> these we can set up a profile to activate them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-246) Do not build source jar, javadoc, in default Maven profile.

2016-12-02 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716160#comment-15716160
 ] 

Daniel Halperin commented on BEAM-246:
--

Implemented over a few weeks in 
https://github.com/apache/incubator-beam/pull/1295 
https://github.com/apache/incubator-beam/pull/1292 
https://github.com/apache/incubator-beam/pull/1290 
https://github.com/apache/incubator-beam/pull/1285 
https://github.com/apache/incubator-beam/pull/1239

> Do not build source jar, javadoc, in default Maven profile.
> ---
>
> Key: BEAM-246
> URL: https://issues.apache.org/jira/browse/BEAM-246
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> We should optimize the default maven profile for interactive human use. Right 
> now it does a lot of things that waste time & CPU and do not speed 
> development, like building the source jar and javadoc.
> We certainly still want to build all of these things in release tests, 
> probably in post-commit tests, and case-by-case in pre-commit tests. But for 
> these we can set up a profile to activate them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-246) Do not build source jar, javadoc, in default Maven profile.

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-246.
--
   Resolution: Fixed
 Assignee: Daniel Halperin
Fix Version/s: 0.4.0-incubating

> Do not build source jar, javadoc, in default Maven profile.
> ---
>
> Key: BEAM-246
> URL: https://issues.apache.org/jira/browse/BEAM-246
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> We should optimize the default maven profile for interactive human use. Right 
> now it does a lot of things that waste time & CPU and do not speed 
> development, like building the source jar and javadoc.
> We certainly still want to build all of these things in release tests, 
> probably in post-commit tests, and case-by-case in pre-commit tests. But for 
> these we can set up a profile to activate them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-480) Use BigQueryServices abstraction in BigQueryIO

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-480.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Use BigQueryServices abstraction in BigQueryIO
> --
>
> Key: BEAM-480
> URL: https://issues.apache.org/jira/browse/BEAM-480
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Pei He
>Assignee: Pei He
>Priority: Minor
> Fix For: Not applicable
>
>
> There are legacy code that sent request to BigQuery directly.
> They should be moved to use BigQueryServices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-580) Add a Datastore delete example

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-580.
--
   Resolution: Duplicate
Fix Version/s: Not applicable

> Add a Datastore delete example
> --
>
> Key: BEAM-580
> URL: https://issues.apache.org/jira/browse/BEAM-580
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-989) GcsUtil batch remove() and copy() need tests.

2016-12-02 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716143#comment-15716143
 ] 

Daniel Halperin commented on BEAM-989:
--

There was a pretty long discussion on 
https://github.com/apache/incubator-beam/pull/826 (and Luke and I did some pair 
code review to try to figure this out) and we could not come up with any decent 
ways to test anything involving {{BatchRequest}} because of the way the test 
classes are structured.

Do you have specific improvements to make here?

Otherwise, suggest closing as infeasible.

> GcsUtil batch remove() and copy() need tests.
> -
>
> Key: BEAM-989
> URL: https://issues.apache.org/jira/browse/BEAM-989
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp
>Reporter: Pei He
>Assignee: Pei He
>
> remove(), copy(), and their callbacks are not tested with executeBatches().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (BEAM-989) GcsUtil batch remove() and copy() need tests.

2016-12-02 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716143#comment-15716143
 ] 

Daniel Halperin edited comment on BEAM-989 at 12/2/16 8:08 PM:
---

There was a pretty long discussion on 
https://github.com/apache/incubator-beam/pull/826 (and Luke and I did some pair 
code review to try to figure this out) and we could not come up with any decent 
ways to test anything involving {{BatchRequest}} because of the way the client 
classes are structured.

Do you have specific improvements to make here?

Otherwise, suggest closing as infeasible.


was (Author: dhalp...@google.com):
There was a pretty long discussion on 
https://github.com/apache/incubator-beam/pull/826 (and Luke and I did some pair 
code review to try to figure this out) and we could not come up with any decent 
ways to test anything involving {{BatchRequest}} because of the way the test 
classes are structured.

Do you have specific improvements to make here?

Otherwise, suggest closing as infeasible.

> GcsUtil batch remove() and copy() need tests.
> -
>
> Key: BEAM-989
> URL: https://issues.apache.org/jira/browse/BEAM-989
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp
>Reporter: Pei He
>Assignee: Pei He
>
> remove(), copy(), and their callbacks are not tested with executeBatches().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1067) apex.examples.WordCountTest.testWordCountExample may be flaky

2016-12-02 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716134#comment-15716134
 ] 

Daniel Halperin commented on BEAM-1067:
---

[~thw] can you take a look?

> apex.examples.WordCountTest.testWordCountExample may be flaky
> -
>
> Key: BEAM-1067
> URL: https://issues.apache.org/jira/browse/BEAM-1067
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Stas Levin
>Assignee: Thomas Weise
>
> Seems that 
> {{org.apache.beam.runners.apex.examples.WordCountTest.testWordCountExample}} 
> is flaky.
> For example, 
> [this|https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5408/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.runners.apex.examples/WordCountTest/testWordCountExample/
>  ] run failed although no changes were made in {{runner-apex}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1067) apex.examples.WordCountTest.testWordCountExample may be flaky

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1067:
--
Assignee: Thomas Weise

> apex.examples.WordCountTest.testWordCountExample may be flaky
> -
>
> Key: BEAM-1067
> URL: https://issues.apache.org/jira/browse/BEAM-1067
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Stas Levin
>Assignee: Thomas Weise
>
> Seems that 
> {{org.apache.beam.runners.apex.examples.WordCountTest.testWordCountExample}} 
> is flaky.
> For example, 
> [this|https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5408/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.runners.apex.examples/WordCountTest/testWordCountExample/
>  ] run failed although no changes were made in {{runner-apex}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-223) KafkaIO: don't use SerializableCoder

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-223.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> KafkaIO: don't use SerializableCoder
> 
>
> Key: BEAM-223
> URL: https://issues.apache.org/jira/browse/BEAM-223
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Daniel Halperin
>Assignee: Raghu Angadi
> Fix For: 0.4.0-incubating
>
>
> Reuven says:
> {quote}
> I noticed that we're using SerializableCoder for the checkpoint mark in 
> KafkaIO. This is generally highly discouraged in streaming pipelines. 
> Partially because it's inefficient, but more importantly because Java 
> serialization is not guaranteed to be stable. If a user updates their 
> pipeline, the new pipeline may not be able to decode the existing checkpoint 
> marks; this will either cause exceptions to be thrown, or data loss.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-747) Text checksum verifier is not resilient to eventually consistent filesystems

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-747.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Text checksum verifier is not resilient to eventually consistent filesystems
> 
>
> Key: BEAM-747
> URL: https://issues.apache.org/jira/browse/BEAM-747
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Mark Liu
> Fix For: 0.4.0-incubating
>
>
> Example 1: 
> https://builds.apache.org/job/beam_PreCommit_MavenVerify/3934/org.apache.beam$beam-examples-java/console
> Here it looks like we need to retry listing files, at least a little bit, if 
> none are found. They did show up:
> {code}
> gsutil ls 
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results\*
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-0-of-3
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-1-of-3
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-2-of-3
> {code}
> Example 2: 
> https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$beam-examples-java/1525/testReport/junit/org.apache.beam.examples/WordCountIT/testE2EWordCount/
> Here it looks like we need to fill in the shard template if the filesystem 
> does not give us a consistent result:
> {code}
> Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher 
> readLines
> INFO: [0 of 1] Read 162 lines from file: 
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-0-of-3
> Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher 
> readLines
> INFO: [1 of 1] Read 144 lines from file: 
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-2-of-3
> Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher 
> matchesSafely
> INFO: Generated checksum for output data: 
> aec68948b2515e6ea35fd1ed7649c267a10a01e5
> {code}
> We missed shard 1-of-3 and hence got the wrong checksum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-252) Make Regex Transform

2016-12-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-252.
--
   Resolution: Fixed
 Assignee: Jesse Anderson  (was: Kenneth Knowles)
Fix Version/s: 0.4.0-incubating

Implemented!

> Make Regex Transform
> 
>
> Key: BEAM-252
> URL: https://issues.apache.org/jira/browse/BEAM-252
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Assignee: Jesse Anderson
> Fix For: 0.4.0-incubating
>
>
> There needs to be an easier way to run Regular Expressions as part of a 
> transform. This will make string-based ETL much easier.
> The transform should support using the matches and find methods. The 
> transform should allow you to choose a group in the regex to output. The 
> transform should allow single strings to be output or KV's of strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >