[jira] [Closed] (BEAM-2321) gRPC configuration failure using DataflowRunner and Bigtable

2017-05-24 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2321.
-
   Resolution: Not A Problem
Fix Version/s: Not applicable

> gRPC configuration failure using DataflowRunner and Bigtable
> 
>
> Key: BEAM-2321
> URL: https://issues.apache.org/jira/browse/BEAM-2321
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.0.0
>Reporter: Nigel Kilmer
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> I'm attempting to run a pipeline that uses the DataflowRunner and writes to 
> Bigtable (v0.9.6.2). This exception is thrown (looks like it's when the 
> BigtableSession is being created):
> java.lang.IllegalArgumentException: Jetty ALPN/NPN has not been properly 
> configured.
>   at 
> io.grpc.netty.GrpcSslContexts.selectApplicationProtocolConfig(GrpcSslContexts.java:174)
>   at io.grpc.netty.GrpcSslContexts.configure(GrpcSslContexts.java:151)
>   at io.grpc.netty.GrpcSslContexts.configure(GrpcSslContexts.java:139)
>   at io.grpc.netty.GrpcSslContexts.forClient(GrpcSslContexts.java:109)
>   at 
> com.google.cloud.bigtable.grpc.BigtableSession.createSslContext(BigtableSession.java:124)
>   at 
> com.google.cloud.bigtable.grpc.BigtableSession.access$000(BigtableSession.java:81)
>   at 
> com.google.cloud.bigtable.grpc.BigtableSession$2.run(BigtableSession.java:151)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> I can run the same pipeline without issue using the DirectRunner instead, and 
> it was also working for me using the 0.7.0 snapshot of Beam last week. I've 
> already checked with the cloud-bigtable-client project; they said that it 
> should be working since I have a dependency on netty_tcnative configured. The 
> fact that the same pipeline works with the DirectRunner and not with the 
> DataflowRunner makes me think it's a DataflowRunner bug.
> My pipeline is pretty simple; it looks like this:
> Pipeline p = Pipeline.create(gcpOptions);
> p.apply(PubsubIO.readProtos(TestProto.class)
> .fromSubscription(pubsubSubscription))
>   .apply(ParDo.of(new BigtableMutationTransform()))
>   
> .apply(BigtableIO.write().withBigtableOptions(bigtableOptionsBuilder).withTableId("table_id"));
> p.run();
> Let me know if you need more context.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1773) Consider allowing Source#validate() to throw exception

2017-05-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1773:
--
Labels: backward-incompatible  (was: )

> Consider allowing Source#validate() to throw exception
> --
>
> Key: BEAM-1773
> URL: https://issues.apache.org/jira/browse/BEAM-1773
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: backward-incompatible
> Attachments: 1773.v4.patch, beam-1773.v1.patch, beam-1773.v2.patch
>
>
> In HDFSFileSource.java :
> {code}
>   @Override
>   public void validate() {
> ...
>   } catch (IOException | InterruptedException e) {
> throw new RuntimeException(e);
>   }
> {code}
> Source#validate() should be allowed to throw exception so that we don't 
> resort to using RuntimeException.
> Here was related thread on mailing list:
> http://search-hadoop.com/m/Beam/gfKHFOwE0uETxae?subj=Re+why+Source+validate+is+not+declared+to+throw+any+exception



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2364) PCollectionView should not be a PValue, it should expand to its underlying PCollection

2017-05-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2364:
--
Labels: backward-incompatible  (was: )

> PCollectionView should not be a PValue, it should expand to its underlying 
> PCollection
> --
>
> Key: BEAM-2364
> URL: https://issues.apache.org/jira/browse/BEAM-2364
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>  Labels: backward-incompatible
>
> This bug has been present for a very long time.
> It is a change to {{@Internal}} details but is going to be 
> backwards-incompatible in a couple ways there, because backwards compatible 
> behavior is incorrect. But we need to be very careful with the surrounding 
> logic.
> The particular motivating need is that as long as a PCollectionView expands 
> to only itself, the outputs for surgery and serialization are not correct. 
> There may be a solution involving hardcoded logic, TBD.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2362) Inline code font formatting is applied to the following space.

2017-05-25 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025406#comment-16025406
 ] 

Daniel Halperin commented on BEAM-2362:
---

Is this true on the actual Beam site, or only in staged web pages?

I believe the staging issue is a known bug with an existing JIRA somewhere.

Looking at the website, it appears that this is not a problem. Please reopen 
(with a link and screenshot to live beam.apache.org) if this is wrong.

> Inline code font formatting is applied to the following space.
> --
>
> Key: BEAM-2362
> URL: https://issues.apache.org/jira/browse/BEAM-2362
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> If you do this in markdown for the website:
> {code}
> This is some `inline code` to demonstrate the bug.
> {code}
> The result will be formatted as though you typed this
> This is some {{inline code }}to demonstrate the bug.
> (JIRA doesn't actually code format when spacing is like this)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2362) Inline code font formatting is applied to the following space.

2017-05-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2362.
-
   Resolution: Won't Fix
 Assignee: Kenneth Knowles  (was: Davor Bonaci)
Fix Version/s: Not applicable

> Inline code font formatting is applied to the following space.
> --
>
> Key: BEAM-2362
> URL: https://issues.apache.org/jira/browse/BEAM-2362
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: Not applicable
>
>
> If you do this in markdown for the website:
> {code}
> This is some `inline code` to demonstrate the bug.
> {code}
> The result will be formatted as though you typed this
> This is some {{inline code }}to demonstrate the bug.
> (JIRA doesn't actually code format when spacing is like this)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2311) Update compatibility matrix Aggregators section for Metrics

2017-05-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2311.
-
Resolution: Done

> Update compatibility matrix Aggregators section for Metrics
> ---
>
> Key: BEAM-2311
> URL: https://issues.apache.org/jira/browse/BEAM-2311
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Daniel Halperin
>Assignee: Pablo Estrada
> Fix For: Not applicable
>
>
> CC: [~bchambers] [~aviemzur] [~pabloem]
> It looks like the definition of Aggregators needs to be updated post 2.0.0 
> and the switch to Metrics.
> https://github.com/apache/beam-site/blob/asf-site/src/_data/capability-matrix.yml#L205



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2336) Consider removing the default coder from TfRecordIO

2017-05-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2336:
--
Labels: backward-incompatible  (was: )

> Consider removing the default coder from TfRecordIO
> ---
>
> Key: BEAM-2336
> URL: https://issues.apache.org/jira/browse/BEAM-2336
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: backward-incompatible
> Fix For: Not applicable
>
>
> This might make more sense from a user perspective. 
> Note that this will be a breaking change, and should not be addressed until 
> the next breaking release (3.0)
> IO in java needs to be similarly updated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2362) Inline code font formatting is applied to the following space when site is staged by Jenkins

2017-05-26 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026432#comment-16026432
 ] 

Daniel Halperin commented on BEAM-2362:
---

Found it! [BEAM-1073]

(Sorry, should have linked yesterday)

> Inline code font formatting is applied to the following space when site is 
> staged by Jenkins
> 
>
> Key: BEAM-2362
> URL: https://issues.apache.org/jira/browse/BEAM-2362
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: Not applicable
>
>
> If you do this in markdown for the website:
> {code}
> This is some `inline code` to demonstrate the bug.
> {code}
> The result will be formatted as though you typed this
> This is some {{inline code }}to demonstrate the bug.
> (JIRA doesn't actually code format when spacing is like this)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-2362) Inline code font formatting is applied to the following space when site is staged by Jenkins

2017-05-26 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-2362.
---
Resolution: Duplicate

> Inline code font formatting is applied to the following space when site is 
> staged by Jenkins
> 
>
> Key: BEAM-2362
> URL: https://issues.apache.org/jira/browse/BEAM-2362
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: Not applicable
>
>
> If you do this in markdown for the website:
> {code}
> This is some `inline code` to demonstrate the bug.
> {code}
> The result will be formatted as though you typed this
> This is some {{inline code }}to demonstrate the bug.
> (JIRA doesn't actually code format when spacing is like this)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1073) Staged websites have extra whitespace around links

2017-05-26 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026479#comment-16026479
 ] 

Daniel Halperin commented on BEAM-1073:
---

The code for this is in the {{beam-site}} repo under the jenkins folder: 
https://github.com/apache/beam-site/blob/asf-site/.jenkins/append_index_html_to_internal_links.py



> Staged websites have extra whitespace around links
> --
>
> Key: BEAM-1073
> URL: https://issues.apache.org/jira/browse/BEAM-1073
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Jason Kuster
>Priority: Minor
> Fix For: Not applicable
>
>
> cc [~davor] [~frances]
> e.g., 
> http://apache-beam-website-pull-requests.storage.googleapis.com/97/documentation/runners/flink/index.html
> has this source when staged:
> {code}
> 
>  
>   Programming Guide
>  
> 
> {code}
> but this source:
> {code}
> Programming Guide
> {code}
> when live. I assume this comes from the rewriting tool we use to make 
> directories work.
> The former (space between end of {{Guide}} and {{}}) is what I assume 
> causes the visual effects.
> NBD, but I spent a while figuring out why someone's PR caused this to happen 
> and then saw it disappear after merging and going live.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2370) BigQuery Insert with Partition Decorator throwing error

2017-05-26 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026517#comment-16026517
 ] 

Daniel Halperin commented on BEAM-2370:
---

[~reuvenlax], can you take a look at this issue?

> BigQuery Insert with Partition Decorator throwing error
> ---
>
> Key: BEAM-2370
> URL: https://issues.apache.org/jira/browse/BEAM-2370
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
> Environment: DirectRunner
>Reporter: Andre
>Assignee: Reuven Lax
>
> Running a DataFlow job with the DirectRunner which is inserting data into a 
> partitioned table using decorators throws the following error multiple times 
> BUT still inserts records into the right partition.
> {code:java|title=Error}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl executeWithRetries
> INFO: Ignore the error and retry the request.
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> {
>   "code" : 400,
>   "errors" : [ {
> "domain" : "global",
> "message" : "Invalid table ID \"mytable_orders$20170516\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> "reason" : "invalid"
>   } ],
>   "message" : "Invalid table ID \"mytable_orders$20170516\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used."
> }
>   at 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
> {code}
> {code:java|title=Code}
> // Write TableRows to BQ
> rows.apply("TransformationStep", ParDo.of(new Outputter()))
>  .apply("WindowDaily", Window.  into(CalendarWindows.days(1)))
>  .apply("WriteToBQ", BigQueryIO.writeTableRows()
>   .to(new SerializableFunction  , 
> TableDestination> () {
>private static final long serialVersionUID = 8196602721734820219 L;
>@Override
>public TableDestination apply(ValueInSingleWindow  value) {
> String dayString = 
> DateTimeFormat.forPattern("MMdd").withZone(DateTimeZone.UTC)
>  .print(((IntervalWindow) value.getWindow()).start());
> TableDestination td = new 
> TableDestination("my-project:dataset.mytable_orders$" + dayString, "");
> return td;
>}
>   }).withSchema(mySchema)
>   .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
>   .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2370) BigQuery Insert with Partition Decorator throwing error

2017-05-26 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2370:
-

Assignee: Reuven Lax  (was: Daniel Halperin)

> BigQuery Insert with Partition Decorator throwing error
> ---
>
> Key: BEAM-2370
> URL: https://issues.apache.org/jira/browse/BEAM-2370
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
> Environment: DirectRunner
>Reporter: Andre
>Assignee: Reuven Lax
>
> Running a DataFlow job with the DirectRunner which is inserting data into a 
> partitioned table using decorators throws the following error multiple times 
> BUT still inserts records into the right partition.
> {code:java|title=Error}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl executeWithRetries
> INFO: Ignore the error and retry the request.
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> {
>   "code" : 400,
>   "errors" : [ {
> "domain" : "global",
> "message" : "Invalid table ID \"mytable_orders$20170516\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> "reason" : "invalid"
>   } ],
>   "message" : "Invalid table ID \"mytable_orders$20170516\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used."
> }
>   at 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
> {code}
> {code:java|title=Code}
> // Write TableRows to BQ
> rows.apply("TransformationStep", ParDo.of(new Outputter()))
>  .apply("WindowDaily", Window.  into(CalendarWindows.days(1)))
>  .apply("WriteToBQ", BigQueryIO.writeTableRows()
>   .to(new SerializableFunction  , 
> TableDestination> () {
>private static final long serialVersionUID = 8196602721734820219 L;
>@Override
>public TableDestination apply(ValueInSingleWindow  value) {
> String dayString = 
> DateTimeFormat.forPattern("MMdd").withZone(DateTimeZone.UTC)
>  .print(((IntervalWindow) value.getWindow()).start());
> TableDestination td = new 
> TableDestination("my-project:dataset.mytable_orders$" + dayString, "");
> return td;
>}
>   }).withSchema(mySchema)
>   .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
>   .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1073) Staged websites have extra whitespace around links

2017-05-26 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-1073.
-
Resolution: Fixed

Thanks Kenn! This is a nice simple fix to an issue that was, for some reason, 
believed to be hard ;)

> Staged websites have extra whitespace around links
> --
>
> Key: BEAM-1073
> URL: https://issues.apache.org/jira/browse/BEAM-1073
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: Not applicable
>
>
> cc [~davor] [~frances]
> e.g., 
> http://apache-beam-website-pull-requests.storage.googleapis.com/97/documentation/runners/flink/index.html
> has this source when staged:
> {code}
> 
>  
>   Programming Guide
>  
> 
> {code}
> but this source:
> {code}
> Programming Guide
> {code}
> when live. I assume this comes from the rewriting tool we use to make 
> directories work.
> The former (space between end of {{Guide}} and {{}}) is what I assume 
> causes the visual effects.
> NBD, but I spent a while figuring out why someone's PR caused this to happen 
> and then saw it disappear after merging and going live.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2372) Only run Apache RAT at root module

2017-05-26 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-2372:
-

 Summary: Only run Apache RAT at root module
 Key: BEAM-2372
 URL: https://issues.apache.org/jira/browse/BEAM-2372
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Daniel Halperin
Assignee: Daniel Halperin
 Fix For: Not applicable


Apache RAT checks all files in the project, yet we run it at the root and then 
again in every submodule. This seems overkill.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2372) Only run Apache RAT at root module

2017-05-26 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2372.
-
Resolution: Done

> Only run Apache RAT at root module
> --
>
> Key: BEAM-2372
> URL: https://issues.apache.org/jira/browse/BEAM-2372
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> Apache RAT checks all files in the project, yet we run it at the root and 
> then again in every submodule. This seems overkill.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2379) SpannerIO tests are failing

2017-05-30 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029965#comment-16029965
 ] 

Daniel Halperin commented on BEAM-2379:
---

Feel free to rollback if fix is not coming quickly. [~jbonofre]

> SpannerIO tests are failing
> ---
>
> Key: BEAM-2379
> URL: https://issues.apache.org/jira/browse/BEAM-2379
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Mairbek Khadikov
> Fix For: Not applicable
>
>
> When performing a complete {{mvn clean install}}, the build is failing on 
> {{SpannerIO}} tests:
> {code}
> [ERROR] Errors: 
> [ERROR]   SpannerIOTest.batching:129 » UserCode 
> java.lang.IllegalArgumentException: A pr...
> [ERROR]   SpannerIOTest.batchingGroups:155 » UserCode 
> java.lang.IllegalArgumentException...
> [ERROR]   SpannerIOTest.noBatching:178 » UserCode 
> java.lang.IllegalArgumentException: A ...
> [ERROR]   SpannerIOTest.singleMutationPipeline » UncheckedExecution 
> org.apache.beam.sdk
> {code}
> These tests fail with the same reason, here's the complete stack trace:
> {code}
> [ERROR] batchingGroups(org.apache.beam.sdk.io.gcp.spanner.SpannerIOTest)  
> Time elapsed: 0.004 s  <<< ERROR!
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalArgumentException: A project ID is required for this service 
> but could not be determined from the builder or the environment.  Please set 
> a project ID using the builder.
> at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
> at 
> org.apache.beam.sdk.io.gcp.spanner.SpannerIO$SpannerWriteFn$DoFnInvoker.invokeSetup(Unknown
>  Source)
> at 
> org.apache.beam.sdk.transforms.DoFnTester.initializeState(DoFnTester.java:745)
> at 
> org.apache.beam.sdk.transforms.DoFnTester.startBundle(DoFnTester.java:219)
> at 
> org.apache.beam.sdk.transforms.DoFnTester.processBundle(DoFnTester.java:183)
> at 
> org.apache.beam.sdk.io.gcp.spanner.SpannerIOTest.batchingGroups(SpannerIOTest.java:155)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
> at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
> at 
> org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
> at 
> org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
> at 
> org.apache

[jira] [Created] (BEAM-2389) GcpCoreApiSurfaceTest isn't testing right module

2017-05-30 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-2389:
-

 Summary: GcpCoreApiSurfaceTest isn't testing right module
 Key: BEAM-2389
 URL: https://issues.apache.org/jira/browse/BEAM-2389
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
Affects Versions: 2.0.0
Reporter: Daniel Halperin
 Fix For: 2.1.0


It looks like a clone of {{SdkApiSurfaceTest}} that was not updated, outside of 
being renamed, now that it's in a new module. Even the java package of the test 
is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2373) AvroSource: Premature End of stream Exception on SnappyCompressorInputStream

2017-05-30 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2373:
--
Fix Version/s: 2.1.0

> AvroSource: Premature End of stream Exception on SnappyCompressorInputStream
> 
>
> Key: BEAM-2373
> URL: https://issues.apache.org/jira/browse/BEAM-2373
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Michael Luckey
>Assignee: Michael Luckey
>Priority: Critical
> Fix For: 2.1.0
>
>
> During processing we encountered on some of our snappy encoded avro input 
> files
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: java.io.IOException: 
> Premature end of stream
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:330)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:292)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> Caused by: java.io.IOException: Premature end of stream
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.expandLiteral(SnappyCompressorInputStream.java:310)
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.fill(SnappyCompressorInputStream.java:169)
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.read(SnappyCompressorInputStream.java:134)
>  at 
> org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:839)
>  at 
> org.apache.avro.io.BinaryDecoder$ByteSource.compactAndFill(BinaryDecoder.java:692)
>  at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:471)
>  at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>  at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
>  at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
>  at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>  at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>  at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
>  at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>  at 
> org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:579)
>  at 
> org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:198)
>  at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:479)
>  at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:277)
>  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:148)
>  at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
>  at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This seems to be caused by a bug in apache.commons.compress:1.9, which was 
> addressed here:
> https://github.com/apache/commons-compress/commit/9ae37525134089dd0c9ee1cf8738192b70e0fc07
> Used a pipeline using AvroIO, on spark and direct, both on hdfs and local 
> file system.
> In our short tests we got it running without exceptions by either:
> * upgrading to commons.compress:1.14
> * applying the patch to the 1.9er code of SnappyCompressorInputStream
> Impacts on other components were not tested, of course :(



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1907) Delete PubsubBoundedReader

2017-04-11 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1907.
---
Resolution: Fixed

> Delete PubsubBoundedReader
> --
>
> Key: BEAM-1907
> URL: https://issues.apache.org/jira/browse/BEAM-1907
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> PubsubIO in bounded mode doesn't really make sense, outside of hacky testing 
> modes -- it might lose data, other stuff that's sketchy. We had it before 
> because the old {{DirectRunner}} did not support unbounded PCollections. Now 
> that it does, we should probably get rid of this buggy code.
> Specifically:
> * Delete the specialized PubsubBoundedReader implementation
> * Either: delete the maxNumRecords and maxReadTime methods, or rename them to 
> something like maxNumRecordsForTesting with clear javadoc as to their 
> drawbacks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2017-04-11 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1092.
---
   Resolution: Fixed
 Assignee: Aviem Zur
Fix Version/s: Not applicable

IMO we've made Guava shaded everywhere by default, which is the biggest 
offender.

This bug is a little too abstract to be left unresolved (when would we close 
it?), so I think I'll call it resolved.

Thanks [~aviemzur]!

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1937) PipelineSurgery renumbers already-unique transforms

2017-04-11 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1937:
-

 Summary: PipelineSurgery renumbers already-unique transforms
 Key: BEAM-1937
 URL: https://issues.apache.org/jira/browse/BEAM-1937
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Daniel Halperin
Assignee: Thomas Groh


In the attached WordCount graph, it appears that some transforms have a 2 at 
the end after submission. However, I'm pretty confident that there only 1 
finalize and only 1 WriteBundles in this graph.

[~tgroh] believes this is a bug in pipeline surgery.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1937) PipelineSurgery renumbers already-unique transforms

2017-04-11 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1937:
--
Attachment: wordcount renumbered.png

> PipelineSurgery renumbers already-unique transforms
> ---
>
> Key: BEAM-1937
> URL: https://issues.apache.org/jira/browse/BEAM-1937
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
> Attachments: wordcount renumbered.png
>
>
> In the attached WordCount graph, it appears that some transforms have a 2 at 
> the end after submission. However, I'm pretty confident that there only 1 
> finalize and only 1 WriteBundles in this graph.
> [~tgroh] believes this is a bug in pipeline surgery.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1966) ApexRunner in cluster mode does not register standard FileSystems/IOChannelFactories

2017-04-13 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1966:
-

 Summary: ApexRunner in cluster mode does not register standard 
FileSystems/IOChannelFactories
 Key: BEAM-1966
 URL: https://issues.apache.org/jira/browse/BEAM-1966
 Project: Beam
  Issue Type: Bug
  Components: runner-apex
Reporter: Daniel Halperin
 Fix For: First stable release






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1966) ApexRunner in cluster mode does not register standard FileSystems/IOChannelFactories

2017-04-13 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-1966:
-

Assignee: Thomas Weise

> ApexRunner in cluster mode does not register standard 
> FileSystems/IOChannelFactories
> 
>
> Key: BEAM-1966
> URL: https://issues.apache.org/jira/browse/BEAM-1966
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Daniel Halperin
>Assignee: Thomas Weise
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1980) Seeming deadlock using Apex with relatively small data

2017-04-14 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1980:
-

 Summary: Seeming deadlock using Apex with relatively small data
 Key: BEAM-1980
 URL: https://issues.apache.org/jira/browse/BEAM-1980
 Project: Beam
  Issue Type: Bug
  Components: runner-apex
Reporter: Daniel Halperin
Assignee: Thomas Weise
 Fix For: First stable release


I'm running the "beam portability demo" at 
https://github.com/dhalperi/beam-portability-demo/tree/apex

Made a very small input file:

{code}
gsutil cat gs://apache-beam-demo/data2/small-game.csv | head -n 10 > 
tiny.csv
{code}

Ran the job in embedded mode using an Apex fat-jar from the pom in that branch 
(and adding in {{slf4j-jdk14.jar}} for debugging info):

{code}
java -cp 
~/.m2/repository/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7.14.jar:target/portability-demo-bundled-apex.jar
 demo.HourlyTeamScore --runner=ApexRunner 
--outputPrefix=gs://clouddfe-dhalperi/output/apex --input=tiny.csv
{code}

A good run takes O(25 seconds):

{code}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/Users/dhalperi/.m2/repository/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7.14.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/Users/dhalperi/beam-portability-demo/target/portability-demo-bundled-apex.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
log4j:WARN No appenders could be found for logger 
(org.apache.commons.beanutils.converters.BooleanConverter).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Apr 14, 2017 1:20:55 PM com.datatorrent.common.util.AsyncFSStorageAgent save
INFO: using 
/var/folders/7r/s0gg2qb11jz4g4pcb8n15gkc009j1y/T/chkp8074838277485202831 as the 
basepath for checkpointing.
Apr 14, 2017 1:20:56 PM com.datatorrent.bufferserver.storage.DiskStorage 
INFO: using /var/folders/7r/s0gg2qb11jz4g4pcb8n15gkc009j1y/T as the basepath 
for spooling.
Apr 14, 2017 1:20:56 PM com.datatorrent.bufferserver.server.Server registered
INFO: Server started listening at /0:0:0:0:0:0:0:0:61087
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.StramLocalCluster 
INFO: Buffer server started: localhost:61087
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
INFO: Started container container-0
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
INFO: Started container container-1
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
INFO: Started container container-2
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
INFO: Started container container-3
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
INFO: Started container container-4
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
INFO: Started container container-5
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
INFO: container-2 msg: [container-2] Entering heartbeat loop..
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
INFO: container-1 msg: [container-1] Entering heartbeat loop..
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
INFO: container-5 msg: [container-5] Entering heartbeat loop..
Apr 14, 2017 1:20:56 PM 
com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
INFO: container-4 msg: [container-4] Entering heartbeat loop..
Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 14, 2017 1:20:56 PM 
com.da

[jira] [Created] (BEAM-1981) Serialization error with TimerInternals in ApexGroupByKeyOperator

2017-04-14 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1981:
-

 Summary: Serialization error with TimerInternals in 
ApexGroupByKeyOperator
 Key: BEAM-1981
 URL: https://issues.apache.org/jira/browse/BEAM-1981
 Project: Beam
  Issue Type: Bug
  Components: runner-apex
Reporter: Daniel Halperin
Assignee: Thomas Weise
 Fix For: First stable release


Logs below. We tried switching to Java serialization, but that didn't work. We 
made the field transient (which is broken but let us make progress) and that 
did.

{code}  
2017-04-14 18:56:49,961 INFO com.datatorrent.stram.StreamingAppMaster: Master 
starting with classpath: 
./portability-demo-bundled-apex.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-auth.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.7.3.jar:/usr/lib/hadoop/hadoop-nfs.jar:/usr/lib/hadoop/hadoop-common-2.7.3-tests.jar:/usr/lib/hadoop/hadoop-annotations-2.7.3.jar:/usr/lib/hadoop/hadoop-nfs-2.7.3.jar:/usr/lib/hadoop/hadoop-common.jar:/usr/lib/hadoop/hadoop-common-2.7.3.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/snappy-java-1.0.5.jar:/usr/lib/hadoop/lib/curator-recipes-2.7.1.jar:/usr/lib/hadoop/lib/commons-lang-2.6.jar:/usr/lib/hadoop/lib/hamcrest-core-1.3.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.19.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/hadoop/lib/commons-logging-1.1.3.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.9.13.jar:/usr/lib/hadoop/lib/jersey-core-1.9.jar:/usr/lib/hadoop/lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/lib/bigquery-connector-0.10.1-hadoop2.jar:/usr/lib/hadoop/lib/slf4j-api-1.7.10.jar:/usr/lib/hadoop/lib/avro-1.7.7.jar:/usr/lib/hadoop/lib/stax-api-1.0-2.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/xz-1.0.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/curator-framework-2.7.1.jar:/usr/lib/hadoop/lib/api-util-1.0.0-M20.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar:/usr/lib/hadoop/lib/commons-io-2.4.jar:/usr/lib/hadoop/lib/gcs-connector-1.6.0-hadoop2.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/zookeeper-3.4.6.jar:/usr/lib/hadoop/lib/jets3t-0.9.0.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/curator-client-2.7.1.jar:/usr/lib/hadoop/lib/htrace-core-3.1.0-incubating.jar:/usr/lib/hadoop/lib/protobuf-java-2.5.0.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jsr305-3.0.0.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/httpclient-4.2.5.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.9.13.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jersey-server-1.9.jar:/usr/lib/hadoop/lib/commons-collections-3.2.2.jar:/usr/lib/hadoop/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/java-xmlbuilder-0.4.jar:/usr/lib/hadoop/lib/gson-2.2.4.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/jersey-json-1.9.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/api-asn1-api-1.0.0-M20.jar:/usr/lib/hadoop/lib/httpcore-4.2.5.jar:/usr/lib/hadoop/lib/junit-4.11.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/apacheds-i18n-2.0.0-M15.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jackson-xc-1.9.13.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/commons-compress-1.4.1.jar:/usr/lib/hadoop/lib/commons-math3-3.1.1.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.3.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-nfs.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.3-tests.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.7.3.jar:/usr/lib/hadoop-hdfs/lib/xml-apis-1.3.04.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.6.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.3.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.9.jar:/usr/lib/hadoop-hdfs/lib/netty-3.6.2.Final.jar:/usr/lib/hadoop-hdfs/lib/leveldbjni-all-1.8.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.4.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/xercesImpl-2.9.1.jar:/usr/lib/hadoop-hdfs/lib/jsr305-3.0.0.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/lib/hadoop-hdfs/lib/com

[jira] [Commented] (BEAM-1980) Seeming deadlock using Apex with relatively small data

2017-04-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969598#comment-15969598
 ] 

Daniel Halperin commented on BEAM-1980:
---

Hmm, that is a good question. I'm honestly not sure I know but I will repro and 
tell you.

Certainly, it did not undeploy more containers than the 2 seen in the log.

> Seeming deadlock using Apex with relatively small data
> --
>
> Key: BEAM-1980
> URL: https://issues.apache.org/jira/browse/BEAM-1980
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Daniel Halperin
>Assignee: Thomas Weise
> Fix For: First stable release
>
>
> I'm running the "beam portability demo" at 
> https://github.com/dhalperi/beam-portability-demo/tree/apex
> Made a very small input file:
> {code}
> gsutil cat gs://apache-beam-demo/data2/small-game.csv | head -n 10 > 
> tiny.csv
> {code}
> Ran the job in embedded mode using an Apex fat-jar from the pom in that 
> branch (and adding in {{slf4j-jdk14.jar}} for debugging info):
> {code}
> java -cp 
> ~/.m2/repository/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7.14.jar:target/portability-demo-bundled-apex.jar
>  demo.HourlyTeamScore --runner=ApexRunner 
> --outputPrefix=gs://clouddfe-dhalperi/output/apex --input=tiny.csv
> {code}
> A good run takes O(25 seconds):
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/Users/dhalperi/.m2/repository/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7.14.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/Users/dhalperi/beam-portability-demo/target/portability-demo-bundled-apex.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
> log4j:WARN No appenders could be found for logger 
> (org.apache.commons.beanutils.converters.BooleanConverter).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Apr 14, 2017 1:20:55 PM com.datatorrent.common.util.AsyncFSStorageAgent save
> INFO: using 
> /var/folders/7r/s0gg2qb11jz4g4pcb8n15gkc009j1y/T/chkp8074838277485202831 as 
> the basepath for checkpointing.
> Apr 14, 2017 1:20:56 PM com.datatorrent.bufferserver.storage.DiskStorage 
> 
> INFO: using /var/folders/7r/s0gg2qb11jz4g4pcb8n15gkc009j1y/T as the basepath 
> for spooling.
> Apr 14, 2017 1:20:56 PM com.datatorrent.bufferserver.server.Server registered
> INFO: Server started listening at /0:0:0:0:0:0:0:0:61087
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.StramLocalCluster 
> INFO: Buffer server started: localhost:61087
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-0
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-1
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-2
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-3
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-4
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-5
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
> INFO: container-2 msg: [container-2] Entering heartbeat loop..
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolL

[jira] [Commented] (BEAM-1980) Seeming deadlock using Apex with relatively small data

2017-04-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969608#comment-15969608
 ] 

Daniel Halperin commented on BEAM-1980:
---

Just repro'ed; after 5 minutes the output is at the same place (undeployed 
container [2]) and the output file has not been created. So it seems stuck 
while processing.

jstack:

{code}
2017-04-14 15:08:34
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.112-b16 mixed mode):

"Attach Listener" #121 daemon prio=9 os_prio=31 tid=0x7f91a9140800 
nid=0x9007 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

"StorageHelper-2-1" #120 prio=5 os_prio=31 tid=0x7f91a7193000 nid=0x9a03 
waiting on condition [0x7dac]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0006c0cd1cf0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ServerHelper-1-1" #119 prio=5 os_prio=31 tid=0x7f91a5a85800 nid=0x9803 
waiting on condition [0x7d9bd000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0006c0cde730> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"9/SumTeamScores/GroupByKey:ApexGroupByKeyOperator" #118 prio=5 os_prio=31 
tid=0x7f91a70d9800 nid=0x9603 waiting on condition [0x7d8ba000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:601)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)

"10/SumTeamScores/ParDo(WriteWindowedFiles)/ParMultiDo(WriteWindowedFiles):ApexParDoOperator"
 #117 prio=5 os_prio=31 tid=0x7f91a8811800 nid=0x9403 waiting on condition 
[0x7d7b7000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:601)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)

"3/SetTimestamps/ParMultiDo(SetTimestamps):ApexParDoOperator" #115 prio=5 
os_prio=31 tid=0x7f91a8860800 nid=0x8e03 waiting on condition 
[0x7d4ae000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:601)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)

"5/SumTeamScores/ParDo(KeyScoreByTeam)/ParMultiDo(KeyScoreByTeam):ApexParDoOperator"
 #114 prio=5 os_prio=31 tid=0x7f91a8809000 nid=0x8c03 waiting on condition 
[0x7d3ab000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:601)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)

"4/FixedWindows/Window.Assign:ApexProcessFnOperator" #112 prio=5 os_prio=31 
tid=0x7f91a6a66800 nid=0x8a03 waiting on condition [0x7d2a8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:601)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)

"6/SumTeamScores/Combine.perKey(SumInteger)/GroupByKey:ApexGroupByKeyOperator" 
#111 prio=5 os_prio=31 tid=0x7f91a6a89800 nid=0x8803 waiting on condition 
[0x00

[jira] [Commented] (BEAM-1980) Seeming deadlock using Apex with relatively small data

2017-04-14 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969612#comment-15969612
 ] 

Daniel Halperin commented on BEAM-1980:
---

Dumb note: this may be a side effect of the workaround we deployed to 
[BEAM-1981] in which we made the timer data transient and lost on checkpoint.

> Seeming deadlock using Apex with relatively small data
> --
>
> Key: BEAM-1980
> URL: https://issues.apache.org/jira/browse/BEAM-1980
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Daniel Halperin
>Assignee: Thomas Weise
> Fix For: First stable release
>
>
> I'm running the "beam portability demo" at 
> https://github.com/dhalperi/beam-portability-demo/tree/apex
> Made a very small input file:
> {code}
> gsutil cat gs://apache-beam-demo/data2/small-game.csv | head -n 10 > 
> tiny.csv
> {code}
> Ran the job in embedded mode using an Apex fat-jar from the pom in that 
> branch (and adding in {{slf4j-jdk14.jar}} for debugging info):
> {code}
> java -cp 
> ~/.m2/repository/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7.14.jar:target/portability-demo-bundled-apex.jar
>  demo.HourlyTeamScore --runner=ApexRunner 
> --outputPrefix=gs://clouddfe-dhalperi/output/apex --input=tiny.csv
> {code}
> A good run takes O(25 seconds):
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/Users/dhalperi/.m2/repository/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7.14.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/Users/dhalperi/beam-portability-demo/target/portability-demo-bundled-apex.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
> log4j:WARN No appenders could be found for logger 
> (org.apache.commons.beanutils.converters.BooleanConverter).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Apr 14, 2017 1:20:55 PM com.datatorrent.common.util.AsyncFSStorageAgent save
> INFO: using 
> /var/folders/7r/s0gg2qb11jz4g4pcb8n15gkc009j1y/T/chkp8074838277485202831 as 
> the basepath for checkpointing.
> Apr 14, 2017 1:20:56 PM com.datatorrent.bufferserver.storage.DiskStorage 
> 
> INFO: using /var/folders/7r/s0gg2qb11jz4g4pcb8n15gkc009j1y/T as the basepath 
> for spooling.
> Apr 14, 2017 1:20:56 PM com.datatorrent.bufferserver.server.Server registered
> INFO: Server started listening at /0:0:0:0:0:0:0:0:61087
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.StramLocalCluster 
> INFO: Buffer server started: localhost:61087
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-0
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-1
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-2
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-3
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-4
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$LocalStreamingContainerLauncher run
> INFO: Started container container-5
> Apr 14, 2017 1:20:56 PM com.datatorrent.stram.Journal write
> WARNING: Journal output stream is null. Skipping write to the WAL.
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
> INFO: container-2 msg: [container-2] Entering heartbeat loop..
> Apr 14, 2017 1:20:56 PM 
> com.datatorrent.stram.StramLocalCluster$UmbilicalProtocolLocalImpl log
> INF

[jira] [Updated] (BEAM-221) ProtoIO

2017-04-14 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-221:
-
Labels: newbie starter  (was: )

> ProtoIO
> ---
>
> Key: BEAM-221
> URL: https://issues.apache.org/jira/browse/BEAM-221
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>  Labels: newbie, starter
>
> Make it easy to read and write binary files of Protobuf objects. If there is 
> a standard open source format for this, use it.
> If not, roll our own and implement it?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-221) ProtoIO

2017-04-14 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-221:
-
Issue Type: New Feature  (was: Bug)

> ProtoIO
> ---
>
> Key: BEAM-221
> URL: https://issues.apache.org/jira/browse/BEAM-221
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>  Labels: newbie, starter
>
> Make it easy to read and write binary files of Protobuf objects. If there is 
> a standard open source format for this, use it.
> If not, roll our own and implement it?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1966) ApexRunner in cluster mode does not register standard FileSystems/IOChannelFactories

2017-04-14 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1966.
---
Resolution: Fixed
  Assignee: Daniel Halperin  (was: Thomas Weise)

> ApexRunner in cluster mode does not register standard 
> FileSystems/IOChannelFactories
> 
>
> Key: BEAM-1966
> URL: https://issues.apache.org/jira/browse/BEAM-1966
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1991) Update references to SumDoubleFn => Sum.ofDoubles

2017-04-17 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1991.
---
   Resolution: Fixed
Fix Version/s: First stable release

> Update references to SumDoubleFn => Sum.ofDoubles
> -
>
> Key: BEAM-1991
> URL: https://issues.apache.org/jira/browse/BEAM-1991
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Wesley Tanaka
>Assignee: Wesley Tanaka
>Priority: Minor
> Fix For: First stable release
>
>
> https://github.com/apache/beam/commit/78a360eac35507d9a558fc6117bb56b67b8a884e
>  missed at least one



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1997) Scaling Problem of Beam (size of the serialized JSON representation of the pipeline exceeds the allowable limit)

2017-04-18 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972897#comment-15972897
 ] 

Daniel Halperin commented on BEAM-1997:
---

These pipelines don't quite look generated from the code you've posted:

* I see repartition in the Beam graph, but it's commented out in the Beam code.
* Also, in both programs you iterate over a list of files, but it looks like in 
the right one you're iterating over more files. That would explain the graph 
difference.

Can you confirm that when you read from the same number of input files you get 
one that will submit to Dataflow and one that won't?

Finally, to save graph size (in both programs) you can move the 
{{ParseIntoJson}} outside the {{for}} loop. That is, apply it *after* the 
{{Flatten.PCollections}}. Runners should automatically be able to choose to 
parallelize the parsing per-file.

> Scaling Problem of Beam (size of the serialized JSON representation of the 
> pipeline exceeds the allowable limit)
> 
>
> Key: BEAM-1997
> URL: https://issues.apache.org/jira/browse/BEAM-1997
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 0.6.0
>Reporter: Tobias Feldhaus
>Assignee: Daniel Halperin
>
> After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does 
> no longer let run it with 180 output days (BigQuery partitions as sinks), but 
> only 60 output days. If using a larger number with Beam the response from the 
> Cloud  Dataflow service reads as follows:
> {code}
> Failed to create a workflow job: The size of the serialized JSON 
> representation of the pipeline exceeds the allowable limit. For more 
> information, please check the FAQ link below:
> {code}
> This is the pipeline in dataflow: 
> https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
> The resulting graph in Dataflow looks like this: 
> https://puu.sh/vhWAW/a12f3246a1.png
> This is the same pipeline in beam: 
> https://gist.github.com/james-woods/c4565db769b0494e0bef5e9c334c
> The constructed graph looks somewhat different:
> https://puu.sh/vhWvm/78a40d422d.png
> Methods used are taken from this example 
> https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1996) Error about mixing pipelines in nosetests

2017-04-18 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-1996:
-

Assignee: Charles Chen  (was: Thomas Groh)

> Error about mixing pipelines in nosetests
> -
>
> Key: BEAM-1996
> URL: https://issues.apache.org/jira/browse/BEAM-1996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Vilhelm von Ehrenheim
>Assignee: Charles Chen
>Priority: Minor
>
> When testing a PTranform (defined using @ptransform_fn) that merges several 
> PCollections from different sources the following error is raised:
> {noformat}ValueError: Mixing value from different pipelines not 
> allowed.{noformat}
> Actually running the same pipeline in GCP using the `DataflowRunner` does not 
> give any error. Neither does running the test file manually instead of 
> through nose. 
> Here is an example:
> {code:none|title=utils.py}
> # Defined in module `utils`
> @ptransform_fn
> def Join(pcolls, by):
> return pcolls | beam.CoGroupByKey()
> {code}
> {code:none|title=test_utils.py}
> class UtilsTest(unittest.TestCase):
> def test_join(self):
> p = TestPipeline(runner="DirectRunner")
> p1 = (p
>  | "Create p1" >> beam.Create([
>  {'a': 1, 'b': 11},
>  {'a': 2, 'b': 22},
>  {'a': 3, 'b': 33}]))
> p2 = (p
>  | "Create p2" >> beam.Create([
>  {'a': 1, 'c': 111},
>  {'a': 1, 'c': 112},
>  {'a': 3, 'c': 333}]))
> res = ((p1, p2) | "LeftJoin" >> utils.Join(by='a'))
> beam.assert_that(res, beam.equal_to([
> {'a': 1, 'b': 11, 'c': 111},
> {'a': 1, 'b': 11, 'c': 112},
> {'a': 2, 'b': 22},
> {'a': 3, 'b': 33, 'c': 333}]))
> # Run test pipeline
> p.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1996) Error about mixing pipelines in nosetests

2017-04-18 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1996:
--
Component/s: (was: runner-direct)
 sdk-py

> Error about mixing pipelines in nosetests
> -
>
> Key: BEAM-1996
> URL: https://issues.apache.org/jira/browse/BEAM-1996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Vilhelm von Ehrenheim
>Assignee: Thomas Groh
>Priority: Minor
>
> When testing a PTranform (defined using @ptransform_fn) that merges several 
> PCollections from different sources the following error is raised:
> {noformat}ValueError: Mixing value from different pipelines not 
> allowed.{noformat}
> Actually running the same pipeline in GCP using the `DataflowRunner` does not 
> give any error. Neither does running the test file manually instead of 
> through nose. 
> Here is an example:
> {code:none|title=utils.py}
> # Defined in module `utils`
> @ptransform_fn
> def Join(pcolls, by):
> return pcolls | beam.CoGroupByKey()
> {code}
> {code:none|title=test_utils.py}
> class UtilsTest(unittest.TestCase):
> def test_join(self):
> p = TestPipeline(runner="DirectRunner")
> p1 = (p
>  | "Create p1" >> beam.Create([
>  {'a': 1, 'b': 11},
>  {'a': 2, 'b': 22},
>  {'a': 3, 'b': 33}]))
> p2 = (p
>  | "Create p2" >> beam.Create([
>  {'a': 1, 'c': 111},
>  {'a': 1, 'c': 112},
>  {'a': 3, 'c': 333}]))
> res = ((p1, p2) | "LeftJoin" >> utils.Join(by='a'))
> beam.assert_that(res, beam.equal_to([
> {'a': 1, 'b': 11, 'c': 111},
> {'a': 1, 'b': 11, 'c': 112},
> {'a': 2, 'b': 22},
> {'a': 3, 'b': 33, 'c': 333}]))
> # Run test pipeline
> p.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1996) Error about mixing pipelines in nosetests

2017-04-18 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973001#comment-15973001
 ] 

Daniel Halperin commented on BEAM-1996:
---

Maybe [~charleschen] has feedback here?

> Error about mixing pipelines in nosetests
> -
>
> Key: BEAM-1996
> URL: https://issues.apache.org/jira/browse/BEAM-1996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Vilhelm von Ehrenheim
>Assignee: Charles Chen
>Priority: Minor
>
> When testing a PTranform (defined using @ptransform_fn) that merges several 
> PCollections from different sources the following error is raised:
> {noformat}ValueError: Mixing value from different pipelines not 
> allowed.{noformat}
> Actually running the same pipeline in GCP using the `DataflowRunner` does not 
> give any error. Neither does running the test file manually instead of 
> through nose. 
> Here is an example:
> {code:none|title=utils.py}
> # Defined in module `utils`
> @ptransform_fn
> def Join(pcolls, by):
> return pcolls | beam.CoGroupByKey()
> {code}
> {code:none|title=test_utils.py}
> class UtilsTest(unittest.TestCase):
> def test_join(self):
> p = TestPipeline(runner="DirectRunner")
> p1 = (p
>  | "Create p1" >> beam.Create([
>  {'a': 1, 'b': 11},
>  {'a': 2, 'b': 22},
>  {'a': 3, 'b': 33}]))
> p2 = (p
>  | "Create p2" >> beam.Create([
>  {'a': 1, 'c': 111},
>  {'a': 1, 'c': 112},
>  {'a': 3, 'c': 333}]))
> res = ((p1, p2) | "LeftJoin" >> utils.Join(by='a'))
> beam.assert_that(res, beam.equal_to([
> {'a': 1, 'b': 11, 'c': 111},
> {'a': 1, 'b': 11, 'c': 112},
> {'a': 2, 'b': 22},
> {'a': 3, 'b': 33, 'c': 333}]))
> # Run test pipeline
> p.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1272) Align the naming of "generateInitialSplits" and "splitIntoBundles" to better reflect their intention

2017-04-18 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1272.
---
Resolution: Fixed

> Align the naming of "generateInitialSplits" and "splitIntoBundles" to better 
> reflect their intention
> 
>
> Key: BEAM-1272
> URL: https://issues.apache.org/jira/browse/BEAM-1272
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Stas Levin
>Assignee: Etienne Chauchot
>Priority: Minor
> Fix For: First stable release
>
>
> See [dev list 
> thread|https://lists.apache.org/thread.html/ac5717566707153e85da880cc75c8d047e1c6606861777670bb9107c@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1272) Align the naming of "generateInitialSplits" and "splitIntoBundles" to better reflect their intention

2017-04-18 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973302#comment-15973302
 ] 

Daniel Halperin commented on BEAM-1272:
---

[~echauchot] Is there work to be done in Python SDK to match?

CC: [~altay]

> Align the naming of "generateInitialSplits" and "splitIntoBundles" to better 
> reflect their intention
> 
>
> Key: BEAM-1272
> URL: https://issues.apache.org/jira/browse/BEAM-1272
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Stas Levin
>Assignee: Etienne Chauchot
>Priority: Minor
> Fix For: First stable release
>
>
> See [dev list 
> thread|https://lists.apache.org/thread.html/ac5717566707153e85da880cc75c8d047e1c6606861777670bb9107c@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1272) Align the naming of "generateInitialSplits" and "splitIntoBundles" to better reflect their intention

2017-04-18 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973592#comment-15973592
 ] 

Daniel Halperin commented on BEAM-1272:
---

Thanks Cham, Ahmet. And of course Stas and Etienne!

> Align the naming of "generateInitialSplits" and "splitIntoBundles" to better 
> reflect their intention
> 
>
> Key: BEAM-1272
> URL: https://issues.apache.org/jira/browse/BEAM-1272
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Stas Levin
>Assignee: Etienne Chauchot
>Priority: Minor
> Fix For: First stable release
>
>
> See [dev list 
> thread|https://lists.apache.org/thread.html/ac5717566707153e85da880cc75c8d047e1c6606861777670bb9107c@%3Cdev.beam.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2007) DataflowRunner drops Reads with no consumers

2017-04-18 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-2007:
-

 Summary: DataflowRunner drops Reads with no consumers
 Key: BEAM-2007
 URL: https://issues.apache.org/jira/browse/BEAM-2007
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Daniel Halperin
Assignee: Thomas Groh
 Fix For: First stable release


Basically, if a pipeline has "just" a Read with no consumers, the optimizer in 
Dataflow will drop it. To preserve Beam semantics, we do want to run the Read 
and drop its output, e.g., because the Read may have side effects that we're 
testing for.

Is it possible with pipeline surgery to find such Reads and add an Identity 
ParDo to them?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2007) DataflowRunner drops Reads with no consumers

2017-04-18 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973737#comment-15973737
 ] 

Daniel Halperin commented on BEAM-2007:
---

Actually, no IdentityParDo -- a dropping pardo that outputs nothing

> DataflowRunner drops Reads with no consumers
> 
>
> Key: BEAM-2007
> URL: https://issues.apache.org/jira/browse/BEAM-2007
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
> Fix For: First stable release
>
>
> Basically, if a pipeline has "just" a Read with no consumers, the optimizer 
> in Dataflow will drop it. To preserve Beam semantics, we do want to run the 
> Read and drop its output, e.g., because the Read may have side effects that 
> we're testing for.
> Is it possible with pipeline surgery to find such Reads and add an Identity 
> ParDo to them?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2013) Upgrade to Jackson 2.8.8

2017-04-19 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974831#comment-15974831
 ] 

Daniel Halperin commented on BEAM-2013:
---

What motivates this change?

> Upgrade to Jackson 2.8.8
> 
>
> Key: BEAM-2013
> URL: https://issues.apache.org/jira/browse/BEAM-2013
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2005:
--
Fix Version/s: First stable release

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2005:
--
Affects Version/s: (was: First stable release)

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975001#comment-15975001
 ] 

Daniel Halperin commented on BEAM-2005:
---

`core` vs `extensions` -- this won't be in `sdk-java-core` itself, it will 
probably be in `sdk-java-extensions-hadoop` or whatever (just like 
`GcsFileSystem` either is moving or has moved to the new 
`sdks-java-extensions-gcp-core`).

I could see also `sdk-java-io-hadoop`, but I think this is a reasonable-ish use 
of `core`. Our JIRA tags are not perfect.

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1441) Add FileSystem support to Python SDK

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1441:
--
Summary: Add FileSystem support to Python SDK  (was: Add an 
IOChannelFactory interface to Python SDK)

> Add FileSystem support to Python SDK
> 
>
> Key: BEAM-1441
> URL: https://issues.apache.org/jira/browse/BEAM-1441
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Sourabh Bajaj
> Fix For: First stable release
>
>
> Based on proposal [1], an IOChannelFactory interface was added to Java SDK  
> [2].
> We should add a similar interface to Python SDK and provide proper 
> implementations for native files, GCS, and other useful formats.
> Python SDK currently has a basic ChannelFactory interface [3] which is used 
> by FileBasedSource [4].
> [1] 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#heading=h.kpqagzh8i11w
> [2] 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/IOChannelFactory.java
> [3] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/fileio.py#L107
> [4] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsource.py
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2007) DataflowRunner drops Reads with no consumers

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2007:
--
Component/s: sdk-py

> DataflowRunner drops Reads with no consumers
> 
>
> Key: BEAM-2007
> URL: https://issues.apache.org/jira/browse/BEAM-2007
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py
>Reporter: Daniel Halperin
> Fix For: First stable release
>
>
> Basically, if a pipeline has "just" a Read with no consumers, the optimizer 
> in Dataflow will drop it. To preserve Beam semantics, we do want to run the 
> Read and drop its output, e.g., because the Read may have side effects that 
> we're testing for.
> Is it possible with pipeline surgery to find such Reads and add an Identity 
> ParDo to them?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2007) DataflowRunner drops Reads with no consumers

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2007:
-

Assignee: Ahmet Altay

> DataflowRunner drops Reads with no consumers
> 
>
> Key: BEAM-2007
> URL: https://issues.apache.org/jira/browse/BEAM-2007
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py
>Reporter: Daniel Halperin
>Assignee: Ahmet Altay
> Fix For: First stable release
>
>
> Basically, if a pipeline has "just" a Read with no consumers, the optimizer 
> in Dataflow will drop it. To preserve Beam semantics, we do want to run the 
> Read and drop its output, e.g., because the Read may have side effects that 
> we're testing for.
> Is it possible with pipeline surgery to find such Reads and add an Identity 
> ParDo to them?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2007) DataflowRunner drops Reads with no consumers

2017-04-19 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975169#comment-15975169
 ] 

Daniel Halperin commented on BEAM-2007:
---

[~altay] -- assigning to you solely as FYI

> DataflowRunner drops Reads with no consumers
> 
>
> Key: BEAM-2007
> URL: https://issues.apache.org/jira/browse/BEAM-2007
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py
>Reporter: Daniel Halperin
>Assignee: Ahmet Altay
> Fix For: First stable release
>
>
> Basically, if a pipeline has "just" a Read with no consumers, the optimizer 
> in Dataflow will drop it. To preserve Beam semantics, we do want to run the 
> Read and drop its output, e.g., because the Read may have side effects that 
> we're testing for.
> Is it possible with pipeline surgery to find such Reads and add an Identity 
> ParDo to them?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2017) DataflowRunner: fix NullPointerException that can occur when no metrics are present

2017-04-19 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-2017:
-

 Summary: DataflowRunner: fix NullPointerException that can occur 
when no metrics are present
 Key: BEAM-2017
 URL: https://issues.apache.org/jira/browse/BEAM-2017
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Daniel Halperin
Assignee: Daniel Halperin
 Fix For: First stable release


{code}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.beam.runners.dataflow.DataflowMetrics.populateMetricQueryResults(DataflowMetrics.java:118)
at 
org.apache.beam.runners.dataflow.DataflowMetrics.queryServiceForMetrics(DataflowMetrics.java:173)
at 
org.apache.beam.runners.dataflow.DataflowMetrics.queryMetrics(DataflowMetrics.java:186)
at 
com.google.cloud.dataflow.integration.autotuning.separation.SeparationHarness.verifySeparationInPipelineWithHangingFn(SeparationHarness.java:146)
at 
com.google.cloud.dataflow.integration.autotuning.separation.SeparationHarness.verifySeparationInPipeline(SeparationHarness.java:129)
at 
com.google.cloud.dataflow.integration.autotuning.separation.SeparationHarness.verifySeparation(SeparationHarness.java:120)
at 
com.google.cloud.dataflow.integration.autotuning.separation.InMemorySeparation.main(InMemorySeparation.java:19)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2017) DataflowRunner: fix NullPointerException that can occur when no metrics are present

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2017.
-
Resolution: Fixed

> DataflowRunner: fix NullPointerException that can occur when no metrics are 
> present
> ---
>
> Key: BEAM-2017
> URL: https://issues.apache.org/jira/browse/BEAM-2017
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> {code}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.beam.runners.dataflow.DataflowMetrics.populateMetricQueryResults(DataflowMetrics.java:118)
> at 
> org.apache.beam.runners.dataflow.DataflowMetrics.queryServiceForMetrics(DataflowMetrics.java:173)
> at 
> org.apache.beam.runners.dataflow.DataflowMetrics.queryMetrics(DataflowMetrics.java:186)
> at 
> com.google.cloud.dataflow.integration.autotuning.separation.SeparationHarness.verifySeparationInPipelineWithHangingFn(SeparationHarness.java:146)
> at 
> com.google.cloud.dataflow.integration.autotuning.separation.SeparationHarness.verifySeparationInPipeline(SeparationHarness.java:129)
> at 
> com.google.cloud.dataflow.integration.autotuning.separation.SeparationHarness.verifySeparation(SeparationHarness.java:120)
> at 
> com.google.cloud.dataflow.integration.autotuning.separation.InMemorySeparation.main(InMemorySeparation.java:19)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-919) Remove remaining old use/learn links from website src

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-919.

   Resolution: Fixed
 Assignee: Melissa Pashniak  (was: Frances Perry)
Fix Version/s: Not applicable

> Remove remaining old use/learn links from website src
> -
>
> Key: BEAM-919
> URL: https://issues.apache.org/jira/browse/BEAM-919
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Melissa Pashniak
>Priority: Minor
> Fix For: Not applicable
>
>
> We still have old links lingering after the website refactoring.
> For example, the release guide 
> (https://github.com/apache/incubator-beam-site/blob/asf-site/src/contribute/release-guide.md)
>  still links to "/use/..." in a bunch of places. 
> impact: links still work because of redirects, but it's tech debt we should 
> fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2003) Verify PAssert execution in TestDataflowRunner

2017-04-20 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2003.
-
Resolution: Fixed

This indeed works in the DataflowRunner.

> Verify PAssert execution in TestDataflowRunner
> --
>
> Key: BEAM-2003
> URL: https://issues.apache.org/jira/browse/BEAM-2003
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Blocker
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2012) Rebuild the Dataflow worker and remove the BoundedSource.splitIntoBundles deprecated method

2017-04-20 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2012:
-

Assignee: Daniel Halperin  (was: Eugene Kirpichov)

> Rebuild the Dataflow worker and remove the BoundedSource.splitIntoBundles 
> deprecated method
> ---
>
> Key: BEAM-2012
> URL: https://issues.apache.org/jira/browse/BEAM-2012
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Etienne Chauchot
>Assignee: Daniel Halperin
>
> Linked to https://issues.apache.org/jira/browse/BEAM-1272
> Just a reminder for building/cleaning as suggested by [~dhalp...@google.com]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1936) Allow user provided function to extract custom timestamp from payload in pubsubIO

2017-04-20 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977149#comment-15977149
 ] 

Daniel Halperin commented on BEAM-1936:
---

cc [~reuvenlax]

> Allow user provided function to extract custom timestamp from payload in 
> pubsubIO
> -
>
> Key: BEAM-1936
> URL: https://issues.apache.org/jira/browse/BEAM-1936
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Keith Berkoben
>
> Currently the PubsubIO runner only allows the caller to set a custom 
> timestamp if the timestamp is defined in the attributes of the message.  This 
> can be problematic when the user does not control the publisher.  In such a 
> case, proper windowing of data requires the timestamp to be pulled out of the 
> message payload.  
> Since a payload could have arbitrary data, the user would have to provide a 
> Function() that would extract the timestamp from the payload:
> PubsubIo.Read.timestampLabel(Function extractor);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1936) Allow user provided function to extract custom timestamp from payload in pubsubIO

2017-04-20 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-1936:
-

Assignee: (was: Daniel Halperin)

> Allow user provided function to extract custom timestamp from payload in 
> pubsubIO
> -
>
> Key: BEAM-1936
> URL: https://issues.apache.org/jira/browse/BEAM-1936
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Keith Berkoben
>
> Currently the PubsubIO runner only allows the caller to set a custom 
> timestamp if the timestamp is defined in the attributes of the message.  This 
> can be problematic when the user does not control the publisher.  In such a 
> case, proper windowing of data requires the timestamp to be pulled out of the 
> message payload.  
> Since a payload could have arbitrary data, the user would have to provide a 
> Function() that would extract the timestamp from the payload:
> PubsubIo.Read.timestampLabel(Function extractor);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2026) High performance direct runner

2017-04-20 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2026:
-

Assignee: Mitar  (was: Thomas Groh)

> High performance direct runner
> --
>
> Key: BEAM-2026
> URL: https://issues.apache.org/jira/browse/BEAM-2026
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Mitar
>Assignee: Mitar
>
> In documentation (https://beam.apache.org/documentation/runners/direct/) it 
> is written that direct runner does not try to run efficiently, but it serves 
> mostly for development and debugging.
> I would suggest that there should be also an efficient direct runner. If Beam 
> tries to be an unified programming model, for some smaller tasks I would love 
> to implement them in Beam, just to keep the code in the same model, but it 
> would be OK to run it as a normal smaller program (maybe inside one Docker 
> container), without any distribution across multiple machines. In the future, 
> if usage grows, I could then replace underlying runner with something 
> distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2026) High performance direct runner

2017-04-20 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977156#comment-15977156
 ] 

Daniel Halperin commented on BEAM-2026:
---

[~mitar] I think this is a completely reasonable use case. I also think that 
the Python {{DirectRunner}} has a variety of command-line flags via pipeline 
options that make it faster by turning off various enforcements.

Try it out and let us know how it goes? Which flags are most useful?

> High performance direct runner
> --
>
> Key: BEAM-2026
> URL: https://issues.apache.org/jira/browse/BEAM-2026
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Mitar
>Assignee: Thomas Groh
>
> In documentation (https://beam.apache.org/documentation/runners/direct/) it 
> is written that direct runner does not try to run efficiently, but it serves 
> mostly for development and debugging.
> I would suggest that there should be also an efficient direct runner. If Beam 
> tries to be an unified programming model, for some smaller tasks I would love 
> to implement them in Beam, just to keep the code in the same model, but it 
> would be OK to run it as a normal smaller program (maybe inside one Docker 
> container), without any distribution across multiple machines. In the future, 
> if usage grows, I could then replace underlying runner with something 
> distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-897) Datastore ITs have invalid PipelineOptions

2017-04-20 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-897.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Datastore ITs have invalid PipelineOptions
> --
>
> Key: BEAM-897
> URL: https://issues.apache.org/jira/browse/BEAM-897
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp, testing
>Reporter: Daniel Halperin
>Assignee: Vikas Kedigehalli
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PostCommit_MavenVerify/1718/
> This PR: https://github.com/apache/incubator-beam/pull/1159
> checks that pipeline options cannot have multiple incompatible defaults.
> Datastore ITs currently do have multiple incompatible defaults, and this 
> should be rectified.
> cc [~pei...@gmail.com] [~lcwik]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2012) Rebuild the Dataflow worker and remove the BoundedSource.splitIntoBundles deprecated method

2017-04-20 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2012.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Rebuild the Dataflow worker and remove the BoundedSource.splitIntoBundles 
> deprecated method
> ---
>
> Key: BEAM-2012
> URL: https://issues.apache.org/jira/browse/BEAM-2012
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Etienne Chauchot
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> Linked to https://issues.apache.org/jira/browse/BEAM-1272
> Just a reminder for building/cleaning as suggested by [~dhalp...@google.com]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2040) Occasional build failures caused by AutoValue

2017-04-20 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-2040:
-

 Summary: Occasional build failures caused by AutoValue
 Key: BEAM-2040
 URL: https://issues.apache.org/jira/browse/BEAM-2040
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Daniel Halperin
Assignee: Daniel Halperin


The following flaky compile failures appear to be fixed in AutoValue 1.4

{code}
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project beam-sdks-java-extensions-gcp-core: Fatal error compiling: 
java.lang.AssertionError: java.io.IOException: Stream closed -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project beam-sdks-java-extensions-gcp-core: Fatal error compiling
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at 
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:185)
at 
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:181)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.maven.plugin.MojoExecutionException: Fatal error compiling
at 
org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:954)
at 
org.apache.maven.plugin.compiler.CompilerMojo.execute(CompilerMojo.java:137)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
... 11 more
Caused by: org.codehaus.plexus.compiler.CompilerException: 
java.lang.AssertionError: java.io.IOException: Stream closed
at 
org.codehaus.plexus.compiler.javac.JavaxToolsCompiler.compileInProcess(JavaxToolsCompiler.java:173)
at 
org.codehaus.plexus.compiler.javac.JavacCompiler.performCompile(JavacCompiler.java:174)
at 
org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:943)
... 14 more
Caused by: java.lang.RuntimeException: java.lang.AssertionError: 
java.io.IOException: Stream closed
at com.sun.tools.javac.main.Main.compile(Main.java:553)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
at 
org.codehaus.plexus.compiler.javac.JavaxToolsCompiler.compileInProcess(JavaxToolsCompiler.java:126)
... 16 more
Caused by: java.lang.AssertionError: java.io.IOException: Stream closed
at 
com.google.auto.value.processor.TemplateVars.parsedTemplateForResource(TemplateVars.java:111)
at 
com.google.auto.value.processor.AutoValueTemplateVars.(AutoValueTemplateVars.java:184)
at 
com.google.auto.value.processor.AutoValueProcessor.processType(AutoValueProcessor.java:441)
at 
com.google.auto.value.processor.AutoValueProcessor.process(AutoValueProcessor.java:150)
at 
com.sun.tools.javac.processing.JavacProcessingEnvironment.callProcessor(JavacProcessingEnvironment.java:794)
at 
com.sun.tools.javac.processing.JavacProcessingEnvironment.discoverAndRunProcs(JavacProcessingEnvironment.java:705)
at 
com.sun.tools.javac.processing.JavacProcessingEnvironment.access$1800(JavacProcessingEnvironment.java:91)
at 
com.sun.tools.javac.processing.JavacProcessingEnvironment$Round.run(JavacProcessingEnvironment.java:1037)
at 
com.sun.tools.javac.processing.JavacProcessingEnvironment.doProcessing(JavacProcessingEnvironment.java:1178)
at 
com.sun.tools.javac.main.JavaCompiler.processAnnotations(JavaCompiler.java:1171)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:857)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
... 19 more
Caused by: java.io.IOException: Stream closed
at 
java.util.zip.InflaterInputS

[jira] [Assigned] (BEAM-2035) PTransform application identifier(name) documentation.

2017-04-21 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2035:
-

Assignee: (was: Davor Bonaci)

> PTransform application identifier(name) documentation.
> --
>
> Key: BEAM-2035
> URL: https://issues.apache.org/jira/browse/BEAM-2035
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py, website
>Reporter: Aviem Zur
>
> Add documentation around the user supplied/automatically created identifier 
> (name) of the application of a {{PTransform}} within the pipeline.
> Make sure to relate to how it is constructed when the application is within a 
> composite transform.
> Relate to how this name affects metrics aggregation and metrics querying.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2061) Hint to template job creation in DataflowRunner / DataflowPipelineOptions

2017-04-24 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981391#comment-15981391
 ] 

Daniel Halperin commented on BEAM-2061:
---

Additionally, please refer to the Dataflow 2.0 release notes for information on 
porting between Dataflow SDK for Java 1.x and 2.x:

https://cloud.google.com/dataflow/release-notes/release-notes-java-2#replaced_templatingdataflowpipelinerunner_with_--templatelocation

> Hint to template job creation in DataflowRunner / DataflowPipelineOptions 
> --
>
> Key: BEAM-2061
> URL: https://issues.apache.org/jira/browse/BEAM-2061
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 0.6.0
>Reporter: Jonas Grabber
>Assignee: Daniel Halperin
>Priority: Trivial
>  Labels: documentation
>
> Hello,
> the SDK documentation for both DataflowPipelineOptions and DataflowRunner 
> "fail" to mention the ability to create template jobs in Google Dataflow.
> Especially with the loss of the separate DataflowTemplateRunner this caused 
> confusion amongst my working colleagues and other Beam users I talked to.
> A short hit to the setTemplateLocation method would save a decent amount of 
> time across a lot of developers, I reckon.
> What's the procedure for these kind of (cosmetic) issues?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1981) Serialization error with TimerInternals in ApexGroupByKeyOperator

2017-04-24 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981788#comment-15981788
 ] 

Daniel Halperin commented on BEAM-1981:
---

[~thw] do you think you'll be able to take a look at this? I'd really like to 
have this fixed for First Stable Release -- a blocker for using Apex Runner.

> Serialization error with TimerInternals in ApexGroupByKeyOperator
> -
>
> Key: BEAM-1981
> URL: https://issues.apache.org/jira/browse/BEAM-1981
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Daniel Halperin
>Assignee: Thomas Weise
> Fix For: First stable release
>
>
> Logs below. We tried switching to Java serialization, but that didn't work. 
> We made the field transient (which is broken but let us make progress) and 
> that did.
> Stack trace
> {code}
> com.esotericsoftware.kryo.KryoException: Class cannot be created (missing 
> no-arg constructor): 
> org.apache.beam.runners.core.AutoValue_TimerInternals_TimerData
> Serialization trace:
> activeTimers 
> (org.apache.beam.runners.apex.translation.operators.ApexGroupByKeyOperator)
>   at 
> com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)
>   at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1049)
>   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1058)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:547)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:523)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
>   at 
> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
>   at 
> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
>   at 
> com.datatorrent.common.util.FSStorageAgent.retrieve(FSStorageAgent.java:192)
>   at 
> com.datatorrent.common.util.FSStorageAgent.load(FSStorageAgent.java:137)
>   at 
> com.datatorrent.stram.engine.StreamingContainer.deployNodes(StreamingContainer.java:914)
>   at 
> com.datatorrent.stram.engine.StreamingContainer.deploy(StreamingContainer.java:862)
>   at 
> com.datatorrent.stram.engine.StreamingContainer.processHeartbeatResponse(StreamingContainer.java:820)
>   at 
> com.datatorrent.stram.engine.StreamingContainer.heartbeatLoop(StreamingContainer.java:705)
>   at 
> com.datatorrent.stram.engine.StreamingContainer.main(StreamingContainer.java:310)
>  context: 
> PTContainer[id=6(container_1492195730173_0001_01_12),state=ACTIVE,operators=[
> {code}
> Larger logs with more scope:
> {code}
> 2017-04-14 18:56:49,961 INFO com.datatorrent.stram.StreamingAppMaster: Master 
> starting with classpath: 
> ./portability-demo-bundled-apex.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-auth.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.7.3.jar:/usr/lib/hadoop/hadoop-nfs.jar:/usr/lib/hadoop/hadoop-common-2.7.3-tests.jar:/usr/lib/hadoop/hadoop-annotations-2.7.3.jar:/usr/lib/hadoop/hadoop-nfs-2.7.3.jar:/usr/lib/hadoop/hadoop-common.jar:/usr/lib/hadoop/hadoop-common-2.7.3.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/snappy-java-1.0.5.jar:/usr/lib/hadoop/lib/curator-recipes-2.7.1.jar:/usr/lib/hadoop/lib/commons-lang-2.6.jar:/usr/lib/hadoop/lib/hamcrest-core-1.3.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.19.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/hadoop/lib/commons-logging-1.1.3.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.9.13.jar:/usr/lib/hadoop/lib/jersey-core-1.9.jar:/usr/lib/hadoop/lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/lib/bigquery-connector-0.10.1-hadoop2.jar:/usr/lib/hadoop/lib/slf4j-api-1.7.10.jar:/usr/lib/hadoop/lib/avro-1.7.7.jar:/usr/lib/hadoop/lib/stax-api-1.0-2.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/xz-1.0.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/curator-framework-2.7.1.jar:/usr/lib/hadoop/lib/api-util-1.0.0-M20.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar:/usr/li

[jira] [Updated] (BEAM-2066) Strip IOChannel validation calls from file based IOs

2017-04-24 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2066:
--
Labels: backward-incompatible  (was: )

> Strip IOChannel validation calls from file based IOs
> 
>
> Key: BEAM-2066
> URL: https://issues.apache.org/jira/browse/BEAM-2066
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>Priority: Minor
>  Labels: backward-incompatible
>
> IOs like TextIO are still calling IOChannelUtils to validate the particular 
> spec they're given. This results in errors since in the new Beam FileSystem 
> world, IOChannelFactory does not know about all the registered IO schemas.
> per [~dhalp...@google.com], we can just strip out this validation code - if 
> there is an issue, we will catch it later in the read process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1716) Using Instant type in TableRow should map to the right output format for TIMESTAMP in BigQuery

2017-04-25 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983049#comment-15983049
 ] 

Daniel Halperin commented on BEAM-1716:
---

I do not believe directly using a Joda {{Instant}} in {{TableRow}} will work 
either in Beam or in the older Dataflow SDK.

Beam uses BigQuery’s streaming insert / load job API, and {{BigQueryIO} just 
takes whatever {{TableRow}} object the user provides and serializes it to JSON. 
My understanding is that BigQuery does not support importing directly Joda 
Instant — they expect some form of seconds since epoch or something like that.

Generally, your code looks good. A few tweaks might be relevant: Do you you 
want {{/ 1000.0}}? Do you need to care about time zones?

> Using Instant type in TableRow should map to the right output format for 
> TIMESTAMP in BigQuery
> --
>
> Key: BEAM-1716
> URL: https://issues.apache.org/jira/browse/BEAM-1716
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: Tobias Feldhaus
>  Labels: newbie, starter
> Fix For: First stable release
>
>
> When using an Instant type in a TableRow, and a TIMESTAMP field in the 
> TableFieldSchema, one has to convert the Instant via 
> instant.getMillis() / 1000
> to match the BigQuery TIMESTAMP that is "A positive or negative decimal 
> number. A positive number specifies the number of seconds since the epoch".
> In the Dataflow 1.9 SDK this conversion is done automatically.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1716) Using Instant type in TableRow should map to the right output format for TIMESTAMP in BigQuery

2017-04-25 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983053#comment-15983053
 ] 

Daniel Halperin commented on BEAM-1716:
---

Copying from chat: yes, I think that BigQuery accepts ISO-8601 timestamps, so 
`Instant#toString` may be enough: 
http://joda-time.sourceforge.net/apidocs/org/joda/time/base/AbstractInstant.html#toString()

> Using Instant type in TableRow should map to the right output format for 
> TIMESTAMP in BigQuery
> --
>
> Key: BEAM-1716
> URL: https://issues.apache.org/jira/browse/BEAM-1716
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: Tobias Feldhaus
>  Labels: newbie, starter
> Fix For: First stable release
>
>
> When using an Instant type in a TableRow, and a TIMESTAMP field in the 
> TableFieldSchema, one has to convert the Instant via 
> instant.getMillis() / 1000
> to match the BigQuery TIMESTAMP that is "A positive or negative decimal 
> number. A positive number specifies the number of seconds since the epoch".
> In the Dataflow 1.9 SDK this conversion is done automatically.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1716) Using Instant type in TableRow should map to the right output format for TIMESTAMP in BigQuery

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1716:
--
Fix Version/s: (was: First stable release)

> Using Instant type in TableRow should map to the right output format for 
> TIMESTAMP in BigQuery
> --
>
> Key: BEAM-1716
> URL: https://issues.apache.org/jira/browse/BEAM-1716
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: Tobias Feldhaus
>  Labels: newbie, starter
> Fix For: Not applicable
>
>
> When using an Instant type in a TableRow, and a TIMESTAMP field in the 
> TableFieldSchema, one has to convert the Instant via 
> instant.getMillis() / 1000
> to match the BigQuery TIMESTAMP that is "A positive or negative decimal 
> number. A positive number specifies the number of seconds since the epoch".
> In the Dataflow 1.9 SDK this conversion is done automatically.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1716) Using Instant type in TableRow should map to the right output format for TIMESTAMP in BigQuery

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-1716.
-

> Using Instant type in TableRow should map to the right output format for 
> TIMESTAMP in BigQuery
> --
>
> Key: BEAM-1716
> URL: https://issues.apache.org/jira/browse/BEAM-1716
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: Tobias Feldhaus
>  Labels: newbie, starter
> Fix For: Not applicable
>
>
> When using an Instant type in a TableRow, and a TIMESTAMP field in the 
> TableFieldSchema, one has to convert the Instant via 
> instant.getMillis() / 1000
> to match the BigQuery TIMESTAMP that is "A positive or negative decimal 
> number. A positive number specifies the number of seconds since the epoch".
> In the Dataflow 1.9 SDK this conversion is done automatically.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1716) Using Instant type in TableRow should map to the right output format for TIMESTAMP in BigQuery

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1716:
--
Fix Version/s: Not applicable

> Using Instant type in TableRow should map to the right output format for 
> TIMESTAMP in BigQuery
> --
>
> Key: BEAM-1716
> URL: https://issues.apache.org/jira/browse/BEAM-1716
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.5.0
>Reporter: Tobias Feldhaus
>  Labels: newbie, starter
> Fix For: Not applicable
>
>
> When using an Instant type in a TableRow, and a TIMESTAMP field in the 
> TableFieldSchema, one has to convert the Instant via 
> instant.getMillis() / 1000
> to match the BigQuery TIMESTAMP that is "A positive or negative decimal 
> number. A positive number specifies the number of seconds since the epoch".
> In the Dataflow 1.9 SDK this conversion is done automatically.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2061) Hint to template job creation in DataflowRunner / DataflowPipelineOptions

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2061:
-

Assignee: (was: Daniel Halperin)

> Hint to template job creation in DataflowRunner / DataflowPipelineOptions 
> --
>
> Key: BEAM-2061
> URL: https://issues.apache.org/jira/browse/BEAM-2061
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 0.6.0
>Reporter: Jonas Grabber
>Priority: Trivial
>  Labels: documentation
>
> Hello,
> the SDK documentation for both DataflowPipelineOptions and DataflowRunner 
> "fail" to mention the ability to create template jobs in Google Dataflow.
> Especially with the loss of the separate DataflowTemplateRunner this caused 
> confusion amongst my working colleagues and other Beam users I talked to.
> A short hit to the setTemplateLocation method would save a decent amount of 
> time across a lot of developers, I reckon.
> What's the procedure for these kind of (cosmetic) issues?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2058) BigQuery load job id should be generated at run time, not submission time

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2058:
-

Assignee: Reuven Lax  (was: Daniel Halperin)

> BigQuery load job id should be generated at run time, not submission time
> -
>
> Key: BEAM-2058
> URL: https://issues.apache.org/jira/browse/BEAM-2058
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>
> Currently the job id is generated at submission time, which means that 
> rerunning template jobs will produce the same job id. Generate at run time 
> instead, so a different job id is generated on each execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1870) ByteKey / ByteKeyRangeTracker should not use ByteString on public API surface

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1870.
---
Resolution: Fixed

> ByteKey / ByteKeyRangeTracker should not use ByteString on public API surface
> -
>
> Key: BEAM-1870
> URL: https://issues.apache.org/jira/browse/BEAM-1870
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> We don't want these Google Protocol Buffer dependencies on the public API. We 
> should replace the use of {{ByteString}} with something in the core Java 
> libraries.
> What's the open source standard here? I guess Avro uses {{ByteBuffer}} for 
> wrapping {{byte[]}} ?
> [~iemejia] -- tentatively assigned to you as you brought this up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2059) Implement Metrics support for streaming Dataflow runner

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2059:
-

Assignee: (was: Daniel Halperin)

> Implement Metrics support for streaming Dataflow runner
> ---
>
> Key: BEAM-2059
> URL: https://issues.apache.org/jira/browse/BEAM-2059
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Devon Meunier
>Priority: Minor
>
> Metrics are currently only available in batch mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2076) DirectRunner: minimal transitive API surface

2017-04-25 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-2076:
-

 Summary: DirectRunner: minimal transitive API surface
 Key: BEAM-2076
 URL: https://issues.apache.org/jira/browse/BEAM-2076
 Project: Beam
  Issue Type: Improvement
  Components: runner-direct
Reporter: Daniel Halperin
Assignee: Thomas Groh
 Fix For: First stable release


The {{DirectRunner}} is likely to accidentally be on many users' classpath. It 
should have a minimal transitive API surface, shading things it needs directly 
and need not expose.

My base assumption is that {{runners-core}} should be shaded. There may be 
others tho -- this merits a bit of a look.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2076) DirectRunner: minimal transitive API surface

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2076:
--
Description: 
The {{DirectRunner}} is likely to accidentally be on many users' classpath when 
they are running on other runners. As such, it should have a minimal transitive 
API surface, shading things it needs directly and need not expose.

My base assumption is that {{runners-core}} should be shaded. There may be 
others tho -- this merits a bit of a look.

  was:
The {{DirectRunner}} is likely to accidentally be on many users' classpath. It 
should have a minimal transitive API surface, shading things it needs directly 
and need not expose.

My base assumption is that {{runners-core}} should be shaded. There may be 
others tho -- this merits a bit of a look.


> DirectRunner: minimal transitive API surface
> 
>
> Key: BEAM-2076
> URL: https://issues.apache.org/jira/browse/BEAM-2076
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
> Fix For: First stable release
>
>
> The {{DirectRunner}} is likely to accidentally be on many users' classpath 
> when they are running on other runners. As such, it should have a minimal 
> transitive API surface, shading things it needs directly and need not expose.
> My base assumption is that {{runners-core}} should be shaded. There may be 
> others tho -- this merits a bit of a look.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-59) IOChannelFactory rethinking/redesign

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-59:
---

Assignee: Daniel Halperin  (was: Pei He)

> IOChannelFactory rethinking/redesign
> 
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign

2017-04-25 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983117#comment-15983117
 ] 

Daniel Halperin commented on BEAM-59:
-

Current status:

FileBasedSource has been converted to FileSystems API

FileBasedSink conversion ongoing.

After that, will be finding and destroying all uses of IOChannel* and then 
deleting it.

> IOChannelFactory rethinking/redesign
> 
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-59) Switch from IOChannelFactory to FileSystems

2017-04-25 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-59:

Summary: Switch from IOChannelFactory to FileSystems  (was: 
IOChannelFactory rethinking/redesign)

> Switch from IOChannelFactory to FileSystems
> ---
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2400) Null pointer exception when creating a template

2017-08-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2400:
-

Assignee: (was: Daniel Halperin)

> Null pointer exception when creating a template
> ---
>
> Key: BEAM-2400
> URL: https://issues.apache.org/jira/browse/BEAM-2400
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Melissa Pashniak
>Priority: Minor
>
> The template is successfully created, but is then followed by a null pointer 
> exception.
> Command:
> mvn compile exec:java  -Dexec.mainClass=com.example.WordCount  
> -Dexec.args="--runner=DataflowRunner \
>   --project=my-project \
>   --stagingLocation=gs://my-bucket/staging \
>   --output=gs://my-bucket/output/outputfile \
>   --templateLocation=gs://my-bucket/templates/mytemplate"
> INFO: Template successfully created.
> [WARNING] 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:489)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:465)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:304)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:240)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:193)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:186)
>   at com.example.WordCount.main(WordCount.java:184)
>   ... 6 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1494) GcsFileSystem should check content encoding when setting IsReadSeekEfficient

2017-08-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-1494:
-

Assignee: (was: Daniel Halperin)

> GcsFileSystem should check content encoding when setting IsReadSeekEfficient
> 
>
> Key: BEAM-1494
> URL: https://issues.apache.org/jira/browse/BEAM-1494
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Pei He
>
> It is incorrect to set IsReadSeekEfficient true for files with content 
> encoding set to gzip. This is an inherited issue from GcsIOChannelFactory.
> https://cloud.google.com/storage/docs/transcoding#content-type_vs_content-encoding



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2391) Migrate to gcloud java core for default GCP project id detection

2017-08-02 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2391:
-

Assignee: (was: Daniel Halperin)

> Migrate to gcloud java core for default GCP project id detection
> 
>
> Key: BEAM-2391
> URL: https://issues.apache.org/jira/browse/BEAM-2391
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Luke Cwik
>Priority: Minor
>
> This was exposed within the gcloud java core library with the following issue:
> https://github.com/GoogleCloudPlatform/google-cloud-java/issues/1207



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2096) NullPointerException in DataflowMetrics

2017-04-27 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986795#comment-15986795
 ] 

Daniel Halperin commented on BEAM-2096:
---

This looks like an actual bug in the DataflowRunner exposed by actually running 
tests. Woo!

Pablo, if this is something you know about please take a look -- but if not 
something we can triage fairly quickly then let's move on.

[~bchambers] – this type of error worries me. Before first stable release, can 
we make the DataflowMetrics code more resilient to evolution of metrics 
representations? It seems very hard to not break the current code.

> NullPointerException in DataflowMetrics
> ---
>
> Key: BEAM-2096
> URL: https://issues.apache.org/jira/browse/BEAM-2096
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Pablo Estrada
>Priority: Blocker
>
> This started failing some time yesterday it looks like: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/2949/#showFailuresLink
> Possible culprit at 
> https://github.com/apache/beam/commit/4faa8feba822db000b4b42636408245422ed324d#diff-fff6f69683ac0e26db7e3ad095d7cdf0
>  since that is when it starting failing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-2096) NullPointerException in DataflowMetrics

2017-04-27 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986795#comment-15986795
 ] 

Daniel Halperin edited comment on BEAM-2096 at 4/27/17 3:20 PM:


This looks like an actual bug in the DataflowRunner exposed by actually running 
tests. Woo!

Pablo, if this is something you know about please take a look -- but if not 
something you can handle fairly quickly then please triage forward. Needs to 
get fixed, but your work on Aggregators is a better use of your time.

[~bchambers] – this type of error worries me. Before first stable release, can 
we make the DataflowMetrics code more resilient to evolution of metrics 
representations? It seems very hard to not break the current code.


was (Author: dhalp...@google.com):
This looks like an actual bug in the DataflowRunner exposed by actually running 
tests. Woo!

Pablo, if this is something you know about please take a look -- but if not 
something we can triage fairly quickly then let's move on.

[~bchambers] – this type of error worries me. Before first stable release, can 
we make the DataflowMetrics code more resilient to evolution of metrics 
representations? It seems very hard to not break the current code.

> NullPointerException in DataflowMetrics
> ---
>
> Key: BEAM-2096
> URL: https://issues.apache.org/jira/browse/BEAM-2096
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Pablo Estrada
>Priority: Blocker
>
> This started failing some time yesterday it looks like: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/2949/#showFailuresLink
> Possible culprit at 
> https://github.com/apache/beam/commit/4faa8feba822db000b4b42636408245422ed324d#diff-fff6f69683ac0e26db7e3ad095d7cdf0
>  since that is when it starting failing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2096) NullPointerException in DataflowMetrics

2017-04-27 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2096:
--
Fix Version/s: First stable release

> NullPointerException in DataflowMetrics
> ---
>
> Key: BEAM-2096
> URL: https://issues.apache.org/jira/browse/BEAM-2096
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: First stable release
>
>
> This started failing some time yesterday it looks like: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/2949/#showFailuresLink
> Possible culprit at 
> https://github.com/apache/beam/commit/4faa8feba822db000b4b42636408245422ed324d#diff-fff6f69683ac0e26db7e3ad095d7cdf0
>  since that is when it starting failing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2096) NullPointerException in DataflowMetrics

2017-04-27 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2096.
-
Resolution: Fixed
  Assignee: Daniel Halperin  (was: Pablo Estrada)

> NullPointerException in DataflowMetrics
> ---
>
> Key: BEAM-2096
> URL: https://issues.apache.org/jira/browse/BEAM-2096
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Daniel Halperin
>Priority: Blocker
> Fix For: First stable release
>
>
> This started failing some time yesterday it looks like: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/2949/#showFailuresLink
> Possible culprit at 
> https://github.com/apache/beam/commit/4faa8feba822db000b4b42636408245422ed324d#diff-fff6f69683ac0e26db7e3ad095d7cdf0
>  since that is when it starting failing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2029) NullPointerException when using multi output ParDo in Spark runner in streaming mode.

2017-04-27 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2029.
-
Resolution: Fixed

> NullPointerException when using multi output ParDo in Spark runner in 
> streaming mode.
> -
>
> Key: BEAM-2029
> URL: https://issues.apache.org/jira/browse/BEAM-2029
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> Behavior:
> {{context.borrowDataset(transform)}} returns null.
> stackTrace
> {code}
> 17/04/20 15:00:58 INFO org.apache.beam.runners.spark.SparkRunner$Evaluator: 
> Evaluating GroupByKey
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$6.evaluate(StreamingTransformTranslator.java:272)
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$6.evaluate(StreamingTransformTranslator.java:267)
>   at 
> org.apache.beam.runners.spark.SparkRunner$Evaluator.doVisitTransform(SparkRunner.java:409)
>   at 
> org.apache.beam.runners.spark.SparkRunner$Evaluator.visitPrimitiveTransform(SparkRunner.java:395)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:488)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:232)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:207)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:384)
>   at 
> org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:88)
>   at 
> org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:47)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$10.apply(JavaStreamingContext.scala:776)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$10.apply(JavaStreamingContext.scala:775)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:864)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:775)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:155)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:85)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:276)
>   at 
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1232)
>   at 
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
>   at 
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2108) Integration tests for PubsubIO

2017-04-27 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2108:
--
Issue Type: Test  (was: Bug)

> Integration tests for PubsubIO
> --
>
> Key: BEAM-2108
> URL: https://issues.apache.org/jira/browse/BEAM-2108
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp, testing
>Reporter: Eugene Kirpichov
>
> PubsubIOTest doesn't contain a single test that reads or writes a Pubsub 
> topic. As far as I can tell, the public Beam repo doesn't have any tests that 
> do that, either. So, it's really easy to break Pubsub. It's particularly easy 
> to break because Dataflow runner has a special implementation of Pubsub.
> We should have tests for PubsubIO (read and write; with and without 
> attributes); running against direct runner and against Dataflow runner (since 
> Dataflow runner has a custom implementation of PubsubIO)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2108) Integration tests for PubsubIO

2017-04-27 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2108:
-

Assignee: (was: Daniel Halperin)

> Integration tests for PubsubIO
> --
>
> Key: BEAM-2108
> URL: https://issues.apache.org/jira/browse/BEAM-2108
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp, testing
>Reporter: Eugene Kirpichov
>
> PubsubIOTest doesn't contain a single test that reads or writes a Pubsub 
> topic. As far as I can tell, the public Beam repo doesn't have any tests that 
> do that, either. So, it's really easy to break Pubsub. It's particularly easy 
> to break because Dataflow runner has a special implementation of Pubsub.
> We should have tests for PubsubIO (read and write; with and without 
> attributes); running against direct runner and against Dataflow runner (since 
> Dataflow runner has a custom implementation of PubsubIO)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2108) Integration tests for PubsubIO

2017-04-27 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2108:
--
Component/s: testing

> Integration tests for PubsubIO
> --
>
> Key: BEAM-2108
> URL: https://issues.apache.org/jira/browse/BEAM-2108
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp, testing
>Reporter: Eugene Kirpichov
>
> PubsubIOTest doesn't contain a single test that reads or writes a Pubsub 
> topic. As far as I can tell, the public Beam repo doesn't have any tests that 
> do that, either. So, it's really easy to break Pubsub. It's particularly easy 
> to break because Dataflow runner has a special implementation of Pubsub.
> We should have tests for PubsubIO (read and write; with and without 
> attributes); running against direct runner and against Dataflow runner (since 
> Dataflow runner has a custom implementation of PubsubIO)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2108) Integration tests for PubsubIO

2017-04-27 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988257#comment-15988257
 ] 

Daniel Halperin commented on BEAM-2108:
---

I know that [~reuvenlax] was looking into this -- any progress?

> Integration tests for PubsubIO
> --
>
> Key: BEAM-2108
> URL: https://issues.apache.org/jira/browse/BEAM-2108
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp, testing
>Reporter: Eugene Kirpichov
>
> PubsubIOTest doesn't contain a single test that reads or writes a Pubsub 
> topic. As far as I can tell, the public Beam repo doesn't have any tests that 
> do that, either. So, it's really easy to break Pubsub. It's particularly easy 
> to break because Dataflow runner has a special implementation of Pubsub.
> We should have tests for PubsubIO (read and write; with and without 
> attributes); running against direct runner and against Dataflow runner (since 
> Dataflow runner has a custom implementation of PubsubIO)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2108) Integration tests for PubsubIO

2017-04-27 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988272#comment-15988272
 ] 

Daniel Halperin commented on BEAM-2108:
---

That would allow some deeper unit tests, however we still need some integration 
tests.

> Integration tests for PubsubIO
> --
>
> Key: BEAM-2108
> URL: https://issues.apache.org/jira/browse/BEAM-2108
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp, testing
>Reporter: Eugene Kirpichov
>
> PubsubIOTest doesn't contain a single test that reads or writes a Pubsub 
> topic. As far as I can tell, the public Beam repo doesn't have any tests that 
> do that, either. So, it's really easy to break Pubsub. It's particularly easy 
> to break because Dataflow runner has a special implementation of Pubsub.
> We should have tests for PubsubIO (read and write; with and without 
> attributes); running against direct runner and against Dataflow runner (since 
> Dataflow runner has a custom implementation of PubsubIO)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2114) KafkaIO broken with CoderException

2017-04-28 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2114:
--
Fix Version/s: First stable release

> KafkaIO broken with CoderException
> --
>
> Key: BEAM-2114
> URL: https://issues.apache.org/jira/browse/BEAM-2114
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Devon Meunier
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> For a KafkaIO.Read I simply replaced {{withKeyCoder}} and 
> {{withValueCoder}} with {{withKeyDeserializer}} and {{withValueDeserializer}} 
> using `StringDeserializer.class` on dataflow and I receive the following 
> traceback:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.beam.sdk.coders.CoderException: cannot encode a null String
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:115)
>   at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:136)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   ... 1 more
> Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null 
> String
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:75)
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:41)
>   at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:88)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:60)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:36)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:122)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:106)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:91)
>   at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:106)
>   at 
> org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44)
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:113)
>   ... 8 more
> {code}
> attempting to use {{readWithCoders(StringUtf8Coder.of(), 
> StringUtf8Coder.of())}} instead yields:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder$PopulateDisplayDataException:
>  Error while populating display data for component: 
> org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.include(DisplayData.java:801)
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.access$100(DisplayData.java:733)
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData.from(DisplayData.java:81)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator.evaluateDisplayData(DisplayDataValidator.java:47)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator.access$100(DisplayDataValidator.java:29)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator$Visitor.enterCo

[jira] [Updated] (BEAM-2114) KafkaIO broken with CoderException

2017-04-28 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-2114:
--
Component/s: (was: runner-dataflow)
 (was: sdk-java-core)
 sdk-java-extensions

> KafkaIO broken with CoderException
> --
>
> Key: BEAM-2114
> URL: https://issues.apache.org/jira/browse/BEAM-2114
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Devon Meunier
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> For a KafkaIO.Read I simply replaced {{withKeyCoder}} and 
> {{withValueCoder}} with {{withKeyDeserializer}} and {{withValueDeserializer}} 
> using `StringDeserializer.class` on dataflow and I receive the following 
> traceback:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.beam.sdk.coders.CoderException: cannot encode a null String
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:115)
>   at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:136)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   ... 1 more
> Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null 
> String
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:75)
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:41)
>   at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:88)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:60)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:36)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:122)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:106)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:91)
>   at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:106)
>   at 
> org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44)
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:113)
>   ... 8 more
> {code}
> attempting to use {{readWithCoders(StringUtf8Coder.of(), 
> StringUtf8Coder.of())}} instead yields:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder$PopulateDisplayDataException:
>  Error while populating display data for component: 
> org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.include(DisplayData.java:801)
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.access$100(DisplayData.java:733)
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData.from(DisplayData.java:81)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator.evaluateDisplayData(DisplayDataValidator.java:47)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator.access$100(DisplayDataValidator.java:29

[jira] [Commented] (BEAM-2114) KafkaIO broken with CoderException

2017-04-28 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988914#comment-15988914
 ] 

Daniel Halperin commented on BEAM-2114:
---


When using `withXDeserializer`, to parse data from the Kafka source, we still 
need a Coder for the `X` (Key or Value) once the data is in the pipeline. How 
this works today is here: 
https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L620

* we get the type from the KeyDeserializer and look for a coder that works with 
it.
* if the type isn’t found, we use the coders you specify.

It looks like there are two issues:

1. Documentation and messaging: the error messages could be greatly improved.

a. review the class-level javadoc
b. fix the error messaging. The error if the coder isn't specified 
should say something like "Unable to automatically infer a Coder for the Kafka 
Deserializer %s: no coder registered for type %s" .


2. During coder inference, wrap `NullableCoder` by default around the inferred 
coder. I think this is mandatory, unless Kafka Deserializers have specific ways 
to tell values cannot be null.

> KafkaIO broken with CoderException
> --
>
> Key: BEAM-2114
> URL: https://issues.apache.org/jira/browse/BEAM-2114
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Devon Meunier
>Assignee: Daniel Halperin
> Fix For: First stable release
>
>
> For a KafkaIO.Read I simply replaced {{withKeyCoder}} and 
> {{withValueCoder}} with {{withKeyDeserializer}} and {{withValueDeserializer}} 
> using `StringDeserializer.class` on dataflow and I receive the following 
> traceback:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.beam.sdk.coders.CoderException: cannot encode a null String
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:115)
>   at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:136)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   ... 1 more
> Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null 
> String
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:75)
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:41)
>   at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:88)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:60)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:36)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:122)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:106)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:91)
>   at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:106)
>   at 
> org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44)
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:113)
>   ... 8 more
> {code}
> attempting to use {{readWithCoders(StringUtf8Coder.of(), 
> StringUtf8Coder.of())}} instead yields:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>

[jira] [Assigned] (BEAM-2114) KafkaIO broken with CoderException

2017-04-28 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-2114:
-

Assignee: Devon Meunier  (was: Daniel Halperin)

> KafkaIO broken with CoderException
> --
>
> Key: BEAM-2114
> URL: https://issues.apache.org/jira/browse/BEAM-2114
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Devon Meunier
>Assignee: Devon Meunier
> Fix For: First stable release
>
>
> For a KafkaIO.Read I simply replaced {{withKeyCoder}} and 
> {{withValueCoder}} with {{withKeyDeserializer}} and {{withValueDeserializer}} 
> using `StringDeserializer.class` on dataflow and I receive the following 
> traceback:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.beam.sdk.coders.CoderException: cannot encode a null String
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:115)
>   at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:136)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   ... 1 more
> Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null 
> String
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:75)
>   at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:41)
>   at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:88)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:60)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:36)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:122)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:106)
>   at 
> org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:91)
>   at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:106)
>   at 
> org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44)
>   at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:113)
>   ... 8 more
> {code}
> attempting to use {{readWithCoders(StringUtf8Coder.of(), 
> StringUtf8Coder.of())}} instead yields:
> {code}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder$PopulateDisplayDataException:
>  Error while populating display data for component: 
> org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.include(DisplayData.java:801)
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder.access$100(DisplayData.java:733)
>   at 
> org.apache.beam.sdk.transforms.display.DisplayData.from(DisplayData.java:81)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator.evaluateDisplayData(DisplayDataValidator.java:47)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidator.access$100(DisplayDataValidator.java:29)
>   at 
> org.apache.beam.runners.direct.DisplayDataValidato

[jira] [Commented] (BEAM-2116) PubsubJsonClient doesn't write user created attributeMap

2017-04-28 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989005#comment-15989005
 ] 

Daniel Halperin commented on BEAM-2116:
---

I believe this issue is fixed at HEAD. You can try it by compiling from head or 
by using the nightly 0.7.0-SNAPSHOT from Maven.

https://github.com/apache/beam/commit/ad9df5b5591ce9d153039ac91e8862af6ea42b45

Can you try it out?

> PubsubJsonClient doesn't write user created attributeMap
> 
>
> Key: BEAM-2116
> URL: https://issues.apache.org/jira/browse/BEAM-2116
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.6.0
> Environment: Java Google Dataflow
>Reporter: Christopher Reilly
>Assignee: Daniel Halperin
>Priority: Minor
>
> PubsubJsonClient, which seems to be the hard coded client for 
> PubsubIO.write() doesn't seem to be respecting the attributes set by the user 
> for the PubsubMessage. 
> In the PubsubJsonClient.publish() method, the passed in OutgoingMessage that 
> contains the user set attribute map never actually has it's attributes map 
> read. Instead, a new PubsubMessage is instantiated and the empty 
> attributesMap from that is used. This is fixed in the PubsubGrpcClient, but 
> that client type is never used by default in any way. 
> public int publish(TopicPath topic, List outgoingMessages)
>   throws IOException {
> List pubsubMessages = new 
> ArrayList<>(outgoingMessages.size());
> for (OutgoingMessage outgoingMessage : outgoingMessages) {
>   PubsubMessage pubsubMessage = new 
> PubsubMessage().encodeData(outgoingMessage.elementBytes);
>   Map attributes = pubsubMessage.getAttributes();
>   if ((timestampLabel != null || idLabel != null) && attributes == null) {
> attributes = new TreeMap<>();
>   }
>   if (attributes != null) {
> pubsubMessage.setAttributes(attributes);
>   }
> Please let me know if I am going down the wrong path here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2116) PubsubJsonClient doesn't write user created attributeMap

2017-04-28 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989065#comment-15989065
 ] 

Daniel Halperin commented on BEAM-2116:
---

I just mean you should change your version to {{0.7.0-SNAPSHOT}} in your 
{{pom.xml}}. Make sure you have the Apache Snapshots repository listed in your 
{{}} section. For an example, see:

https://github.com/apache/beam/blob/master/sdks/java/maven-archetypes/starter/src/test/resources/projects/basic/reference/pom.xml#L31

> PubsubJsonClient doesn't write user created attributeMap
> 
>
> Key: BEAM-2116
> URL: https://issues.apache.org/jira/browse/BEAM-2116
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.6.0
> Environment: Java Google Dataflow
>Reporter: Christopher Reilly
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: First stable release
>
>
> PubsubJsonClient, which seems to be the hard coded client for 
> PubsubIO.write() doesn't seem to be respecting the attributes set by the user 
> for the PubsubMessage. 
> In the PubsubJsonClient.publish() method, the passed in OutgoingMessage that 
> contains the user set attribute map never actually has it's attributes map 
> read. Instead, a new PubsubMessage is instantiated and the empty 
> attributesMap from that is used. This is fixed in the PubsubGrpcClient, but 
> that client type is never used by default in any way. 
> public int publish(TopicPath topic, List outgoingMessages)
>   throws IOException {
> List pubsubMessages = new 
> ArrayList<>(outgoingMessages.size());
> for (OutgoingMessage outgoingMessage : outgoingMessages) {
>   PubsubMessage pubsubMessage = new 
> PubsubMessage().encodeData(outgoingMessage.elementBytes);
>   Map attributes = pubsubMessage.getAttributes();
>   if ((timestampLabel != null || idLabel != null) && attributes == null) {
> attributes = new TreeMap<>();
>   }
>   if (attributes != null) {
> pubsubMessage.setAttributes(attributes);
>   }
> Please let me know if I am going down the wrong path here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-2116) PubsubJsonClient doesn't write user created attributeMap

2017-04-28 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-2116.
-
   Resolution: Fixed
Fix Version/s: First stable release

> PubsubJsonClient doesn't write user created attributeMap
> 
>
> Key: BEAM-2116
> URL: https://issues.apache.org/jira/browse/BEAM-2116
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 0.6.0
> Environment: Java Google Dataflow
>Reporter: Christopher Reilly
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: First stable release
>
>
> PubsubJsonClient, which seems to be the hard coded client for 
> PubsubIO.write() doesn't seem to be respecting the attributes set by the user 
> for the PubsubMessage. 
> In the PubsubJsonClient.publish() method, the passed in OutgoingMessage that 
> contains the user set attribute map never actually has it's attributes map 
> read. Instead, a new PubsubMessage is instantiated and the empty 
> attributesMap from that is used. This is fixed in the PubsubGrpcClient, but 
> that client type is never used by default in any way. 
> public int publish(TopicPath topic, List outgoingMessages)
>   throws IOException {
> List pubsubMessages = new 
> ArrayList<>(outgoingMessages.size());
> for (OutgoingMessage outgoingMessage : outgoingMessages) {
>   PubsubMessage pubsubMessage = new 
> PubsubMessage().encodeData(outgoingMessage.elementBytes);
>   Map attributes = pubsubMessage.getAttributes();
>   if ((timestampLabel != null || idLabel != null) && attributes == null) {
> attributes = new TreeMap<>();
>   }
>   if (attributes != null) {
> pubsubMessage.setAttributes(attributes);
>   }
> Please let me know if I am going down the wrong path here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   >