[jira] [Commented] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure

2019-03-11 Thread Kenneth Jung (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790210#comment-16790210
 ] 

Kenneth Jung commented on BEAM-6794:


It is ready to close from my perspective. [~Ardagan] can you verify?

> [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
> --
>
> Key: BEAM-6794
> URL: https://issues.apache.org/jira/browse/BEAM-6794
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Kenneth Jung
>Priority: Critical
>  Labels: currently-failing, triaged
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> First failure
> [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/]
>  
> Culprit PR:
> https://github.com/apache/beam/pull/7967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6798) Reconsider usage of gradle release plugin

2019-03-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790167#comment-16790167
 ] 

Kenneth Knowles commented on BEAM-6798:
---

Are you doing modifications? Should this be assigned to you?

> Reconsider usage of gradle release plugin
> -
>
> Key: BEAM-6798
> URL: https://issues.apache.org/jira/browse/BEAM-6798
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Michael Luckey
>Priority: Major
>
> Currently, we use the gradle release plugin in a way probably not matching 
> plugins own expectations. Some of this was discussed in [1]
> After release branch was cut, we call [2]
> {noformat}
> ./gradlew release
> {noformat}
> Apart from doing some validations, this creates two commits changing version 
> property
>  # sets version in gradle.properties to '${RELEASE}-RC${RC_NUM}' (Commit_1)
>  # sets version in gradle.properties to back to '${RELEASE}-SNAPSHOT' 
> (Commit_2)
> Commit_1 will also be tagged as (tag: v${RELEASE}-RC${RC_NUM})
> Afterwards, we continue with 'Commit_2' in testing, bundling and publishing. 
> I.e. looking into source distribution published, this is not the one tagged, 
> but its successor. This is probably suboptimal.
> The release plugins expectations would probably more along the lines to 
> actually increment next version (either patch, minor or even major) and 
> release on that Commit_1.
> Based on my current understanding, it seems easier to either
>  * drop usage of gradle release plugin and just fall back to a plain 'exec 
> git tag'
>  * use a beam-release task which depends on gradle release checks, but does 
> no version changes nor commits
> The former has the drawback to also drop the checks done by release plugin, 
> e.g.
>  * checkCommitNeeded
>  * checkUpdateNeeded
>  * checkSnapshotDependencies
>  * runBuildTasks
>  * createReleaseTag
> which might be still valuable.
> [1] 
> [https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8@%3Cdev.beam.apache.org%3E]
>  [2] 
> [https://github.com/apache/beam/blob/master/release/src/main/scripts/build_release_candidate.sh#L92-L94]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3204) Coders only should have a FunctionSpec, not an SdkFunctionSpec

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3204:
--
Labels: portability triaged  (was: portability)

> Coders only should have a FunctionSpec, not an SdkFunctionSpec
> --
>
> Key: BEAM-3204
> URL: https://issues.apache.org/jira/browse/BEAM-3204
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: portability, triaged
>
> We added environments to coders to account for "custom" coders where it is 
> only really possible for one SDK to understand them, like this:
> {code}
> Coder {
>   spec: SdkFunctionSpec {
> environment: "java_sdk_docker_container",
> spec: FunctionSpec {
>   urn: "beam:coder:java_custom_coder",
>   payload: 
> }
>   }
> }
> {code}
> But a coder must be understood by both the producer of a PCollection and its 
> consumers. A coder is not the same as other UDF, though these are 
> user-defined.
> A pipeline where either the producer or consumer cannot handle the coder is 
> invalid, and we will have to build our cross-language APIs to prevent 
> construction of such a pipeline. So we can drop the environment.
> I think there are some folks who want to reserve the ability to add an 
> environment later, perhaps, to not pain ourselves into a corner. In this 
> case, we can just add a field to Coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4497) Add pages for master Javadocs / Pydocs and incorporate into post-commit job

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4497:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: BEAM-5671)

> Add pages for master Javadocs / Pydocs and incorporate into post-commit job
> ---
>
> Key: BEAM-4497
> URL: https://issues.apache.org/jira/browse/BEAM-4497
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability, triage
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3743) Support for SDF splitting protocol in ULR

2019-03-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790138#comment-16790138
 ] 

Kenneth Knowles commented on BEAM-3743:
---

[~robertwb] is this supported in the Python ULR?

> Support for SDF splitting protocol in ULR
> -
>
> Key: BEAM-3743
> URL: https://issues.apache.org/jira/browse/BEAM-3743
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, runner-direct
>Reporter: Eugene Kirpichov
>Priority: Major
>  Labels: portability, triaged
> Fix For: 2.6.0
>
>
> If I understand correctly what ULR does and where it currently stands - this 
> is the task for a reference implementation of the runner side of things from 
> https://s.apache.org/beam-breaking-fusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3223) PTransform spec should not reuse FunctionSpec

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3223:
--
Labels: portability triaged  (was: portability)

> PTransform spec should not reuse FunctionSpec
> -
>
> Key: BEAM-3223
> URL: https://issues.apache.org/jira/browse/BEAM-3223
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability, triaged
>
> We should add a new type instead, TransformSpec, say, or just inline a URN 
> and payload. It's confusing otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790142#comment-16790142
 ] 

Kenneth Knowles commented on BEAM-3279:
---

[~AlexKbit] great! I've added you to the Contributors permission so you can be 
assigned issues.

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: starter
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6794:
--
Issue Type: Bug  (was: New Feature)

> [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
> --
>
> Key: BEAM-6794
> URL: https://issues.apache.org/jira/browse/BEAM-6794
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Kenneth Jung
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> First failure
> [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/]
>  
> Culprit PR:
> https://github.com/apache/beam/pull/7967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4497) Add pages for master Javadocs / Pydocs and incorporate into post-commit job

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4497:
--
Labels: beam-site-automation-reliability triaged  (was: 
beam-site-automation-reliability triage)

> Add pages for master Javadocs / Pydocs and incorporate into post-commit job
> ---
>
> Key: BEAM-4497
> URL: https://issues.apache.org/jira/browse/BEAM-4497
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability, triaged
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5525) Intuitive default behavior for sdk_location pipeline option

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5525:
--
Labels: portability triaged  (was: portability)

> Intuitive default behavior for sdk_location pipeline option
> ---
>
> Key: BEAM-5525
> URL: https://issues.apache.org/jira/browse/BEAM-5525
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Affects Versions: 2.7.0
>Reporter: Thomas Weise
>Priority: Major
>  Labels: portability, triaged
>
> The current default value of "default" implies a Dataflow specific behavior 
> of the artifact stager. The same stager is also used by the portable runner, 
> which has to specify a value "container", which actually means to not stage 
> the SDK. That should be the default behavior and the default value for the 
> sdk_location should be None. The Dataflow runner can then specify a value 
> such as "pypi" which conveys more closely the expected behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5397) Flink portable runner GRPC cleanup failure after user class loader was removed

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5397:
--
Labels: portability triaged  (was: portability)

> Flink portable runner GRPC cleanup failure after user class loader was removed
> --
>
> Key: BEAM-5397
> URL: https://issues.apache.org/jira/browse/BEAM-5397
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Thomas Weise
>Priority: Major
>  Labels: portability, triaged
>
> Looks like another attempt to perform cleanup after close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4497) Add pages for master Javadocs / Pydocs and incorporate into post-commit job

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4497:
--
Labels: beam-site-automation-reliability triage  (was: 
beam-site-automation-reliability)

> Add pages for master Javadocs / Pydocs and incorporate into post-commit job
> ---
>
> Key: BEAM-4497
> URL: https://issues.apache.org/jira/browse/BEAM-4497
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability, triage
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5397) Flink portable runner GRPC cleanup failure after user class loader was removed

2019-03-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790140#comment-16790140
 ] 

Kenneth Knowles commented on BEAM-5397:
---

How about in 2.10.0 or 2.11.0?

> Flink portable runner GRPC cleanup failure after user class loader was removed
> --
>
> Key: BEAM-5397
> URL: https://issues.apache.org/jira/browse/BEAM-5397
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.8.0
>Reporter: Thomas Weise
>Priority: Major
>  Labels: portability, triaged
>
> Looks like another attempt to perform cleanup after close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3743) Support for SDF splitting protocol in ULR

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3743:
--
Labels: portability triaged  (was: portability)

> Support for SDF splitting protocol in ULR
> -
>
> Key: BEAM-3743
> URL: https://issues.apache.org/jira/browse/BEAM-3743
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, runner-direct
>Reporter: Eugene Kirpichov
>Priority: Major
>  Labels: portability, triaged
> Fix For: 2.6.0
>
>
> If I understand correctly what ULR does and where it currently stands - this 
> is the task for a reference implementation of the runner side of things from 
> https://s.apache.org/beam-breaking-fusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6794:
--
Priority: Critical  (was: Major)

> [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
> --
>
> Key: BEAM-6794
> URL: https://issues.apache.org/jira/browse/BEAM-6794
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Kenneth Jung
>Priority: Critical
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> First failure
> [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/]
>  
> Culprit PR:
> https://github.com/apache/beam/pull/7967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6794:
--
Labels: currently-failing triaged  (was: currently-failing)

> [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
> --
>
> Key: BEAM-6794
> URL: https://issues.apache.org/jira/browse/BEAM-6794
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Kenneth Jung
>Priority: Critical
>  Labels: currently-failing, triaged
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> First failure
> [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/]
>  
> Culprit PR:
> https://github.com/apache/beam/pull/7967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6794) [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure

2019-03-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790137#comment-16790137
 ] 

Kenneth Knowles commented on BEAM-6794:
---

Can this be closed?

> [beam_PostCommit_Java_PortabilityApi][testBigQueryStorageRead1G] coder failure
> --
>
> Key: BEAM-6794
> URL: https://issues.apache.org/jira/browse/BEAM-6794
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Kenneth Jung
>Priority: Critical
>  Labels: currently-failing, triaged
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> First failure
> [https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/1178/]
>  
> Culprit PR:
> https://github.com/apache/beam/pull/7967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6147) Python process environment factory

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6147:
--
Labels: portability portability-flink triaged  (was: portability 
portability-flink)

> Python process environment factory
> --
>
> Key: BEAM-6147
> URL: https://issues.apache.org/jira/browse/BEAM-6147
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink, sdk-py-harness
>Affects Versions: 2.9.0
>Reporter: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink, triaged
>
> Provide an easy to use process environment factory that allows for Python 
> worker execution as Docker alternative. Note that we have a base that the 
> user can configure and an attempt to utilize it for the Python Flink post 
> commit test. However, that setup is specific to the Jenkins environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6327) Don't attempt to fuse subtransforms of primitive/known transforms.

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6327?focusedWorklogId=211423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211423
 ]

ASF GitHub Bot logged work on BEAM-6327:


Author: ASF GitHub Bot
Created on: 12/Mar/19 00:48
Start Date: 12/Mar/19 00:48
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8011: [BEAM-6327] move 
pipeline trimming logic from Flink runner to core co…
URL: https://github.com/apache/beam/pull/8011#issuecomment-471800050
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211423)
Time Spent: 1h 50m  (was: 1h 40m)

> Don't attempt to fuse subtransforms of primitive/known transforms.
> --
>
> Key: BEAM-6327
> URL: https://issues.apache.org/jira/browse/BEAM-6327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Robert Bradshaw
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: triaged
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently we must remove all sub-components of any known transform that may 
> have an optional substructure, e.g. 
> [https://github.com/apache/beam/blob/release-2.9.0/sdks/python/apache_beam/runners/portability/portable_runner.py#L126]
>  (for GBK) and [https://github.com/apache/beam/pull/7360] (Reshuffle).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211415
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 12/Mar/19 00:35
Start Date: 12/Mar/19 00:35
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7675: [BEAM-6527] Use 
Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#issuecomment-471797048
 
 
   PTAL @tvalentyn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211415)
Time Spent: 3h 20m  (was: 3h 10m)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211386
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:51
Start Date: 11/Mar/19 23:51
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r264474846
 
 

 ##
 File path: sdks/python/test-suites/tox/py3/build.gradle
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Unit tests for Python 3
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
 
 Review comment:
   thanks! done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211386)
Time Spent: 3h 10m  (was: 3h)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6771?focusedWorklogId=211384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211384
 ]

ASF GitHub Bot logged work on BEAM-6771:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:45
Start Date: 11/Mar/19 23:45
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #8032: [BEAM-6771] 
MetricsContainerStepMap#equals required for Spark.
URL: https://github.com/apache/beam/pull/8032#issuecomment-471785643
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211384)
Time Spent: 40m  (was: 0.5h)

> Spark Runner Fails on Certain Versions of Spark 2.X
> ---
>
> Key: BEAM-6771
> URL: https://issues.apache.org/jira/browse/BEAM-6771
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.11.0
>Reporter: Kyle Winkelman
>Priority: Blocker
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When updating to Beam 2.11.0, I ran into the exception at the bottom of this 
> issue while running a pipeline on the Spark Runner (which worked in 2.9.0). 
> My cluster uses Spark 2.2.1.
> Related Issues:
> SPARK-23697 (Proof that equals must be implemented for items being 
> accumulated.)
> BEAM-1920 (In PR#3808, equals was implemented on MetricsContainerStepMap to 
> get Spark to run on 2.X.)
> My analysis has lead me to believe that BEAM-6138 is the reason for this 
> issue.
> Before this change, versions of Spark that are affected by SPARK-23697 would 
> create a new MetricsContainerStepMap and make sure that the copied and reset 
> instance (the one serialized for distribution) is equal to the initial empty 
> MetricsContainerStepMap that is passed in. This would effectively check if 
> two empty ConcurrentHashMaps were equal. This results in true.
> After this change, versions of Spark that are affected by SPARK-23697 would 
> effectively be checking if two empty ConcurrentHashMaps were equal _*AND*_ if 
> two different instances of the MetricsContainerImpl are equal. Because 
> MetricsContainerImpl doesn't implement equals, this results in false.
> I believe BEAM-6546 will fix this issue, but I wanted to raise a red flag. I 
> am also hoping someone can verify my analysis.
> {noformat}
> ERROR ApplicationMaster: User class threw exception: 
> java.lang.RuntimeException: java.lang.AssertionError: assertion failed: 
> copyAndReset must return a zero value copy
> java.lang.RuntimeException: java.lang.AssertionError: assertion failed: 
> copyAndReset must return a zero value copy
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:54)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:71)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:98)
>   at com.optum.analyticstore.execution.Exec.run(Exec.java:276)
>   at com.optum.analyticstore.execution.Exec.main(Exec.java:364)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> Caused by: java.lang.AssertionError: assertion failed: copyAndReset must 
> return a zero value copy
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> 

[jira] [Work logged] (BEAM-6703) Support Java 11 in Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6703?focusedWorklogId=211375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211375
 ]

ASF GitHub Bot logged work on BEAM-6703:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:11
Start Date: 11/Mar/19 23:11
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8010: [BEAM-6703] Added a 
phrase-triggered Jenkins job to test a Direct runner with Java 11 runtime
URL: https://github.com/apache/beam/pull/8010#issuecomment-471776988
 
 
   exciting : D
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211375)
Time Spent: 6.5h  (was: 6h 20m)

> Support Java 11 in Jenkins
> --
>
> Key: BEAM-6703
> URL: https://issues.apache.org/jira/browse/BEAM-6703
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, runner-direct
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In this issue I'll create a Jenkins job that compiles Dataflow and Direct 
> runners with tests using Java 8 and runs Validates Runner suites with Java 11 
> Runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=211374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211374
 ]

ASF GitHub Bot logged work on BEAM-5985:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:09
Start Date: 11/Mar/19 23:09
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7903: [BEAM-5985] Dataflow 
batch load test jobs
URL: https://github.com/apache/beam/pull/7903#issuecomment-471776612
 
 
   Ok this LGTM. Feel free to self-merge: )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211374)
Time Spent: 19h 50m  (was: 19h 40m)

> Create jenkins jobs to run the load tests for Java SDK
> --
>
> Key: BEAM-5985
> URL: https://issues.apache.org/jira/browse/BEAM-5985
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> How/how often/in what cases we run those tests is yet to be decided (this is 
> part of the task)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5985) Create jenkins jobs to run the load tests for Java SDK

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5985?focusedWorklogId=211373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211373
 ]

ASF GitHub Bot logged work on BEAM-5985:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:09
Start Date: 11/Mar/19 23:09
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #7903: [BEAM-5985] 
Dataflow batch load test jobs
URL: https://github.com/apache/beam/pull/7903#discussion_r264466167
 
 

 ##
 File path: .test-infra/jenkins/job_LoadTests_Java.groovy
 ##
 @@ -17,123 +17,215 @@
  */
 
 import CommonJobProperties as commonJobProperties
+import CommonTestProperties
 import LoadTestsBuilder as loadTestsBuilder
 import PhraseTriggeringPostCommitBuilder
+import CronJobBuilder
 
 def loadTestConfigurations = [
 [
-jobName   : 
'beam_Java_LoadTests_GroupByKey_Dataflow_Small',
-jobDescription: 'Runs GroupByKey load tests on Dataflow 
runner small records 10b',
-itClass   : 
'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest',
-prCommitStatusName: 'Java GroupByKey Small Load Test Dataflow',
-prTriggerPhrase   : 'Run GroupByKey Small Java Load Test 
Dataflow',
-runner: CommonTestProperties.Runner.DATAFLOW,
-sdk   : CommonTestProperties.SDK.JAVA,
-jobProperties : [
+title: 'Load test: 2GB of 10B records',
+itClass  : 
'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest',
+runner   : CommonTestProperties.Runner.DATAFLOW,
+jobProperties: [
 project : 'apache-beam-testing',
+appName : 
'load_tests_Java_Dataflow_Batch_GBK_1',
 tempLocation: 
'gs://temp-storage-for-perf-tests/loadtests',
 publishToBigQuery   : true,
-bigQueryDataset : 'load_test_PRs',
-bigQueryTable   : 'dataflow_gbk_small',
-sourceOptions   : 
'{"numRecords":10,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}',
-stepOptions : 
'{"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true,"perBundleDelay":1,"perBundleDelayType":"MIXED","cpuUtilizationInMixedDelay":0.5}',
-fanout  : 10,
+bigQueryDataset : 'load_test',
+bigQueryTable   : 'java_dataflow_batch_GBK_1',
+sourceOptions   : """
+{
+  "numRecords": 2,
+  "keySizeBytes": 1,
+  "valueSizeBytes": 9
+}
+   """.trim().replaceAll("\\s", ""),
+fanout  : 1,
 iterations  : 1,
-maxNumWorkers   : 32,
+maxNumWorkers   : 5,
+numWorkers  : 5,
+autoscalingAlgorithm: "NONE"
 ]
-
 ],
-]
+[
+title: 'Load test: 2GB of 100B records',
+itClass  : 
'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest',
+runner   : CommonTestProperties.Runner.DATAFLOW,
+jobProperties: [
+project : 'apache-beam-testing',
+appName : 
'load_tests_Java_Dataflow_Batch_GBK_2',
+tempLocation: 
'gs://temp-storage-for-perf-tests/loadtests',
+publishToBigQuery   : true,
+bigQueryDataset : 'load_test',
+bigQueryTable   : 'java_dataflow_batch_GBK_2',
+sourceOptions   : """
+{
+  "numRecords": 2000,
+  "keySizeBytes": 10,
+  "valueSizeBytes": 90
+}
+   """.trim().replaceAll("\\s", ""),
+fanout  : 1,
+iterations  : 1,
+  

[jira] [Work logged] (BEAM-6703) Support Java 11 in Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6703?focusedWorklogId=211372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211372
 ]

ASF GitHub Bot logged work on BEAM-6703:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:07
Start Date: 11/Mar/19 23:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8010: [BEAM-6703] Added a 
phrase-triggered Jenkins job to test a Direct runner with Java 11 runtime
URL: https://github.com/apache/beam/pull/8010#issuecomment-471776042
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211372)
Time Spent: 6h 20m  (was: 6h 10m)

> Support Java 11 in Jenkins
> --
>
> Key: BEAM-6703
> URL: https://issues.apache.org/jira/browse/BEAM-6703
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, runner-direct
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> In this issue I'll create a Jenkins job that compiles Dataflow and Direct 
> runners with tests using Java 8 and runs Validates Runner suites with Java 11 
> Runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6703) Support Java 11 in Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6703?focusedWorklogId=211371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211371
 ]

ASF GitHub Bot logged work on BEAM-6703:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:07
Start Date: 11/Mar/19 23:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8010: [BEAM-6703] 
Added a phrase-triggered Jenkins job to test a Direct runner with Java 11 
runtime
URL: https://github.com/apache/beam/pull/8010
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211371)
Time Spent: 6h 10m  (was: 6h)

> Support Java 11 in Jenkins
> --
>
> Key: BEAM-6703
> URL: https://issues.apache.org/jira/browse/BEAM-6703
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, runner-direct
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In this issue I'll create a Jenkins job that compiles Dataflow and Direct 
> runners with tests using Java 8 and runs Validates Runner suites with Java 11 
> Runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6771?focusedWorklogId=211370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211370
 ]

ASF GitHub Bot logged work on BEAM-6771:


Author: ASF GitHub Bot
Created on: 11/Mar/19 23:04
Start Date: 11/Mar/19 23:04
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8032: [BEAM-6771] 
MetricsContainerStepMap#equals required for Spark.
URL: https://github.com/apache/beam/pull/8032#issuecomment-471775206
 
 
   R: @ajamato 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211370)
Time Spent: 0.5h  (was: 20m)

> Spark Runner Fails on Certain Versions of Spark 2.X
> ---
>
> Key: BEAM-6771
> URL: https://issues.apache.org/jira/browse/BEAM-6771
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.11.0
>Reporter: Kyle Winkelman
>Priority: Blocker
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When updating to Beam 2.11.0, I ran into the exception at the bottom of this 
> issue while running a pipeline on the Spark Runner (which worked in 2.9.0). 
> My cluster uses Spark 2.2.1.
> Related Issues:
> SPARK-23697 (Proof that equals must be implemented for items being 
> accumulated.)
> BEAM-1920 (In PR#3808, equals was implemented on MetricsContainerStepMap to 
> get Spark to run on 2.X.)
> My analysis has lead me to believe that BEAM-6138 is the reason for this 
> issue.
> Before this change, versions of Spark that are affected by SPARK-23697 would 
> create a new MetricsContainerStepMap and make sure that the copied and reset 
> instance (the one serialized for distribution) is equal to the initial empty 
> MetricsContainerStepMap that is passed in. This would effectively check if 
> two empty ConcurrentHashMaps were equal. This results in true.
> After this change, versions of Spark that are affected by SPARK-23697 would 
> effectively be checking if two empty ConcurrentHashMaps were equal _*AND*_ if 
> two different instances of the MetricsContainerImpl are equal. Because 
> MetricsContainerImpl doesn't implement equals, this results in false.
> I believe BEAM-6546 will fix this issue, but I wanted to raise a red flag. I 
> am also hoping someone can verify my analysis.
> {noformat}
> ERROR ApplicationMaster: User class threw exception: 
> java.lang.RuntimeException: java.lang.AssertionError: assertion failed: 
> copyAndReset must return a zero value copy
> java.lang.RuntimeException: java.lang.AssertionError: assertion failed: 
> copyAndReset must return a zero value copy
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:54)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:71)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:98)
>   at com.optum.analyticstore.execution.Exec.run(Exec.java:276)
>   at com.optum.analyticstore.execution.Exec.main(Exec.java:364)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> Caused by: java.lang.AssertionError: assertion failed: copyAndReset must 
> return a zero value copy
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> 

[jira] [Work logged] (BEAM-6754) Support multi core machines for python pipeline on flink for loopback environment

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6754?focusedWorklogId=211363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211363
 ]

ASF GitHub Bot logged work on BEAM-6754:


Author: ASF GitHub Bot
Created on: 11/Mar/19 22:55
Start Date: 11/Mar/19 22:55
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7984: [BEAM-6754] Use 
subprocess instead of threads in loopback environment
URL: https://github.com/apache/beam/pull/7984#issuecomment-471772836
 
 
   Sounds good, updated the default to use thread.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211363)
Time Spent: 3h  (was: 2h 50m)

> Support multi core machines for python pipeline on flink for loopback 
> environment
> -
>
> Key: BEAM-6754
> URL: https://issues.apache.org/jira/browse/BEAM-6754
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Loopbck worker is shared across multiple taskmanagers on a single machine. We 
> should support starting multiple process in loopback worker based on number 
> of cores.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211355
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 11/Mar/19 22:29
Start Date: 11/Mar/19 22:29
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #8012: [BEAM-6777] Add 
HealthDaemon and tests
URL: https://github.com/apache/beam/pull/8012#issuecomment-471765270
 
 
   Is the expectation that this will ping an endpoint hosted by dataflow 
service or the runner harness?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211355)
Time Spent: 1h 40m  (was: 1.5h)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4265) Add a dead letter queue to Python streaming BigQuery sink

2019-03-11 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790025#comment-16790025
 ] 

Pablo Estrada commented on BEAM-4265:
-

I added this in Pr https://github.com/apache/beam/pull/7677

> Add a dead letter queue to Python streaming BigQuery sink
> -
>
> Key: BEAM-4265
> URL: https://issues.apache.org/jira/browse/BEAM-4265
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Priority: Major
>
> When writing to BigQuery using streaming writes, Java SDK supports writing 
> failed records to a dead letter queue: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1375]
>  
> This is a very useful feature for long running pipelines so we should add 
> this to Python BQ sink: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1279



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4265) Add a dead letter queue to Python streaming BigQuery sink

2019-03-11 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-4265.
-
   Resolution: Fixed
Fix Version/s: 2.12.0

> Add a dead letter queue to Python streaming BigQuery sink
> -
>
> Key: BEAM-4265
> URL: https://issues.apache.org/jira/browse/BEAM-4265
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Priority: Major
> Fix For: 2.12.0
>
>
> When writing to BigQuery using streaming writes, Java SDK supports writing 
> failed records to a dead letter queue: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1375]
>  
> This is a very useful feature for long running pipelines so we should add 
> this to Python BQ sink: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1279



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6808) Use gav or something equivalent in announcement for dependency uogrades

2019-03-11 Thread Romain Manni-Bucau (JIRA)
Romain Manni-Bucau created BEAM-6808:


 Summary: Use gav or something equivalent in announcement for 
dependency uogrades
 Key: BEAM-6808
 URL: https://issues.apache.org/jira/browse/BEAM-6808
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Affects Versions: 2.11.0
Reporter: Romain Manni-Bucau


Annoucement/changelog uses gradle variables which is not very user friendly 
since it is beam internals. Would be great to move to actual gav.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6443) decrease the number of threads for BigQuery streaming insertAll

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6443?focusedWorklogId=211352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211352
 ]

ASF GitHub Bot logged work on BEAM-6443:


Author: ASF GitHub Bot
Created on: 11/Mar/19 22:24
Start Date: 11/Mar/19 22:24
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #7547: [BEAM-6443] decrease the 
number of thread for BigQuery streaming inse…
URL: https://github.com/apache/beam/pull/7547#issuecomment-471763651
 
 
   > Can you describe how this PR has been tested at scale?
   
   I created UnboundedSource that generates very small (9 bytes) and maximum 
(1MB streaming insert row size limit) sized elements and ran a BigQuery 
inserting pipeline on DataflowRunner with multiple threadpool configurations 
(unlimited, single, 1 semaphored, 3 semaphored). Running time was about 20 
minutes each. You can find exact numbers in a benchmark note: 
https://docs.google.com/document/d/1EhRNWLevm86GD_QtvlrTauHITVMwQBzuemyp-w4Z_ck/edit#heading=h.c0angyd9tn21
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211352)
Time Spent: 4h 50m  (was: 4h 40m)

> decrease the number of threads for BigQuery streaming insertAll
> ---
>
> Key: BEAM-6443
> URL: https://issues.apache.org/jira/browse/BEAM-6443
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Labels: triaged
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> When inserting (a large number of ) very small elements into BigQuery via 
> streaming insertAll, BigQueryIO causes lots of quota exceeded errors. This 
> implies that 1) BigQueryIO puts unnecessary overheads on BigQuery API layer 
> by sending requests too fast 2) log file becomes very big because of repeated 
> same error messages. Currently we use 50 shards for writing data into 
> BigQuery and in each bundle 20-30 futures are executed simultaneously with 
> unlimited thread pool. It would be worth investigating whether just single 
> thread pool is sufficient for running concurrent insertAll.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6771?focusedWorklogId=211347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211347
 ]

ASF GitHub Bot logged work on BEAM-6771:


Author: ASF GitHub Bot
Created on: 11/Mar/19 22:17
Start Date: 11/Mar/19 22:17
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on pull request #8032: 
[BEAM-6771] MetricsContainerStepMap#equals required for Spark.
URL: https://github.com/apache/beam/pull/8032
 
 
   Please see the [jira](https://issues.apache.org/jira/browse/BEAM-6771) for 
information. I have tested this with a local build of release-2.11.0 branch and 
my pipeline now succeeds on Spark 2.2.1.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   See 

[jira] [Work logged] (BEAM-6443) decrease the number of threads for BigQuery streaming insertAll

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6443?focusedWorklogId=211339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211339
 ]

ASF GitHub Bot logged work on BEAM-6443:


Author: ASF GitHub Bot
Created on: 11/Mar/19 22:02
Start Date: 11/Mar/19 22:02
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #7547: [BEAM-6443] 
decrease the number of thread for BigQuery streaming inse…
URL: https://github.com/apache/beam/pull/7547#discussion_r264449198
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
 ##
 @@ -1001,4 +1007,141 @@ public void close() {
   client.close();
 }
   }
+
+  private static class BoundedExecutorService implements ExecutorService {
+private final ExecutorService executor;
+private final Semaphore semaphore;
+private final int parallelism;
+
+BoundedExecutorService(ExecutorService executor, int parallelism) {
+  this.executor = executor;
+  this.parallelism = parallelism;
+  this.semaphore = new Semaphore(parallelism);
+}
+
+@Override
+public void shutdown() {
+  executor.shutdown();
+}
+
+@Override
+public List shutdownNow() {
+  List runnables = executor.shutdownNow();
+  // try to release permits as many as possible before returning 
semaphored runnables.
+  synchronized (this) {
+if (semaphore.availablePermits() <= parallelism) {
+  semaphore.release(Integer.MAX_VALUE - parallelism);
 
 Review comment:
   I think we don't have to pair acquire() and release(). Excerpted from 
release() API doc:
   
   > There is no requirement that a thread that releases a permit must have 
acquired that permit by calling acquire().
   > 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html#release--
   
   The possible edge case would be that if we put the total number of permits 
more than Integer.MAX_VALUE by calling release() then it throws an exception. 
By checking availablePermits() before release() in synchronized section we can 
avoid those cases.
   
   Other option here is we can just return semaphored callables as is and 
document it clearly in a comment. I believe that this `BoundedExecutorService` 
class will hardly be reused anyway.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211339)
Time Spent: 4h 40m  (was: 4.5h)

> decrease the number of threads for BigQuery streaming insertAll
> ---
>
> Key: BEAM-6443
> URL: https://issues.apache.org/jira/browse/BEAM-6443
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Labels: triaged
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When inserting (a large number of ) very small elements into BigQuery via 
> streaming insertAll, BigQueryIO causes lots of quota exceeded errors. This 
> implies that 1) BigQueryIO puts unnecessary overheads on BigQuery API layer 
> by sending requests too fast 2) log file becomes very big because of repeated 
> same error messages. Currently we use 50 shards for writing data into 
> BigQuery and in each bundle 20-30 futures are executed simultaneously with 
> unlimited thread pool. It would be worth investigating whether just single 
> thread pool is sufficient for running concurrent insertAll.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211335
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:59
Start Date: 11/Mar/19 21:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8012: [BEAM-6777] Add 
HealthDaemon and tests
URL: https://github.com/apache/beam/pull/8012#issuecomment-471753341
 
 
   This looks good. Can you please squash the commits into one?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211335)
Time Spent: 1.5h  (was: 1h 20m)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6298) Can not insert into BigQuery table that is not empty

2019-03-11 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin reassigned BEAM-6298:


Assignee: (was: Xu Mingmin)

> Can not insert into BigQuery table that is not empty
> 
>
> Key: BEAM-6298
> URL: https://issues.apache.org/jira/browse/BEAM-6298
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.10.0
>Reporter: Luat Nguyen
>Priority: Major
>  Labels: triaged
>
> There is a Exception when I try to insert into BigQuery table that is not 
> empty.
> Example code Beam SQL:
> {code:java}
> BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery("INSERT INTO 
> D_CARD_LITE(DIM_ID) VALUES('')")){code}
> The exception messages as below:
> {code:java}
> java.lang.IllegalStateException: BigQuery table is not empty: 
> mydataset:samples.D_CARD_LITE.
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:518)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyTableNotExistOrEmpty(BigQueryHelpers.java:470)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.validate(BigQueryIO.java:1564)
>  at 
> org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:641)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:645)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
>  at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:577)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:312)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2478) Distinct Aggregates

2019-03-11 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin closed BEAM-2478.

   Resolution: Won't Do
Fix Version/s: Not applicable

It's supported by Calcite rules as Julian's comment.

> Distinct Aggregates
> ---
>
> Key: BEAM-2478
> URL: https://issues.apache.org/jira/browse/BEAM-2478
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Xu Mingmin
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>
> eg: COUNT(DISTINCT empno)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6185) Upgrade to Spark 2.4.0

2019-03-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789997#comment-16789997
 ] 

Ismaël Mejía commented on BEAM-6185:


Dataproc is still in 2.3.x but I think the time seems better now at least the 
majoirity is now in 2.4.x, Can we just wait the (on going vote) release of 
version Spark 2.4.1 before doing the move. In that moment we will re open JB's 
PR. WDYT [~aromanenko]?

> Upgrade to Spark 2.4.0
> --
>
> Key: BEAM-6185
> URL: https://issues.apache.org/jira/browse/BEAM-6185
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: triaged
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6807) Implement an Azure blobstore filesystem for Python SDK

2019-03-11 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-6807:
---

 Summary: Implement an Azure blobstore filesystem for Python SDK
 Key: BEAM-6807
 URL: https://issues.apache.org/jira/browse/BEAM-6807
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada


This is similar to BEAM-2572, but for Azure's blobstore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5203) expose PaneInfo and BoundedWindow as UDF

2019-03-11 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin closed BEAM-5203.

   Resolution: Won't Do
Fix Version/s: Not applicable

> expose PaneInfo and BoundedWindow as UDF
> 
>
> Key: BEAM-5203
> URL: https://issues.apache.org/jira/browse/BEAM-5203
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> besides adding new keywords in Calcite, there's an alternative way to expose 
> PaneInfo and BoundedWindow of Row by UDF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211322
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:44
Start Date: 11/Mar/19 21:44
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r264443764
 
 

 ##
 File path: sdks/python/scripts/run_tox.sh
 ##
 @@ -24,9 +24,10 @@
 
 ###
 # Usage check.
-if [[ $# != 1 ]]; then
-  printf "Usage: \n$> ./scripts/run_tox.sh "
+if [[ $# < 1 || $# > 2 ]]; then
+  printf "Usage: \n$> ./scripts/run_tox.sh  "
 
 Review comment:
   sg.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211322)
Time Spent: 2.5h  (was: 2h 20m)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5976) use AbstractInstant as DATEITME type in functions

2019-03-11 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin closed BEAM-5976.

   Resolution: Fixed
Fix Version/s: Not applicable

> use AbstractInstant as DATEITME type in functions
> -
>
> Key: BEAM-5976
> URL: https://issues.apache.org/jira/browse/BEAM-5976
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>  Labels: triaged
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> refer to discussion in 
> [https://github.com/apache/beam/pull/6913#discussion_r230148526]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6105) Support "partition by XXX order by XXX" SQL

2019-03-11 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin reassigned BEAM-6105:


Assignee: (was: Xu Mingmin)

> Support "partition by XXX order by XXX" SQL
> ---
>
> Key: BEAM-6105
> URL: https://issues.apache.org/jira/browse/BEAM-6105
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brandon Jiang
>Priority: Minor
>  Labels: triaged
>
> Based on our expereince, looks like for bounded stream, beam SQL does not 
> able to support statement like "partition by XXX order by XXX". It will not 
> be able to parition data to different nodes and sorting data in each 
> partition/node parallelly.
> We have to use Java SDK and extension to convert following SQL statement to 
> GroupByKey + SortValues to achieve this. 
>  
> Does we miss anything? If not, is this something that we can improve? and 
> took a quick look at calcite, seems that it can explain the query plan for 
> "partition by... order by..." fine.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6297) There is a NullPointerException when read null-value field in BigQuery table

2019-03-11 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin reassigned BEAM-6297:


Assignee: (was: Xu Mingmin)

> There is a NullPointerException when read null-value field in BigQuery table
> 
>
> Key: BEAM-6297
> URL: https://issues.apache.org/jira/browse/BEAM-6297
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.10.0
>Reporter: Luat Nguyen
>Priority: Major
>  Labels: triaged
>
> I run query on a BigQuery table by Beam SQL.
> Ex: 
> {code:java}
> BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery("SELECT * FROM 
> X_bigquery_table"));
> {code}
> There is a NullPointerException when it reads null-value field in the 
> BigQuery table as below:
> {code:java}
> Dec 22, 2018 11:05:21 AM org.apache.beam.sdk.io.FileBasedSource createReader
> INFO: Matched 1 files for pattern 
> gs://xxx/tmp/BigQueryExtractTemp/a84545971aa94cf6b6717984e9d71642/.avro
> java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.AvroUtils.convertAvroString(AvroUtils.java:81)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.AvroUtils.convertAvroPrimitiveTypes(AvroUtils.java:104)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.AvroUtils.convertAvroFormat(AvroUtils.java:46)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRow(BigQueryUtils.java:206)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils$ToBeamRow.apply(BigQueryUtils.java:198)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils$ToBeamRow.apply(BigQueryUtils.java:185)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:221)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:214)
>  at 
> org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:567)
>  at 
> org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:209)
>  at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484)
>  at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479)
>  at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249)
>  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:147)
>  at 
> org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
>  at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211328
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:51
Start Date: 11/Mar/19 21:51
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r264445998
 
 

 ##
 File path: sdks/python/scripts/run_tox.sh
 ##
 @@ -24,9 +24,10 @@
 
 ###
 # Usage check.
-if [[ $# != 1 ]]; then
-  printf "Usage: \n$> ./scripts/run_tox.sh "
+if [[ $# < 1 || $# > 2 ]]; then
+  printf "Usage: \n$> ./scripts/run_tox.sh  "
   printf "\n\ttox_environment: [required] Tox environment to run the test 
in.\n"
+  printf "\n\tsdk_location: [optional] SDK artifact location to install 
dependencies.\n"
 
 Review comment:
   `--install` behaves the same as `--installpkg` but we should use the 
documented one. Thanks for pointing out.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211328)
Time Spent: 3h  (was: 2h 50m)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211327
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:51
Start Date: 11/Mar/19 21:51
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r264445998
 
 

 ##
 File path: sdks/python/scripts/run_tox.sh
 ##
 @@ -24,9 +24,10 @@
 
 ###
 # Usage check.
-if [[ $# != 1 ]]; then
-  printf "Usage: \n$> ./scripts/run_tox.sh "
+if [[ $# < 1 || $# > 2 ]]; then
+  printf "Usage: \n$> ./scripts/run_tox.sh  "
   printf "\n\ttox_environment: [required] Tox environment to run the test 
in.\n"
+  printf "\n\tsdk_location: [optional] SDK artifact location to install 
dependencies.\n"
 
 Review comment:
   `--install` behaves the same as `--installpkg` but we should use the 
documented one. Thanks for pointed out.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211327)
Time Spent: 2h 50m  (was: 2h 40m)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211326
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:48
Start Date: 11/Mar/19 21:48
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r26893
 
 

 ##
 File path: sdks/python/test-suites/tox/py3/build.gradle
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Unit tests for Python 3
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+applyPythonNature()
+
+// Required to setup a Python 3 virtualenv.
+project.ext.python3 = true
+
+task lint {}
+check.dependsOn lint
+
+toxTask "lintPy3", "py3-lint"
+lint.dependsOn lintPy3
+
+toxTask "testPython3", "py3"
+test.dependsOn testPython3
+
+toxTask "testPy3Gcp", "py3-gcp"
+test.dependsOn testPy3Gcp
+
+toxTask "testPy3Cython", "py3-cython"
+test.dependsOn testPy3Cython
+// Ensure that testPy3Cython runs exclusively to other tests.
+testPy3Cython.mustRunAfter testPython3, testPy3Gcp
+testPy3Cython.mustRunAfter ':beam-sdks-python:testPy2Cython'
 
 Review comment:
   testPy3Cython and testPy2Cython run in parallel even I use `finalizedBy` in 
[here](https://github.com/apache/beam/pull/7675/files#diff-c197962302397baf3a4cc36463dce5eaR197).
 If they do not affect each other, I can remove this line.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211326)
Time Spent: 2h 40m  (was: 2.5h)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211320
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:39
Start Date: 11/Mar/19 21:39
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r264442150
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1611,18 +1611,48 @@ class BeamModulePlugin implements Plugin {
 outputs.dirs(project.ext.envdir)
   }
 
+  def pythonSdkDeps = project.files(
+  project.fileTree(dir: "${project.rootDir}", include: [
+'model/**',
+'sdks/python/apache_beam/**/*.py',
+'sdks/python/apache_beam/**/*.pyx',
+'sdks/python/apache_beam/**/*.pxd',
+'sdks/python/apache_beam/testing/data/**',
+'sdks/python/apache_beam/scripts/**',
+'sdks/python/.pylintrc',
+'sdks/python/MANIFEST.in',
+'sdks/python/gen_protos.py',
+'sdks/python/setup.cfg',
+'sdks/python/setup.py',
+'sdks/python/test_config.py',
+'sdks/python/tox.ini',
+  ])
+  )
+
   project.configurations { distConfig }
 
   project.task('sdist', dependsOn: 'setupVirtualenv') {
 doLast {
+  // Copy sdk sources to isolate directory
+  def copiedSrcDir = "${project.buildDir}/srcs"
+  project.copy {
+from pythonSdkDeps
+into copiedSrcDir
+  }
+
+  // Build artifact
   project.exec {
 executable 'sh'
-args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${pythonRootDir} && python setup.py sdist --keep-temp --formats zip,gztar 
--dist-dir ${project.buildDir}"
+args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedSrcDir}/sdks/python && python setup.py sdist --formats zip,gztar 
--dist-dir ${project.buildDir}"
 
 Review comment:
   I added this flag to fix the parallel failure in integration tests since the 
temp directory is shared between different build processes and by default it's 
deleted after a build finish. However, it never works for tox tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211320)
Time Spent: 2h 10m  (was: 2h)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6527) Parallel tox (unit) tests run on Jenkins

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6527?focusedWorklogId=211321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211321
 ]

ASF GitHub Bot logged work on BEAM-6527:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:44
Start Date: 11/Mar/19 21:44
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7675: 
[BEAM-6527] Use Gradle to parallel Python tox tests
URL: https://github.com/apache/beam/pull/7675#discussion_r264443718
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1673,6 +1703,20 @@ class BeamModulePlugin implements Plugin {
 }
 return argList.join(' ')
   }
+
+  project.ext.toxTask = { name, tox_env ->
+project.tasks.create(name) {
+  dependsOn = ['sdist']
+  doLast {
+project.exec {
+  executable 'sh'
+  args '-c', ". ${project.ext.envdir}/bin/activate && 
${pythonRootDir}/scripts/run_tox.sh $tox_env 
${project.buildDir}/apache-beam.tar.gz"
 
 Review comment:
   I only pass tarball to the script. `$tox_env` is the name of environment 
that we want to run (like `py27-lint, py3-gcp`). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211321)
Time Spent: 2h 20m  (was: 2h 10m)

> Parallel tox (unit) tests run on Jenkins
> 
>
> Key: BEAM-6527
> URL: https://issues.apache.org/jira/browse/BEAM-6527
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Existing tox unit test suite (basic, gcp and cython) will be enabled in 
> multiple version of Python 3, which will significantly increase runtime of 
> Pre/PostCommit build. A parallel is wanted in tox or Gradle invocation to 
> control the time in a reasonable range (<30mins for PreCommit is desired).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6711) Bigquery Tornadoes IT is broken in Python3 PostCommit test suite.

2019-03-11 Thread Tanay Tummalapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789988#comment-16789988
 ] 

Tanay Tummalapalli commented on BEAM-6711:
--

[~tvalentyn] I'll find the answer to those questions.

 

> Bigquery Tornadoes IT is broken in Python3 PostCommit test suite. 
> --
>
> Key: BEAM-6711
> URL: https://issues.apache.org/jira/browse/BEAM-6711
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> First failure was observed in 
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/54 , after 
> https://github.com/apache/beam/commit/cdea885872b3be7de9ba22f22700be89f7d53766
>  was merged. 
> [~pabloem], could you please take a look? I suggest we do a rollback + 
> rollforward with a fix.
> {noformat}
> root: ERROR: Exception at bundle 
> , 
> due to an exception.
>  Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py",
>  line 727, in process
> return self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py",
>  line 556, in invoke_process
> windowed_value, additional_args, additional_kwargs, output_processor)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py",
>  line 622, in _invoke_per_window
> self.process_method(*args_for_process, **kwargs_for_process))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/common.py",
>  line 823, in process_outputs
> for result in results:
>   File "/home/jenkins/jenkins-slave/works
> pace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py",
>  line 191, in process
> if destination in self._destination_to_file_writer:
> TypeError: unhashable type: 'TableReference'
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211302
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:17
Start Date: 11/Mar/19 21:17
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #8012: 
[BEAM-6777] Add HealthDaemon and tests
URL: https://github.com/apache/beam/pull/8012#discussion_r264434326
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/health_daemon.py
 ##
 @@ -0,0 +1,121 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import errno
+import http.client
+import logging
+import socket
+import time
+from builtins import object
+
+
+class HealthDaemon(object):
+  """Sends periodic HTTP PUT /sdk requests to the health server.
+
+  The purpose of this class is to communicate to the health server that this
+  SDK Harness is alive. If this SDK Harness does not communicate to the health
+  server after a configured amount of time, the health server will restart the
+  container.
+
+  Expected Usage:
+# The HealthDaemon is expected to spin forever, start it on a separate
+# thread.
+health_thread = threading.Thread(target=HealthDaemon(8080).start)
+
+# Automatically kill the thread when the program exists.
+health_thread.daemon = True
+health_thread.setName('health-client-demon')
+
+# Start the HealthDaemon.
+health_thread.start()
+
+  """
+
+  def __init__(self, health_http_port):
+self._health_http_port = health_http_port
+
+  @staticmethod
+  def connect_to_server(health_http_port, timeout=5):
+"""Connects to the health server on the given port.
+
+Args:
+  health_http_port(int): Binding port for the debug server.
+Default is 0 which means any free unsecured port
+  timeout(int): Timeout in seconds for all operations.
+
+Returns:
+  The connection to the health server.
+"""
+
+logging.info('Connecting to localhost:%s', health_http_port)
+return http.client.HTTPConnection('localhost', health_http_port,
+  timeout=timeout)
+
+  @staticmethod
+  def try_health_ping(health_server):
+"""Attempts to ping the given health server.
+
+Args:
+  health_server(http.client.HTTPConnection): Connection to the health
+server.
+
+Returns:
+  True if the health ping succeeded, false otherwise.
+"""
+
+success = False
+try:
+  health_server.request('PUT', '/sdk')
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211302)
Time Spent: 1h 20m  (was: 1h 10m)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211301
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 11/Mar/19 21:17
Start Date: 11/Mar/19 21:17
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #8012: 
[BEAM-6777] Add HealthDaemon and tests
URL: https://github.com/apache/beam/pull/8012#discussion_r264434288
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/health_daemon.py
 ##
 @@ -0,0 +1,121 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import errno
+import http.client
+import logging
+import socket
+import time
+from builtins import object
+
+
+class HealthDaemon(object):
+  """Sends periodic HTTP PUT /sdk requests to the health server.
+
+  The purpose of this class is to communicate to the health server that this
+  SDK Harness is alive. If this SDK Harness does not communicate to the health
+  server after a configured amount of time, the health server will restart the
+  container.
+
+  Expected Usage:
+# The HealthDaemon is expected to spin forever, start it on a separate
+# thread.
+health_thread = threading.Thread(target=HealthDaemon(8080).start)
+
+# Automatically kill the thread when the program exists.
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211301)
Time Spent: 1h 10m  (was: 1h)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread Michael Luckey (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Luckey resolved BEAM-6726.
--
Resolution: Fixed

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211288
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 20:56
Start Date: 11/Mar/19 20:56
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on issue #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#issuecomment-471726275
 
 
   Nice. Thx for merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211288)
Time Spent: 4h  (was: 3h 50m)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211284
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 20:53
Start Date: 11/Mar/19 20:53
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211284)
Time Spent: 3h 50m  (was: 3h 40m)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6795) Improve Release Scripts

2019-03-11 Thread Michael Luckey (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789950#comment-16789950
 ] 

Michael Luckey commented on BEAM-6795:
--

In discussion of PR [#8026|https://github.com/apache/beam/pull/8026] it was 
suggested to add some consistency validations
- check that user input matches across scripts, especially the signing key

Current script implementations do not support here.

> Improve Release Scripts
> ---
>
> Key: BEAM-6795
> URL: https://issues.apache.org/jira/browse/BEAM-6795
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ahmet Altay
>Priority: Major
>
> - Scripts use sudo to install binaries. Could be improved by local 
> installations, or perhaps using a container for build the release.
> - Scripts make changes to bashrc file (e.g. alias hub to git), these could be 
> avoided. Even though scripts attempt make a backup file, it is easy to 
> override them if the script is cancelled.
> - There are too many yes/no questions, configuration questions for 
> validations. They are not set and forget requires attention. (Possible 
> solutions: use command line arguments)
> - Once script fails at any step (e.g. invalid password at a step) it fails 
> without giving a second chance and requires re-running from the top. 
> (Posssible idea: use breadcrumbs to continue the script for its last known 
> location.)
> - Signing with GPG is not friendly when used from a remote terminal. Has 
> modal dialogs and does not interact well with gradle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6619) Add PostCommit suite for integration tests on DataflowRunner

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6619?focusedWorklogId=211278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211278
 ]

ASF GitHub Bot logged work on BEAM-6619:


Author: ASF GitHub Bot
Created on: 11/Mar/19 20:31
Start Date: 11/Mar/19 20:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8016: [BEAM-6619] 
[BEAM-6593] Add pubsub integration tests to postcommit
URL: https://github.com/apache/beam/pull/8016#issuecomment-471717285
 
 
   r: @tvalentyn PTAL?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211278)
Time Spent: 13h 40m  (was: 13.5h)

> Add PostCommit suite for integration tests on DataflowRunner
> 
>
> Key: BEAM-6619
> URL: https://issues.apache.org/jira/browse/BEAM-6619
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6619) Add PostCommit suite for integration tests on DataflowRunner

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6619?focusedWorklogId=211276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211276
 ]

ASF GitHub Bot logged work on BEAM-6619:


Author: ASF GitHub Bot
Created on: 11/Mar/19 20:30
Start Date: 11/Mar/19 20:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8016: [BEAM-6619] 
[BEAM-6593] Add pubsub integration tests to postcommit
URL: https://github.com/apache/beam/pull/8016#issuecomment-471717226
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211276)
Time Spent: 13.5h  (was: 13h 20m)

> Add PostCommit suite for integration tests on DataflowRunner
> 
>
> Key: BEAM-6619
> URL: https://issues.apache.org/jira/browse/BEAM-6619
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6771) Spark Runner Fails on Certain Versions of Spark 2.X

2019-03-11 Thread Kyle Winkelman (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Winkelman updated BEAM-6771:
-
Priority: Blocker  (was: Critical)

> Spark Runner Fails on Certain Versions of Spark 2.X
> ---
>
> Key: BEAM-6771
> URL: https://issues.apache.org/jira/browse/BEAM-6771
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.11.0
>Reporter: Kyle Winkelman
>Priority: Blocker
>
> When updating to Beam 2.11.0, I ran into the exception at the bottom of this 
> issue while running a pipeline on the Spark Runner (which worked in 2.9.0). 
> My cluster uses Spark 2.2.1.
> Related Issues:
> SPARK-23697 (Proof that equals must be implemented for items being 
> accumulated.)
> BEAM-1920 (In PR#3808, equals was implemented on MetricsContainerStepMap to 
> get Spark to run on 2.X.)
> My analysis has lead me to believe that BEAM-6138 is the reason for this 
> issue.
> Before this change, versions of Spark that are affected by SPARK-23697 would 
> create a new MetricsContainerStepMap and make sure that the copied and reset 
> instance (the one serialized for distribution) is equal to the initial empty 
> MetricsContainerStepMap that is passed in. This would effectively check if 
> two empty ConcurrentHashMaps were equal. This results in true.
> After this change, versions of Spark that are affected by SPARK-23697 would 
> effectively be checking if two empty ConcurrentHashMaps were equal _*AND*_ if 
> two different instances of the MetricsContainerImpl are equal. Because 
> MetricsContainerImpl doesn't implement equals, this results in false.
> I believe BEAM-6546 will fix this issue, but I wanted to raise a red flag. I 
> am also hoping someone can verify my analysis.
> {noformat}
> ERROR ApplicationMaster: User class threw exception: 
> java.lang.RuntimeException: java.lang.AssertionError: assertion failed: 
> copyAndReset must return a zero value copy
> java.lang.RuntimeException: java.lang.AssertionError: assertion failed: 
> copyAndReset must return a zero value copy
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:54)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:71)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:98)
>   at com.optum.analyticstore.execution.Exec.run(Exec.java:276)
>   at com.optum.analyticstore.execution.Exec.main(Exec.java:364)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> Caused by: java.lang.AssertionError: assertion failed: copyAndReset must 
> return a zero value copy
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 

[jira] [Updated] (BEAM-6806) org.apache.beam.runners not importing in 2.10 & 2.11

2019-03-11 Thread Steven Jon Anderson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Jon Anderson updated BEAM-6806:
--
Description: 
When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages 
under org.apache.beam.runners disappears (does not load, does not exist), 
breaking our scripts. This is preventing us from upgrading from 2.9.

The error:
{code:java}
The import org.apache.beam.runners cannot be resolved.{code}
Classes we need:
{code:java}
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code}
Relevant POM
{code:java}

    org.apache.beam
    beam-sdks-java-core
    2.11.0


    org.apache.beam
    beam-sdks-java-io-google-cloud-platform
    2.11.0


    org.apache.beam
    beam-runners-google-cloud-dataflow-java
    2.11.0
    runtime


    org.apache.beam
    beam-runners-direct-java
    2.11.0
    runtime

{code}

  was:
When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages 
under org.apache.beam.runners disappears (does not load, does not exist), 
breaking our scripts. This is preventing us from upgrading from 2.9.

 

The error:
{code:java}
The import org.apache.beam.runners cannot be resolved.{code}
Classes we need:
{code:java}
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code}
Relevant POM
{code:java}

    org.apache.beam
    beam-sdks-java-core
    2.11.0


    org.apache.beam
    beam-sdks-java-io-google-cloud-platform
    2.11.0


    org.apache.beam
    beam-runners-google-cloud-dataflow-java
    2.11.0
    runtime


    org.apache.beam
    beam-runners-direct-java
    2.11.0
    runtime

{code}


> org.apache.beam.runners not importing in 2.10 & 2.11
> 
>
> Key: BEAM-6806
> URL: https://issues.apache.org/jira/browse/BEAM-6806
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.10.0, 2.11.0
>Reporter: Steven Jon Anderson
>Priority: Blocker
>
> When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages 
> under org.apache.beam.runners disappears (does not load, does not exist), 
> breaking our scripts. This is preventing us from upgrading from 2.9.
> The error:
> {code:java}
> The import org.apache.beam.runners cannot be resolved.{code}
> Classes we need:
> {code:java}
> org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
> org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code}
> Relevant POM
> {code:java}
> 
>     org.apache.beam
>     beam-sdks-java-core
>     2.11.0
> 
> 
>     org.apache.beam
>     beam-sdks-java-io-google-cloud-platform
>     2.11.0
> 
> 
>     org.apache.beam
>     beam-runners-google-cloud-dataflow-java
>     2.11.0
>     runtime
> 
> 
>     org.apache.beam
>     beam-runners-direct-java
>     2.11.0
>     runtime
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6806) org.apache.beam.runners not importing in 2.10 & 2.11

2019-03-11 Thread Steven Jon Anderson (JIRA)
Steven Jon Anderson created BEAM-6806:
-

 Summary: org.apache.beam.runners not importing in 2.10 & 2.11
 Key: BEAM-6806
 URL: https://issues.apache.org/jira/browse/BEAM-6806
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: 2.11.0, 2.10.0
Reporter: Steven Jon Anderson


When trying to upgrade our 2.9.0 pipeline to 2.10 or 2.11, all the packages 
under org.apache.beam.runners disappears (does not load, does not exist), 
breaking our scripts. This is preventing us from upgrading from 2.9.

 

The error:
{code:java}
The import org.apache.beam.runners cannot be resolved.{code}
Classes we need:
{code:java}
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType{code}
Relevant POM
{code:java}

    org.apache.beam
    beam-sdks-java-core
    2.11.0


    org.apache.beam
    beam-sdks-java-io-google-cloud-platform
    2.11.0


    org.apache.beam
    beam-runners-google-cloud-dataflow-java
    2.11.0
    runtime


    org.apache.beam
    beam-runners-direct-java
    2.11.0
    runtime

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6735) WriteFiles with runner-determined sharding is forced to handle spilling

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6735?focusedWorklogId=211261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211261
 ]

ASF GitHub Bot logged work on BEAM-6735:


Author: ASF GitHub Bot
Created on: 11/Mar/19 20:09
Start Date: 11/Mar/19 20:09
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #7929: [BEAM-6735] 
Add noSpilling option to WriteFiles.
URL: https://github.com/apache/beam/pull/7929#issuecomment-471705486
 
 
   Done!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211261)
Time Spent: 1h 20m  (was: 1h 10m)

> WriteFiles with runner-determined sharding is forced to handle spilling
> ---
>
> Key: BEAM-6735
> URL: https://issues.apache.org/jira/browse/BEAM-6735
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kyle Winkelman
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As a result of BEAM-2302, files in excess of WriteFiles 
> maxNumWritersPerBundle are shuffled to be written later. The downside to this 
> is that even if you can guarantee that maxNumWritersPerBundle is high enough 
> to handle all writes you still have to pay the overhead of this write now 
> being a MultiOutput ParDo.
> e.g. In the Spark Runner when a ParDo has multiple outputs the returned data 
> is cached and if using the disableCache pipeline option it would cause 
> recalculation and all the temp files would be written again.
> I'm sure that the Spark Runner is not the only runner that would benefit from 
> an optional setting for WriteFiles that would skip this spilling and simplify 
> the pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6713) FileIO and TextIO unable to alter WriteFiles maxNumWritersPerBundle

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6713?focusedWorklogId=211243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211243
 ]

ASF GitHub Bot logged work on BEAM-6713:


Author: ASF GitHub Bot
Created on: 11/Mar/19 19:46
Start Date: 11/Mar/19 19:46
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on pull request #7893: 
[BEAM-6713] Add withMaxNumWritersPerBundle from WriteFiles to FileIO …
URL: https://github.com/apache/beam/pull/7893#discussion_r264399407
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##
 @@ -431,7 +431,8 @@
 .setNumShards(0)
 .setCodec(TypedWrite.DEFAULT_SERIALIZABLE_CODEC)
 .setMetadata(ImmutableMap.of())
-.setWindowedWrites(false);
+.setWindowedWrites(false)
+
.setMaxNumWritersPerBundle(WriteFiles.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE);
 
 Review comment:
   I have also come up with another approach to my issue:  #7929. So it may be 
unnecessary to expose this if that is the consensus. I just want to highlight 
that it was a huge pain to work around this limitation so I could set a higher 
max. I had to copy most of the FileIO class because its all private internal 
stuff so that I could call WriteFiles on my own with a higher max.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211243)
Time Spent: 1h 20m  (was: 1h 10m)

> FileIO and TextIO unable to alter WriteFiles maxNumWritersPerBundle
> ---
>
> Key: BEAM-6713
> URL: https://issues.apache.org/jira/browse/BEAM-6713
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kyle Winkelman
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When attempting to run a batch workflow with a FileIO.write() I was getting 
> job failures due to WriteFiles.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE causing a 
> significant amount of data to be shuffled. My issues would be solved by 
> increasing this and luckily WriteFiles already has withMaxNumWritersPerBundle 
> but unfortunately FileIO and TextIO do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6748) Block size difference in avro library on Python3 causes some AvroIO tests to fail.

2019-03-11 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-6748.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Block size difference in avro library on Python3 causes some AvroIO tests to 
> fail.
> --
>
> Key: BEAM-6748
> URL: https://issues.apache.org/jira/browse/BEAM-6748
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *apache_beam.io.avroio_test.TestAvro.test_split_points*
> *apache_beam.io.avroio_test.TestFastAvro.test_split_points*
> fail with:
>  
> {code:java}
> Traceback (most recent call last):
>  File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 308, in test_split_points
>  self.assertEquals(split_points_report[-10:], [(2, 1)] * 10)
> AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 
> chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)]
> First differing element 0:
> (10, 1)
> (2, 1)
> + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), 
> (2, 1)]
> - [(10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1)] {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6735) WriteFiles with runner-determined sharding is forced to handle spilling

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6735?focusedWorklogId=211210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211210
 ]

ASF GitHub Bot logged work on BEAM-6735:


Author: ASF GitHub Bot
Created on: 11/Mar/19 18:37
Start Date: 11/Mar/19 18:37
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7929: [BEAM-6735] Add 
noSpilling option to WriteFiles.
URL: https://github.com/apache/beam/pull/7929#issuecomment-471667430
 
 
   Ismael may also be able to review if Luke can't
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211210)
Time Spent: 1h 10m  (was: 1h)

> WriteFiles with runner-determined sharding is forced to handle spilling
> ---
>
> Key: BEAM-6735
> URL: https://issues.apache.org/jira/browse/BEAM-6735
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kyle Winkelman
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As a result of BEAM-2302, files in excess of WriteFiles 
> maxNumWritersPerBundle are shuffled to be written later. The downside to this 
> is that even if you can guarantee that maxNumWritersPerBundle is high enough 
> to handle all writes you still have to pay the overhead of this write now 
> being a MultiOutput ParDo.
> e.g. In the Spark Runner when a ParDo has multiple outputs the returned data 
> is cached and if using the disableCache pipeline option it would cause 
> recalculation and all the temp files would be written again.
> I'm sure that the Spark Runner is not the only runner that would benefit from 
> an optional setting for WriteFiles that would skip this spilling and simplify 
> the pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6493) examples in Kotlin

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6493?focusedWorklogId=211208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211208
 ]

ASF GitHub Bot logged work on BEAM-6493:


Author: ASF GitHub Bot
Created on: 11/Mar/19 18:36
Start Date: 11/Mar/19 18:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7807: [BEAM-6493] Add 
wordcount example in kotlin
URL: https://github.com/apache/beam/pull/7807#issuecomment-471667011
 
 
   @the-dagger ping : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211208)
Time Spent: 2h 10m  (was: 2h)
Remaining Estimate: 502h 50m  (was: 503h)

> examples in Kotlin
> --
>
> Key: BEAM-6493
> URL: https://issues.apache.org/jira/browse/BEAM-6493
> Project: Beam
>  Issue Type: Task
>  Components: examples-java
>Affects Versions: Not applicable
>Reporter: Harshit Dwivedi
>Assignee: Harshit Dwivedi
>Priority: Minor
>  Labels: documentation
> Fix For: Not applicable
>
>   Original Estimate: 504h
>  Time Spent: 2h 10m
>  Remaining Estimate: 502h 50m
>
> I have been using Apache Beam for few of my projects in production since the 
> past 6 months and apart from Java, [Kotlin|https://kotlinlang.org/] also 
> seems to work as well with no issues whatsoever.
> But currently, the Github Repository of Apache Beam contains examples only in 
> Java which might be an issue for other developers who want to use Apache Beam 
> SDK with kotlin as there are no sample resources available.
> That said, I would love to go ahead and add kotlin examples alongside the 
> current java examples in the [Beam 
> repository|https://github.com/apache/beam/tree/master/examples/java].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6735) WriteFiles with runner-determined sharding is forced to handle spilling

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6735?focusedWorklogId=211209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211209
 ]

ASF GitHub Bot logged work on BEAM-6735:


Author: ASF GitHub Bot
Created on: 11/Mar/19 18:36
Start Date: 11/Mar/19 18:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7929: [BEAM-6735] Add 
noSpilling option to WriteFiles.
URL: https://github.com/apache/beam/pull/7929#issuecomment-471667330
 
 
   Kyle could you rebase this? And would you mind adding the Javadoc? <3 Luke 
is out on leave, but he'll be back soon and he can review...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211209)
Time Spent: 1h  (was: 50m)

> WriteFiles with runner-determined sharding is forced to handle spilling
> ---
>
> Key: BEAM-6735
> URL: https://issues.apache.org/jira/browse/BEAM-6735
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kyle Winkelman
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As a result of BEAM-2302, files in excess of WriteFiles 
> maxNumWritersPerBundle are shuffled to be written later. The downside to this 
> is that even if you can guarantee that maxNumWritersPerBundle is high enough 
> to handle all writes you still have to pay the overhead of this write now 
> being a MultiOutput ParDo.
> e.g. In the Spark Runner when a ParDo has multiple outputs the returned data 
> is cached and if using the disableCache pipeline option it would cause 
> recalculation and all the temp files would be written again.
> I'm sure that the Spark Runner is not the only runner that would benefit from 
> an optional setting for WriteFiles that would skip this spilling and simplify 
> the pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1251) Python 3 Support

2019-03-11 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789854#comment-16789854
 ] 

Valentyn Tymofieiev commented on BEAM-1251:
---

A recently released Apache Beam 2.11.0 is the first release to offer partial 
support for Python 3.5+. Python 3  support remains an active work in progress, 
and the support offered in 2.11.0 has limitations and known issues.
 * Beam 2.11.0 release has been tested only with Python 3.5 on  Direct and 
Dataflow runners.
 * IO availability is limited on Python 3 as of Beam 2.11.0:

 * BEAM-4543: Datastore IO connector is not available in Python 3.  
 * BEAM-6522: Avro IO connector has issues in Python 3. 
 * BEAM-5844: VCF IO connector is not available in Python 3.
 * BEAM-6769: BigQuery IO does not support raw bytes in Python 3.


 * Dataflow Runner supports Python 2.7 and 3.5 versions only and will not send 
jobs to Dataflow service if the SDK is running using a different version of the 
interpreter.

 * Other known issues:
 ** Main sessions that contain invocations of superclass constructors fail to 
save: [https://github.com/uqfoundation/dill/issues/300]
 ** New syntactic constructs introduced in Python 3, may not be supported in 
Beam 2.11:
 *** BEAM-5878 - Support functions with keyword-only arguments.
 * Breaking changes in Beam 2.11.0: 
 ** BEAM-5731 - Top.Of and Top.PerKey no longer accepts a compare parameter in 
line with Python's change to its sorting operations.

We will likely uncover more Python 3-related issues in the future and we 
appreciate your feedback. Feel free to report reproducible Python-3 related 
issues as sub-tasks in BEAM-1251.

Contributions are welcome, see: 
[https://beam.apache.org/roadmap/python-sdk/#python-3-support] for details.

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Labels: triaged
>  Time Spent: 28h 50m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=211204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211204
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 11/Mar/19 18:25
Start Date: 11/Mar/19 18:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8012: [BEAM-6777] 
Add HealthDaemon and tests
URL: https://github.com/apache/beam/pull/8012#discussion_r264054020
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/health_daemon.py
 ##
 @@ -0,0 +1,121 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import errno
+import http.client
+import logging
+import socket
+import time
+from builtins import object
+
+
+class HealthDaemon(object):
+  """Sends periodic HTTP PUT /sdk requests to the health server.
+
+  The purpose of this class is to communicate to the health server that this
+  SDK Harness is alive. If this SDK Harness does not communicate to the health
+  server after a configured amount of time, the health server will restart the
+  container.
+
+  Expected Usage:
+# The HealthDaemon is expected to spin forever, start it on a separate
+# thread.
+health_thread = threading.Thread(target=HealthDaemon(8080).start)
+
+# Automatically kill the thread when the program exists.
+health_thread.daemon = True
+health_thread.setName('health-client-demon')
+
+# Start the HealthDaemon.
+health_thread.start()
+
+  """
+
+  def __init__(self, health_http_port):
+self._health_http_port = health_http_port
+
+  @staticmethod
+  def connect_to_server(health_http_port, timeout=5):
+"""Connects to the health server on the given port.
+
+Args:
+  health_http_port(int): Binding port for the debug server.
+Default is 0 which means any free unsecured port
+  timeout(int): Timeout in seconds for all operations.
+
+Returns:
+  The connection to the health server.
+"""
+
+logging.info('Connecting to localhost:%s', health_http_port)
+return http.client.HTTPConnection('localhost', health_http_port,
+  timeout=timeout)
+
+  @staticmethod
+  def try_health_ping(health_server):
+"""Attempts to ping the given health server.
+
+Args:
+  health_server(http.client.HTTPConnection): Connection to the health
+server.
+
+Returns:
+  True if the health ping succeeded, false otherwise.
+"""
+
+success = False
+try:
+  health_server.request('PUT', '/sdk')
 
 Review comment:
   nit: Maybe add `'/sdk'` to a class variable? `HEALTH_CHECK_ENDPOINT` or 
something like that?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211204)
Time Spent: 1h  (was: 50m)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5638) Add exception handling to single message transforms in Java SDK

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5638?focusedWorklogId=211194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211194
 ]

ASF GitHub Bot logged work on BEAM-5638:


Author: ASF GitHub Bot
Created on: 11/Mar/19 18:00
Start Date: 11/Mar/19 18:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7736: [BEAM-5638] Exception 
handling for Java MapElements and FlatMapElements
URL: https://github.com/apache/beam/pull/7736#issuecomment-471653308
 
 
   @reuvenlax 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211194)
Time Spent: 9h 10m  (was: 9h)
Remaining Estimate: 158h 50m  (was: 159h)

> Add exception handling to single message transforms in Java SDK
> ---
>
> Key: BEAM-5638
> URL: https://issues.apache.org/jira/browse/BEAM-5638
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Minor
>  Labels: triaged
>   Original Estimate: 168h
>  Time Spent: 9h 10m
>  Remaining Estimate: 158h 50m
>
> Add methods to MapElements, FlatMapElements, and Filter that allow users to 
> specify expected exceptions and tuple tags to associate with the with 
> collections of the successfully and unsuccessfully processed elements.
> See discussion on dev list:
> https://lists.apache.org/thread.html/936ed2a5f2c01be066fd903abf70130625e0b8cf4028c11b89b8b23f@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6719) Allow multiple Joins in the same pipeline

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6719?focusedWorklogId=211191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211191
 ]

ASF GitHub Bot logged work on BEAM-6719:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:56
Start Date: 11/Mar/19 17:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7813: [BEAM-6719] Allow 
multiple Joins in the same pipeline
URL: https://github.com/apache/beam/pull/7813#issuecomment-471651587
 
 
   I've requested myself and ismael as reviewers. I'll take a look soon.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211191)
Time Spent: 0.5h  (was: 20m)

> Allow multiple Joins in the same pipeline
> -
>
> Key: BEAM-6719
> URL: https://issues.apache.org/jira/browse/BEAM-6719
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-join-library
>Reporter: Daniel Mescheder
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently it is not possible to have multiple joins in the same pipeline 
> without wrapping them in individual PTransforms as this would generate name 
> clashes.
> Consider the following test case:
> {code:java}
> @Test
> public void testMultipleJoinsInSamePipeline() { 
>   leftListOfKv.add(KV.of("Key2", 4L)); 
>   PCollection> leftCollection = p.apply("CreateLeft", 
> Create.of(leftListOfKv));
>   rightListOfKv.add(KV.of("Key2", "bar")); 
>   PCollection> rightCollection = p.apply("CreateRight", 
> Create.of(rightListOfKv));
>   expectedResult.add(KV.of("Key2", KV.of(4L, "bar")));
>   PCollection>> output1 = 
> Join.innerJoin(leftCollection, rightCollection);
>   PCollection>> output2 = 
> Join.innerJoin(leftCollection, rightCollection);
>  PAssert.that(output1).containsInAnyOrder(expectedResult);
>  PAssert.that(output2).containsInAnyOrder(expectedResult);
>  p.run(); 
> }
> {code}
> This fails because of clashing names in the pipeline and there is currently 
> no way to use the join library to give the joins different names.
> Therefore I find myself routinely wrapping joins in new PTransforms which 
> leads me to believe that this should be part of the library itself.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6719) Allow multiple Joins in the same pipeline

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6719?focusedWorklogId=211190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211190
 ]

ASF GitHub Bot logged work on BEAM-6719:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:55
Start Date: 11/Mar/19 17:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7813: [BEAM-6719] Allow 
multiple Joins in the same pipeline
URL: https://github.com/apache/beam/pull/7813#issuecomment-471651186
 
 
   Hello Daniel! I'm so sorry that we did not pick this up. Luke is away on 
leave, so we'd need to get you a new reviewer.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211190)
Time Spent: 20m  (was: 10m)

> Allow multiple Joins in the same pipeline
> -
>
> Key: BEAM-6719
> URL: https://issues.apache.org/jira/browse/BEAM-6719
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-join-library
>Reporter: Daniel Mescheder
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently it is not possible to have multiple joins in the same pipeline 
> without wrapping them in individual PTransforms as this would generate name 
> clashes.
> Consider the following test case:
> {code:java}
> @Test
> public void testMultipleJoinsInSamePipeline() { 
>   leftListOfKv.add(KV.of("Key2", 4L)); 
>   PCollection> leftCollection = p.apply("CreateLeft", 
> Create.of(leftListOfKv));
>   rightListOfKv.add(KV.of("Key2", "bar")); 
>   PCollection> rightCollection = p.apply("CreateRight", 
> Create.of(rightListOfKv));
>   expectedResult.add(KV.of("Key2", KV.of(4L, "bar")));
>   PCollection>> output1 = 
> Join.innerJoin(leftCollection, rightCollection);
>   PCollection>> output2 = 
> Join.innerJoin(leftCollection, rightCollection);
>  PAssert.that(output1).containsInAnyOrder(expectedResult);
>  PAssert.that(output2).containsInAnyOrder(expectedResult);
>  p.run(); 
> }
> {code}
> This fails because of clashing names in the pipeline and there is currently 
> no way to use the join library to give the joins different names.
> Therefore I find myself routinely wrapping joins in new PTransforms which 
> leads me to believe that this should be part of the library itself.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3660?focusedWorklogId=211189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211189
 ]

ASF GitHub Bot logged work on BEAM-3660:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:54
Start Date: 11/Mar/19 17:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7231: [BEAM-3660] Port 
ReadSpannerSchemaTest off DoFnTester
URL: https://github.com/apache/beam/pull/7231#issuecomment-471650610
 
 
   @Nisuuum : ( happy to merge, just looking for answers on the previous 
question
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211189)
Time Spent: 1h 10m  (was: 1h)

> Port ReadSpannerSchemaTest off DoFnTester
> -
>
> Key: BEAM-3660
> URL: https://issues.apache.org/jira/browse/BEAM-3660
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Evgeniy Musin
>Priority: Major
>  Labels: beginner, newbie, starter, triaged
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4164) Make unit tests of CassandraIO use embeded server

2019-03-11 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-4164:
--
Fix Version/s: (was: 2.11.0)
   2.12.0

> Make unit tests of CassandraIO use embeded server
> -
>
> Key: BEAM-4164
> URL: https://issues.apache.org/jira/browse/BEAM-4164
> Project: Beam
>  Issue Type: Test
>  Components: io-java-cassandra
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: triaged
> Fix For: 2.12.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The UT currently use a mock of the cassandra server. It would be good to do 
> the tests using embeded Cassandra instance to be as close as possible from a 
> real Cassandra server in the UT. Why not the one from cassandra-unit 
> ([https://mvnrepository.com/artifact/org.cassandraunit/cassandra-unit/3.3.0.2])
>  ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6805) assertTrue used where assertEquals should be

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6805?focusedWorklogId=211175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211175
 ]

ASF GitHub Bot logged work on BEAM-6805:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:45
Start Date: 11/Mar/19 17:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #7806: [BEAM-6805] 
Use assertEquals(x, y) instead of assertTrue(x.equals(y))
URL: https://github.com/apache/beam/pull/7806
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211175)
Time Spent: 10m
Remaining Estimate: 0h

> assertTrue used where assertEquals should be
> 
>
> Key: BEAM-6805
> URL: https://issues.apache.org/jira/browse/BEAM-6805
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6292) PasswordDecrypter: Delay decryption / Avoid serialization

2019-03-11 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-6292:
--
Fix Version/s: (was: 2.11.0)
   2.12.0

> PasswordDecrypter: Delay decryption / Avoid serialization
> -
>
> Key: BEAM-6292
> URL: https://issues.apache.org/jira/browse/BEAM-6292
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Reporter: Mathieu Blanchard
>Assignee: Mathieu Blanchard
>Priority: Minor
>  Labels: triaged
> Fix For: 2.12.0
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Currently, the password is decrypted before the serialization of the pipeline 
> and this causes the raw version to be visible to everyone on the staging 
> location.
> To avoid this, we delayed the decryption of the password when connecting to 
> the cluster, which ensures that the raw password is never serialized in the 
> pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6805) assertTrue used where assertEquals should be

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6805?focusedWorklogId=211177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211177
 ]

ASF GitHub Bot logged work on BEAM-6805:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:45
Start Date: 11/Mar/19 17:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7806: [BEAM-6805] Use 
assertEquals(x, y) instead of assertTrue(x.equals(y))
URL: https://github.com/apache/beam/pull/7806#issuecomment-471647338
 
 
   Sorry about the delay.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211177)
Time Spent: 20m  (was: 10m)

> assertTrue used where assertEquals should be
> 
>
> Key: BEAM-6805
> URL: https://issues.apache.org/jira/browse/BEAM-6805
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6591) CassandraIO split does not work in some corner cases.

2019-03-11 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-6591:
--
Fix Version/s: (was: 2.11.0)
   2.12.0

> CassandraIO split does not work in some corner cases.
> -
>
> Key: BEAM-6591
> URL: https://issues.apache.org/jira/browse/BEAM-6591
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: triaged
> Fix For: 2.12.0
>
>
> CassandraIO split uses token ranges to split data in the Read part of the IO. 
> When one split ends up using the minimum token in the token ring, then the IO 
> reads all the data in one split, leading to duplication. This is due to 
> behavior of Cassandra: see 
> https://issues.apache.org/jira/browse/CASSANDRA-14684



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6805) assertTrue used where assertEquals should be

2019-03-11 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-6805:
---

 Summary: assertTrue used where assertEquals should be
 Key: BEAM-6805
 URL: https://issues.apache.org/jira/browse/BEAM-6805
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Pablo Estrada






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6748) Block size difference in avro library on Python3 causes some AvroIO tests to fail.

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6748?focusedWorklogId=211172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211172
 ]

ASF GitHub Bot logged work on BEAM-6748:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:42
Start Date: 11/Mar/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8015: [BEAM-6748] Account 
for synchronization interval when estimating amount of blocks in generated Avro 
test file.
URL: https://github.com/apache/beam/pull/8015#issuecomment-471645741
 
 
   Thanks for review & merge, @chamikaramj .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211172)
Time Spent: 1h 20m  (was: 1h 10m)

> Block size difference in avro library on Python3 causes some AvroIO tests to 
> fail.
> --
>
> Key: BEAM-6748
> URL: https://issues.apache.org/jira/browse/BEAM-6748
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *apache_beam.io.avroio_test.TestAvro.test_split_points*
> *apache_beam.io.avroio_test.TestFastAvro.test_split_points*
> fail with:
>  
> {code:java}
> Traceback (most recent call last):
>  File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 308, in test_split_points
>  self.assertEquals(split_points_report[-10:], [(2, 1)] * 10)
> AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 
> chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)]
> First differing element 0:
> (10, 1)
> (2, 1)
> + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), 
> (2, 1)]
> - [(10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1)] {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6185) Upgrade to Spark 2.4.0

2019-03-11 Thread Alexey Romanenko (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789795#comment-16789795
 ] 

Alexey Romanenko commented on BEAM-6185:


Cloudera CDH 6.1.0 is already based on Apache Spark 2.4 upstream version. 
[https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_610_new_features.html#spark_new_features]
Does it make sense to move forward and upgrade Spark version in Beam? 

> Upgrade to Spark 2.4.0
> --
>
> Key: BEAM-6185
> URL: https://issues.apache.org/jira/browse/BEAM-6185
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: triaged
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6804) [beam_PostCommit_Java] [PubsubReadIT.testReadPublicData] Timeout waiting on Sub

2019-03-11 Thread Mikhail Gryzykhin (JIRA)
Mikhail Gryzykhin created BEAM-6804:
---

 Summary: [beam_PostCommit_Java] [PubsubReadIT.testReadPublicData] 
Timeout waiting on Sub
 Key: BEAM-6804
 URL: https://issues.apache.org/jira/browse/BEAM-6804
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Mikhail Gryzykhin
Assignee: Kenneth Knowles


_Use this form to file an issue for test failure:_
 * [Jenkins 
Job|https://builds.apache.org/job/beam_PostCommit_Java/2796/testReport/junit/org.apache.beam.sdk.io.gcp.pubsub/PubsubReadIT/testReadPublicData/]
 * [Gradle Build Scan|https://scans.gradle.com/s/3s4lnjovurqdi]
 * [Test source 
code|https://github.com/apache/beam/blame/b953645ed6db837d24284d7fe1fe091e7309f821/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadIT.java]

Initial investigation:

java.lang.AssertionError: Did not receive signal on 
projects/apache-beam-testing/subscriptions/start-subscription-313044384168895769
 in 300s at 
org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal.pollForResultForDuration(TestPubsubSignal.java:269)
 at 
org.apache.beam.sdk.io.gcp.pubsub.TestPubsubSignal.lambda$waitForStart$0(TestPubsubSignal.java:218)
 at 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Suppliers$MemoizingSupplier.get(Suppliers.java:120)
 at 
org.apache.beam.sdk.io.gcp.pubsub.PubsubReadIT.testReadPublicData(PubsubReadIT.java:54)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)

_After you've filled out the above details, please [assign the issue to an 
individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
 Assignee should [treat test failures as 
high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
 helping to fix the issue or find a more appropriate owner. See [Apache Beam 
Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6748) Block size difference in avro library on Python3 causes some AvroIO tests to fail.

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6748?focusedWorklogId=211144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211144
 ]

ASF GitHub Bot logged work on BEAM-6748:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:15
Start Date: 11/Mar/19 17:15
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8015: 
[BEAM-6748] Account for synchronization interval when estimating amount of 
blocks in generated Avro test file.
URL: https://github.com/apache/beam/pull/8015
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211144)
Time Spent: 1h 10m  (was: 1h)

> Block size difference in avro library on Python3 causes some AvroIO tests to 
> fail.
> --
>
> Key: BEAM-6748
> URL: https://issues.apache.org/jira/browse/BEAM-6748
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *apache_beam.io.avroio_test.TestAvro.test_split_points*
> *apache_beam.io.avroio_test.TestFastAvro.test_split_points*
> fail with:
>  
> {code:java}
> Traceback (most recent call last):
>  File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 308, in test_split_points
>  self.assertEquals(split_points_report[-10:], [(2, 1)] * 10)
> AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 
> chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)]
> First differing element 0:
> (10, 1)
> (2, 1)
> + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), 
> (2, 1)]
> - [(10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1)] {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211156
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:18
Start Date: 11/Mar/19 17:18
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#discussion_r264339885
 
 

 ##
 File path: release/src/main/scripts/build_release_candidate.sh
 ##
 @@ -56,12 +56,19 @@ read USER_GITHUB_ID
 
 USER_REMOTE_URL=g...@github.com:${USER_GITHUB_ID}/beam-site
 
+echo "Listing all GPG keys="
+gpg --list-keys --keyid-format LONG --fingerprint --fingerprint
+echo "Please copy the public key which is associated with your Apache account:"
+
+read SIGNING_KEY
 
 Review comment:
   Sounds good. Do you mind add a JIRA todo comment here to clean this up in 
all scripts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211156)
Time Spent: 3h 40m  (was: 3.5h)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211155
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:18
Start Date: 11/Mar/19 17:18
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#discussion_r264339709
 
 

 ##
 File path: release/src/main/scripts/build_release_candidate.sh
 ##
 @@ -98,7 +105,8 @@ if [[ $confirmation = "y" ]]; then
   echo "2. new rc tag has created in github."
 
   echo "-Staging Java Artifacts into Maven---"
-  ./gradlew publish -PisRelease --no-daemon
+  gpg --local-user ${SIGNING_KEY} --output /dev/null --sign ~/.bashrc
 
 Review comment:
   Fair enough.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211155)
Time Spent: 3.5h  (was: 3h 20m)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211146
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:16
Start Date: 11/Mar/19 17:16
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on issue #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#issuecomment-471635008
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211146)
Time Spent: 3h 10m  (was: 3h)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211147
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:16
Start Date: 11/Mar/19 17:16
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on issue #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#issuecomment-471635196
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211147)
Time Spent: 3h 20m  (was: 3h 10m)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211143
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:14
Start Date: 11/Mar/19 17:14
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on pull request #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#discussion_r264338204
 
 

 ##
 File path: release/src/main/scripts/build_release_candidate.sh
 ##
 @@ -56,12 +56,19 @@ read USER_GITHUB_ID
 
 USER_REMOTE_URL=g...@github.com:${USER_GITHUB_ID}/beam-site
 
+echo "Listing all GPG keys="
+gpg --list-keys --keyid-format LONG --fingerprint --fingerprint
+echo "Please copy the public key which is associated with your Apache account:"
+
+read SIGNING_KEY
 
 Review comment:
   Probably yes.
   
   But we did not check before [1], so I did not bother to implement this. As 
it probably would require to keep some state across scripts. Currently this is 
left to manual release verification.
   
   As I tend to assume that these script need some rework anyway, I restricted 
the scope of this PR to a minimal viable solution to get release enabled on 
gradle5.
   
   [1] There is no check on signing key set in git config against key put into 
KEYS file nor against default key used for signing artefacts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211143)
Time Spent: 3h  (was: 2h 50m)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6803) Do not use conscrypt SSL by default

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6803:
--
Fix Version/s: (was: 2.9.0)
   2.7.1

> Do not use conscrypt SSL by default
> ---
>
> Key: BEAM-6803
> URL: https://issues.apache.org/jira/browse/BEAM-6803
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Blocker
>  Labels: triaged
> Fix For: 2.7.1
>
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6803) LTS backport: Do not use conscrypt SSL by default

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6803:
--
Priority: Major  (was: Blocker)

> LTS backport: Do not use conscrypt SSL by default
> -
>
> Key: BEAM-6803
> URL: https://issues.apache.org/jira/browse/BEAM-6803
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Major
>  Labels: triaged
> Fix For: 2.7.1
>
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6182) Use of conscrypt SSL results in stuck workflows in Dataflow

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-6182.
---
Resolution: Fixed
  Assignee: Ahmet Altay  (was: Tyler Akidau)

> Use of conscrypt SSL results in stuck workflows in Dataflow
> ---
>
> Key: BEAM-6182
> URL: https://issues.apache.org/jira/browse/BEAM-6182
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Blocker
>  Labels: triaged
> Fix For: 2.9.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6726) Gradle Publish fails with Gradle 5

2019-03-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6726?focusedWorklogId=211132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-211132
 ]

ASF GitHub Bot logged work on BEAM-6726:


Author: ASF GitHub Bot
Created on: 11/Mar/19 17:05
Start Date: 11/Mar/19 17:05
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on pull request #8026: [BEAM-6726] 
explicitly specify signing key
URL: https://github.com/apache/beam/pull/8026#discussion_r264334366
 
 

 ##
 File path: release/src/main/scripts/build_release_candidate.sh
 ##
 @@ -98,7 +105,8 @@ if [[ $confirmation = "y" ]]; then
   echo "2. new rc tag has created in github."
 
   echo "-Staging Java Artifacts into Maven---"
-  ./gradlew publish -PisRelease --no-daemon
+  gpg --local-user ${SIGNING_KEY} --output /dev/null --sign ~/.bashrc
 
 Review comment:
   No. It is to ensure the key is unlocked so gpg-agent will just provide 
access to the key without requesting for user input within gradle call. As 
gradle is configured to shell out to gpg cli, streams get broken and no input 
is possible.
   
   And yes, this will break, if .bashrc does not exist. But the same pattern 
was used before [1], so I just reused that.
   
   Of course, we might reconsider that.
   
   [1] 
https://github.com/apache/beam/blob/master/release/src/main/scripts/verify_release_build.sh#L140
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 211132)
Time Spent: 2h 50m  (was: 2h 40m)

> Gradle Publish fails with Gradle 5
> --
>
> Key: BEAM-6726
> URL: https://issues.apache.org/jira/browse/BEAM-6726
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ahmet Altay
>Assignee: Michael Luckey
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> cc: [~alanmyrvold] [~kenn]
> :beam-sdks-java-bom:signMavenJavaPublication task fails with an obscure 
> error: 
> (https://scans.gradle.com/s/mcbb4axlx6agy/failure?openFailures=WzBd=WzFd#top=0):
> Duplicate key pom-default.xml.asc:xml.asc:asc:null (attempted merging values 
> Signature pom-default.xml.asc:xml.asc:asc:null and Signature 
> pom-default.xml.asc:xml.asc:asc:null)
> Downgrading to Gradle 4 by reverting 
> https://github.com/apache/beam/commit/cadb6f7fabc6faedc6037104338306688f17652f
>  works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6802) Re-enable conscrypt SSL as default when possible

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6802:
--
Priority: Major  (was: Blocker)

> Re-enable conscrypt SSL as default when possible
> 
>
> Key: BEAM-6802
> URL: https://issues.apache.org/jira/browse/BEAM-6802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Major
>  Labels: triaged
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6802) Re-enable conscrypt SSL as default when possible

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6802:
--
Fix Version/s: (was: 2.9.0)

> Re-enable conscrypt SSL as default when possible
> 
>
> Key: BEAM-6802
> URL: https://issues.apache.org/jira/browse/BEAM-6802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Blocker
>  Labels: triaged
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6803) LTS backport: Do not use conscrypt SSL by default

2019-03-11 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6803:
--
Summary: LTS backport: Do not use conscrypt SSL by default  (was: Do not 
use conscrypt SSL by default)

> LTS backport: Do not use conscrypt SSL by default
> -
>
> Key: BEAM-6803
> URL: https://issues.apache.org/jira/browse/BEAM-6803
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Blocker
>  Labels: triaged
> Fix For: 2.7.1
>
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6803) Do not use conscrypt SSL by default

2019-03-11 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-6803:
-

 Summary: Do not use conscrypt SSL by default
 Key: BEAM-6803
 URL: https://issues.apache.org/jira/browse/BEAM-6803
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Ahmet Altay
Assignee: Ahmet Altay
 Fix For: 2.9.0


An experimental flag is being added to disable it for now with an option to 
enable it per-workflow.

Also related:
https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6802) Re-enable conscrypt SSL as default when possible

2019-03-11 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-6802:
-

 Summary: Re-enable conscrypt SSL as default when possible
 Key: BEAM-6802
 URL: https://issues.apache.org/jira/browse/BEAM-6802
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Ahmet Altay
Assignee: Ahmet Altay
 Fix For: 2.9.0


An experimental flag is being added to disable it for now with an option to 
enable it per-workflow.

Also related:
https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6182) Use of conscrypt SSL results in stuck workflows in Dataflow

2019-03-11 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789760#comment-16789760
 ] 

Kenneth Knowles commented on BEAM-6182:
---

The blog post mentions it but the auto-generated release notes are missing this 
one. Probably fine. I think I will resolve this to 2.9.0 anyhow and create 
clones for other actions.

> Use of conscrypt SSL results in stuck workflows in Dataflow
> ---
>
> Key: BEAM-6182
> URL: https://issues.apache.org/jira/browse/BEAM-6182
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ahmet Altay
>Assignee: Tyler Akidau
>Priority: Blocker
>  Labels: triaged
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> An experimental flag is being added to disable it for now with an option to 
> enable it per-workflow.
> Also related:
> https://issues.apache.org/jira/browse/BEAM-5747 - Upgrade conscrypt to its 
> latest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >