[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354985
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 06/Dec/19 07:22
Start Date: 06/Dec/19 07:22
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10294: [BEAM-8895] Add 
BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#issuecomment-562463303
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354985)
Time Spent: 1h 40m  (was: 1.5h)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8882?focusedWorklogId=354958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354958
 ]

ASF GitHub Bot logged work on BEAM-8882:


Author: ASF GitHub Bot
Created on: 06/Dec/19 06:31
Start Date: 06/Dec/19 06:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10292: [BEAM-8882] Fully 
populate log messages.
URL: https://github.com/apache/beam/pull/10292#issuecomment-562451084
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354958)
Time Spent: 4h 20m  (was: 4h 10m)

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8906) Long BigQuery dry runs cause avalanche delay

2019-12-05 Thread June Oh (Jira)
June Oh created BEAM-8906:
-

 Summary: Long BigQuery dry runs cause avalanche delay
 Key: BEAM-8906
 URL: https://issues.apache.org/jira/browse/BEAM-8906
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.16.0
 Environment: Google Cloud Platform
Reporter: June Oh


Reproduction Steps:

1. Compose a BigQuery SELECT query that will take over 80 seconds for a dry run.
2. Run the query with Beam SDK's BigQueryIO.
3. Observe the 10+ minute delay before the actual query job is created.

When running readTableRows(), BigQueryIO attempts to estimate the query size by 
performing a dry run, even if withoutValidation() is set. If the request takes 
over 80 seconds (RetryHttpRequestInitializer.HANGING_GET_TIMEOUT_SEC), 
RetryHttpRequestInitializer will time out and retry, up to 9 times 
(BigQueryServicesImpl.MAX_RPC_RETRIES). Hence, once a dry run duration crosses 
the 80 second tipping point, it causes an inevitable avalanche of a 720-second 
delay. Considering the fact that size estimation is not a requirement in 
running the query [1], BigQueryIO should provide a way to circumvent the 
redundant delay, especially in consideration of time-critical enterprise 
workloads.

There can be several ways to address this:
- increasing the timeout threshold (which will still create a tipping point);
- preventing the dry run requests from retrying; or
- adding an option to skip the size estimation within serializeToCloudSource().

[1] 
https://github.com/apache/beam/blob/2ec3b0495c191597c9a88830d25a2c360b3277e0/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/internal/CustomSources.java#L75



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=354941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354941
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 06/Dec/19 05:50
Start Date: 06/Dec/19 05:50
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on issue #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-562442234
 
 
   All of our AWS APIs should have this. But out API just need to call 
`getKinesisClient()`, which means we delegate it to users, and it's up to them 
to how build the client.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354941)
Time Spent: 11h 10m  (was: 11h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7881) Get rid of jackson to avoid the continuous flow of CVEs in Jackson

2019-12-05 Thread Romain Manni-Bucau (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989414#comment-16989414
 ] 

Romain Manni-Bucau commented on BEAM-7881:
--

I will just highlight that the 0day issue was due to the presence of jars, not 
their feature activation and that beam does not own jackson version but the 
runner does. So best beam can do is to decoralate itself from such libs IMHO.

Now if the community does not care, please just close the ticket, this is no 
more a blocker for me.

> Get rid of jackson to avoid the continuous flow of CVEs in Jackson
> --
>
> Key: BEAM-7881
> URL: https://issues.apache.org/jira/browse/BEAM-7881
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: 2.14.0
>Reporter: Romain Manni-Bucau
>Priority: Blocker
>
> Jackson keeps having CVE on all releases of databind and transitively beam 
> sdk java core has CVE on all its releases (for the record, when writing this 
> issue you must use at least jackson-databind 2.9.9.2 but last week it was 
> 2.9.9.1 and 2.14 didn't get the fix).
> Can be neat to get rid of jackson which does not fix this issue for a very 
> long time now and just use JSON-B or another JSON impl to ensure the CVE is 
> not usable because beam is there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=354934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354934
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 06/Dec/19 05:07
Start Date: 06/Dec/19 05:07
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #10285: [BEAM-8815] Define 
the no artifacts retrieval token in proto
URL: https://github.com/apache/beam/pull/10285#issuecomment-562434068
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354934)
Time Spent: 5h 50m  (was: 5h 40m)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
> Fix For: 2.17.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8905) matching Java PCollectionTuple translation naming convention in expansion service with index only

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8905?focusedWorklogId=354932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354932
 ]

ASF GitHub Bot logged work on BEAM-8905:


Author: ASF GitHub Bot
Created on: 06/Dec/19 04:49
Start Date: 06/Dec/19 04:49
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10310: [BEAM-8905] matching 
Java PCollectionTuple translation naming convention in expansion service
URL: https://github.com/apache/beam/pull/10310#issuecomment-562431019
 
 
   Run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354932)
Time Spent: 0.5h  (was: 20m)

> matching Java PCollectionTuple translation naming convention in expansion 
> service with index only
> -
>
> Key: BEAM-8905
> URL: https://issues.apache.org/jira/browse/BEAM-8905
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Python PCollectionTuple is translated to an index-keyed map e.g. 
> \{0->pcollection1, 1->pcollection2}, however Java PCollectionTuple is 
> translated to slightly different formats such as \{output_0->pcollection1, 
> output_1->pcollection2}. We need to match these naming conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-106) Native support for conditional iteration

2019-12-05 Thread Nishant Trivedi (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989373#comment-16989373
 ] 

Nishant Trivedi commented on BEAM-106:
--

Hello,

I'm new to dataflow and the job that I'm trying to write operates on objects 
stored in GCS. The problem is that the objects can be stored at an arbitrary 
depth like {{gs://bucket/a/b/c/1/, gs://bucket/a/b/c/1/, 
gs://bucket/a/b/c/2/, gs://bucket/a/b/c/2/ ...}}. I have to 
collect all objects in a list before I can have the job operate on them. If I 
want to list the objects as a pipeline step I think it would require 
conditional steps and I don't know how to do it. Is there a way to do 
conditional steps? Alternatively is there a native way of doing a directory 
walk in beam. Also attaching a naive version of the {{walk}} below to make my 
use case clear:
{code}
from os import path
from tensorflow.io import gfile

def walker(base_path):
dir_q = gfile.glob(path.join(base_path, "*"))
object_q = []
while len(dir_q) != 0:
current = dir_q.pop(0)
if gfile.isdir(current):
dir_q.extend(gfile.glob(path.join(current, "*")))
else:
object_q.append(current)
return object_q
{code}
I should also mentions that there are a couple of million objects so doing the 
listing in a serial fashion like above is not very efficient.

> Native support for conditional iteration
> 
>
> Key: BEAM-106
> URL: https://issues.apache.org/jira/browse/BEAM-106
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Luke Cwik
>Priority: Major
>
> Ported from: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/50
> There are a variety of use cases which would benefit from native support for 
> conditional iteration.
> For instance, 
> http://stackoverflow.com/questions/31654421/conditional-iterations-in-google-cloud-dataflow/31659923?noredirect=1#comment51264604_31659923
>  asks about being able to write a loop like the following:
> {code}
> PCollection data  = ...
> while(needsMoreWork(data)) {
>   data = doAStep(data)
> }
> {code}
> If there are specific use cases please let us know the details. In the future 
> we will use this issue to post progress updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8816) Load balance bundle processing w/ multiple SDK workers

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8816?focusedWorklogId=354909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354909
 ]

ASF GitHub Bot logged work on BEAM-8816:


Author: ASF GitHub Bot
Created on: 06/Dec/19 03:23
Start Date: 06/Dec/19 03:23
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10313: [BEAM-8816] 
Option to load balance bundle processing w/ multiple SDK workers
URL: https://github.com/apache/beam/pull/10313
 
 
   A new pipeline option that allows for bundles of a given executable stage to 
be processed with any available SDK worker (the default is to process all 
bundles with the same SDK worker). 
   
   
https://lists.apache.org/thread.html/59c02d8b8ea849c158deb39ad9d83af4d8fcb56570501c7fe8f79bb2%40%3Cdev.beam.apache.org%3E
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=354902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354902
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 06/Dec/19 03:02
Start Date: 06/Dec/19 03:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10166: [BEAM-7390] 
Add code snippet for Latest
URL: https://github.com/apache/beam/pull/10166#discussion_r354647568
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/latest.py
 ##
 @@ -0,0 +1,69 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+
+def latest_globally(test=None):
+  # [START latest_globally]
+  import apache_beam as beam
+
+  with beam.Pipeline() as pipeline:
+latest_element = (
+pipeline
+| 'Create produce' >> beam.Create([
 
 Review comment:
   Latest works by picking the Latest based on the timestamp. I do not recall 
with Create does but it may use the same timestamp for its elements, or 
potentially an implementation could do that. Perhaps it would be more explicit 
to use elements with Timestamps, otherwise the output could be arbitrary.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354902)
Time Spent: 6h 50m  (was: 6h 40m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=354891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354891
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:58
Start Date: 06/Dec/19 02:58
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10174: [BEAM-7390] 
Add code snippet for Sample
URL: https://github.com/apache/beam/pull/10174
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354891)
Time Spent: 6h 40m  (was: 6.5h)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=354890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354890
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:57
Start Date: 06/Dec/19 02:57
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10165: [BEAM-7390] Add code 
snippet for GroupIntoBatches
URL: https://github.com/apache/beam/pull/10165#issuecomment-562411199
 
 
   > Hi @chamikaramj and @raheelkhan, this code snippet will be displayed in 
the docs. I have some questions:
   > 
   > * What is the difference between `GroupIntoBatches` and `BatchElements`?
   > * Which is the recommended approach?
   > * I found that `BatchElements` was both more intuitive and flexible since 
it doesn't require a key, should we get rid of the `GroupIntoBatches` example 
in favor of `BatchElements` or is there a scenario where `GroupIntoBatches` 
make more sense?
   
   See the discussion on https://github.com/apache/beam/pull/8914 for 
differences between the two transforms.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354890)
Time Spent: 6.5h  (was: 6h 20m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=354888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354888
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:52
Start Date: 06/Dec/19 02:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10276: [BEAM-7926] 
Data-centric Interactive Part1
URL: https://github.com/apache/beam/pull/10276#discussion_r354645430
 
 

 ##
 File path: sdks/python/apache_beam/utils/interactive_utils.py
 ##
 @@ -0,0 +1,98 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Common interactive utility module.
+
+For experimental usage only; no backwards-compatibility guarantees.
+"""
+from __future__ import absolute_import
+
+import logging
+
+_LOGGER = logging.getLogger(__name__)
+
+
+def is_in_ipython():
+  """Determines if current code is executed within an ipython session."""
+  is_in_ipython = False
+  # Check if the runtime is within an interactive environment, i.e., ipython.
+  try:
+from IPython import get_ipython  # pylint: disable=import-error
+if get_ipython():
+  is_in_ipython = True
+  except ImportError:
+pass  # If dependencies are not available, then not interactive for sure.
+  return is_in_ipython
+
+
+def is_in_notebook():
+  """Determines if current code is executed from an ipython notebook.
+
+  If is_in_notebook() is True, then is_in_ipython() must also be True.
+  """
+  is_in_notebook = False
+  if is_in_ipython():
+# The import and usage must be valid under the execution path.
+from IPython import get_ipython
+if 'IPKernelApp' in get_ipython().config:
+  is_in_notebook = True
+  return is_in_notebook
+
+
+def alter_label_if_interactive(transform, pvalueish):
+  """Alters the label to an interactive label with ipython prompt metadata
+  prefixed for the given transform if the given pvalueish belongs to a
+  user-defined pipeline. Otherwise, noop.
+
+  A label is either a user-defined or auto-generated str name of a PTransform
+  that is unique within a pipeline. If current environment is_in_ipython(), 
Beam
+  can implicitly create interactive labels to replace labels of root 
PTransforms
+  to be applied. The label is formatted as `Cell {prompt}: {original_label}`.
+  """
+  if is_in_ipython():
+from apache_beam.runners.interactive import interactive_environment as ie
+# Tracks user defined pipeline instances in watched scopes so that we only
+# alter labels for any transform to pvalueish belonging to those pipeline
+# instances, excluding any transform to be applied in other pipeline
+# instances the Beam SDK creates implicitly.
+ie.current_env().track_user_pipelines()
+from IPython import get_ipython
+prompt = get_ipython().execution_count
+pipeline = _extract_pipeline_of_pvalueish(pvalueish)
+if not pipeline:
+  _LOGGER.warning('Failed to alter the label of a transform with the '
+  'ipython prompt metadata. Cannot figure out the pipeline 
'
+  'that the given pvalueish %s belongs to. Thus noop.'
+  % pvalueish)
+if (pipeline
+# We only alter for transforms to be applied to user-defined pipelines
+# at pipeline construction time.
+and pipeline in ie.current_env().tracked_user_pipelines):
+  transform.label = 'Cell {}: {}'.format(prompt, transform.label)
+
+
+def _extract_pipeline_of_pvalueish(pvalueish):
+  """Extracts the pipeline that the given pvalueish belongs to."""
 
 Review comment:
   Even if it is unlikely, this might return a pvalue. If that is not expected, 
maybe we can make those cases errors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354888)
Time Spent: 25h 20m  

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=354889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354889
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:52
Start Date: 06/Dec/19 02:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10276: [BEAM-7926] 
Data-centric Interactive Part1
URL: https://github.com/apache/beam/pull/10276#discussion_r354645656
 
 

 ##
 File path: sdks/python/apache_beam/utils/interactive_utils.py
 ##
 @@ -0,0 +1,98 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Common interactive utility module.
+
+For experimental usage only; no backwards-compatibility guarantees.
+"""
+from __future__ import absolute_import
+
+import logging
+
+_LOGGER = logging.getLogger(__name__)
+
+
+def is_in_ipython():
+  """Determines if current code is executed within an ipython session."""
+  is_in_ipython = False
+  # Check if the runtime is within an interactive environment, i.e., ipython.
+  try:
+from IPython import get_ipython  # pylint: disable=import-error
+if get_ipython():
+  is_in_ipython = True
+  except ImportError:
+pass  # If dependencies are not available, then not interactive for sure.
+  return is_in_ipython
+
+
+def is_in_notebook():
+  """Determines if current code is executed from an ipython notebook.
+
+  If is_in_notebook() is True, then is_in_ipython() must also be True.
+  """
+  is_in_notebook = False
+  if is_in_ipython():
+# The import and usage must be valid under the execution path.
+from IPython import get_ipython
+if 'IPKernelApp' in get_ipython().config:
+  is_in_notebook = True
+  return is_in_notebook
+
+
+def alter_label_if_interactive(transform, pvalueish):
 
 Review comment:
   Should we make the code more generic? Unless this runs in ipython 
environment it will not do anything even if the environment is interactive. We 
can either change the code or the name to make them match each other.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354889)
Time Spent: 25.5h  (was: 25h 20m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=354885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354885
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:40
Start Date: 06/Dec/19 02:40
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-562407954
 
 
   R: @robertwb
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354885)
Time Spent: 27.5h  (was: 27h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 27.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989329#comment-16989329
 ] 

Kirill Kozlov commented on BEAM-8896:
-

[~amaliujia], yes, join condition is pushed down to a Project which serve as an 
input to a Join.

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Assignee: Andrew Pilloud
>Priority: Major
>
> The first one of the three following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> // Ok
> String query3 =
> "WITH query AS "
> + "( "
> + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (query.id = tblB.id)";
> Schema transform3 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query3))
> .getSchema();
> System.out.println(transform3);
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query1))
> .getSchema();
> System.out.println(transform1);
> {code}
>  
> The error is:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is 
> invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread 
> "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 
> 'RecordType(VARBINARY id, VARCHAR fA1)' at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
>  
> If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all 
> queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=354880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354880
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:27
Start Date: 06/Dec/19 02:27
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-562405200
 
 
   As discussed in the email, I re-implemented the tests to set three different 
kinds of metrics (counter, distribution and gauge) in a CombineFn and verified 
their values which is available in the result returned by the pipeline.run().
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354880)
Time Spent: 27h 20m  (was: 27h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 27h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7825) Python's DirectRunner emits multiple panes per window and does not discard late data

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7825?focusedWorklogId=354874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354874
 ]

ASF GitHub Bot logged work on BEAM-7825:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:18
Start Date: 06/Dec/19 02:18
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9164: [BEAM-7825] Add 
test showing inconsistent stream processing with DirectRunner
URL: https://github.com/apache/beam/pull/9164#issuecomment-562403410
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354874)
Time Spent: 4h 50m  (was: 4h 40m)

> Python's DirectRunner emits multiple panes per window and does not discard 
> late data
> 
>
> Key: BEAM-7825
> URL: https://issues.apache.org/jira/browse/BEAM-7825
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.13.0
> Environment: OS: Debian rodete.
> Beam versions: 2.15.0.dev.
> Python versions: Python 2.7, Python 3.7
>Reporter: Alexey Strokach
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The documentation for Beam's Windowing and Triggers functionality [states 
> that|https://beam.apache.org/documentation/programming-guide/#triggers] _"if 
> you use Beam’s default windowing configuration and default trigger, Beam 
> outputs the aggregated result when it estimates all data has arrived, and 
> discards all subsequent data for that window"_. However, it seems that the 
> current behavior of Python's DirectRunner is inconsistent with both of those 
> points. As the {{StreamingWordGroupIT.test_discard_late_data}} test shows, 
> DirectRunner appears to process every data point that it reads from the input 
> stream, irrespective of whether or not the timestamp of that data point is 
> older than the timestamps of the windows that have already been processed. 
> Furthermore, as the {{StreamingWordGroupIT.test_single_output_per_window}} 
> test shows, DirectRunner generates multiple "panes" for the same window, 
> apparently disregarding the notion of a watermark?
> The Dataflow runner passes both of those end-to-end tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8861) Disallow self-signed certificates by default in ElasticsearchIO

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?focusedWorklogId=354871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354871
 ]

ASF GitHub Bot logged work on BEAM-8861:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:03
Start Date: 06/Dec/19 02:03
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10296: 
[release-2.18.0][BEAM-8861] Disallow self-signed certificates by default in 
ElasticsearchIO
URL: https://github.com/apache/beam/pull/10296#issuecomment-562400099
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354871)
Time Spent: 50m  (was: 40m)

> Disallow self-signed certificates by default in ElasticsearchIO
> ---
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8861) Disallow self-signed certificates by default in ElasticsearchIO

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?focusedWorklogId=354870=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354870
 ]

ASF GitHub Bot logged work on BEAM-8861:


Author: ASF GitHub Bot
Created on: 06/Dec/19 02:03
Start Date: 06/Dec/19 02:03
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10296: 
[release-2.18.0][BEAM-8861] Disallow self-signed certificates by default in 
ElasticsearchIO
URL: https://github.com/apache/beam/pull/10296#issuecomment-562400043
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354870)
Time Spent: 40m  (was: 0.5h)

> Disallow self-signed certificates by default in ElasticsearchIO
> ---
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=354868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354868
 ]

ASF GitHub Bot logged work on BEAM-3865:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:59
Start Date: 06/Dec/19 01:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10192: [BEAM-3865] 
Stronger trigger tests.
URL: https://github.com/apache/beam/pull/10192#issuecomment-562399025
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354868)
Time Spent: 2h 40m  (was: 2.5h)

> Incorrect timestamp on merging window outputs.
> --
>
> Key: BEAM-3865
> URL: https://issues.apache.org/jira/browse/BEAM-3865
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Looks like we're setting multiple watermark holds with one arbitrarily being 
> held. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=354869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354869
 ]

ASF GitHub Bot logged work on BEAM-3865:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:59
Start Date: 06/Dec/19 01:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10192: [BEAM-3865] 
Stronger trigger tests.
URL: https://github.com/apache/beam/pull/10192#issuecomment-562399156
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354869)
Time Spent: 2h 50m  (was: 2h 40m)

> Incorrect timestamp on merging window outputs.
> --
>
> Key: BEAM-3865
> URL: https://issues.apache.org/jira/browse/BEAM-3865
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Looks like we're setting multiple watermark holds with one arbitrarily being 
> held. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989314#comment-16989314
 ] 

Kyle Weaver commented on BEAM-8835:
---

No, just covered over when we hid FlinkUberJarJobServer behind a flag.

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-8835:
--
Fix Version/s: (was: 2.17.0)
   2.18.0

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989315#comment-16989315
 ] 

Robert Bradshaw commented on BEAM-8901:
---

Ack. I wonder if [~mxm] has any ideas. 

> add experimental flag for reusing flink local environment
> -
>
> Key: BEAM-8901
> URL: https://issues.apache.org/jira/browse/BEAM-8901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Flink job server launches a new mini cluster every time we run the pipeline 
> on Flink local environment. To prevent OOM, we need to reuse existing Flink 
> local environment if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver reassigned BEAM-8835:
-

Assignee: Kyle Weaver

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8835:
--
Status: Open  (was: Triage Needed)

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reopened BEAM-8835:
---

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989310#comment-16989310
 ] 

Robert Bradshaw commented on BEAM-8835:
---

This isn't resolved, is it?

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989309#comment-16989309
 ] 

Rui Wang commented on BEAM-8896:


BTW, BeamSQL does not support executing function call in Join condition. Is the 
join condition pushed down such that it gets evaluated somehow? Otherwise it is 
expected behaviour.

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Assignee: Andrew Pilloud
>Priority: Major
>
> The first one of the three following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> // Ok
> String query3 =
> "WITH query AS "
> + "( "
> + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (query.id = tblB.id)";
> Schema transform3 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query3))
> .getSchema();
> System.out.println(transform3);
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query1))
> .getSchema();
> System.out.println(transform1);
> {code}
>  
> The error is:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is 
> invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread 
> "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 
> 'RecordType(VARBINARY id, VARCHAR fA1)' at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
>  
> If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all 
> queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8846) Force synchronization of the stream observer in BeamFnControlClient

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8846?focusedWorklogId=354864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354864
 ]

ASF GitHub Bot logged work on BEAM-8846:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:46
Start Date: 06/Dec/19 01:46
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10242: [BEAM-8846] 
Force synchronization of the stream observer in BeamFnControlClient
URL: https://github.com/apache/beam/pull/10242#issuecomment-562396144
 
 
   Thanks for the quick review, I have update the PR accordingly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354864)
Time Spent: 2h 20m  (was: 2h 10m)

> Force synchronization of the stream observer in BeamFnControlClient
> ---
>
> Key: BEAM-8846
> URL: https://issues.apache.org/jira/browse/BEAM-8846
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently there is no synchronization to access the stream observer in 
> BeamFnControlClient which is not thread safe. We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989302#comment-16989302
 ] 

Heejong Lee commented on BEAM-8901:
---

I couldn't find any way to reclaim the resources except restarting the running 
JVM. Starting and shutting down the fresh external Flink job server process per 
test is doable but the cost of booting up the JVM wouldn't be that small (and 
the code is uglier too).

> add experimental flag for reusing flink local environment
> -
>
> Key: BEAM-8901
> URL: https://issues.apache.org/jira/browse/BEAM-8901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Flink job server launches a new mini cluster every time we run the pipeline 
> on Flink local environment. To prevent OOM, we need to reuse existing Flink 
> local environment if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=354858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354858
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:32
Start Date: 06/Dec/19 01:32
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #10285: [BEAM-8815] Define 
the no artifacts retrieval token in proto
URL: https://github.com/apache/beam/pull/10285#issuecomment-562393222
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354858)
Time Spent: 5h 40m  (was: 5.5h)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
> Fix For: 2.17.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8886) Add a python mongodbio integration test that triggers load split

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8886?focusedWorklogId=354856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354856
 ]

ASF GitHub Bot logged work on BEAM-8886:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:25
Start Date: 06/Dec/19 01:25
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10289: [BEAM-8886] Add a 
mongodb io dataflow integration test
URL: https://github.com/apache/beam/pull/10289#issuecomment-562391687
 
 
   Run Python MongoDBIO Load Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354856)
Time Spent: 2h  (was: 1h 50m)

> Add a python mongodbio integration test that triggers load split
> 
>
> Key: BEAM-8886
> URL: https://issues.apache.org/jira/browse/BEAM-8886
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Current integration test doesn't seem to trigger liquid sharding at all, we 
> should change integration test that has more load and potentially use the 
> mongodb k8s cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=354850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354850
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:18
Start Date: 06/Dec/19 01:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10273: 
[BEAM-8427] Add MongoDB to SQL documentation
URL: https://github.com/apache/beam/pull/10273#discussion_r354627008
 
 

 ##
 File path: 
website/src/documentation/dsls/sql/extensions/create-external-table.md
 ##
 @@ -308,6 +308,43 @@ Write Mode supports writing to a topic.
 
 Only simple types are supported.
 
+## MongoDB
+
+### Syntax
+
+```
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [, 
tableElement ]*)
+TYPE mongodb
+LOCATION 'mongodb://[HOST]:[PORT]/[DATABASE]/[COLLECTION]'
+```
+*   `LOCATION`: Location of the collection.
+*   `HOST`: Location of the MongoDB server. Can be localhost or an ip 
address.
+ When authentication is required username and password can be specified
+ as follows: `username:password@localhost`.
+*   `PORT`: Port on which MongoDB server is listening.
+*   `DATABASE`: Database to connect to.
+*   `COLLECTION`: Collection within the database.
+
+### Read Mode
+
+Read Mode supports reading from a collection.
+
+### Write Mode
+
+Write Mode supports writing to a collection.
+
+### Schema
+
+Only simple types are supported. MongoDB documents are mapped to Beam SQL 
types via `JsonToRow` transform.
 
 Review comment:
   Actually it looks like there's quite a lot of precedent within `website/src` 
of using `{{site.release_latest}}` for this purpose (some other places use 
`current` instead of a version number which I didn't realize was an option. 
@11moon11 could you link to JsonToRow using that?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354850)
Time Spent: 9h  (was: 8h 50m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=354851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354851
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:18
Start Date: 06/Dec/19 01:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10273: 
[BEAM-8427] Add MongoDB to SQL documentation
URL: https://github.com/apache/beam/pull/10273#discussion_r354627008
 
 

 ##
 File path: 
website/src/documentation/dsls/sql/extensions/create-external-table.md
 ##
 @@ -308,6 +308,43 @@ Write Mode supports writing to a topic.
 
 Only simple types are supported.
 
+## MongoDB
+
+### Syntax
+
+```
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [, 
tableElement ]*)
+TYPE mongodb
+LOCATION 'mongodb://[HOST]:[PORT]/[DATABASE]/[COLLECTION]'
+```
+*   `LOCATION`: Location of the collection.
+*   `HOST`: Location of the MongoDB server. Can be localhost or an ip 
address.
+ When authentication is required username and password can be specified
+ as follows: `username:password@localhost`.
+*   `PORT`: Port on which MongoDB server is listening.
+*   `DATABASE`: Database to connect to.
+*   `COLLECTION`: Collection within the database.
+
+### Read Mode
+
+Read Mode supports reading from a collection.
+
+### Write Mode
+
+Write Mode supports writing to a collection.
+
+### Schema
+
+Only simple types are supported. MongoDB documents are mapped to Beam SQL 
types via `JsonToRow` transform.
 
 Review comment:
   Actually it looks like there's quite a lot of precedent within `website/src` 
of using `{{site.release_latest}}` for this purpose (some other places use 
`current` instead of a version number which I didn't realize was an option). 
@11moon11 could you link to JsonToRow using that?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354851)
Time Spent: 9h 10m  (was: 9h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8904) properly update output pcollections from expanded transforms

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8904?focusedWorklogId=354849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354849
 ]

ASF GitHub Bot logged work on BEAM-8904:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:12
Start Date: 06/Dec/19 01:12
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10309: [BEAM-8904] properly 
update output pcollections from expanded transforms
URL: https://github.com/apache/beam/pull/10309#issuecomment-56235
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354849)
Time Spent: 20m  (was: 10m)

> properly update output pcollections from expanded transforms
> 
>
> Key: BEAM-8904
> URL: https://issues.apache.org/jira/browse/BEAM-8904
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> currently output pcollections from expanded transforms are ignored. we need 
> to properly update output pcollections when it's returned to the caller of 
> expansion service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=354848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354848
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:11
Start Date: 06/Dec/19 01:11
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests 
for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-562388684
 
 
   This PR has dependencies on #10303, #10305, #10307, #10309, #10310
   
   Test should be failed without merging those commits first.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354848)
Time Spent: 8h  (was: 7h 50m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8810) Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8810?focusedWorklogId=354847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354847
 ]

ASF GitHub Bot logged work on BEAM-8810:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:06
Start Date: 06/Dec/19 01:06
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #10311: [BEAM-8810] Detect 
stuck commits in StreamingDataflowWorker
URL: https://github.com/apache/beam/pull/10311#issuecomment-562387486
 
 
   R: @reuvenlax 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354847)
Time Spent: 20m  (was: 10m)

> Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
> ---
>
> Key: BEAM-8810
> URL: https://issues.apache.org/jira/browse/BEAM-8810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In several pipelines using streaming engine and thus the streaming commit 
> rpcs, work became stuck in state COMMITTING indefinitely.  Such stuckness 
> coincided with repeated streaming rpc failures.
> The status page shows that the key has work in state COMMITTING, and has 1 
> queued work item.
> There is a single active commit stream, with 0 pending requests.
> The stream could exist past the stream deadline because the StreamCache only 
> closes stream due to the deadline when a stream is retrieved, which only 
> occurs if there are other commits.  Since the pipeline is stuck due to this 
> event, there are no other commits.
> It seems therefore there is some race on the commitStream between onNewStream 
> and commitWork that either prevents work from being retried, an exception 
> that triggers between when the pending request is removed and the callback is 
> called, or some potential corruption of the activeWork data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8810) Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8810?focusedWorklogId=354846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354846
 ]

ASF GitHub Bot logged work on BEAM-8810:


Author: ASF GitHub Bot
Created on: 06/Dec/19 01:05
Start Date: 06/Dec/19 01:05
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10311: [BEAM-8810] 
Detect stuck commits in StreamingDataflowWorker
URL: https://github.com/apache/beam/pull/10311
 
 
   Detect and abort stuck commits when using streaming rpcs.  
   Additionally clean up some error handling which may have caused stuck 
commits and add logging to aid in debugging if stuck commits reoccur.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8870) beam_PostCommit_Python_VR_Spark is permanently failing

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8870?focusedWorklogId=354842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354842
 ]

ASF GitHub Bot logged work on BEAM-8870:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:58
Start Date: 06/Dec/19 00:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10279: [BEAM-8870] Fix Spark 
Python VR failures.
URL: https://github.com/apache/beam/pull/10279#issuecomment-562385738
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354842)
Time Spent: 0.5h  (was: 20m)

> beam_PostCommit_Python_VR_Spark is permanently failing
> --
>
> Key: BEAM-8870
> URL: https://issues.apache.org/jira/browse/BEAM-8870
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, test-failures
>Reporter: Kenneth Knowles
>Assignee: Kyle Weaver
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/
> Is this a known issue? Should this suite be disabled until it is expected to 
> pass?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8870) beam_PostCommit_Python_VR_Spark is permanently failing

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8870?focusedWorklogId=354843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354843
 ]

ASF GitHub Bot logged work on BEAM-8870:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:58
Start Date: 06/Dec/19 00:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10279: [BEAM-8870] Fix Spark 
Python VR failures.
URL: https://github.com/apache/beam/pull/10279#issuecomment-562385761
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354843)
Time Spent: 40m  (was: 0.5h)

> beam_PostCommit_Python_VR_Spark is permanently failing
> --
>
> Key: BEAM-8870
> URL: https://issues.apache.org/jira/browse/BEAM-8870
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, test-failures
>Reporter: Kenneth Knowles
>Assignee: Kyle Weaver
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/
> Is this a known issue? Should this suite be disabled until it is expected to 
> pass?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8905) matching Java PCollectionTuple translation naming convention in expansion service with index only

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8905?focusedWorklogId=354839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354839
 ]

ASF GitHub Bot logged work on BEAM-8905:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:55
Start Date: 06/Dec/19 00:55
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10310: [BEAM-8905] matching 
Java PCollectionTuple translation naming convention in expansion service
URL: https://github.com/apache/beam/pull/10310#issuecomment-562385060
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354839)
Time Spent: 20m  (was: 10m)

> matching Java PCollectionTuple translation naming convention in expansion 
> service with index only
> -
>
> Key: BEAM-8905
> URL: https://issues.apache.org/jira/browse/BEAM-8905
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Python PCollectionTuple is translated to an index-keyed map e.g. 
> \{0->pcollection1, 1->pcollection2}, however Java PCollectionTuple is 
> translated to slightly different formats such as \{output_0->pcollection1, 
> output_1->pcollection2}. We need to match these naming conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8905) matching Java PCollectionTuple translation naming convention in expansion service with index only

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8905?focusedWorklogId=354838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354838
 ]

ASF GitHub Bot logged work on BEAM-8905:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:55
Start Date: 06/Dec/19 00:55
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10310: [BEAM-8905] 
matching Java PCollectionTuple translation naming convention in expansion 
service
URL: https://github.com/apache/beam/pull/10310
 
 

   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-8905) matching Java PCollectionTuple translation naming convention in expansion service with index only

2019-12-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8905:
--
Summary: matching Java PCollectionTuple translation naming convention in 
expansion service with index only  (was: matching Java PCollectionTuple 
translation naming convention with index only)

> matching Java PCollectionTuple translation naming convention in expansion 
> service with index only
> -
>
> Key: BEAM-8905
> URL: https://issues.apache.org/jira/browse/BEAM-8905
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Python PCollectionTuple is translated to an index-keyed map e.g. 
> \{0->pcollection1, 1->pcollection2}, however Java PCollectionTuple is 
> translated to slightly different formats such as \{output_0->pcollection1, 
> output_1->pcollection2}. We need to match these naming conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8905) matching Java PCollectionTuple translation naming convention with index only

2019-12-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8905:
--
Status: Open  (was: Triage Needed)

> matching Java PCollectionTuple translation naming convention with index only
> 
>
> Key: BEAM-8905
> URL: https://issues.apache.org/jira/browse/BEAM-8905
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Python PCollectionTuple is translated to an index-keyed map e.g. 
> \{0->pcollection1, 1->pcollection2}, however Java PCollectionTuple is 
> translated to slightly different formats such as \{output_0->pcollection1, 
> output_1->pcollection2}. We need to match these naming conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8905) matching Java PCollectionTuple translation naming convention with index only

2019-12-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8905:
-

 Summary: matching Java PCollectionTuple translation naming 
convention with index only
 Key: BEAM-8905
 URL: https://issues.apache.org/jira/browse/BEAM-8905
 Project: Beam
  Issue Type: Improvement
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee


Python PCollectionTuple is translated to an index-keyed map e.g. 
\{0->pcollection1, 1->pcollection2}, however Java PCollectionTuple is 
translated to slightly different formats such as \{output_0->pcollection1, 
output_1->pcollection2}. We need to match these naming conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8904) properly update output pcollections from expanded transforms

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8904?focusedWorklogId=354836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354836
 ]

ASF GitHub Bot logged work on BEAM-8904:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:46
Start Date: 06/Dec/19 00:46
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10309: [BEAM-8904] 
properly update output pcollections from expanded transforms
URL: https://github.com/apache/beam/pull/10309
 
 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-8898) Enable WriteToBigQuery to perform range partitioning

2019-12-05 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-8898:

Status: Open  (was: Triage Needed)

> Enable WriteToBigQuery to perform range partitioning
> 
>
> Key: BEAM-8898
> URL: https://issues.apache.org/jira/browse/BEAM-8898
> Project: Beam
>  Issue Type: New Feature
>  Components: io-py-gcp
>Reporter: Saman Vaisipour
>Priority: Minor
>
> BigQuery team recently [released 
> 1.22.0|https://github.com/googleapis/google-cloud-python/releases/tag/bigquery-1.22.0]
>  which includes the range partitioning feature (here is [an 
> example|https://github.com/googleapis/google-cloud-python/blob/c4a69d44ccea9635b3d9d316b3f545f16538dafe/bigquery/samples/create_table_range_partitioned.py]).
>  
> WriteToBigQuery uses 
> [`additional_bq_parameters`|https://github.com/apache/beam/blob/c1719476b74ec6f68fabea392087607adafc70ef/sdks/python/apache_beam/io/gcp/bigquery.py#L177]
>  to create tables with date partitioning and clustering. It would be great if 
> the same would be possible to create tables with range partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8842) Consistently timing out: BigQueryStreamingInsertTransformIntegrationTests.test_multiple_destinations_transform

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8842?focusedWorklogId=354835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354835
 ]

ASF GitHub Bot logged work on BEAM-8842:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:44
Start Date: 06/Dec/19 00:44
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10299: [BEAM-8842] 
Reactivating BQIO Py test while preventing timeouts.
URL: https://github.com/apache/beam/pull/10299
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354835)
Time Spent: 2h 20m  (was: 2h 10m)

> Consistently timing out: 
> BigQueryStreamingInsertTransformIntegrationTests.test_multiple_destinations_transform
> --
>
> Key: BEAM-8842
> URL: https://issues.apache.org/jira/browse/BEAM-8842
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Pablo Estrada
>Priority: Critical
> Fix For: 2.19.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code}
> 07:43:27 test_multiple_destinations_transform 
> (apache_beam.io.gcp.bigquery_test.BigQueryStreamingInsertTransformIntegrationTests)
>  ... ERROR
> 07:43:27 
> 07:43:27 
> ==
> 07:43:27 ERROR: test suite for  'apache_beam.io.gcp.bigquery_test.BigQueryStreamingInsertTransformIntegrationTests'>
> 07:43:27 
> --
> 07:43:27 Traceback (most recent call last):
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/multiprocess.py",
>  line 812, in run
> 07:43:27 test(orig)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/suite.py",
>  line 178, in __call__
> 07:43:27 return self.run(*arg, **kw)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/multiprocess.py",
>  line 822, in run
> 07:43:27 test.config.plugins.addError(test,err)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/manager.py",
>  line 99, in __call__
> 07:43:27 return self.call(*arg, **kw)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/manager.py",
>  line 167, in simple
> 07:43:27 result = meth(*arg, **kw)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/xunit.py",
>  line 287, in addError
> 07:43:27 tb = format_exception(err, self.encoding)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/pyversion.py",
>  line 214, in format_exception
> 07:43:27 ''.join(traceback.format_exception(*exc_info)),
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 121, in 
> format_exception
> 07:43:27 type(value), value, tb, limit=limit).format(chain=chain))
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 508, in __init__
> 07:43:27 capture_locals=capture_locals)
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 363, in extract
> 07:43:27 f.line
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 285, in line
> 07:43:27 self._line = linecache.getline(self.filename, 
> self.lineno).strip()
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/linecache.py",
>  line 16, in getline
> 07:43:27 lines = getlines(filename, module_globals)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/linecache.py",
>  line 47, in getlines
> 07:43:27 return updatecache(filename, module_globals)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/linecache.py",
>  line 136, in 

[jira] [Resolved] (BEAM-8842) Consistently timing out: BigQueryStreamingInsertTransformIntegrationTests.test_multiple_destinations_transform

2019-12-05 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-8842.
-
Fix Version/s: 2.19.0
   Resolution: Fixed

> Consistently timing out: 
> BigQueryStreamingInsertTransformIntegrationTests.test_multiple_destinations_transform
> --
>
> Key: BEAM-8842
> URL: https://issues.apache.org/jira/browse/BEAM-8842
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Pablo Estrada
>Priority: Critical
> Fix For: 2.19.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code}
> 07:43:27 test_multiple_destinations_transform 
> (apache_beam.io.gcp.bigquery_test.BigQueryStreamingInsertTransformIntegrationTests)
>  ... ERROR
> 07:43:27 
> 07:43:27 
> ==
> 07:43:27 ERROR: test suite for  'apache_beam.io.gcp.bigquery_test.BigQueryStreamingInsertTransformIntegrationTests'>
> 07:43:27 
> --
> 07:43:27 Traceback (most recent call last):
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/multiprocess.py",
>  line 812, in run
> 07:43:27 test(orig)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/suite.py",
>  line 178, in __call__
> 07:43:27 return self.run(*arg, **kw)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/multiprocess.py",
>  line 822, in run
> 07:43:27 test.config.plugins.addError(test,err)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/manager.py",
>  line 99, in __call__
> 07:43:27 return self.call(*arg, **kw)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/manager.py",
>  line 167, in simple
> 07:43:27 result = meth(*arg, **kw)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/plugins/xunit.py",
>  line 287, in addError
> 07:43:27 tb = format_exception(err, self.encoding)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/site-packages/nose/pyversion.py",
>  line 214, in format_exception
> 07:43:27 ''.join(traceback.format_exception(*exc_info)),
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 121, in 
> format_exception
> 07:43:27 type(value), value, tb, limit=limit).format(chain=chain))
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 508, in __init__
> 07:43:27 capture_locals=capture_locals)
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 363, in extract
> 07:43:27 f.line
> 07:43:27   File "/usr/lib/python3.7/traceback.py", line 285, in line
> 07:43:27 self._line = linecache.getline(self.filename, 
> self.lineno).strip()
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/linecache.py",
>  line 16, in getline
> 07:43:27 lines = getlines(filename, module_globals)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/linecache.py",
>  line 47, in getlines
> 07:43:27 return updatecache(filename, module_globals)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/linecache.py",
>  line 136, in updatecache
> 07:43:27 with tokenize.open(fullname) as fp:
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/tokenize.py",
>  line 449, in open
> 07:43:27 encoding, lines = detect_encoding(buffer.readline)
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/tokenize.py",
>  line 418, in detect_encoding
> 07:43:27 first = read_or_stop()
> 07:43:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/-1734967052/lib/python3.7/tokenize.py",
>  line 376, in read_or_stop
> 07:43:27 return readline()
> 07:43:27   File 
> 

[jira] [Work logged] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?focusedWorklogId=354834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354834
 ]

ASF GitHub Bot logged work on BEAM-8835:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:44
Start Date: 06/Dec/19 00:44
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10308: [BEAM-8835] Stage 
artifacts to BEAM-PIPELINE dir in zip
URL: https://github.com/apache/beam/pull/10308#issuecomment-562382685
 
 
   cc @angoenka @tweise 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354834)
Time Spent: 3.5h  (was: 3h 20m)

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?focusedWorklogId=354833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354833
 ]

ASF GitHub Bot logged work on BEAM-8835:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:43
Start Date: 06/Dec/19 00:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10308: [BEAM-8835] 
Stage artifacts to BEAM-PIPELINE dir in zip
URL: https://github.com/apache/beam/pull/10308
 
 
   The problem was the leading slash, which got inserted by the join with the 
empty string (`_root`). Even though we remove it on the Java end:
   
   
https://github.com/apache/beam/blob/6ada564ec2b5519c6b0bd96865c52b02855eeb43/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ClassLoaderArtifactRetrievalService.java#L49-L51
   
   The [ZIP file 
specification](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) 
states: "The path stored MUST not contain a drive or device letter, or a 
leading slash." (FWIW it'd be nice if the zipfile library checked this.) Even 
though the resulting jar looks fine in the GNOME Archive Manager, the Java 
classloader will fail to find resources that were originally written with a 
leading slash in their path.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Updated] (BEAM-8904) properly update output pcollections from expanded transforms

2019-12-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8904:
--
Status: Open  (was: Triage Needed)

> properly update output pcollections from expanded transforms
> 
>
> Key: BEAM-8904
> URL: https://issues.apache.org/jira/browse/BEAM-8904
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> currently output pcollections from expanded transforms are ignored. we need 
> to properly update output pcollections when it's returned to the caller of 
> expansion service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8903) handling --jar_packages experimental flag in PortableRunner for staging external dependencies used in expanded transforms

2019-12-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8903:
--
Status: Open  (was: Triage Needed)

> handling --jar_packages experimental flag in PortableRunner for staging 
> external dependencies used in expanded transforms
> -
>
> Key: BEAM-8903
> URL: https://issues.apache.org/jira/browse/BEAM-8903
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> handling --jar_packages experimental flag in PortableRunner for staging 
> external dependencies used in expanded transforms. could be removed after 
> proper dependency management for cross-language pipelines is implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8904) properly update output pcollections from expanded transforms

2019-12-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8904:
-

 Summary: properly update output pcollections from expanded 
transforms
 Key: BEAM-8904
 URL: https://issues.apache.org/jira/browse/BEAM-8904
 Project: Beam
  Issue Type: Improvement
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee


currently output pcollections from expanded transforms are ignored. we need to 
properly update output pcollections when it's returned to the caller of 
expansion service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354829
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:39
Start Date: 06/Dec/19 00:39
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354615011
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
+
+  def __init__(self, num_elems):
+self.num_elems = num_elems
+self.value = ['a' for _ in range(num_elems)]
+StateBackedTestElementType.element_count += 1
+# Due to using state backed iterable, we expect there is a few instances
+# alive at any given time.
+if StateBackedTestElementType.element_count > 5:
+  raise RuntimeError('Too many live instances.')
+
+  def __del__(self):
+StateBackedTestElementType.element_count -= 1
+
+  def __reduce__(self):
+return (self.__class__, (self.num_elems, ))
 
 Review comment:
   updated.  
   
   With this change, I believe the motivation is to just make the element large 
when serialization happens, thus the actual element no longer need to hold a 
blob of data.  Though I don't full get why the latter is necessarily better for 
the sake of having the test (?). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354829)
Time Spent: 10h 20m  (was: 10h 10m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=354830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354830
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:39
Start Date: 06/Dec/19 00:39
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9953: [BEAM-8335] Adds 
support for multi-output TestStream
URL: https://github.com/apache/beam/pull/9953#issuecomment-555647530
 
 
   ~~This review is blocked on https://github.com/apache/beam/pull/10035, which 
fixes the PAssert equal_to_per_windows which this PR needs to be able to test 
its implementation. Without the fix, the precommit will fail.~~
   Unblocked now that https://github.com/apache/beam/pull/10035 is merged
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354830)
Time Spent: 42.5h  (was: 42h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 42.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8903) handling --jar_packages experimental flag in PortableRunner for staging external dependencies used in expanded transforms

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8903?focusedWorklogId=354828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354828
 ]

ASF GitHub Bot logged work on BEAM-8903:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:35
Start Date: 06/Dec/19 00:35
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10307: [BEAM-8903] handling 
--jar_packages experimental flag in PortableRunner
URL: https://github.com/apache/beam/pull/10307#issuecomment-562380680
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354828)
Time Spent: 20m  (was: 10m)

> handling --jar_packages experimental flag in PortableRunner for staging 
> external dependencies used in expanded transforms
> -
>
> Key: BEAM-8903
> URL: https://issues.apache.org/jira/browse/BEAM-8903
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> handling --jar_packages experimental flag in PortableRunner for staging 
> external dependencies used in expanded transforms. could be removed after 
> proper dependency management for cross-language pipelines is implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354825
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:32
Start Date: 06/Dec/19 00:32
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354616584
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
+
+  def __init__(self, num_elems):
+self.num_elems = num_elems
+self.value = ['a' for _ in range(num_elems)]
+StateBackedTestElementType.element_count += 1
+# Due to using state backed iterable, we expect there is a few instances
+# alive at any given time.
+if StateBackedTestElementType.element_count > 5:
+  raise RuntimeError('Too many live instances.')
+
+  def __del__(self):
+StateBackedTestElementType.element_count -= 1
+
+  def __reduce__(self):
+return (self.__class__, (self.num_elems, ))
+
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+
+  class ElementDoFn(beam.DoFn):
+def process(self, elements):
+  unused_key, ts = elements
+
+  yield sum([item.num_elems for item in ts])
+
+  def create_pipeline(self):
+return beam.Pipeline(
 
 Review comment:
   acked.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354825)
Time Spent: 10h 10m  (was: 10h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8903) handling --jar_packages experimental flag in PortableRunner for staging external dependencies used in expanded transforms

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8903?focusedWorklogId=354824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354824
 ]

ASF GitHub Bot logged work on BEAM-8903:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:32
Start Date: 06/Dec/19 00:32
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10307: [BEAM-8903] 
handling --jar_packages experimental flag in PortableRunner
URL: https://github.com/apache/beam/pull/10307
 
 
   for staging external dependencies used in expanded transforms. 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354823
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:32
Start Date: 06/Dec/19 00:32
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354615011
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
+
+  def __init__(self, num_elems):
+self.num_elems = num_elems
+self.value = ['a' for _ in range(num_elems)]
+StateBackedTestElementType.element_count += 1
+# Due to using state backed iterable, we expect there is a few instances
+# alive at any given time.
+if StateBackedTestElementType.element_count > 5:
+  raise RuntimeError('Too many live instances.')
+
+  def __del__(self):
+StateBackedTestElementType.element_count -= 1
+
+  def __reduce__(self):
+return (self.__class__, (self.num_elems, ))
 
 Review comment:
   updated.  
   
   With this change, I believe  the motivation is to just make the element 
large when serialization happens, thus the actual element now does not hold a 
blob of data.  Thus I don't full get why it is necessarily better for the sake 
of having the test. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354823)
Time Spent: 10h  (was: 9h 50m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=354821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354821
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:32
Start Date: 06/Dec/19 00:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10236: [BEAM-8335] 
Add method to PipelineInstrument to create background caching pipline
URL: https://github.com/apache/beam/pull/10236#discussion_r354608206
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##
 @@ -73,13 +74,20 @@ def __init__(self, pipeline, options=None):
 pipeline.to_runner_api(use_fake_coders=True),
 pipeline.runner,
 options)
+
+self._background_caching_pipeline = beam.pipeline.Pipeline.from_runner_api(
+pipeline.to_runner_api(use_fake_coders=True),
+pipeline.runner,
 
 Review comment:
   Runner here may be the interactive runner. is that what's intended?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354821)
Time Spent: 42h 20m  (was: 42h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 42h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8903) handling --jar_packages experimental flag in PortableRunner for staging external dependencies used in expanded transforms

2019-12-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8903:
-

 Summary: handling --jar_packages experimental flag in 
PortableRunner for staging external dependencies used in expanded transforms
 Key: BEAM-8903
 URL: https://issues.apache.org/jira/browse/BEAM-8903
 Project: Beam
  Issue Type: Improvement
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee


handling --jar_packages experimental flag in PortableRunner for staging 
external dependencies used in expanded transforms. could be removed after 
proper dependency management for cross-language pipelines is implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8882?focusedWorklogId=354820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354820
 ]

ASF GitHub Bot logged work on BEAM-8882:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:27
Start Date: 06/Dec/19 00:27
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10283: [BEAM-8882] Allow 
Dataflow to automatically choose portability.
URL: https://github.com/apache/beam/pull/10283#issuecomment-562378775
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354820)
Time Spent: 4h 10m  (was: 4h)

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8882?focusedWorklogId=354819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354819
 ]

ASF GitHub Bot logged work on BEAM-8882:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:26
Start Date: 06/Dec/19 00:26
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10292: [BEAM-8882] Fully 
populate log messages.
URL: https://github.com/apache/beam/pull/10292#issuecomment-562378648
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354819)
Time Spent: 4h  (was: 3h 50m)

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8882?focusedWorklogId=354818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354818
 ]

ASF GitHub Bot logged work on BEAM-8882:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:26
Start Date: 06/Dec/19 00:26
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10292: [BEAM-8882] Fully 
populate log messages.
URL: https://github.com/apache/beam/pull/10292#issuecomment-562378622
 
 
   Gradle build daemon disappeared unexpectedly (it may have been killed or may 
have crashed)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354818)
Time Spent: 3h 50m  (was: 3h 40m)

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354816
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:25
Start Date: 06/Dec/19 00:25
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354615011
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
+
+  def __init__(self, num_elems):
+self.num_elems = num_elems
+self.value = ['a' for _ in range(num_elems)]
+StateBackedTestElementType.element_count += 1
+# Due to using state backed iterable, we expect there is a few instances
+# alive at any given time.
+if StateBackedTestElementType.element_count > 5:
+  raise RuntimeError('Too many live instances.')
+
+  def __del__(self):
+StateBackedTestElementType.element_count -= 1
+
+  def __reduce__(self):
+return (self.__class__, (self.num_elems, ))
 
 Review comment:
   updated.  I believe the motivation is to just make the element large when 
serialization happens, thus the actual element now does not hold a blob of 
data. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354816)
Time Spent: 9h 50m  (was: 9h 40m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354815
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:24
Start Date: 06/Dec/19 00:24
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354614686
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
+
+  def __init__(self, num_elems):
+self.num_elems = num_elems
+self.value = ['a' for _ in range(num_elems)]
+StateBackedTestElementType.element_count += 1
+# Due to using state backed iterable, we expect there is a few instances
+# alive at any given time.
+if StateBackedTestElementType.element_count > 5:
+  raise RuntimeError('Too many live instances.')
+
+  def __del__(self):
+StateBackedTestElementType.element_count -= 1
+
+  def __reduce__(self):
+return (self.__class__, (self.num_elems, ))
+
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+
+  class ElementDoFn(beam.DoFn):
+def process(self, elements):
+  unused_key, ts = elements
 
 Review comment:
   Ah. This is much better.  Thanks! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354815)
Time Spent: 9h 40m  (was: 9.5h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8886) Add a python mongodbio integration test that triggers load split

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8886?focusedWorklogId=354814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354814
 ]

ASF GitHub Bot logged work on BEAM-8886:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:22
Start Date: 06/Dec/19 00:22
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10289: [BEAM-8886] Add a 
mongodb io dataflow integration test
URL: https://github.com/apache/beam/pull/10289#issuecomment-562377617
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354814)
Time Spent: 1h 50m  (was: 1h 40m)

> Add a python mongodbio integration test that triggers load split
> 
>
> Key: BEAM-8886
> URL: https://issues.apache.org/jira/browse/BEAM-8886
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Current integration test doesn't seem to trigger liquid sharding at all, we 
> should change integration test that has more load and potentially use the 
> mongodb k8s cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8902) parameterize input type of Java external transform

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8902?focusedWorklogId=354813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354813
 ]

ASF GitHub Bot logged work on BEAM-8902:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:19
Start Date: 06/Dec/19 00:19
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10305: [BEAM-8902] 
parameterize input type of Java external transform
URL: https://github.com/apache/beam/pull/10305#issuecomment-562376890
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354813)
Time Spent: 20m  (was: 10m)

> parameterize input type of Java external transform
> --
>
> Key: BEAM-8902
> URL: https://issues.apache.org/jira/browse/BEAM-8902
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> parameterize input type of Java external transform to allow multiple 
> pcollections like PCollectionTuple or PCollectionList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8902) parameterize input type of Java external transform

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8902?focusedWorklogId=354812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354812
 ]

ASF GitHub Bot logged work on BEAM-8902:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:19
Start Date: 06/Dec/19 00:19
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10305: [BEAM-8902] 
parameterize input type of Java external transform
URL: https://github.com/apache/beam/pull/10305
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-7881) Get rid of jackson to avoid the continuous flow of CVEs in Jackson

2019-12-05 Thread Tatu Saloranta (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989267#comment-16989267
 ] 

Tatu Saloranta commented on BEAM-7881:
--

[~romain.manni-bucau] I am sorry but I am not sure I understand the points. But 
the fact is that the stream of CVEs will stop with 2.10, and with default 
settings Jackson does not have vulnerabilities regarding polymorphic typing.  
If user code explicitly enables use of unsafe features that is no different 
from custom code opening  security holes by any other means – if code execution 
is allowed, framework can not do much to try to prevent self-inflicted problems.

> Get rid of jackson to avoid the continuous flow of CVEs in Jackson
> --
>
> Key: BEAM-7881
> URL: https://issues.apache.org/jira/browse/BEAM-7881
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: 2.14.0
>Reporter: Romain Manni-Bucau
>Priority: Blocker
>
> Jackson keeps having CVE on all releases of databind and transitively beam 
> sdk java core has CVE on all its releases (for the record, when writing this 
> issue you must use at least jackson-databind 2.9.9.2 but last week it was 
> 2.9.9.1 and 2.14 didn't get the fix).
> Can be neat to get rid of jackson which does not fix this issue for a very 
> long time now and just use JSON-B or another JSON impl to ensure the CVE is 
> not usable because beam is there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989263#comment-16989263
 ] 

Robert Bradshaw commented on BEAM-8901:
---

Is it not possible to reclaim the resources by shutting it down cleanly? 
(Re-use may still be desirable if it's expensive to bring up and down of 
course, but if it's cheap, making tests hermetic is another benefit.)

> add experimental flag for reusing flink local environment
> -
>
> Key: BEAM-8901
> URL: https://issues.apache.org/jira/browse/BEAM-8901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Flink job server launches a new mini cluster every time we run the pipeline 
> on Flink local environment. To prevent OOM, we need to reuse existing Flink 
> local environment if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=354806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354806
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:04
Start Date: 06/Dec/19 00:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354806)
Time Spent: 9h 40m  (was: 9.5h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=354805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354805
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:04
Start Date: 06/Dec/19 00:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-562373229
 
 
   I fixed the merge conflicts and history and reverted un-intentional changes 
on the other PR, which has been merged. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354805)
Time Spent: 9.5h  (was: 9h 20m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=354803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354803
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:02
Start Date: 06/Dec/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10304: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10304
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354803)
Time Spent: 9h 20m  (was: 9h 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=354801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354801
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 06/Dec/19 00:00
Start Date: 06/Dec/19 00:00
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10304: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10304
 
 
   This fixes up some of the history and reverts unintentional changes from 
#10035. 
   
   https://github.com/rohdesamuel/beam/compare/triggerfix..robertwb:triggerfix
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Created] (BEAM-8902) parameterize input type of Java external transform

2019-12-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8902:
-

 Summary: parameterize input type of Java external transform
 Key: BEAM-8902
 URL: https://issues.apache.org/jira/browse/BEAM-8902
 Project: Beam
  Issue Type: Improvement
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee


parameterize input type of Java external transform to allow multiple 
pcollections like PCollectionTuple or PCollectionList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8902) parameterize input type of Java external transform

2019-12-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8902:
--
Status: Open  (was: Triage Needed)

> parameterize input type of Java external transform
> --
>
> Key: BEAM-8902
> URL: https://issues.apache.org/jira/browse/BEAM-8902
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> parameterize input type of Java external transform to allow multiple 
> pcollections like PCollectionTuple or PCollectionList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=354799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354799
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:56
Start Date: 05/Dec/19 23:56
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r354602340
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/util.py
 ##
 @@ -64,9 +64,9 @@ def __init__(self, encoded_key, window, name, time_domain, 
timestamp):
 self.timestamp = timestamp
 
   def __repr__(self):
-return 'TimerFiring(%r, %r, %s, %s)' % (self.encoded_key,
-self.name, self.time_domain,
-self.timestamp)
+return 'TimerFiring({}, {}, {}, {})'.format(self.encoded_key,
 
 Review comment:
   The use of `repr` for encoded keys was intentional here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354799)
Time Spent: 9h  (was: 8h 50m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8846) Force synchronization of the stream observer in BeamFnControlClient

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8846?focusedWorklogId=354796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354796
 ]

ASF GitHub Bot logged work on BEAM-8846:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:47
Start Date: 05/Dec/19 23:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10242: [BEAM-8846] 
Force synchronization of the stream observer in BeamFnControlClient
URL: https://github.com/apache/beam/pull/10242#discussion_r354605585
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java
 ##
 @@ -28,15 +28,17 @@
 public abstract class OutboundObserverFactory {
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and the default buffer size.
+   * {@link ExecutorService} and the default buffer size. It adds 
synchronization to provide thread
+   * safety of access to the returned observer.
*/
   public static OutboundObserverFactory clientBuffered(ExecutorService 
executorService) {
 return new Buffered(executorService, Buffered.DEFAULT_BUFFER_SIZE);
   }
 
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and buffer size.
+   * {@link ExecutorService} and buffer size. It adds synchronization to 
provide thread safety of
+   * access to the returned observer.
*/
   public static OutboundObserverFactory clientBuffered(
   ExecutorService executorService, int bufferSize) {
 
 Review comment:
   Please also update the `clientDirect()` comment to the simpler version as 
well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354796)
Time Spent: 2h 10m  (was: 2h)

> Force synchronization of the stream observer in BeamFnControlClient
> ---
>
> Key: BEAM-8846
> URL: https://issues.apache.org/jira/browse/BEAM-8846
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently there is no synchronization to access the stream observer in 
> BeamFnControlClient which is not thread safe. We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8846) Force synchronization of the stream observer in BeamFnControlClient

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8846?focusedWorklogId=354794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354794
 ]

ASF GitHub Bot logged work on BEAM-8846:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:47
Start Date: 05/Dec/19 23:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10242: [BEAM-8846] 
Force synchronization of the stream observer in BeamFnControlClient
URL: https://github.com/apache/beam/pull/10242#discussion_r354605420
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java
 ##
 @@ -28,15 +28,17 @@
 public abstract class OutboundObserverFactory {
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and the default buffer size.
+   * {@link ExecutorService} and the default buffer size. It adds 
synchronization to provide thread
+   * safety of access to the returned observer.
*/
   public static OutboundObserverFactory clientBuffered(ExecutorService 
executorService) {
 return new Buffered(executorService, Buffered.DEFAULT_BUFFER_SIZE);
   }
 
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and buffer size.
+   * {@link ExecutorService} and buffer size. It adds synchronization to 
provide thread safety of
+   * access to the returned observer.
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354794)
Time Spent: 2h  (was: 1h 50m)

> Force synchronization of the stream observer in BeamFnControlClient
> ---
>
> Key: BEAM-8846
> URL: https://issues.apache.org/jira/browse/BEAM-8846
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently there is no synchronization to access the stream observer in 
> BeamFnControlClient which is not thread safe. We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8846) Force synchronization of the stream observer in BeamFnControlClient

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8846?focusedWorklogId=354795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354795
 ]

ASF GitHub Bot logged work on BEAM-8846:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:47
Start Date: 05/Dec/19 23:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10242: [BEAM-8846] 
Force synchronization of the stream observer in BeamFnControlClient
URL: https://github.com/apache/beam/pull/10242#discussion_r354605342
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java
 ##
 @@ -28,15 +28,17 @@
 public abstract class OutboundObserverFactory {
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and the default buffer size.
+   * {@link ExecutorService} and the default buffer size. It adds 
synchronization to provide thread
+   * safety of access to the returned observer.
*/
   public static OutboundObserverFactory clientBuffered(ExecutorService 
executorService) {
 return new Buffered(executorService, Buffered.DEFAULT_BUFFER_SIZE);
   }
 
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and buffer size.
+   * {@link ExecutorService} and buffer size. It adds synchronization to 
provide thread safety of
 
 Review comment:
   ```suggestion
  * {@link ExecutorService} and buffer size. All {@link StreamObserver}s
  * created by this factory are thread safe.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354795)
Time Spent: 2h 10m  (was: 2h)

> Force synchronization of the stream observer in BeamFnControlClient
> ---
>
> Key: BEAM-8846
> URL: https://issues.apache.org/jira/browse/BEAM-8846
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently there is no synchronization to access the stream observer in 
> BeamFnControlClient which is not thread safe. We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8846) Force synchronization of the stream observer in BeamFnControlClient

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8846?focusedWorklogId=354792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354792
 ]

ASF GitHub Bot logged work on BEAM-8846:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:47
Start Date: 05/Dec/19 23:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10242: [BEAM-8846] 
Force synchronization of the stream observer in BeamFnControlClient
URL: https://github.com/apache/beam/pull/10242#discussion_r354605283
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java
 ##
 @@ -28,15 +28,17 @@
 public abstract class OutboundObserverFactory {
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and the default buffer size.
+   * {@link ExecutorService} and the default buffer size. It adds 
synchronization to provide thread
+   * safety of access to the returned observer.
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354792)
Time Spent: 1h 40m  (was: 1.5h)

> Force synchronization of the stream observer in BeamFnControlClient
> ---
>
> Key: BEAM-8846
> URL: https://issues.apache.org/jira/browse/BEAM-8846
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently there is no synchronization to access the stream observer in 
> BeamFnControlClient which is not thread safe. We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8901?focusedWorklogId=354791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354791
 ]

ASF GitHub Bot logged work on BEAM-8901:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:47
Start Date: 05/Dec/19 23:47
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10303: [BEAM-8901] add 
experimental flag for reusing flink local environment
URL: https://github.com/apache/beam/pull/10303#issuecomment-562369120
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354791)
Time Spent: 20m  (was: 10m)

> add experimental flag for reusing flink local environment
> -
>
> Key: BEAM-8901
> URL: https://issues.apache.org/jira/browse/BEAM-8901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Flink job server launches a new mini cluster every time we run the pipeline 
> on Flink local environment. To prevent OOM, we need to reuse existing Flink 
> local environment if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8846) Force synchronization of the stream observer in BeamFnControlClient

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8846?focusedWorklogId=354793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354793
 ]

ASF GitHub Bot logged work on BEAM-8846:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:47
Start Date: 05/Dec/19 23:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10242: [BEAM-8846] 
Force synchronization of the stream observer in BeamFnControlClient
URL: https://github.com/apache/beam/pull/10242#discussion_r354605238
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/stream/OutboundObserverFactory.java
 ##
 @@ -28,15 +28,17 @@
 public abstract class OutboundObserverFactory {
   /**
* Create a buffering {@link OutboundObserverFactory} for client-side RPCs 
with the specified
-   * {@link ExecutorService} and the default buffer size.
+   * {@link ExecutorService} and the default buffer size. It adds 
synchronization to provide thread
 
 Review comment:
   ```suggestion
  * {@link ExecutorService} and the default buffer size. All {@link 
StreamObserver}s
  * created by this factory are thread safe.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354793)
Time Spent: 1h 50m  (was: 1h 40m)

> Force synchronization of the stream observer in BeamFnControlClient
> ---
>
> Key: BEAM-8846
> URL: https://issues.apache.org/jira/browse/BEAM-8846
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently there is no synchronization to access the stream observer in 
> BeamFnControlClient which is not thread safe. We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8901?focusedWorklogId=354790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354790
 ]

ASF GitHub Bot logged work on BEAM-8901:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:46
Start Date: 05/Dec/19 23:46
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10303: [BEAM-8901] add 
experimental flag for reusing flink local environment
URL: https://github.com/apache/beam/pull/10303
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354789
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:40
Start Date: 05/Dec/19 23:40
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354603779
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
 
 Review comment:
   fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354789)
Time Spent: 9.5h  (was: 9h 20m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8901:
-

 Summary: add experimental flag for reusing flink local environment
 Key: BEAM-8901
 URL: https://issues.apache.org/jira/browse/BEAM-8901
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Heejong Lee
Assignee: Heejong Lee


Flink job server launches a new mini cluster every time we run the pipeline on 
Flink local environment. To prevent OOM, we need to reuse existing Flink local 
environment if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8901) add experimental flag for reusing flink local environment

2019-12-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8901:
--
Status: Open  (was: Triage Needed)

> add experimental flag for reusing flink local environment
> -
>
> Key: BEAM-8901
> URL: https://issues.apache.org/jira/browse/BEAM-8901
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Flink job server launches a new mini cluster every time we run the pipeline 
> on Flink local environment. To prevent OOM, we need to reuse existing Flink 
> local environment if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?focusedWorklogId=354783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354783
 ]

ASF GitHub Bot logged work on BEAM-8424:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:31
Start Date: 05/Dec/19 23:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10295: [BEAM-8424] Revert 
"[BEAM-4287] Add trySplit API to Java restriction tracker matc…
URL: https://github.com/apache/beam/pull/10295#issuecomment-562365175
 
 
   If the forward fix doesn't work, feel free to re-open the revert.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354783)
Time Spent: 1h 50m  (was: 1h 40m)

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=354785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354785
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:31
Start Date: 05/Dec/19 23:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r354600989
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -608,7 +606,7 @@ def start_bundle(self):
 
   def process_timer(self, timer_firing):
 if timer_firing.name not in self.user_timer_map:
-  _LOGGER.warning('Unknown timer fired: %s', timer_firing)
+  logging.warning('Unknown timer fired: %s', timer_firing)
 
 Review comment:
   Looks like this was reverted again. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354785)
Time Spent: 8h 50m  (was: 8h 40m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?focusedWorklogId=354784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354784
 ]

ASF GitHub Bot logged work on BEAM-8424:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:31
Start Date: 05/Dec/19 23:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10295: [BEAM-8424] 
Revert "[BEAM-4287] Add trySplit API to Java restriction tracker matc…
URL: https://github.com/apache/beam/pull/10295
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354784)
Time Spent: 2h  (was: 1h 50m)

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4287) SplittableDoFn: splitAtFraction() API for Java

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4287?focusedWorklogId=354779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354779
 ]

ASF GitHub Bot logged work on BEAM-4287:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:27
Start Date: 05/Dec/19 23:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10302: [BEAM-4287] Fix to 
use the residual instead of the current restriction on process continuations.
URL: https://github.com/apache/beam/pull/10302#issuecomment-562364234
 
 
   R: @ehudm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354779)
Time Spent: 2.5h  (was: 2h 20m)

> SplittableDoFn: splitAtFraction() API for Java
> --
>
> Key: BEAM-4287
> URL: https://issues.apache.org/jira/browse/BEAM-4287
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> SDF currently only has a checkpoint() API. This Jira is about adding the 
> splitAtFraction() API and its support in runners that support the respective 
> feature for sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4287) SplittableDoFn: splitAtFraction() API for Java

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4287?focusedWorklogId=354778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354778
 ]

ASF GitHub Bot logged work on BEAM-4287:


Author: ASF GitHub Bot
Created on: 05/Dec/19 23:27
Start Date: 05/Dec/19 23:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10302: [BEAM-4287] 
Fix to use the residual instead of the current restriction on process 
continuations.
URL: https://github.com/apache/beam/pull/10302
 
 
   This is a forward fix for https://github.com/apache/beam/pull/10258 to fix 
Dataflow VR postcommit suite.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Commented] (BEAM-8613) Add environment variable support to Docker environment

2019-12-05 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989247#comment-16989247
 ] 

Robert Bradshaw commented on BEAM-8613:
---

What kind of environment variables are you trying to pass here? Is there not 
another way to pass this data to the operations being performed in this 
container? I'm worried about building too much into these (unstructured) string 
fields and having to support this long term once we finally clean up our 
environment specs. 

> Add environment variable support to Docker environment
> --
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution, runner-core, runner-direct
>Reporter: Nathan Rusch
>Priority: Trivial
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8900) Python SDK should stage only up-to-date versions of pipeline dependencies defined by requirements file.

2019-12-05 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8900:
--
Labels: starter  (was: )

> Python SDK should stage only up-to-date versions of pipeline dependencies 
> defined by requirements file.
> ---
>
> Key: BEAM-8900
> URL: https://issues.apache.org/jira/browse/BEAM-8900
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>  Labels: starter
>
> When staging pipeline dependencies specified in requirements.txt file and 
> transitive deps, we stage entire content of requirements-cache directory on 
> user's machine, which may include dependencies that were cached a while ago 
> and are no longer needed. See [1] for discussion and some ideas how we can 
> avoid staging packages we don't need.
> [1] 
> https://lists.apache.org/thread.html/f99820489026a46160505ea9a4c8c4a0eabe8649143c1ee22b7088aa%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8900) Python SDK should stage only up-to-date versions of pipeline dependencies defined by requirements file.

2019-12-05 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8900:
--
Status: Open  (was: Triage Needed)

> Python SDK should stage only up-to-date versions of pipeline dependencies 
> defined by requirements file.
> ---
>
> Key: BEAM-8900
> URL: https://issues.apache.org/jira/browse/BEAM-8900
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>
> When staging pipeline dependencies specified in requirements.txt file and 
> transitive deps, we stage entire content of requirements-cache directory on 
> user's machine, which may include dependencies that were cached a while ago 
> and are no longer needed. See [1] for discussion and some ideas how we can 
> avoid staging packages we don't need.
> [1] 
> https://lists.apache.org/thread.html/f99820489026a46160505ea9a4c8c4a0eabe8649143c1ee22b7088aa%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4032) Support staging binary distributions of dependency packages.

2019-12-05 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989243#comment-16989243
 ] 

Valentyn Tymofieiev commented on BEAM-4032:
---

https://issues.apache.org/jira/browse/BEAM-8900 is an inefficiency in 
dependency staging that can be addressed when we are working on this issue. 

> Support staging binary distributions of dependency packages.
> 
>
> Key: BEAM-4032
> URL: https://issues.apache.org/jira/browse/BEAM-4032
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> requirements.txt only supports source-distribution dependencies [1].
> --extra_packages does not officially support wheel files [2].
> It is possible to expand this to support binary distributions as long as we 
> have the knowledge of the target platform.
> We should take into consideration the mechanisms of staging dependencies 
> through portability framework, and perhaps consolidate some of the existing 
> options.
> [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L260]
> [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L188]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >