[jira] [Commented] (BEAM-7978) ArithmeticExceptions on getting backlog bytes

2019-08-15 Thread Mateusz (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908730#comment-16908730
 ] 

Mateusz commented on BEAM-7978:
---

Hello [~chamikara], I guess all the runners that are executing method 
getTotalBacklogBytes() on reader are affected. In case of Dataflow Reader, it 
results in constant recreation of reader, so the pipeline can't progress 
correctly.

Adding also [~ajo.thomas24] who contributed changes to watermark calculations.

> ArithmeticExceptions on getting backlog bytes 
> --
>
> Key: BEAM-7978
> URL: https://issues.apache.org/jira/browse/BEAM-7978
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.14.0
>Reporter: Mateusz
>Priority: Major
>
> Hello,
> Beam 2.14.0
>  (and to be more precise 
> [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec])
>  introduced a change in watermark calculation in Kinesis IO causing below 
> error:
> {code:java}
> exception:  "java.lang.RuntimeException: Unknown kinesis failure, when trying 
> to reach kinesis
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155)
>   at 
> org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArithmeticException: Value cannot fit in an int: 
> 153748963401
>   at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229)
>   at 
> org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141)
>   at 
> org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72)
>   at org.joda.time.Minutes.minutesBetween(Minutes.java:101)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210)
>   ... 10 more
> {code}
> We spotted this issue on Dataflow runner. It's problematic as inability to 
> get backlog bytes seems to result in constant recreation of KinesisReader.
> The issue happens if the backlog bytes are retrieved before watermark value 
> is updated from initial default value. Easy way to reproduce it is to create 
> a pipeline with Kinesis source for a stream where no records are being put. 
> While debugging it locally, you can observe that the watermark is set to the 
> value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes 
> (default watermark idle duration threshold is set to 2 minutes) , the 
> watermark is set to value of 
> [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]),
>  so the next backlog bytes retrieval should be correct. However, as described 
> before, running the pipeline on Dataflow runner results in KinesisReader 
> being closed just after creation, so the watermark won't be fixed.
> The reason of the issue is following: The introduced watermark policies are 
> relying on 
> [WatermarkParameters|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkParameters.java]
>  which initialises currentWatermark and eventTime to 
> [BoundedWindow.TIMESTAMP_MIN_VALUE|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ecR52].
>  This result in watermark being set to new Instant(-9223372036854775L) at the 
> KinesisReader creation. Calculated [period between the watermark and the 
> current 
> 

[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=296082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296082
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 05:34
Start Date: 16/Aug/19 05:34
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9336: [BEAM-7987] 
Drop empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296082)
Time Spent: 1h 20m  (was: 1h 10m)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7073) AvroUtils converting generic record to Beam Row causes class cast exception

2019-08-15 Thread Vishwas (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908696#comment-16908696
 ] 

Vishwas commented on BEAM-7073:
---

[~ryanskraba] This issue is already fixed as part of below PR

[https://github.com/apache/beam/pull/8376]

This Jira can be closed.

> AvroUtils converting generic record to Beam Row causes class cast exception
> ---
>
> Key: BEAM-7073
> URL: https://issues.apache.org/jira/browse/BEAM-7073
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
> Environment: Direct Runner
>Reporter: Vishwas
>Assignee: Ryan Skraba
>Priority: Major
>
> Below is my pipeline:
> KafkaSource (KafkaIo.read) ---> Pardo ---> BeamSql--> 
> KafkaSink (KafkaIO.write)
> Kafka Source IO reads from Kafka topic avro records and deserializes it to 
> generic record using below
> KafkaIO.Read kafkaIoRead = KafkaIO. GenericRecord>read()
>     .withBootstrapServers(bootstrapServerUrl)
>     .withTopic(topicName)
>     .withKeyDeserializer(StringDeserializer.class)
>     .withValueDeserializerAndCoder(GenericAvroDeserializer.class,
>                                                  
> AvroCoder.of(GenericRecord.class, avroSchema))   
>     
> .updateConsumerProperties(ImmutableMap.of("schema.registry.url",
>   
>    schemaRegistryUrl));
> Avro schema of the topic has a logicaltype (timestamp-millis). This is 
> deserialized to
> joda-time.
>   {
>             "name": "timeOfRelease",
>         "type": [
>                 "null",
>                 {
>                     "type": "long",
>                     "logicalType": "timestamp-millis",
>                     "connect.version": 1,
>                     "connect.name": "org.apache.kafka.connect.data.Timestamp"
>                 }
>             ],
>             "default": null,
>        }
> Now in my Pardo transform, I am trying to use the AvroUtils class methods to 
> convert the generic record to Beam Row and getting below class cast exception
>  AvroUtils.toBeamRowStrict(genericRecord, this.beamSchema)
> Caused by: java.lang.ClassCastException: org.joda.time.DateTime cannot be 
> cast to java.lang.Long
>     at 
> org.apache.beam.sdk.schemas.utils.AvroUtils.convertAvroFieldStrict(AvroUtils.java:664)
>     at 
> org.apache.beam.sdk.schemas.utils.AvroUtils.toBeamRowStrict(AvroUtils.java:217)
>  
> This looks like a bug as joda time type created as part of deserialization is 
> being type casted to Long in below code.
>   else if (logicalType instanceof LogicalTypes.TimestampMillis) {
>           return convertDateTimeStrict((Long) value, fieldType);
>   }
> PS: I also used the avro-tools 1.8.2 jar to get the classes for the mentioned 
> avro schema and I see that the attribute with timestamp-millis logical type 
> is being converted to joda-time.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=296061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296061
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 04:09
Start Date: 16/Aug/19 04:09
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9336: [BEAM-7987] 
Drop empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#discussion_r314577246
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindowingWindmillReader.java
 ##
 @@ -116,31 +117,54 @@
 final WorkItem workItem = context.getWork();
 KeyedWorkItem keyedWorkItem =
 new WindmillKeyedWorkItem<>(key, workItem, windowCoder, windowsCoder, 
valueCoder);
+final boolean isEmptyWorkItem =
+(Iterables.isEmpty(keyedWorkItem.timersIterable())
+&& Iterables.isEmpty(keyedWorkItem.elementsIterable()));
 final WindowedValue> value = new 
ValueInEmptyWindows<>(keyedWorkItem);
 
-return new NativeReaderIterator>>() {
-  private WindowedValue> current;
-
-  @Override
-  public boolean start() throws IOException {
-current = value;
-return true;
-  }
+// Return a non-op iterator when current workitem is an empty workitem.
 
 Review comment:
   Done. Thanks : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296061)
Time Spent: 1h 10m  (was: 1h)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=296060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296060
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 04:08
Start Date: 16/Aug/19 04:08
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #9336: [BEAM-7987] Drop 
empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#issuecomment-521876861
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296060)
Time Spent: 1h  (was: 50m)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=296042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296042
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 16/Aug/19 03:41
Start Date: 16/Aug/19 03:41
Worklog Time Spent: 10m 
  Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-521873132
 
 
   @pabloem You're anything but annoying, and I really appreciate the guidance; 
will get to setting up the `TestPipeline` next.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296042)
Time Spent: 36.5h  (was: 36h 20m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 36.5h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-15 Thread sridhar Reddy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908680#comment-16908680
 ] 

sridhar Reddy commented on BEAM-7049:
-

Here is the pull request. As I mentioned in the description the implementation 
is still in very early stages and it will break 2-way union test. 

[https://github.com/apache/beam/pull/9358]

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=296035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296035
 ]

ASF GitHub Bot logged work on BEAM-7049:


Author: ASF GitHub Bot
Created on: 16/Aug/19 03:31
Start Date: 16/Aug/19 03:31
Worklog Time Spent: 10m 
  Work Description: sridharinuog commented on pull request #9358: 
(WIP-BEAM-7049)Changes made to make a simple case of threeway union work
URL: https://github.com/apache/beam/pull/9358
 
 
   Code changes to implement 3 way union work. This is still in very early 
stages and may break existing 2 way union. More changes to come. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?focusedWorklogId=295999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295999
 ]

ASF GitHub Bot logged work on BEAM-7989:


Author: ASF GitHub Bot
Created on: 16/Aug/19 02:56
Start Date: 16/Aug/19 02:56
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #9357: [BEAM-7989] 
Remove side inputs from CacheVisitor calculation.
URL: https://github.com/apache/beam/pull/9357#issuecomment-521866366
 
 
   R: @iemejia @dmvk 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295999)
Time Spent: 20m  (was: 10m)

> SparkRunner CacheVisitor counts PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs

2019-08-15 Thread Kyle Winkelman (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908661#comment-16908661
 ] 

Kyle Winkelman commented on BEAM-7989:
--

I first noticed this when I had a pipeline that I expected to not cache, but it 
had one random cache that was only used in one place (so there was no reason 
for it to be cached). I eventually noticed that PCollection was used to create 
the PCollectionView and that PCollectionView was unwrapped in the 
TransformHierarchy.Node constructor via transform.getAdditionalInputs().

> SparkRunner CacheVisitor counts PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs

2019-08-15 Thread Kyle Winkelman (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Winkelman updated BEAM-7989:
-
Summary: SparkRunner CacheVisitor counts PCollections from SideInputs  
(was: Spark Runner Caches PCollections from SideInputs)

> SparkRunner CacheVisitor counts PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=295968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295968
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 02:36
Start Date: 16/Aug/19 02:36
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9336: [BEAM-7987] Drop 
empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#issuecomment-521863262
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295968)
Time Spent: 50m  (was: 40m)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=295966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295966
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 02:36
Start Date: 16/Aug/19 02:36
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9336: [BEAM-7987] Drop 
empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#issuecomment-521863224
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295966)
Time Spent: 40m  (was: 0.5h)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=295963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295963
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 02:35
Start Date: 16/Aug/19 02:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9336: [BEAM-7987] 
Drop empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#discussion_r314565268
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindowingWindmillReader.java
 ##
 @@ -116,31 +117,54 @@
 final WorkItem workItem = context.getWork();
 KeyedWorkItem keyedWorkItem =
 new WindmillKeyedWorkItem<>(key, workItem, windowCoder, windowsCoder, 
valueCoder);
+final boolean isEmptyWorkItem =
+(Iterables.isEmpty(keyedWorkItem.timersIterable())
+&& Iterables.isEmpty(keyedWorkItem.elementsIterable()));
 final WindowedValue> value = new 
ValueInEmptyWindows<>(keyedWorkItem);
 
-return new NativeReaderIterator>>() {
-  private WindowedValue> current;
-
-  @Override
-  public boolean start() throws IOException {
-current = value;
-return true;
-  }
+// Return a non-op iterator when current workitem is an empty workitem.
 
 Review comment:
   ```suggestion
   // Return a noop iterator when current workitem is an empty workitem.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295963)
Time Spent: 0.5h  (was: 20m)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7989) Spark Runner Caches PCollections from SideInputs

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?focusedWorklogId=295948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295948
 ]

ASF GitHub Bot logged work on BEAM-7989:


Author: ASF GitHub Bot
Created on: 16/Aug/19 02:16
Start Date: 16/Aug/19 02:16
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on pull request #9357: 
[BEAM-7989] Remove side inputs from CacheVisitor calculation.
URL: https://github.com/apache/beam/pull/9357
 
 
   The sideInput PCollectionViews of a DoFn cause SparkRunner.CacheVisitor to 
increment the cacheCandidate for the PCollection that underlies the 
PCollectionView. The SparkRunner does not access those underlying PCollections 
when performing a DoFn, they are only accessed once in the 
View.CreatePCollectionView Transform.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-7989) Spark Runner Caches PCollections from SideInputs

2019-08-15 Thread Kyle Winkelman (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Winkelman updated BEAM-7989:
-
Status: Open  (was: Triage Needed)

> Spark Runner Caches PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7989) Spark Runner Caches PCollections from SideInputs

2019-08-15 Thread Kyle Winkelman (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Winkelman reassigned BEAM-7989:


Assignee: Kyle Winkelman

> Spark Runner Caches PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=295943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295943
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:58
Start Date: 16/Aug/19 01:58
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #9336: [BEAM-7987] Drop 
empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#issuecomment-521856845
 
 
   Run Dataflow PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295943)
Time Spent: 20m  (was: 10m)

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7989) Spark Runner Caches PCollections from SideInputs

2019-08-15 Thread Kyle Winkelman (JIRA)
Kyle Winkelman created BEAM-7989:


 Summary: Spark Runner Caches PCollections from SideInputs
 Key: BEAM-7989
 URL: https://issues.apache.org/jira/browse/BEAM-7989
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Affects Versions: 2.14.0
Reporter: Kyle Winkelman


The SparkRunner's CacheVisitor looks at all inputs for a 
TransformHierarchy.Node. Those inputs include the PCollections from the 
PCollectionViews that are supplied as sideInputs.

The SparkRunner should not count these instances of sideInputs as the 
PCollections are not actually accessed. They are only accessed when the 
CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?focusedWorklogId=295934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295934
 ]

ASF GitHub Bot logged work on BEAM-7987:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:52
Start Date: 16/Aug/19 01:52
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #9336: [BEAM-7987] Drop 
empty Windmill workitem in WindowingWindmillReader
URL: https://github.com/apache/beam/pull/9336#issuecomment-521855824
 
 
   Run Dataflow PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295934)
Time Spent: 10m
Remaining Estimate: 0h

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=295917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295917
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:13
Start Date: 16/Aug/19 01:13
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-521849441
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295917)
Time Spent: 10.5h  (was: 10h 20m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=295918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295918
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:13
Start Date: 16/Aug/19 01:13
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-521849465
 
 
   run python 3.7 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295918)
Time Spent: 10h 40m  (was: 10.5h)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=295915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295915
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:12
Start Date: 16/Aug/19 01:12
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9083: [BEAM-7495] 
Improve the test that compares EXPORT and DIRECT_READ
URL: https://github.com/apache/beam/pull/9083#issuecomment-521849286
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295915)
Time Spent: 13h  (was: 12h 50m)
Remaining Estimate: 491h  (was: 491h 10m)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 13h
>  Remaining Estimate: 491h
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7986?focusedWorklogId=295914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295914
 ]

ASF GitHub Bot logged work on BEAM-7986:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:12
Start Date: 16/Aug/19 01:12
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9356: [BEAM-7986] Upgrade 
grpcio
URL: https://github.com/apache/beam/pull/9356#issuecomment-521849218
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295914)
Time Spent: 20m  (was: 10m)

> Increase minimum grpcio required version
> 
>
> Key: BEAM-7986
> URL: https://issues.apache.org/jira/browse/BEAM-7986
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> According to this question, 1.11.0 is not new enough (1.22.0 reportedly 
> works), and we list the minimum as 1.8.
> https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498
> Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7986?focusedWorklogId=295913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295913
 ]

ASF GitHub Bot logged work on BEAM-7986:


Author: ASF GitHub Bot
Created on: 16/Aug/19 01:11
Start Date: 16/Aug/19 01:11
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9356: [BEAM-7986] 
Upgrade grpcio
URL: https://github.com/apache/beam/pull/9356
 
 
   #8708 introduced a call to close the gRPC channel, which was introduced in 
1.12.
   
   Tested manually using (once with grpcio 1.11 and once with 1.12.1):
   ```
   python setup.py nosetests --tests 
apache_beam.io.gcp.pubsub_integration_test:PubSubIntegrationTest.test_streaming_data_only
 --test-pipeline-options='--runner=TestDirectRunner --project=foo'
   ```
   
   Remove comment about protobuf version since grpcio no longer depends on
   it.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-15 Thread Kyle Weaver (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908583#comment-16908583
 ] 

Kyle Weaver commented on BEAM-7924:
---

I just ran into a similar issue. Do either of you know why setting 
save_main_session causes an error? Is there a different workaround so we can 
enable this setting in Flink without causing an error?

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7988) Error should include runner name when runner is invalid

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7988?focusedWorklogId=295897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295897
 ]

ASF GitHub Bot logged work on BEAM-7988:


Author: ASF GitHub Bot
Created on: 16/Aug/19 00:27
Start Date: 16/Aug/19 00:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9355: [BEAM-7988] py: 
include runner name when runner is invalid
URL: https://github.com/apache/beam/pull/9355
 
 
   Just a little added information to make debugging a tiny bit easier.
   
   R: @angoenka 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-7988) Error should include runner name when runner is invalid

2019-08-15 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7988:
-

 Summary: Error should include runner name when runner is invalid
 Key: BEAM-7988
 URL: https://issues.apache.org/jira/browse/BEAM-7988
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Kyle Weaver
Assignee: Kyle Weaver


"TypeError: Runner must be a PipelineRunner object or the name of a registered 
runner." would be a more helpful error message if it included the runner that 
was found to be incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-15 Thread Boyuan Zhang (JIRA)
Boyuan Zhang created BEAM-7987:
--

 Summary: WindowingWindmillReader should not start to read from an 
empty windmill workitem
 Key: BEAM-7987
 URL: https://issues.apache.org/jira/browse/BEAM-7987
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Boyuan Zhang
Assignee: Boyuan Zhang


WindowingWindmillReader should expect that a windmill workitem has either 
timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908569#comment-16908569
 ] 

Udi Meiri edited comment on BEAM-7986 at 8/16/19 12:19 AM:
---

Added in 1.12: https://github.com/grpc/grpc/releases/tag/v1.12.0


was (Author: udim):
Added in 1.12: 
https://github.com/grpc/grpc/pull/15254/commits/bccd32dafa1ed60e745c958a55960cf75c56d7d2

> Increase minimum grpcio required version
> 
>
> Key: BEAM-7986
> URL: https://issues.apache.org/jira/browse/BEAM-7986
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> According to this question, 1.11.0 is not new enough (1.22.0 reportedly 
> works), and we list the minimum as 1.8.
> https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498
> Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908565#comment-16908565
 ] 

Udi Meiri commented on BEAM-7986:
-

Will upgrade dep to the oldest version that has the abstract method 
Channel.close().

> Increase minimum grpcio required version
> 
>
> Key: BEAM-7986
> URL: https://issues.apache.org/jira/browse/BEAM-7986
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> According to this question, 1.11.0 is not new enough (1.22.0 reportedly 
> works), and we list the minimum as 1.8.
> https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498
> Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-7986:
---

Assignee: Udi Meiri

> Increase minimum grpcio required version
> 
>
> Key: BEAM-7986
> URL: https://issues.apache.org/jira/browse/BEAM-7986
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> According to this question, 1.11.0 is not new enough (1.22.0 reportedly 
> works), and we list the minimum as 1.8.
> https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498
> Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=295877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295877
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 15/Aug/19 23:25
Start Date: 15/Aug/19 23:25
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9351: [BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-521831467
 
 
   R: @aaltay 
   Hi, Ahmet, I created a PR, can we review it and make sure I am on a correct 
track?
   Thanks,
   Hannah
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295877)
Time Spent: 1h 40m  (was: 1.5h)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295869
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 23:21
Start Date: 15/Aug/19 23:21
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9261: [BEAM-7389] Add code 
examples for Partition page
URL: https://github.com/apache/beam/pull/9261#issuecomment-521830573
 
 
   LGTM. Please ping me once you get approval from @rosetn .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295869)
Time Spent: 48h 40m  (was: 48.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 48h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-7909) Write integration tests to test customized containers

2019-08-15 Thread Hannah Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7909 started by Hannah Jiang.
--
> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=295857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295857
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 15/Aug/19 23:07
Start Date: 15/Aug/19 23:07
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9351: [WIP][BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-521794846
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295857)
Time Spent: 1h 20m  (was: 1h 10m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=295858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295858
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 15/Aug/19 23:07
Start Date: 15/Aug/19 23:07
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9351: [WIP][BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-521822580
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295858)
Time Spent: 1.5h  (was: 1h 20m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295851
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 22:59
Start Date: 15/Aug/19 22:59
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9261: 
[BEAM-7389] Add code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r314530915
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,133 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In machine learning, it is a common task to split data into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model and 20% is used for 
testing.
+
+In this example, we split a `PCollection` dataset into training and testing 
datasets.
+We define `split_dataset`, which takes the `plant` element, `num_partitions`,
+and an additional argument `ratio`.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
 
 Review comment:
   Done!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295851)
Time Spent: 48.5h  (was: 48h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 48.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=295847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295847
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 15/Aug/19 22:55
Start Date: 15/Aug/19 22:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-521825485
 
 
   Hope I'm not being annoying - I'm just passing pointers to where you can 
find the errors : )
   
   See:
   There's some py3 compatibility checks:
   
https://scans.gradle.com/s/nzwat46hx6qh6/console-log?task=:sdks:python:test-suites:tox:py2:lintPy27_3#L30
   
   And a couple other lint issues:
   
https://scans.gradle.com/s/nzwat46hx6qh6/console-log?task=:sdks:python:test-suites:tox:py2:lintPy27#L51
   
   The tests finally arrived at the point of running the bigtableio test, and 
it's failing with this info:
   
https://scans.gradle.com/s/nzwat46hx6qh6/console-log?task=:sdks:python:test-suites:tox:py35:testPy35Gcp#L7074
   
   We're getting pretty close.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295847)
Time Spent: 36h 20m  (was: 36h 10m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 36h 20m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=295811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295811
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 15/Aug/19 22:41
Start Date: 15/Aug/19 22:41
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9351: [WIP][BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-521822580
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295811)
Time Spent: 1h 10m  (was: 1h)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7528) Save correctly Python Load Tests metrics according to it's namespace

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7528?focusedWorklogId=295809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295809
 ]

ASF GitHub Bot logged work on BEAM-7528:


Author: ASF GitHub Bot
Created on: 15/Aug/19 22:29
Start Date: 15/Aug/19 22:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8941: [BEAM-7528] 
Save load test metrics according to distribution name
URL: https://github.com/apache/beam/pull/8941#discussion_r314523476
 
 

 ##
 File path: 
sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
 ##
 @@ -75,17 +73,114 @@
 ]
 
 
-def get_element_by_schema(schema_name, insert_list):
-  for element in insert_list:
-if element['label'] == schema_name:
-  return element['value']
+def parse_step(step_name):
+  """Replaces white spaces and removes 'Step:' label
+
+  Args:
+step_name(str): step name passed in metric ParDo
+
+  Returns:
+lower case step name without namespace and step label
+  """
+  return step_name.lower().replace(' ', '_').strip('step:_')
+
+
+def split_metrics_by_namespace_and_name(metrics, namespace, name):
+  """Splits metrics list namespace and name.
+
+  Args:
+metrics: list of metrics from pipeline result
+namespace(str): filter metrics by namespace
+name(str): filter metrics by name
+
+  Returns:
+two lists - one of metrics which are matching filters
+and second of not matching
+  """
+  matching_metrics = []
+  not_matching_metrics = []
+  for dist in metrics:
+if dist.key.metric.namespace == namespace\
+and dist.key.metric.name == name:
+  matching_metrics.append(dist)
+else:
+  not_matching_metrics.append(dist)
+  return matching_metrics, not_matching_metrics
+
+
+def get_generic_distributions(generic_dists, metric_id):
+  """Creates flatten list of distributions per its value type.
+  A generic distribution is the one which is not processed but saved in
+  the most raw version.
+
+  Args:
+generic_dists: list of distributions to be saved
+metric_id(uuid): id of the current test run
+
+  Returns:
+list of dictionaries made from :class:`DistributionMetric`
+  """
+  return sum(
+  (get_all_distributions_by_type(dist, metric_id)
+   for dist in generic_dists),
+  []
+  )
+
+
+def get_all_distributions_by_type(dist, metric_id):
+  """Creates new list of objects with type of each distribution
+  metric value.
+
+  Args:
+dist(object): DistributionMetric object to be parsed
+metric_id(uuid): id of the current test run
+  Returns:
+list of :class:`DistributionMetric` objects
+  """
+  submit_timestamp = time.time()
+  dist_types = ['mean', 'max', 'min', 'sum']
+  return [
+  get_distribution_dict(dist_type, submit_timestamp,
+dist, metric_id)
+  for dist_type in dist_types
+  ]
+
+
+def get_distribution_dict(metric_type, submit_timestamp, dist, metric_id):
+  """Function creates :class:`DistributionMetric`
+
+  Args:
+metric_type(str): type of value from distribution metric which will
+  be saved (ex. max, min, mean, sum)
+submit_timestamp: timestamp when metric is saved
+dist(object) distribution object from pipeline result
+metric_id(uuid): id of the current test run
+
+  Returns:
+dictionary prepared for saving according to schema
+  """
+  return DistributionMetric(dist, submit_timestamp, metric_id,
 
 Review comment:
   We create a generic distribution here, right? Should we tag them as generic 
somehow? Maybe it's just the fact that it has a different name thatn the 
distributions we do care about. Just wondering.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295809)
Time Spent: 7h 50m  (was: 7h 40m)

> Save correctly Python Load Tests metrics according to it's namespace
> 
>
> Key: BEAM-7528
> URL: https://issues.apache.org/jira/browse/BEAM-7528
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Load test framework considers all distribution metrics defined in a pipeline 
> as a `runtime` metric (which is defined by the loadtest framework), while 
> only  `runtime` distribution metric should be considered as runtime.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295808
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 22:14
Start Date: 15/Aug/19 22:14
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r314520053
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,133 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In machine learning, it is a common task to split data into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model and 20% is used for 
testing.
+
+In this example, we split a `PCollection` dataset into training and testing 
datasets.
+We define `split_dataset`, which takes the `plant` element, `num_partitions`,
+and an additional argument `ratio`.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
 
 Review comment:
   "the ratio how"->"the ratio of how "
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295808)
Time Spent: 48h 20m  (was: 48h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 48h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA

[jira] [Updated] (BEAM-7953) Move LoadTests gradle tasks to test-suites gradle files

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7953:
---
Status: Open  (was: Triage Needed)

> Move LoadTests gradle tasks to test-suites gradle files
> ---
>
> Key: BEAM-7953
> URL: https://issues.apache.org/jira/browse/BEAM-7953
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7957) Warn at job submit time if a step is named with a / or empty in DataflowRunner

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7957:
---
Status: Open  (was: Triage Needed)

> Warn at job submit time if a step is named with a / or empty in DataflowRunner
> --
>
> Key: BEAM-7957
> URL: https://issues.apache.org/jira/browse/BEAM-7957
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: David Yan
>Priority: Major
>
> When a job with an empty step name or a step name that has a "/" in it, it 
> quietly breaks the job graph in the Dataflow UI. We should at least warn the 
> user at job submit time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7955) Dynamic Writer - combining computed shards' number for late events with window's

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7955:
---
Status: Open  (was: Triage Needed)

> Dynamic Writer - combining computed shards' number for late events with 
> window's
> 
>
> Key: BEAM-7955
> URL: https://issues.apache.org/jira/browse/BEAM-7955
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.10.0
>Reporter: Mariusz Rebandel
>Priority: Minor
>
> Runner attempts to combine shards' numbers computed for the window and 
> following panes with late events even if the window's accumulation mode is 
> set to DISCARDING_FIRED_PANES. This results in an exception thrown by 
> SingletonCombineFn.
> Steps to recreate this behaviour:
>  - create dynamic writer with `withSharding()` option
>  - send stream of messages to Dataflow job via PubSub
>  - retain *some* messages
>  - let the rest of the messages flow to the job, until the watermark reaches 
> the window's end
>  - release retained messages
> In case all PubSub traffic is halted and released after window's end, Beam 
> won't try to merge them. This only happens, if just a part of messages come 
> as late events.
> Stacktrace:
> {code:java}
> java.lang.IllegalArgumentException: PCollection with more than one element 
> accessed as a singleton view. Consider using Combine.globally().asSingleton() 
> to combine the PCollection into a single value
> 
> org.apache.beam.sdk.transforms.View$SingletonCombineFn.apply(View.java:358)
> 
> org.apache.beam.sdk.transforms.Combine$BinaryCombineFn.addInput(Combine.java:448)
> 
> org.apache.beam.sdk.transforms.Combine$BinaryCombineFn.addInput(Combine.java:429)
> 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.add(WindmillStateInternals.java:925)
> 
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SystemReduceFn.processValue(SystemReduceFn.java:115)
> 
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:608)
> 
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
> 
> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
> 
> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
> 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
> 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
> 
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
> 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134)
> 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
> 
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
> 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
> 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
> 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:76)
> 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1233)
> 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:144)
> 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:972)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Sharding implementation:
> {code:java}
> class RecordCountSharding[T](recordsPerShard: Int) extends 
> PTransform[PCollection[T], PCollectionView[java.lang.Integer]] {
>   import RecordCountSharding._
>   override def expand(input: PCollection[T]): 
> PCollectionView[java.lang.Integer] = {
> val count = input.apply(
>   Combine.globally(Count.combineFn[T]()).withoutDefaults()
> 

[jira] [Updated] (BEAM-7914) Add python 3 test in crossLanguageValidateRunner task

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7914:
---
Status: Open  (was: Triage Needed)

> Add python 3 test in crossLanguageValidateRunner task
> -
>
> Key: BEAM-7914
> URL: https://issues.apache.org/jira/browse/BEAM-7914
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Chamikara Jayalath
>Priority: Major
>  Labels: portability
>
> add python 3 test in crossLanguageValidateRunner task



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7853) PubsubIO and FixedWindows

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7853:
---
Status: Open  (was: Triage Needed)

> PubsubIO and FixedWindows
> -
>
> Key: BEAM-7853
> URL: https://issues.apache.org/jira/browse/BEAM-7853
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Gregory Parsons
>Priority: Major
>
> Hi all,
> I am having a potential issue with Windowing on cloud PubsubIO.
> I am finding that FixedWindows do not trigger on either DirectRunner or 
> DataflowRunner after running a GroupBy transform.
> A basic pipeline with my use case would look like:
>  
> {code:java}
> Pipeline p = Pipeline.create();
> PubsubIO.Read read = PubsubIO
> .readMessagesWithAttributes()
> .withTimestampAttribute("time")
> .fromTopic("test-topic");
> Window window = Window.into(
> FixedWindows.of(Duration.standardSeconds(10L))
> )
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.standardSeconds(10L))
> .discardingFiredPanes();
> PCollection>> windowedMessages = p
> .apply("Read Events", read)
> .apply("Apply Window", window)
> .apply("Convert to KV", ParDo.of(new ConvertToMapOnKey()))
> .apply("Group by key", GroupByKey.create())
> .apply("Log Pairs", ParDo.of(new LogGroupedEvents()));{code}
>  
> LogGroupedEvents would log the key as a string, and the array of 
> PubsubMessages in the grouped array. But this function never runs correctly.
> For simplicity I have simplified the pipeline to demonstrate the issue and 
> have removed the actual use case of the pipeline. Therefore it may seem odd 
> that I am grouping and logging simple messages but that is actually not what 
> I am doing.
>  
> If I swap the windowing function for one with triggers it works correctly.
> {code:java}
> Window getDefaultWindow(Long duration) {
> return Window.into(new GlobalWindows())
> .triggering(Repeatedly.forever(
> AfterProcessingTime
> .pastFirstElementInPane()
> .plusDelayOf(Duration.standardSeconds(duration)
> )
> ))
> .withAllowedLateness(Duration.standardSeconds(10L))
> .discardingFiredPanes()
> ;
> }
> {code}
>  
> This could be due to me not understanding windowing and triggers but 
> according the documentation and many examples online all that people use is a 
> simple FixedWindow because it needs to automatically run a trigger at the end 
> of the window per the beam docs:
>  
> [https://beam.apache.org/documentation/programming-guide/#setting-your-pcollections-windowing-function]
> On example 7.3.1.
>  
> I have been researching as much as I can about how windowing works 
> internally. We arrived to our solution with triggering by looking at source 
> code.
>  
> Let me know if there is any other information you need from me to help look 
> into this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7950) Remove the Python 3 warning as it has already been supported

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7950:
---
Status: Open  (was: Triage Needed)

> Remove the Python 3 warning as it has already been supported
> 
>
> Key: BEAM-7950
> URL: https://issues.apache.org/jira/browse/BEAM-7950
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> There are warnings that Python 3 is not fully supported in Beam 
> (beam/sdks/python/setup.py). As mentioned in the ML, we should remove the 
> Python 3 warning as it has already been supported as an effort of 
> https://issues.apache.org/jira/browse/BEAM-1251.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7951:
---
Status: Open  (was: Triage Needed)

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7947) Improves the interfaces of classes such as FnDataService, BundleProcessor, ActiveBundle, etc to change the parameter type from WindowedValue to T

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7947:
---
Status: Open  (was: Triage Needed)

> Improves the interfaces of classes such as FnDataService, BundleProcessor, 
> ActiveBundle, etc to change the parameter type from WindowedValue to T
> 
>
> Key: BEAM-7947
> URL: https://issues.apache.org/jira/browse/BEAM-7947
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Both `Coder>` and `FnDataReceiver>` use 
> `WindowedValue` as the data structure that both sides of Runner and SDK 
> Harness know each other. Control Plane/Data Plane/State Plane/Logging is a 
> highly abstraction, such as Control Plane and Logging, these are common 
> requirements for all multi-language platforms. For example, the Flink 
> community is also discussing how to support Python UDF, as well as how to 
> deal with docker environment. how to data transfer, how to state access, how 
> to logging etc. If Beam can further abstract these service interfaces, i.e., 
> interface definitions are compatible with multiple engines, and finally 
> provided to other projects in the form of class libraries, it definitely will 
> help other platforms that want to support multiple languages. Here I am to 
> throw out a minnow to catch a whale, take the FnDataService#receive interface 
> as an example, and turn `WindowedValue` into `T` so that other platforms 
> can be extended arbitrarily, as follows:
> {code}
>  InboundDataClient receive(LogicalEndpoint inputLocation, Coder coder, 
> FnDataReceiver> listener);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7952) Make the input queue of the input buffer in Python SDK Harness size limited.

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7952:
---
Status: Open  (was: Triage Needed)

> Make the input queue of the input buffer in Python SDK Harness size limited.
> 
>
> Key: BEAM-7952
> URL: https://issues.apache.org/jira/browse/BEAM-7952
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> At Python SDK harness, the input queue size of the input buffer in Python SDK 
> Harness is not size limited and also not configurable. This may become a 
> problem if the data production rate is more than the data consumption rate.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7945:
---
Status: Open  (was: Triage Needed)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7948:
---
Status: Open  (was: Triage Needed)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7949) Add time-based cache threshold support in the data service of the Python SDK harness

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7949:
---
Status: Open  (was: Triage Needed)

> Add time-based cache threshold support in the data service of the Python SDK 
> harness
> 
>
> Key: BEAM-7949
> URL: https://issues.apache.org/jira/browse/BEAM-7949
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Currently only size-based cache threshold is supported in the data service of 
> Python SDK harness. It should also support the time-based cache threshold. 
> This is very important, especially for streaming jobs which are sensitive to 
> the delay. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6442) Incomplete JobService API Semantics

2019-08-15 Thread Sam Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde reassigned BEAM-6442:
---

Assignee: (was: Sam Rohde)

> Incomplete JobService API Semantics
> ---
>
> Key: BEAM-6442
> URL: https://issues.apache.org/jira/browse/BEAM-6442
> Project: Beam
>  Issue Type: Test
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Sam Rohde
>Priority: Major
>
> The JobService API (beam_job_api.proto) allows for the possibility of never 
> seeing messages or states with Get(State|Message)Stream. This is because the  
> Get(State|Message)Stream calls need to have the job id which can only be 
> obtained from the RunJobResponse. But in order to see all messages/states the 
> streams need to be opened before the job starts.
> This is fine in Dataflow as the preparation_id == job_id, but this is not 
> true in Flink.
> Fix is to modify the API to only keep a single id to be used between the 
> preparation/run APIs. 
> Consumers of the API will have to be modified to meet the new semantics.
> Dev list thread 
> (https://lists.apache.org/thread.html/3ace7585278c0545185fa4bb8d6975283d5c48c097e1bb2c2e18b9a2@%3Cdev.beam.apache.org%3E)
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7961:
---
Status: Open  (was: Triage Needed)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7940) beam_Release_Python_NightlySnapshot is broken due to directory not exist

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7940:
---
Status: Open  (was: Triage Needed)

> beam_Release_Python_NightlySnapshot is broken due to directory not exist
> 
>
> Key: BEAM-7940
> URL: https://issues.apache.org/jira/browse/BEAM-7940
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, test-failures
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> :sdks:python:depSnapshot broke beam_Release_Python_NightlySnapshot due to:
> {code}
> sh: 1: cannot create 
> /home/jenkins/jenkins-slave/workspace/beam_Release_Python_NightlySnapshot/src/sdks/python/build/requirements.txt:
>  Directory nonexistent
> {code}
> This is affected by https://github.com/apache/beam/pull/9277. Directory 
> `sdks/python/build` no longer exist when writes to 
> sdks/python/build/requirements.txt. We can create an empty file first to fix 
> this problem and do cleanup at same time if old file exists.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6442) Incomplete JobService API Semantics

2019-08-15 Thread Sam Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908507#comment-16908507
 ] 

Sam Rohde commented on BEAM-6442:
-

I unfortunately don't have time to work on this.

> Incomplete JobService API Semantics
> ---
>
> Key: BEAM-6442
> URL: https://issues.apache.org/jira/browse/BEAM-6442
> Project: Beam
>  Issue Type: Test
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Sam Rohde
>Priority: Major
>
> The JobService API (beam_job_api.proto) allows for the possibility of never 
> seeing messages or states with Get(State|Message)Stream. This is because the  
> Get(State|Message)Stream calls need to have the job id which can only be 
> obtained from the RunJobResponse. But in order to see all messages/states the 
> streams need to be opened before the job starts.
> This is fine in Dataflow as the preparation_id == job_id, but this is not 
> true in Flink.
> Fix is to modify the API to only keep a single id to be used between the 
> preparation/run APIs. 
> Consumers of the API will have to be modified to meet the new semantics.
> Dev list thread 
> (https://lists.apache.org/thread.html/3ace7585278c0545185fa4bb8d6975283d5c48c097e1bb2c2e18b9a2@%3Cdev.beam.apache.org%3E)
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7963) Unnesting with large schema causes error

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7963:
---
Status: Open  (was: Triage Needed)

> Unnesting with large schema causes error
> 
>
> Key: BEAM-7963
> URL: https://issues.apache.org/jira/browse/BEAM-7963
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: Sahith Nallapareddy
>Priority: Major
>
> query: select id from table a, UNNEST(a.nested_field) this causes the 
> following error if table a has a relatively large schema, especially with 
> either Row> or Array> (nested repeated records or 
> records with repeated records)
>  
>  
> {noformat}
> Aug 13, 2019 10:17:01 AM 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> INFO: SQL:
> SELECT `a`.`artist_gid`
> FROM `beam`.`endpoint1` AS `a`,
> UNNEST(`a`.`genre`.`genres`) AS `genres`
> Aug 13, 2019 10:17:01 AM 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(artist_gid=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{87}])
> LogicalProject(artist_gid=[$0], artist_uri=[$1], date=[$2], id=[$3.id], 
> gid=[$3.gid], name=[$3.name], redirect=[$3.redirect], 
> fuzzyname=[$3.fuzzyname], inserted=[$3.inserted], echo_nest_artists=[$4], 
> id10=[$5.id], similars=[$5.similars], gid12=[$6.gid], version=[$6.version], 
> vector=[$6.vector], value=[$7.value], gid16=[$7.gid], domains=[$7.domains], 
> playlist_adds=[$8], uri=[$9.uri], gid20=[$9.gid], region=[$9.region], 
> popularity_raw=[$9.popularity_raw], 
> popularity_normalized=[$9.popularity_normalized], percentile=[$9.percentile], 
> rank=[$9.rank], popularity_regional=[$10], gid27=[$11.gid], 
> artist_name=[$11.artist_name], bios=[$11.bios], 
> ancestor_artists=[$11.ancestor_artists], 
> descendant_artists=[$11.descendant_artists], 
> asserted_similars=[$11.asserted_similars], tags=[$11.tags], 
> genres=[$11.genres], members=[$11.members], members_past=[$11.members_past], 
> meanings=[$11.meanings], country=[$11.country], 
> voted_descriptions=[$11.voted_descriptions], years_active=[$11.years_active], 
> amazon_urls=[$11.amazon_urls], itunes_urls=[$11.itunes_urls], 
> lastfm_urls=[$11.lastfm_urls], facebook_urls=[$11.facebook_urls], 
> urbandictionary_urls=[$11.urbandictionary_urls], 
> wikipedia_urls=[$11.wikipedia_urls], 
> twitter_screennames=[$11.twitter_screennames], categories=[$11.categories], 
> childrens=[$11.category.childrens], classical=[$11.category.classical], 
> curated=[$11.category.curated], deceptive=[$11.category.deceptive], 
> generic=[$11.category.generic], inactive=[$11.category.inactive], 
> karaoke=[$11.category.karaoke], non_artist=[$11.category.non_artist], 
> soundalike=[$11.category.soundalike], unpreferred=[$11.category.unpreferred], 
> offensive=[$11.category.offensive], 
> do_not_recommend=[$11.category.do_not_recommend], 
> do_not_support=[$11.category.do_not_support], 
> pass_on_programming=[$11.category.pass_on_programming], 
> deceased=[$11.category.deceased], edited_terms=[$11.edited_terms], 
> edited_text_terms=[$11.edited_text_terms], 
> free_text_terms=[$11.free_text_terms], display_terms=[$11.display_terms], 
> extra=[$11.extra], force_curated_sims=[$11.sims_curation.force_curated_sims], 
> curated_sims_uris=[$11.sims_curation.curated_sims_uris], 
> blacklisted_sims_uris=[$11.sims_curation.blacklisted_sims_uris], 
> display_bios=[$11.display_bios], discogs_uri=[$11.discogs_uri], 
> musicbrainz_uri=[$11.musicbrainz_uri], rovi_music_uri=[$11.rovi_music_uri], 
> blocked_display_bio_providers=[$11.blocked_display_bio_providers], 
> portrait=[$11.portrait], hidden_portraits=[$11.hidden_portraits], 
> primary_portrait=[$11.primary_portrait], imdb_urls=[$11.imdb_urls], 
> instagram_screennames=[$11.instagram_screennames], 
> myspace_urls=[$11.myspace_urls], tumblr_screennames=[$11.tumblr_screennames], 
> youtube_urls=[$11.youtube_urls], gid85=[$12.gid], 
> acousticVector=[$12.acousticVector], gid87=[$13.gid], genres88=[$13.genres], 
> extended_genres=[$13.extended_genres], currency=[$14])
>   BeamIOSourceRel(table=[[beam, endpoint1]])
> Uncollect
>   LogicalProject(genres=[$cor0.genres_88])
> LogicalValues(tuples=[[{ 0 }]])
> org.apache.beam.sdk.extensions.sql.impl.SqlConversionException: Unable to 
> convert query select artist_gid from endpoint1 a, UNNEST(a.genre.genres) as 
> genres
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:170)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:124)
>   at 
> 

[jira] [Commented] (BEAM-7968) UDF case insensitive

2019-08-15 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908505#comment-16908505
 ] 

Ismaël Mejía commented on BEAM-7968:


Assigned to you Andrew because you probably have already the answer to this 
question.

> UDF case insensitive
> 
>
> Key: BEAM-7968
> URL: https://issues.apache.org/jira/browse/BEAM-7968
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Yang Zhang
>Assignee: Andrew Pilloud
>Priority: Major
>
> Currently, in Beam SQL, UDF is case sensitive. Is there a plan to make UDF 
> case insensitive?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7968) UDF case insensitive

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7968:
--

Assignee: Andrew Pilloud  (was: Aizhamal Nurmamat kyzy)

> UDF case insensitive
> 
>
> Key: BEAM-7968
> URL: https://issues.apache.org/jira/browse/BEAM-7968
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Yang Zhang
>Assignee: Andrew Pilloud
>Priority: Major
>
> Currently, in Beam SQL, UDF is case sensitive. Is there a plan to make UDF 
> case insensitive?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7968) UDF case insensitive

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7968:
---
Status: Open  (was: Triage Needed)

> UDF case insensitive
> 
>
> Key: BEAM-7968
> URL: https://issues.apache.org/jira/browse/BEAM-7968
> Project: Beam
>  Issue Type: Bug
>  Components: beam-events
>Reporter: Yang Zhang
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>
> Currently, in Beam SQL, UDF is case sensitive. Is there a plan to make UDF 
> case insensitive?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7806) org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubJsonIT.testSelectsPayloadContent failed

2019-08-15 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-7806.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubJsonIT.testSelectsPayloadContent
>  failed
> -
>
> Key: BEAM-7806
> URL: https://issues.apache.org/jira/browse/BEAM-7806
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Kenneth Knowles
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> First failure: https://builds.apache.org/job/beam_PostCommit_SQL/2135/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7968) UDF case insensitive

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7968:
---
Component/s: (was: beam-events)
 dsl-sql

> UDF case insensitive
> 
>
> Key: BEAM-7968
> URL: https://issues.apache.org/jira/browse/BEAM-7968
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Yang Zhang
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>
> Currently, in Beam SQL, UDF is case sensitive. Is there a plan to make UDF 
> case insensitive?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7976) Documentation for word count example is unclear about inputFile vs pom.xml

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7976:
---
Status: Open  (was: Triage Needed)

> Documentation for word count example is unclear about inputFile vs pom.xml
> --
>
> Key: BEAM-7976
> URL: https://issues.apache.org/jira/browse/BEAM-7976
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Affects Versions: 2.14.0
>Reporter: niklas Hansson
>Assignee: niklas Hansson
>Priority: Minor
>
> Currently the documentation for the word count example mentions the input 
> file and give the following example:
> ```Java
> $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
> ``
> . As far as I believe the pom.xml is not supposed to be the input file in 
> this case. I will create a PR  if someone could confirm that my assumption is 
> correct :) 
> /Niklas



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7974) Make RowCoder package-private

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7974:
---
Status: Open  (was: Triage Needed)

> Make RowCoder package-private
> -
>
> Key: BEAM-7974
> URL: https://issues.apache.org/jira/browse/BEAM-7974
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>
> RowCoder is currently public in sdk.coders, tempting people to use it 
> directly. But the Schemas API is written such that everyone should be using 
> SchemaCoder, and RowCoder should be an implementation detail.
> Unfortunately this isn't a trivial change, I tried to do it and resolve the 
> few dependencies that cropped up, but running RowCoderTest yielded the 
> following error:
> {code:java}
> tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
> java.lang.IllegalAccessError: tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:159)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:54)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.encode(CoderProperties.java:334)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.decodeEncode(CoderProperties.java:362)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqualInContext(CoderProperties.java:104)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:94)
> {code}
> My attempt is available at 
> https://github.com/TheNeuralBit/beam/commit/869b8c6ba2f554bf56d8df70a754b76ef38dbc89



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7971) Pycharm debugger for apache_beam/*_test.py broken

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7971:
---
Status: Open  (was: Triage Needed)

> Pycharm debugger for apache_beam/*_test.py broken
> -
>
> Key: BEAM-7971
> URL: https://issues.apache.org/jira/browse/BEAM-7971
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Minor
>
> This currently affects pipeline_test.py and pvalue_test.py.
> It seems that "import io" is interpreted as importing apache_beam.io, which 
> fails.
> In Python 2.7 the stacktrace shows:
> {code}
> Testing started at 3:48 PM ...
> /usr/local/google/home/ehudm/virtualenvs/beamenv/bin/python 
> /usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/pydevd.py
>  --multiproc --qt-support=auto --client 127.0.0.1 --port 41493 --file 
> /usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pycharm/_jb_nosetest_runner.py
>  --path 
> /usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pvalue_test.py
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/pydevd.py",
>  line 15, in 
> from _pydevd_bundle.pydevd_constants import IS_JYTH_LESS25, 
> IS_PY34_OR_GREATER, IS_PY36_OR_GREATER, IS_PYCHARM, get_thread_id, \
>   File 
> "/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/_pydevd_bundle/pydevd_constants.py",
>  line 169, in 
> from _pydev_imps._pydev_saved_modules import thread
>   File 
> "/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/_pydev_imps/_pydev_saved_modules.py",
>  line 15, in 
> import xmlrpclib
>   File "/usr/lib/python2.7/xmlrpclib.py", line 145, in 
> import httplib
>   File "/usr/lib/python2.7/httplib.py", line 80, in 
> import mimetools
>   File "/usr/lib/python2.7/mimetools.py", line 6, in 
> import tempfile
>   File "/usr/lib/python2.7/tempfile.py", line 32, in 
> import io as _io
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/io/__init__.py",
>  line 22, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/__init__.py", 
> line 97, in 
> from apache_beam import coders
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/coders/__init__.py",
>  line 19, in 
> from apache_beam.coders.coders import *
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/coders/coders.py",
>  line 27, in 
> from builtins import object
>   File 
> "/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/builtins/__init__.py",
>  line 8, in 
> from future.builtins import *
>   File 
> "/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/future/builtins/__init__.py",
>  line 13, in 
> from future.builtins.misc import (ascii, chr, hex, input, isinstance, 
> next,
>   File 
> "/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/future/builtins/misc.py",
>  line 43, in 
> from io import open
> ImportError: cannot import name open
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7980) External environment with containerized worker pool

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7980:
---
Status: Open  (was: Triage Needed)

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7973) Python doesn't shut down Flink job server properly

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7973:
---
Status: Open  (was: Triage Needed)

> Python doesn't shut down Flink job server properly
> --
>
> Key: BEAM-7973
> URL: https://issues.apache.org/jira/browse/BEAM-7973
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Using the new Python FlinkRunner [1], a new job server is created and the job 
> succeeds, but seemingly not being shut down properly when the Python command 
> exits. Specifically, the java -jar command that started the job server is 
> still running in the background, eating up memory.
> Relevant args:
> python ...
>  --runner FlinkRunner \ 
>  --flink_job_server_jar $FLINK_JOB_SERVER_JAR ...
> [1] [https://github.com/apache/beam/pull/9043]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7972:
---
Summary: Portable Python Reshuffle does not work with windowed pcollection  
(was: Portable Python Reshuffle does not work with with windowed pcollection)

> Portable Python Reshuffle does not work with windowed pcollection
> -
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7969:
---
Status: Open  (was: Triage Needed)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7972:
---
Status: Open  (was: Triage Needed)

> Portable Python Reshuffle does not work with windowed pcollection
> -
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7977) DataflowRunner writes to System.out instead of logging

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7977:
---
Status: Open  (was: Triage Needed)

> DataflowRunner writes to System.out instead of logging
> --
>
> Key: BEAM-7977
> URL: https://issues.apache.org/jira/browse/BEAM-7977
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sebastian Jambor
>Assignee: Sebastian Jambor
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> DataflowRunner writes two lines to stdout for every job, which bypasses the 
> logger setup. This is slightly annoying, especially if many jobs are started 
> programmatically.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7982) Dataflow runner needs to identify the new format of metric names for distribution metrics

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7982:
---
Status: Open  (was: Triage Needed)

> Dataflow runner needs to identify the new format of metric names for 
> distribution metrics
> -
>
> Key: BEAM-7982
> URL: https://issues.apache.org/jira/browse/BEAM-7982
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: David Yan
>Priority: Major
>
> For example, 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py#L157]
> uses [MAX], [MIN], etc. but the new format will be _MAX, _MIN, etc.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7983) Template parameters don't work if they are only used in DoFns

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7983:
---
Status: Open  (was: Triage Needed)

> Template parameters don't work if they are only used in DoFns
> -
>
> Key: BEAM-7983
> URL: https://issues.apache.org/jira/browse/BEAM-7983
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Yunqing Zhou
>Assignee: Luke Cwik
>Priority: Minor
>
> Template parameters don't work if they are only used in DoFns but not 
> anywhere else in main.
> Sample pipeline:
>  
> {code:java}
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> import org.apache.beam.sdk.options.ValueProvider;
> import org.apache.beam.sdk.transforms.Create;
> import org.apache.beam.sdk.transforms.DoFn;
> import org.apache.beam.sdk.transforms.ParDo;
> public class BugPipeline {
>   public interface Options extends PipelineOptions {
> ValueProvider getFoo();
> void setFoo(ValueProvider foo);
>   }
>   public static void main(String[] args) throws Exception {
> Options options = PipelineOptionsFactory.fromArgs(args).as(Options.class);
> Pipeline p = Pipeline.create(options);
> p.apply(Create.of(1)).apply(ParDo.of(new DoFn() {
>   @ProcessElement
>   public void processElement(ProcessContext context) {
> 
> System.out.println(context.getPipelineOptions().as(Options.class).getFoo());
>   }   
> }));
> p.run();  
>   
>   
>   
>   
>   }
> }
> {code}
> Option "foo" is not used anywhere else than the DoFn. So to reproduce the 
> problem:
> {code:bash}
> $java BugPipeline --project=$PROJECT --stagingLocation=$STAGING 
> --templateLocation=$TEMPLATE --runner=DataflowRunner
> $gcloud dataflow jobs run $NAME --gcs-location=$TEMPLATE --parameters=foo=bar
> {code}
> it will fail w/ this error:
> {code}
> ERROR: (gcloud.dataflow.jobs.run) INVALID_ARGUMENT: (2621bec26c2488b7): The 
> workflow could not be created. Causes: (2621bec26c248dba): Found unexpected 
> parameters: ['foo' (perhaps you meant 'zone')]
> - '@type': type.googleapis.com/google.rpc.DebugInfo
>   detail: "(2621bec26c2488b7): The workflow could not be created. Causes: 
> (2621bec26c248dba):\
> \ Found unexpected parameters: ['foo' (perhaps you meant 'zone')]"
> {code}
> The underlying problem is that ProxyInvocationHandler.java only populate 
> options which are "invoked" to the pipeline option map in the job object:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L159
> One way to solve it is to save all ValueProvider type of params in the 
> pipelineoptions section. Alternatively, some registration mechanism can be 
> introduced.
> A current workaround is to annotate the parameter with 
> {code}@Validation.Required{code}, which will call invoke() behind the scene.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7984:
---
Status: Open  (was: Triage Needed)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7981) ParDo function wrapper doesn't support Iterable output types

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7981:
---
Status: Open  (was: Triage Needed)

> ParDo function wrapper doesn't support Iterable output types
> 
>
> Key: BEAM-7981
> URL: https://issues.apache.org/jira/browse/BEAM-7981
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Major
>
> I believe the bug is in CallableWrapperDoFn.default_type_hints, which 
> converts Iterable[str] to str.
> This test will be included (commented out) in 
> https://github.com/apache/beam/pull/9283
> {code}
>   def test_typed_callable_iterable_output(self):
> @typehints.with_input_types(int)
> @typehints.with_output_types(typehints.Iterable[str])
> def do_fn(element):
>   return [[str(element)] * 2]
> result = [1, 2] | beam.ParDo(do_fn)
> self.assertEqual([['1', '1'], ['2', '2']], sorted(result))
> {code}
> Result:
> {code}
> ==
> ERROR: test_typed_callable_iterable_output 
> (apache_beam.typehints.typed_pipeline_test.MainInputTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py",
>  line 104, in test_typed_callable_iterable_output
> result = [1, 2] | beam.ParDo(do_fn)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 519, in __ror__
> p.run().wait_until_finish()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", 
> line 406, in run
> self._options).run(False)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", 
> line 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 129, in run_pipeline
> return runner.run_pipeline(pipeline, options)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 366, in run_pipeline
> default_environment=self._default_environment))
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 373, in run_via_runner_api
> return self.run_stages(stage_context, stages)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 455, in run_stages
> stage_context.safe_coders)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 733, in _run_stage
> result, splits = bundle_manager.process_bundle(data_input, data_output)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1663, in process_bundle
> part, expected_outputs), part_inputs):
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
> result_iterator
> yield fs.pop().result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
> return self.__get_result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
> __get_result
> raise self._exception
>   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1663, in 
> part, expected_outputs), part_inputs):
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1601, in process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1080, in push
> response = self.worker.do_instruction(request)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 343, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 369, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle
> data.ptransform_id].process_encoded(data.data)
>   File 
> 

[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7975:
---
Status: Open  (was: Triage Needed)

> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups isn't scaling as well as I 
> expect. Perhaps this is contributing (how do I tell if workers are being 
> utilized or not?)
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
> this when running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7986:
---
Status: Open  (was: Triage Needed)

> Increase minimum grpcio required version
> 
>
> Key: BEAM-7986
> URL: https://issues.apache.org/jira/browse/BEAM-7986
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Priority: Major
>
> According to this question, 1.11.0 is not new enough (1.22.0 reportedly 
> works), and we list the minimum as 1.8.
> https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498
> Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=295788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295788
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 15/Aug/19 21:54
Start Date: 15/Aug/19 21:54
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #9330: [BEAM-7969] Report 
FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#issuecomment-521811390
 
 
   Thanks for the offline discussion.
   It seems reasonable to move with the PR and add the integration tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295788)
Time Spent: 2h 10m  (was: 2h)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-5980) Add load tests for Core Apache Beam operations

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-5980:
---
Summary: Add load tests for Core Apache Beam operations   (was: Add load 
tests for Core Apache Beam opertaions )

> Add load tests for Core Apache Beam operations 
> ---
>
> Key: BEAM-5980
> URL: https://issues.apache.org/jira/browse/BEAM-5980
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This involves adding a suite of load tests described in this proposal: 
> [https://s.apache.org/load-test-basic-operations]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7978) ArithmeticExceptions on getting backlog bytes

2019-08-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7978:
---
Status: Open  (was: Triage Needed)

> ArithmeticExceptions on getting backlog bytes 
> --
>
> Key: BEAM-7978
> URL: https://issues.apache.org/jira/browse/BEAM-7978
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.14.0
>Reporter: Mateusz
>Priority: Major
>
> Hello,
> Beam 2.14.0
>  (and to be more precise 
> [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec])
>  introduced a change in watermark calculation in Kinesis IO causing below 
> error:
> {code:java}
> exception:  "java.lang.RuntimeException: Unknown kinesis failure, when trying 
> to reach kinesis
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155)
>   at 
> org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArithmeticException: Value cannot fit in an int: 
> 153748963401
>   at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229)
>   at 
> org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141)
>   at 
> org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72)
>   at org.joda.time.Minutes.minutesBetween(Minutes.java:101)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210)
>   ... 10 more
> {code}
> We spotted this issue on Dataflow runner. It's problematic as inability to 
> get backlog bytes seems to result in constant recreation of KinesisReader.
> The issue happens if the backlog bytes are retrieved before watermark value 
> is updated from initial default value. Easy way to reproduce it is to create 
> a pipeline with Kinesis source for a stream where no records are being put. 
> While debugging it locally, you can observe that the watermark is set to the 
> value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes 
> (default watermark idle duration threshold is set to 2 minutes) , the 
> watermark is set to value of 
> [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]),
>  so the next backlog bytes retrieval should be correct. However, as described 
> before, running the pipeline on Dataflow runner results in KinesisReader 
> being closed just after creation, so the watermark won't be fixed.
> The reason of the issue is following: The introduced watermark policies are 
> relying on 
> [WatermarkParameters|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkParameters.java]
>  which initialises currentWatermark and eventTime to 
> [BoundedWindow.TIMESTAMP_MIN_VALUE|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ecR52].
>  This result in watermark being set to new Instant(-9223372036854775L) at the 
> KinesisReader creation. Calculated [period between the watermark and the 
> current 
> timestamp|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L169]
>  is bigger than expected causing the ArithmeticException to be thrown.
> The maximum retention on Kinesis streams is  [7 
> 

[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295780
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 15/Aug/19 21:38
Start Date: 15/Aug/19 21:38
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-521806974
 
 
   @kanterov I restored the access modifiers and let everything as suggested.
   
   I think we should merge this as it is and open the discussion in dev@ since 
the classes are still experimental we can still adapt the changes we conclude 
from the discussion and ‘experimental’ users can benefit of having this 
available in the meantime. WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295780)
Time Spent: 5h 20m  (was: 5h 10m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-5428) Implement cross-bundle state caching.

2019-08-15 Thread Rakesh Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Kumar updated BEAM-5428:
---
Description: 
Tech spec: 
[https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]

Relevant document: 
[https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]

Mailing list link: 
[https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-5428) Implement cross-bundle state caching.

2019-08-15 Thread Rakesh Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Kumar reassigned BEAM-5428:
--

Assignee: Rakesh Kumar  (was: Robert Bradshaw)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7475) Add Python stateful processing example in blog

2019-08-15 Thread Rakesh Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Kumar resolved BEAM-7475.

   Resolution: Fixed
Fix Version/s: 2.14.0

> Add Python stateful processing example in blog
> --
>
> Key: BEAM-7475
> URL: https://issues.apache.org/jira/browse/BEAM-7475
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7737) Microbenchmark script do not work consistently

2019-08-15 Thread Rakesh Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Kumar resolved BEAM-7737.

   Resolution: Fixed
Fix Version/s: 2.14.0

> Microbenchmark script do not work consistently 
> ---
>
> Key: BEAM-7737
> URL: https://issues.apache.org/jira/browse/BEAM-7737
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> microbenchmar scripts do not work consistently, at least run into problem for 
> me:
> {code}
> (beam) ➜  python git:(master) python -m 
> apache_beam.tools.distribution_counter_microbenchmark
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
>  line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File 
> "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
>  line 72, in _run_code
> exec code in run_globals
>   File 
> "/Users/rkumar/personal/opensource/beam/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py",
>  line 69, in 
> 'apache_beam.transforms.cy_dataflow_distribution_counter')
>   File "apache_beam/tools/utils.py", line 41, in check_compiled
> "Profiling uncompiled code.\n"
> RuntimeError: Profiling uncompiled code.
> To compile beam, run 'pip install Cython; python setup.py build_ext --inplace'
> {code}
> Tihs is because [this 
> line|https://github.com/apache/beam/blob/f7cbf88f550c8918b99a13af4182d6efa07cd2b5/sdks/python/apache_beam/tools/utils.py#L37]
>  of check_compiled method 
> doesn't work consistently. It every time just get the root module instead of 
> the submodule and fails to identify the the given moudle is compiled with 
> cython.
> Also it is not recommended to use {{__import__}} method because it is python 
> internal and it is meant to be used by the Python interpreter. 
> {{importlib.import_module}} is the recommended way of finding the module.
> Pull request: https://github.com/apache/beam/pull/9066



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295765
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 15/Aug/19 21:07
Start Date: 15/Aug/19 21:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r314499407
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -127,7 +127,7 @@ public static FixedBytesField withSize(int size) {
 
 /** Create a {@link FixedBytesField} from a Beam {@link FieldType}. */
 @Nullable
-public static FixedBytesField fromBeamFieldType(FieldType fieldType) {
+static FixedBytesField fromBeamFieldType(FieldType fieldType) {
 
 Review comment:
   I am letting things as they were and resolving this one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295765)
Time Spent: 5h 10m  (was: 5h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295759
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 15/Aug/19 20:58
Start Date: 15/Aug/19 20:58
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r314495688
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##
 @@ -875,6 +926,13 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   recordClass,
   schemaSupplier.get());
 }
+
+private static class JsonToSchema implements Function, 
Serializable {
 
 Review comment:
   :) yaaayy !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295759)
Time Spent: 5h  (was: 4h 50m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295756
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 15/Aug/19 20:57
Start Date: 15/Aug/19 20:57
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r314495186
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##
 @@ -186,6 +188,49 @@
  * scalability. Note that it may decrease performance if the filepattern 
matches only a small number
  * of files.
  *
+ * Inferring Beam schemas from Avro files
+ *
+ * If you want to use SQL or schema based operations on an Avro-based 
PCollection. You must
+ * configure the read transform to infer the Beam schema and automatically 
setup the Beam related
+ * coders by doing:
+ *
+ * {@code
+ * PCollection records =
+ * p.apply(AvroIO.read(...).from(...).withBeamSchemas(true);
+ * }
+ *
+ * Inferring Beam schemas from Avro PCollections
+ *
+ * If you created an Avro-based PCollection by other means e.g. reading 
records from Kafka or as
+ * the output of another PTransform. You may be interested on making your 
PCollection schema-aware
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295756)
Time Spent: 4h 40m  (was: 4.5h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=295757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295757
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 15/Aug/19 20:57
Start Date: 15/Aug/19 20:57
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9351: [WIP][BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-521794846
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295757)
Time Spent: 1h  (was: 50m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295758
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 15/Aug/19 20:57
Start Date: 15/Aug/19 20:57
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r314495461
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##
 @@ -186,6 +188,49 @@
  * scalability. Note that it may decrease performance if the filepattern 
matches only a small number
  * of files.
  *
+ * Inferring Beam schemas from Avro files
+ *
+ * If you want to use SQL or schema based operations on an Avro-based 
PCollection. You must
+ * configure the read transform to infer the Beam schema and automatically 
setup the Beam related
+ * coders by doing:
+ *
+ * {@code
+ * PCollection records =
+ * p.apply(AvroIO.read(...).from(...).withBeamSchemas(true);
+ * }
+ *
+ * Inferring Beam schemas from Avro PCollections
+ *
+ * If you created an Avro-based PCollection by other means e.g. reading 
records from Kafka or as
+ * the output of another PTransform. You may be interested on making your 
PCollection schema-aware
+ * so you can use the Schema-based APIs or Beam's SqlTransform.
+ *
+ * If you are using Avro specific records (generated classes from an Avro 
schema), you can
+ * register a schema provider for the specific Avro class to make any 
PCollection of these objects
+ * schema-aware.
+ *
+ * {@code
+ * pipeline.getSchemaRegistry().registerSchemaProvider(AvroAutoGenClass.class, 
new AvroRecordSchema());
 
 Review comment:
   good one, definitely simpler
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295758)
Time Spent: 4h 50m  (was: 4h 40m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295755
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 15/Aug/19 20:56
Start Date: 15/Aug/19 20:56
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r314494996
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##
 @@ -186,6 +188,49 @@
  * scalability. Note that it may decrease performance if the filepattern 
matches only a small number
  * of files.
  *
+ * Inferring Beam schemas from Avro files
+ *
+ * If you want to use SQL or schema based operations on an Avro-based 
PCollection. You must
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295755)
Time Spent: 4.5h  (was: 4h 20m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=295745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295745
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 15/Aug/19 20:48
Start Date: 15/Aug/19 20:48
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-521791865
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295745)
Time Spent: 20h  (was: 19h 50m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=295708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295708
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 15/Aug/19 19:50
Start Date: 15/Aug/19 19:50
Worklog Time Spent: 10m 
  Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-521773234
 
 
   @pabloem running `tox -e py27-lint` locally throws an error [reproduced on 2 
different machines]:
   ```
   (Beam-MF) C:\git\beam-MF\sdks\python>tox -e py27-lint
   GLOB sdist-make: C:\git\beam-MF\sdks\python\setup.py
   py27-lint create: C:\git\beam-MF\sdks\python\target\.tox\py27-lint
   py27-lint installdeps: pycodestyle==2.3.1, pylint==1.9.3, future==0.16.0, 
isort==4.2.15, flake8==3.5.0
   ERROR: invocation failed (exit code 2), logfile: 
C:\git\beam-MF\sdks\python\target\.tox\py27-lint\log\py27-lint-1.log
   == log start 
==
   C:\git\beam-MF\sdks\python\target\.tox\py27-lint\Scripts/python: can't open 
file 'C:\git\beam-MF\sdks\python\target\.tox\py27-lint\Scripts/pip': [Errno 2] 
No such file or directory
   
   === log end 
===
   ERROR: could not install deps [pycodestyle==2.3.1, pylint==1.9.3, 
future==0.16.0, isort==4.2.15, flake8==3.5.0]; v = 
InvocationError(u"'C:\\git\\beam-MF\\sdks\\python\\target\\.tox\\py27-lint\\Scripts/python'
 'C:\\git\\beam-MF\\sdks\\python\\target\\.tox\\py27-lint
   \\Scripts/pip' install --retries 10 pycodestyle==2.3.1 pylint==1.9.3 
future==0.16.0 isort==4.2.15 flake8==3.5.0", 2)
   ___ summary 
___
   ERROR:   py27-lint: could not install deps [pycodestyle==2.3.1, 
pylint==1.9.3, future==0.16.0, isort==4.2.15, flake8==3.5.0]; v = 
InvocationError(u"'C:\\git\\beam-MF\\sdks\\python\\target\\.tox\\py27-lint\\Scripts/python'
 'C:\\git\\beam-MF\\sdks\\python\\target\\.t
   ox\\py27-lint\\Scripts/pip' install --retries 10 pycodestyle==2.3.1 
pylint==1.9.3 future==0.16.0 isort==4.2.15 flake8==3.5.0", 2)
   ```
   Although not the best, as of right now using Git checks seems to be the only 
viable way.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295708)
Time Spent: 36h 10m  (was: 36h)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 36h 10m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295672
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 19:00
Start Date: 15/Aug/19 19:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9263: [BEAM-7389] Add 
code examples for Keys page
URL: https://github.com/apache/beam/pull/9263
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295672)
Time Spent: 48h 10m  (was: 48h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 48h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295670
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 18:59
Start Date: 15/Aug/19 18:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9258: [BEAM-7389] Add 
code examples for KvSwap page
URL: https://github.com/apache/beam/pull/9258
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295670)
Time Spent: 48h  (was: 47h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 48h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295669
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 18:59
Start Date: 15/Aug/19 18:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9267: [BEAM-7389] Add 
code examples for WithTimestamps page
URL: https://github.com/apache/beam/pull/9267
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295669)
Time Spent: 47h 50m  (was: 47h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 47h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7986) Increase minimum grpcio required version

2019-08-15 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-7986:
---

 Summary: Increase minimum grpcio required version
 Key: BEAM-7986
 URL: https://issues.apache.org/jira/browse/BEAM-7986
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Udi Meiri


According to this question, 1.11.0 is not new enough (1.22.0 reportedly works), 
and we list the minimum as 1.8.

https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498

Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=295667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295667
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Aug/19 18:54
Start Date: 15/Aug/19 18:54
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9265: [BEAM-7389] Add 
code examples for Map page
URL: https://github.com/apache/beam/pull/9265
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 295667)
Time Spent: 47h 40m  (was: 47.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 47h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >