date:20190704

[jira] [Work logged] (BEAM-7420) Including the Flink runner causes exceptions unless running in a flink environment

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7420?focusedWorklogId=272399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272399
 ]

ASF GitHub Bot logged work on BEAM-7420:


Author: ASF GitHub Bot
Created on: 05/Jul/19 02:17
Start Date: 05/Jul/19 02:17
Worklog Time Spent: 10m 
  Work Description: mikekap commented on pull request #8894: [BEAM-7420]: 
Allow including the flink runner without flink on the classpath.
URL: https://github.com/apache/beam/pull/8894#discussion_r300523080
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -146,11 +149,12 @@
*/
   @Description(
   "Sets the state backend to use in streaming mode. "
-  + "Otherwise the default is read from the Flink config.")
+  + "Otherwise the default is read from the Flink config. "
+  + "This should be an instance of {@link 
org.apache.flink.runtime.state.StateBackend}")
   @JsonIgnore
-  StateBackend getStateBackend();
+  Object getStateBackend();
 
-  void setStateBackend(StateBackend stateBackend);
+  void setStateBackend(Object stateBackend);
 
 Review comment:
   I tried to do this, but I didn't quite know how to let folks configure the 
backend nicely. The backends in flink have different constructor arguments 
which makes something generic...a little difficult. The factory gets the 
options object, so presumably it can be somewhat dynamic. Let me know if you 
were thinking of something different.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272399)
Time Spent: 50m  (was: 40m)

> Including the Flink runner causes exceptions unless running in a flink 
> environment
> --
>
> Key: BEAM-7420
> URL: https://issues.apache.org/jira/browse/BEAM-7420
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mike Kaplinskiy
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The {{FlinkPipelineOptions}} imports various Flink enums which make it 
> impossible to e.g. run the direct runner with the same classpath but without 
> the flink runtime. The fix is potentially easy - make the arguments strings 
> and convert them to enums at the callsites.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5440) Add option to mount a directory inside SDK harness containers

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5440?focusedWorklogId=272391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272391
 ]

ASF GitHub Bot logged work on BEAM-5440:


Author: ASF GitHub Bot
Created on: 05/Jul/19 01:30
Start Date: 05/Jul/19 01:30
Worklog Time Spent: 10m 
  Work Description: sambvfx commented on pull request #8982: [BEAM-5440] 
Pass docker run options to SDK harness containers
URL: https://github.com/apache/beam/pull/8982#discussion_r300518136
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -136,6 +138,10 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   dockerArgsBuilder.add("--rm");
 }
 
+if (!dockerOptions.isEmpty()) {
+  dockerArgsBuilder.addAll(Arrays.asList(dockerOptions.split("\\s+")));
 
 Review comment:
   Thanks for taking a look @mxm! 
   
   I'm not sure I understand the desire to introduce JSON here. I admit this 
java string parsing is not very robust. Is the motivation to move the docker 
option shell string parsing into the SDKs and send along an encoded list of 
strings?
   
   For reference the docker options could be anything acceptable to pass into 
`docker run [OPTIONS]`.
   
   ```
   '-v /Volumes/mounts/foo:/Volumes/mounts/foo:ro'
   '--user sambvfx'
   '--env "ENV_VAR=bar"'
   ```
   
   If this was python I would likely parse the string with 
[shlex.split](https://docs.python.org/2.7/library/shlex.html#shlex.split) - is 
there an equivalent in Java for safely parsing shell-style strings?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272391)
Time Spent: 1h 10m  (was: 1h)

> Add option to mount a directory inside SDK harness containers
> -
>
> Key: BEAM-5440
> URL: https://issues.apache.org/jira/browse/BEAM-5440
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-core
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While experimenting with the Python SDK locally, I found it inconvenient that 
> I can't mount a host directory to the Docker containers, i.e. the input must 
> already be in the container and the results of a Write remain inside the 
> container. For local testing, users may want to mount a host directory.
> Since BEAM-5288 the {{Environment}} carries explicit environment information, 
> we could a) add volume args to the {{DockerPayload}}, or b) provide a general 
> Docker arguments field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7534) add --mountTempDir option for easier data sharing with Docker container

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7534?focusedWorklogId=272386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272386
 ]

ASF GitHub Bot logged work on BEAM-7534:


Author: ASF GitHub Bot
Created on: 05/Jul/19 00:05
Start Date: 05/Jul/19 00:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #8828: [BEAM-7534] add 
--mountTempDir option for Docker container
URL: https://github.com/apache/beam/pull/8828#issuecomment-508592895
 
 
   @ihji there’s another PR that covers this same topic over at 
https://nuget.pkg.github.com/apache/beam/pull/8982. It seems like it covers 
more cases and does so without adding a docker-specific flag.  Do you think it 
will work sufficiently for your use case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272386)
Time Spent: 3h  (was: 2h 50m)

> add --mountTempDir option for easier data sharing with Docker container
> ---
>
> Key: BEAM-7534
> URL: https://issues.apache.org/jira/browse/BEAM-7534
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> add --mountTempDir option for easier data sharing with Docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5440) Add option to mount a directory inside SDK harness containers

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5440?focusedWorklogId=272381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272381
 ]

ASF GitHub Bot logged work on BEAM-5440:


Author: ASF GitHub Bot
Created on: 04/Jul/19 22:33
Start Date: 04/Jul/19 22:33
Worklog Time Spent: 10m 
  Work Description: sambvfx commented on pull request #8982: [BEAM-5440] 
Pass docker run options to SDK harness containers
URL: https://github.com/apache/beam/pull/8982#discussion_r300506366
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1235,26 +1235,32 @@ class DockerSdkWorkerHandler(GrpcWorkerHandler):
   def __init__(self, payload, state, provision_info):
 super(DockerSdkWorkerHandler, self).__init__(state, provision_info)
 self._container_image = payload.container_image
+self._options = payload.options
 self._container_id = None
 
   def start_worker(self):
 try:
   subprocess.check_call(['docker', 'pull', self._container_image])
 except Exception:
   logging.info('Unable to pull image %s' % self._container_image)
-self._container_id = subprocess.check_output(
-['docker',
- 'run',
- '-d',
- # TODO:  credentials
- '--network=host',
- self._container_image,
- '--id=%s' % uuid.uuid4(),
- '--logging_endpoint=%s' % self.logging_api_service_descriptor().url,
- '--control_endpoint=%s' % self.control_address,
- '--artifact_endpoint=%s' % self.control_address,
- '--provision_endpoint=%s' % self.control_address,
-]).strip()
+cmd = [
+  'docker',
+  'run',
+  '-d',
+  # TODO:  credentials
+  '--network=host',
+]
+if self._options:
+  cmd.extend(self._options.split(' '))
+cmd.extend([
+  self._container_image,
+  '--id=%s' % uuid.uuid4(),
+  '--logging_endpoint=%s' % self.logging_api_service_descriptor().url,
+  '--control_endpoint=%s' % self.control_address,
+  '--artifact_endpoint=%s' % self.control_address,
+  '--provision_endpoint=%s' % self.control_address,
+])
+self._container_id = subprocess.check_output(cmd).strip()
 
 Review comment:
   The reformatting was necessary to add the optional `DockerPayload.options` 
into the docker run command.
   
   ```python
   if self._options:
 cmd.extend(self._options.split(' '))
   ```
   
   I didn't see anywhere this class was being used... I'm happy to revert if we 
don't want/need this functionality here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272381)
Time Spent: 1h  (was: 50m)

> Add option to mount a directory inside SDK harness containers
> -
>
> Key: BEAM-5440
> URL: https://issues.apache.org/jira/browse/BEAM-5440
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-core
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While experimenting with the Python SDK locally, I found it inconvenient that 
> I can't mount a host directory to the Docker containers, i.e. the input must 
> already be in the container and the results of a Write remain inside the 
> container. For local testing, users may want to mount a host directory.
> Since BEAM-5288 the {{Environment}} carries explicit environment information, 
> we could a) add volume args to the {{DockerPayload}}, or b) provide a general 
> Docker arguments field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7690) Port WordCountTest off DoFnTester

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7690?focusedWorklogId=272346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272346
 ]

ASF GitHub Bot logged work on BEAM-7690:


Author: ASF GitHub Bot
Created on: 04/Jul/19 19:31
Start Date: 04/Jul/19 19:31
Worklog Time Spent: 10m 
  Work Description: cademarkegard commented on issue #9003: [BEAM-7690] 
Port WordCountTest off DoFnTester
URL: https://github.com/apache/beam/pull/9003#issuecomment-508562499
 
 
   R: @lukecwik
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272346)
Time Spent: 20m  (was: 10m)

> Port WordCountTest off DoFnTester
> -
>
> Key: BEAM-7690
> URL: https://issues.apache.org/jira/browse/BEAM-7690
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Cade Markegard
>Assignee: Cade Markegard
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7692) Create a transform to assign an index when reading data from built in I/O transforms

2019-07-04 Thread SAILOKESH DIVI (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878864#comment-16878864
 ] 

SAILOKESH DIVI commented on BEAM-7692:
--

Beam user mail thread

[https://lists.apache.org/thread.html/5c10b7edf982ef63d1d1d70545e3fe2716d00628ff5c2a7854383413@%3Cuser.beam.apache.org%3E]

> Create a transform to assign an index when reading data from built in I/O 
> transforms
> 
>
> Key: BEAM-7692
> URL: https://issues.apache.org/jira/browse/BEAM-7692
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Affects Versions: 2.13.0
>Reporter: SAILOKESH DIVI
>Priority: Major
>  Labels: beam
>
> As a beam user
> when using any of the existing beams I/O transforms
> I would like to add an index to each line read as part of the transform.
>  
> As spark has zipWithIndex to assign an index when reading files with the beam 
> being abstraction layer for may runners. I would expect this feature should 
> be added to beam 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7692) Create a transform to assign an index when reading data from built in I/O transforms

2019-07-04 Thread SAILOKESH DIVI (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SAILOKESH DIVI updated BEAM-7692:
-
Labels: beam  (was: )

> Create a transform to assign an index when reading data from built in I/O 
> transforms
> 
>
> Key: BEAM-7692
> URL: https://issues.apache.org/jira/browse/BEAM-7692
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Affects Versions: 2.13.0
>Reporter: SAILOKESH DIVI
>Priority: Major
>  Labels: beam
>
> As a beam user
> when using any of the existing beams I/O transforms
> I would like to add an index to each line read as part of the transform.
>  
> As spark has zipWithIndex to assign an index when reading files with the beam 
> being abstraction layer for may runners. I would expect this feature should 
> be added to beam 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-7692) Create a transform to assign an index when reading data from built in I/O transforms

2019-07-04 Thread SAILOKESH DIVI (JIRA)

SAILOKESH DIVI created BEAM-7692:


 Summary: Create a transform to assign an index when reading data 
from built in I/O transforms
 Key: BEAM-7692
 URL: https://issues.apache.org/jira/browse/BEAM-7692
 Project: Beam
  Issue Type: New Feature
  Components: io-ideas
Affects Versions: 2.13.0
Reporter: SAILOKESH DIVI


As a beam user

when using any of the existing beams I/O transforms

I would like to add an index to each line read as part of the transform.

 

As spark has zipWithIndex to assign an index when reading files with the beam 
being abstraction layer for may runners. I would expect this feature should be 
added to beam 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4087?focusedWorklogId=272324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272324
 ]

ASF GitHub Bot logged work on BEAM-4087:


Author: ASF GitHub Bot
Created on: 04/Jul/19 18:26
Start Date: 04/Jul/19 18:26
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on issue #8255: [BEAM-4087] 
implement configurable dependency versions
URL: https://github.com/apache/beam/pull/8255#issuecomment-508553164
 
 
   @iemejia I guess, the approach on the other ticket is still valid and is not 
really impacted by the changes here. I d need to have a look at it again, but 
iirc it was just a matter of extracting that into reusable 
module/plugin/whateverucallit...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272324)
Time Spent: 4h 20m  (was: 4h 10m)

> Gradle build does not allow to overwrite versions of provided dependencies
> --
>
> Key: BEAM-4087
> URL: https://issues.apache.org/jira/browse/BEAM-4087
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> In order to test modules with provided dependencies in maven we can execute 
> for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 
> -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this 
> with gradle because the version of the dependencies are defined locally and 
> not in the gradle.properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with UnboundedSource

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=272284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272284
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 04/Jul/19 16:02
Start Date: 04/Jul/19 16:02
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #8757: [BEAM-7427] Fix 
JmsCheckpointMark Avro Encoding
URL: https://github.com/apache/beam/pull/8757#issuecomment-508528032
 
 
   @zouabimourad ok, let me propose a fix then.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272284)
Time Spent: 1h 10m  (was: 1h)

> JmsCheckpointMark Avro Serialisation issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7683) MongoDBIO.withQueryFn ignores custom filter

2019-07-04 Thread Ahmed El.Hussaini (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878740#comment-16878740
 ] 

Ahmed El.Hussaini commented on BEAM-7683:
-

LOOL. Note to self, never reply to issues before having my morning cappuccino :D

> MongoDBIO.withQueryFn ignores custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Assignee: Chaim
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7683) MongoDBIO.withQueryFn ignores custom filter

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878739#comment-16878739
 ] 

Ismaël Mejía commented on BEAM-7683:


Hehe it is already merged [~sandboxws] :)

> MongoDBIO.withQueryFn ignores custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Assignee: Chaim
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7503) Create CoGBK Python Load Test Job Jenkins

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7503?focusedWorklogId=272229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272229
 ]

ASF GitHub Bot logged work on BEAM-7503:


Author: ASF GitHub Bot
Created on: 04/Jul/19 14:46
Start Date: 04/Jul/19 14:46
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8969: [BEAM-7503] Created 
CoGBK Python Load Test Jenkins job
URL: https://github.com/apache/beam/pull/8969#issuecomment-508507248
 
 
   R: @pabloem Could you take a look at this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272229)
Time Spent: 50m  (was: 40m)

> Create CoGBK Python Load Test Job Jenkins
> -
>
> Key: BEAM-7503
> URL: https://issues.apache.org/jira/browse/BEAM-7503
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7503) Create CoGBK Python Load Test Job Jenkins

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7503?focusedWorklogId=272217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272217
 ]

ASF GitHub Bot logged work on BEAM-7503:


Author: ASF GitHub Bot
Created on: 04/Jul/19 14:27
Start Date: 04/Jul/19 14:27
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8969: [BEAM-7503] Created 
CoGBK Python Load Test Jenkins job
URL: https://github.com/apache/beam/pull/8969#issuecomment-508501378
 
 
   Run Load Tests Python CoGBK Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272217)
Time Spent: 40m  (was: 0.5h)

> Create CoGBK Python Load Test Job Jenkins
> -
>
> Key: BEAM-7503
> URL: https://issues.apache.org/jira/browse/BEAM-7503
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7683) MongoDBIO.withQueryFn ignores custom filter

2019-07-04 Thread Ahmed El.Hussaini (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878700#comment-16878700
 ] 

Ahmed El.Hussaini commented on BEAM-7683:
-

[~Turkel] feel free to send the PR my way when it's ready.

> MongoDBIO.withQueryFn ignores custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Assignee: Chaim
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7503) Create CoGBK Python Load Test Job Jenkins

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7503?focusedWorklogId=272213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272213
 ]

ASF GitHub Bot logged work on BEAM-7503:


Author: ASF GitHub Bot
Created on: 04/Jul/19 14:12
Start Date: 04/Jul/19 14:12
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8969: [BEAM-7503] Created 
CoGBK Python Load Test Jenkins job
URL: https://github.com/apache/beam/pull/8969#issuecomment-508496838
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272213)
Time Spent: 0.5h  (was: 20m)

> Create CoGBK Python Load Test Job Jenkins
> -
>
> Key: BEAM-7503
> URL: https://issues.apache.org/jira/browse/BEAM-7503
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7557) Migrate DynamoDBIO to AWS SDK for Java 2

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7557?focusedWorklogId=272209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272209
 ]

ASF GitHub Bot logged work on BEAM-7557:


Author: ASF GitHub Bot
Created on: 04/Jul/19 14:02
Start Date: 04/Jul/19 14:02
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #8987: 
[BEAM-7557] - Migrate DynamoDBIO to AWS SDK for Java 2
URL: https://github.com/apache/beam/pull/8987#discussion_r300412735
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java
 ##
 @@ -0,0 +1,533 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws2.dynamodb;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.ListCoder;
+import org.apache.beam.sdk.coders.MapCoder;
+import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.Reshuffle;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TypeDescriptor;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
+import org.apache.http.HttpStatus;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
+import software.amazon.awssdk.services.dynamodb.model.AttributeValue;
+import software.amazon.awssdk.services.dynamodb.model.BatchWriteItemRequest;
+import software.amazon.awssdk.services.dynamodb.model.DynamoDbException;
+import software.amazon.awssdk.services.dynamodb.model.ScanRequest;
+import software.amazon.awssdk.services.dynamodb.model.ScanResponse;
+import software.amazon.awssdk.services.dynamodb.model.WriteRequest;
+
+/**
+ * {@link PTransform}s to read/write from/to https://aws.amazon.com/dynamodb/;>DynamoDB.
+ *
+ * Writing to DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * PCollection data = ...;
+ * data.apply(
+ *   DynamoDBIO.write()
+ *   .withWriteRequestMapperFn(
+ *   (SerializableFunction>)
+ *   //Transforming your T data into KV
+ *   t -> KV.of(tableName, writeRequest))
+ *   .withRetryConfiguration(
+ *DynamoDBIO.RetryConfiguration.create(5, 
Duration.standardMinutes(1)))
+ *   .withAwsClientsProvider(new BasicSnsProvider(accessKey, 
secretKey, region));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   Retry configuration
+ *   Specify AwsClientsProvider. You can pass on the default one 
BasicSnsProvider
+ *   Mapper function with a table name to map or transform your object 
into KV
+ * 
+ *
+ * Reading from DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * PCollection>> output =
+ * pipeline.apply(
+ *

[jira] [Work logged] (BEAM-7557) Migrate DynamoDBIO to AWS SDK for Java 2

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7557?focusedWorklogId=272208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272208
 ]

ASF GitHub Bot logged work on BEAM-7557:


Author: ASF GitHub Bot
Created on: 04/Jul/19 14:00
Start Date: 04/Jul/19 14:00
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #8987: 
[BEAM-7557] - Migrate DynamoDBIO to AWS SDK for Java 2
URL: https://github.com/apache/beam/pull/8987#discussion_r300411997
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java
 ##
 @@ -0,0 +1,533 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws2.dynamodb;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.ListCoder;
+import org.apache.beam.sdk.coders.MapCoder;
+import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.Reshuffle;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TypeDescriptor;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
+import org.apache.http.HttpStatus;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
+import software.amazon.awssdk.services.dynamodb.model.AttributeValue;
+import software.amazon.awssdk.services.dynamodb.model.BatchWriteItemRequest;
+import software.amazon.awssdk.services.dynamodb.model.DynamoDbException;
+import software.amazon.awssdk.services.dynamodb.model.ScanRequest;
+import software.amazon.awssdk.services.dynamodb.model.ScanResponse;
+import software.amazon.awssdk.services.dynamodb.model.WriteRequest;
+
+/**
+ * {@link PTransform}s to read/write from/to https://aws.amazon.com/dynamodb/;>DynamoDB.
+ *
+ * Writing to DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * PCollection data = ...;
+ * data.apply(
+ *   DynamoDBIO.write()
+ *   .withWriteRequestMapperFn(
+ *   (SerializableFunction>)
+ *   //Transforming your T data into KV
+ *   t -> KV.of(tableName, writeRequest))
+ *   .withRetryConfiguration(
+ *DynamoDBIO.RetryConfiguration.create(5, 
Duration.standardMinutes(1)))
+ *   .withAwsClientsProvider(new BasicSnsProvider(accessKey, 
secretKey, region));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   Retry configuration
+ *   Specify AwsClientsProvider. You can pass on the default one 
BasicSnsProvider
+ *   Mapper function with a table name to map or transform your object 
into KV
+ * 
+ *
+ * Reading from DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * PCollection>> output =
+ * pipeline.apply(
+ *

[jira] [Assigned] (BEAM-3759) Add support for PaneInfo descriptor in Python SDK

2019-07-04 Thread Tanay Tummalapalli (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanay Tummalapalli reassigned BEAM-3759:


Assignee: Tanay Tummalapalli

> Add support for PaneInfo descriptor in Python SDK
> -
>
> Key: BEAM-3759
> URL: https://issues.apache.org/jira/browse/BEAM-3759
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.3.0
>Reporter: Charles Chen
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The PaneInfo descriptor allows a user to determine which particular 
> triggering emitted a value.  This allows the user to differentiate between 
> speculative (early), on-time (at end of window) and late value emissions 
> coming out of a GroupByKey.  We should add support for this feature in the 
> Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-07-04 Thread Alexey Romanenko (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878643#comment-16878643
 ] 

Alexey Romanenko commented on BEAM-7589:


Agree with [~iemejia] that it would be better to test it in advance if it's 
possible, before starting a release process.

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7673) README.md doesn't mention that Docker is required for build

2019-07-04 Thread elharo (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878638#comment-16878638
 ] 

elharo commented on BEAM-7673:
--

So far I have not succeeded in successfully building this project, so I'm 
hesitant to suggest a fix that might or might not work.

> README.md doesn't mention that Docker is required for build
> ---
>
> Key: BEAM-7673
> URL: https://issues.apache.org/jira/browse/BEAM-7673
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Minor
>
> README.md on github says:
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
>  openjdk-8-jdk \
>  python-setuptools \
>  python-pip \
>  virtualenv
> That's correct as far as it goes, but it's incomplete. .gradlew will fail in 
> task ':website:buildDockerImage' unless docker is installed and running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=272186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272186
 ]

ASF GitHub Bot logged work on BEAM-7437:


Author: ASF GitHub Bot
Created on: 04/Jul/19 12:30
Start Date: 04/Jul/19 12:30
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8934: [BEAM-7437] Add 
streaming flag to BQ streaming inserts IT test
URL: https://github.com/apache/beam/pull/8934#issuecomment-508465544
 
 
   Thanks @udim!
   I thought they were flaky. I'll try again later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272186)
Time Spent: 5h 10m  (was: 5h)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with UnboundedSource

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=272166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272166
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:53
Start Date: 04/Jul/19 11:53
Worklog Time Spent: 10m 
  Work Description: zouabimourad commented on issue #8757: [BEAM-7427] Fix 
JmsCheckpointMark Avro Encoding
URL: https://github.com/apache/beam/pull/8757#issuecomment-508455106
 
 
   > IMHO, as original author of this IO, I would refactor the checkpoint mark 
to use `JmsRecord` instead of `Message` and ack on this. It would straight 
forward without wrapping.
   > @zouabimourad let me know if you want me to do the change or you wanna try.
   
   If you have better fix .. please commit it
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272166)
Time Spent: 1h  (was: 50m)

> JmsCheckpointMark Avro Serialisation issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5967) ProtoCoder doesn't support DynamicMessage

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5967?focusedWorklogId=272161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272161
 ]

ASF GitHub Bot logged work on BEAM-5967:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:46
Start Date: 04/Jul/19 11:46
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #8496: [BEAM-5967] Add 
handling of DynamicMessage in ProtoCoder
URL: https://github.com/apache/beam/pull/8496#issuecomment-508453493
 
 
   Well, as I'm sharing some code with the Protobuf Schema support, I like to
   keep this one on hold a bit, if thats ok.
   
_/
   _/ Alex Van Boxel
   
   
   On Thu, Jul 4, 2019 at 1:29 PM Maximilian Michels 
   wrote:
   
   > @kennknowles  @reuvenlax
   >  Should this go through another review
   > round?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272161)
Time Spent: 3h 40m  (was: 3.5h)

> ProtoCoder doesn't support DynamicMessage
> -
>
> Key: BEAM-5967
> URL: https://issues.apache.org/jira/browse/BEAM-5967
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.8.0
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The ProtoCoder does make some assumptions about static messages being 
> available. The DynamicMessage doesn't have some of them, mainly because the 
> proto schema is defined at runtime and not at compile time.
> Does it make sense to make a special coder for DynamicMessage or build it 
> into the normal ProtoCoder.
> Here is an example of the assumtion being made in the current Codec:
> {code:java}
> try {
>   @SuppressWarnings("unchecked")
>   T protoMessageInstance = (T) 
> protoMessageClass.getMethod("getDefaultInstance").invoke(null);
>   @SuppressWarnings("unchecked")
>   Parser tParser = (Parser) protoMessageInstance.getParserForType();
>   memoizedParser = tParser;
> } catch (IllegalAccessException | InvocationTargetException | 
> NoSuchMethodException e) {
>   throw new IllegalArgumentException(e);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7550) Implement missing pipeline parameters in ParDo Load Test

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7550?focusedWorklogId=272159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272159
 ]

ASF GitHub Bot logged work on BEAM-7550:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:44
Start Date: 04/Jul/19 11:44
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #8847: [BEAM-7550] 
Missing pipeline parameters in ParDo Load Test
URL: https://github.com/apache/beam/pull/8847
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272159)
Time Spent: 4h 10m  (was: 4h)

> Implement missing pipeline parameters in ParDo Load Test
> 
>
> Key: BEAM-7550
> URL: https://issues.apache.org/jira/browse/BEAM-7550
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Without some pipeline parameters in ParDo Load Test in Python, it is 
> impossible to create all required test cases (see proposal: 
> [https://s.apache.org/load-test-basic-operations]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-7544) [workaround available] Please provide a build against scala 2.12 for Flink runner

2019-07-04 Thread Maximilian Michels (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved BEAM-7544.
--
   Resolution: Workaround
Fix Version/s: Not applicable

I'm resolving this because a sensible workaround is available. Most users won't 
need to worry about the Scala version anyways.

> [workaround available] Please provide a build against scala 2.12 for Flink 
> runner
> -
>
> Key: BEAM-7544
> URL: https://issues.apache.org/jira/browse/BEAM-7544
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 2.11.0, 2.13.0
> Environment: scio 0.7.4 + scala 2.12.8
>Reporter: Cyrille Chépélov
>Priority: Minor
> Fix For: Not applicable
>
>
> Flink supports scala 2.12 since version 1.7, while BEAM uses Flink 1.8.
> It would be useful to begin supporting scala 2.12 as a preparation towards 
> scala 2.13 as well as soon as Flink supports it
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7420) Including the Flink runner causes exceptions unless running in a flink environment

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7420?focusedWorklogId=272150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272150
 ]

ASF GitHub Bot logged work on BEAM-7420:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:34
Start Date: 04/Jul/19 11:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8894: [BEAM-7420]: Allow 
including the flink runner without flink on the classpath.
URL: https://github.com/apache/beam/pull/8894#discussion_r300358988
 
 

 ##
 File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineOptionsTest.java
 ##
 @@ -77,7 +75,7 @@ public void testDefaults() {
 assertThat(options.getLatencyTrackingInterval(), is(0L));
 assertThat(options.isShutdownSourcesOnFinalWatermark(), is(false));
 assertThat(options.getObjectReuse(), is(false));
-assertThat(options.getCheckpointingMode(), 
is(CheckpointingMode.EXACTLY_ONCE));
+assertThat(options.getCheckpointingMode(), is("EXACTLY_ONCE"));
 
 Review comment:
   ```suggestion
   assertThat(options.getCheckpointingMode(), 
is(CheckpointingMode.EXACTLY_ONCE.name()));
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272150)
Time Spent: 40m  (was: 0.5h)

> Including the Flink runner causes exceptions unless running in a flink 
> environment
> --
>
> Key: BEAM-7420
> URL: https://issues.apache.org/jira/browse/BEAM-7420
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mike Kaplinskiy
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The {{FlinkPipelineOptions}} imports various Flink enums which make it 
> impossible to e.g. run the direct runner with the same classpath but without 
> the flink runtime. The fix is potentially easy - make the arguments strings 
> and convert them to enums at the callsites.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7420) Including the Flink runner causes exceptions unless running in a flink environment

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7420?focusedWorklogId=272149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272149
 ]

ASF GitHub Bot logged work on BEAM-7420:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:34
Start Date: 04/Jul/19 11:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8894: [BEAM-7420]: Allow 
including the flink runner without flink on the classpath.
URL: https://github.com/apache/beam/pull/8894#discussion_r300358891
 
 

 ##
 File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkPipelineOptionsTest.java
 ##
 @@ -88,7 +86,7 @@ public void testDefaults() {
 assertThat(options.getStateBackend(), is(nullValue()));
 assertThat(options.getMaxBundleSize(), is(1000L));
 assertThat(options.getMaxBundleTimeMills(), is(1000L));
-assertThat(options.getExecutionModeForBatch(), 
is(ExecutionMode.PIPELINED));
+assertThat(options.getExecutionModeForBatch(), is("PIPELINED"));
 
 Review comment:
   ```suggestion
   assertThat(options.getExecutionModeForBatch(), 
is(ExecutionMode.PIPELINED.name()));
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272149)
Time Spent: 40m  (was: 0.5h)

> Including the Flink runner causes exceptions unless running in a flink 
> environment
> --
>
> Key: BEAM-7420
> URL: https://issues.apache.org/jira/browse/BEAM-7420
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mike Kaplinskiy
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The {{FlinkPipelineOptions}} imports various Flink enums which make it 
> impossible to e.g. run the direct runner with the same classpath but without 
> the flink runtime. The fix is potentially easy - make the arguments strings 
> and convert them to enums at the callsites.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5967) ProtoCoder doesn't support DynamicMessage

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5967?focusedWorklogId=272146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272146
 ]

ASF GitHub Bot logged work on BEAM-5967:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:29
Start Date: 04/Jul/19 11:29
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #8496: [BEAM-5967] Add handling 
of DynamicMessage in ProtoCoder
URL: https://github.com/apache/beam/pull/8496#issuecomment-508448961
 
 
   @kennknowles @reuvenlax Should this go through another review round?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272146)
Time Spent: 3.5h  (was: 3h 20m)

> ProtoCoder doesn't support DynamicMessage
> -
>
> Key: BEAM-5967
> URL: https://issues.apache.org/jira/browse/BEAM-5967
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.8.0
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The ProtoCoder does make some assumptions about static messages being 
> available. The DynamicMessage doesn't have some of them, mainly because the 
> proto schema is defined at runtime and not at compile time.
> Does it make sense to make a special coder for DynamicMessage or build it 
> into the normal ProtoCoder.
> Here is an example of the assumtion being made in the current Codec:
> {code:java}
> try {
>   @SuppressWarnings("unchecked")
>   T protoMessageInstance = (T) 
> protoMessageClass.getMethod("getDefaultInstance").invoke(null);
>   @SuppressWarnings("unchecked")
>   Parser tParser = (Parser) protoMessageInstance.getParserForType();
>   memoizedParser = tParser;
> } catch (IllegalAccessException | InvocationTargetException | 
> NoSuchMethodException e) {
>   throw new IllegalArgumentException(e);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7577) Allow the use of ValueProviders in datastore.v1new.datastoreio.ReadFromDatastore query

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7577?focusedWorklogId=272140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272140
 ]

ASF GitHub Bot logged work on BEAM-7577:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:16
Start Date: 04/Jul/19 11:16
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #8950: [BEAM-7577] 
Allow ValueProviders in Datastore Query filters
URL: https://github.com/apache/beam/pull/8950#discussion_r300351754
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/datastore/v1new/types.py
 ##
 @@ -84,6 +89,30 @@ def _to_client_query(self, client):
   def clone(self):
 return copy.copy(self)
 
+  def _set_runtime_filters(self):
+"""
+Extracts values from ValueProviders in `self.filters` if available
+:param filters: sequence of tuple[str, str, str] or
+sequence of tuple[ValueProvider, ValueProvider, ValueProvider]
 
 Review comment:
   Been thinking more about this today and have a question regarding 
ValueProviders.
   
   Say I have a GCP Cloud Function that starts my Dataflow job using the 
template I've staged. This function can pass parameters to the template that 
would specify a runtime filter that I want to apply when I ReadFromDatastore.
   
   When executing the template by using an API call, I cannot provide a fully 
qualified ValueProvider object as a parameter, I can only use strings as 
parameters. This was the reason I originally used `sequence of 
tuple[ValueProvider, ValueProvider, ValueProvider]`.
   
   When using a `ValueProvider[tuple[str, str, str]]`, I'm not sure if there is 
a clean way (other than ast.literal_eval but we should avoid that) to 
instantiate it using strings received as arguments to the Dataflow template.
   
   I might be misunderstanding exactly how ValueProviders work, but is it 
possible to instantiate a `ValueProvider[tuple[str, str, str]]` where the 
strings in the filter would be determined at runtime, say e.g. through an API 
call?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272140)
Time Spent: 4h 10m  (was: 4h)

> Allow the use of ValueProviders in 
> datastore.v1new.datastoreio.ReadFromDatastore query
> --
>
> Key: BEAM-7577
> URL: https://issues.apache.org/jira/browse/BEAM-7577
> Project: Beam
>  Issue Type: New Feature
>  Components: io-python-gcp
>Affects Versions: 2.13.0
>Reporter: EDjur
>Assignee: EDjur
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The current implementation of ReadFromDatastore does not support specifying 
> the query parameter at runtime. This could potentially be fixed through the 
> usage of a ValueProvider to specify and build the Datastore query.
> Allowing specifying the query at runtime makes it easier to use dynamic 
> queries in Dataflow templates. Currently, there is no way to have a Dataflow 
> template that includes a dynamic query (such as filtering by a timestamp or 
> similar).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5440) Add option to mount a directory inside SDK harness containers

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5440?focusedWorklogId=272136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272136
 ]

ASF GitHub Bot logged work on BEAM-5440:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:14
Start Date: 04/Jul/19 11:14
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8982: [BEAM-5440] Pass 
docker run options to SDK harness containers
URL: https://github.com/apache/beam/pull/8982#discussion_r300351156
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -136,6 +138,10 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   dockerArgsBuilder.add("--rm");
 }
 
+if (!dockerOptions.isEmpty()) {
+  dockerArgsBuilder.addAll(Arrays.asList(dockerOptions.split("\\s+")));
 
 Review comment:
   Can we make the options JSON encoded?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272136)
Time Spent: 0.5h  (was: 20m)

> Add option to mount a directory inside SDK harness containers
> -
>
> Key: BEAM-5440
> URL: https://issues.apache.org/jira/browse/BEAM-5440
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-core
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While experimenting with the Python SDK locally, I found it inconvenient that 
> I can't mount a host directory to the Docker containers, i.e. the input must 
> already be in the container and the results of a Write remain inside the 
> container. For local testing, users may want to mount a host directory.
> Since BEAM-5288 the {{Environment}} carries explicit environment information, 
> we could a) add volume args to the {{DockerPayload}}, or b) provide a general 
> Docker arguments field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5440) Add option to mount a directory inside SDK harness containers

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5440?focusedWorklogId=272137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272137
 ]

ASF GitHub Bot logged work on BEAM-5440:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:14
Start Date: 04/Jul/19 11:14
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8982: [BEAM-5440] Pass 
docker run options to SDK harness containers
URL: https://github.com/apache/beam/pull/8982#discussion_r300350057
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1235,26 +1235,32 @@ class DockerSdkWorkerHandler(GrpcWorkerHandler):
   def __init__(self, payload, state, provision_info):
 super(DockerSdkWorkerHandler, self).__init__(state, provision_info)
 self._container_image = payload.container_image
+self._options = payload.options
 self._container_id = None
 
   def start_worker(self):
 try:
   subprocess.check_call(['docker', 'pull', self._container_image])
 except Exception:
   logging.info('Unable to pull image %s' % self._container_image)
-self._container_id = subprocess.check_output(
-['docker',
- 'run',
- '-d',
- # TODO:  credentials
- '--network=host',
- self._container_image,
- '--id=%s' % uuid.uuid4(),
- '--logging_endpoint=%s' % self.logging_api_service_descriptor().url,
- '--control_endpoint=%s' % self.control_address,
- '--artifact_endpoint=%s' % self.control_address,
- '--provision_endpoint=%s' % self.control_address,
-]).strip()
+cmd = [
+  'docker',
+  'run',
+  '-d',
+  # TODO:  credentials
+  '--network=host',
+]
+if self._options:
+  cmd.extend(self._options.split(' '))
+cmd.extend([
+  self._container_image,
+  '--id=%s' % uuid.uuid4(),
+  '--logging_endpoint=%s' % self.logging_api_service_descriptor().url,
+  '--control_endpoint=%s' % self.control_address,
+  '--artifact_endpoint=%s' % self.control_address,
+  '--provision_endpoint=%s' % self.control_address,
+])
+self._container_id = subprocess.check_output(cmd).strip()
 
 Review comment:
   As far as I can see this is just reformating? Please consider reverting 
since this is unrelated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272137)
Time Spent: 40m  (was: 0.5h)

> Add option to mount a directory inside SDK harness containers
> -
>
> Key: BEAM-5440
> URL: https://issues.apache.org/jira/browse/BEAM-5440
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-core
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While experimenting with the Python SDK locally, I found it inconvenient that 
> I can't mount a host directory to the Docker containers, i.e. the input must 
> already be in the container and the results of a Write remain inside the 
> container. For local testing, users may want to mount a host directory.
> Since BEAM-5288 the {{Environment}} carries explicit environment information, 
> we could a) add volume args to the {{DockerPayload}}, or b) provide a general 
> Docker arguments field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5440) Add option to mount a directory inside SDK harness containers

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-5440?focusedWorklogId=272138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272138
 ]

ASF GitHub Bot logged work on BEAM-5440:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:14
Start Date: 04/Jul/19 11:14
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8982: [BEAM-5440] Pass 
docker run options to SDK harness containers
URL: https://github.com/apache/beam/pull/8982#discussion_r300351520
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -128,13 +128,17 @@ def _create_environment(options):
 'Unknown environment type: %s' % portable_options.environment_type)
 
 if environment_urn == common_urns.environments.DOCKER.urn:
-  docker_image = (
-  portable_options.environment_config
-  or PortableRunner.default_docker_image())
+  if not portable_options.environment_config:
+docker_image = PortableRunner.default_docker_image()
+docker_options = ''
+  else:
+docker_options, _, docker_image = portable_options.environment_config \
+  .strip().rpartition(' ')
 
 Review comment:
   I believe this should decode a JSON config string here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272138)
Time Spent: 50m  (was: 40m)

> Add option to mount a directory inside SDK harness containers
> -
>
> Key: BEAM-5440
> URL: https://issues.apache.org/jira/browse/BEAM-5440
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-core
>Reporter: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While experimenting with the Python SDK locally, I found it inconvenient that 
> I can't mount a host directory to the Docker containers, i.e. the input must 
> already be in the container and the results of a Write remain inside the 
> container. For local testing, users may want to mount a host directory.
> Since BEAM-5288 the {{Environment}} carries explicit environment information, 
> we could a) add volume args to the {{DockerPayload}}, or b) provide a general 
> Docker arguments field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7577) Allow the use of ValueProviders in datastore.v1new.datastoreio.ReadFromDatastore query

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7577?focusedWorklogId=272135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272135
 ]

ASF GitHub Bot logged work on BEAM-7577:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:12
Start Date: 04/Jul/19 11:12
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #8950: [BEAM-7577] 
Allow ValueProviders in Datastore Query filters
URL: https://github.com/apache/beam/pull/8950#discussion_r300351754
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/datastore/v1new/types.py
 ##
 @@ -84,6 +89,30 @@ def _to_client_query(self, client):
   def clone(self):
 return copy.copy(self)
 
+  def _set_runtime_filters(self):
+"""
+Extracts values from ValueProviders in `self.filters` if available
+:param filters: sequence of tuple[str, str, str] or
+sequence of tuple[ValueProvider, ValueProvider, ValueProvider]
 
 Review comment:
   Been thinking more about this today and have a question regarding 
ValueProviders.
   
   Say I have a GCP Cloud Function that starts my Dataflow job using the 
template I've staged. This function can pass parameters to the template that 
would specify a runtime filter that I want to apply when I ReadFromDatastore.
   
   When executing the template by using an API call, I cannot provide a fully 
qualified ValueProvider object as a parameter, I can only use strings as 
parameters. This was the reason I originally used `sequence of 
tuple[ValueProvider, ValueProvider, ValueProvider]`.
   
   When using a `ValueProvider[tuple[str, str, str]]`, I'm not sure if there is 
a clean way to instantiate it using strings received as arguments to the 
Dataflow template.
   
   I might be misunderstanding exactly how ValueProviders work, but is it 
possible to instantiate a `ValueProvider[tuple[str, str, str]]` where the 
strings in the filter would be determined at runtime, say e.g. through an API 
call?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272135)
Time Spent: 4h  (was: 3h 50m)

> Allow the use of ValueProviders in 
> datastore.v1new.datastoreio.ReadFromDatastore query
> --
>
> Key: BEAM-7577
> URL: https://issues.apache.org/jira/browse/BEAM-7577
> Project: Beam
>  Issue Type: New Feature
>  Components: io-python-gcp
>Affects Versions: 2.13.0
>Reporter: EDjur
>Assignee: EDjur
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The current implementation of ReadFromDatastore does not support specifying 
> the query parameter at runtime. This could potentially be fixed through the 
> usage of a ValueProvider to specify and build the Datastore query.
> Allowing specifying the query at runtime makes it easier to use dynamic 
> queries in Dataflow templates. Currently, there is no way to have a Dataflow 
> template that includes a dynamic query (such as filtering by a timestamp or 
> similar).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7534) add --mountTempDir option for easier data sharing with Docker container

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7534?focusedWorklogId=272132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272132
 ]

ASF GitHub Bot logged work on BEAM-7534:


Author: ASF GitHub Bot
Created on: 04/Jul/19 11:05
Start Date: 04/Jul/19 11:05
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8828: [BEAM-7534] add 
--mountTempDir option for Docker container
URL: https://github.com/apache/beam/pull/8828#discussion_r300342413
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/ManualDockerEnvironmentOptions.java
 ##
 @@ -32,6 +32,14 @@
 
   void setRetainDockerContainers(boolean retainDockerContainers);
 
+  @Description(
+  "Mount temporary directory to Docker container instance. "
+  + "The parameter is a path relative to the system temporary 
directory "
+  + "e.g. '--mountTempDir foo' implies '/tmp/foo' on Linux.")
+  String getMountTempDir();
+
+  void setMountTempDir(String mountTempDir);
 
 Review comment:
   I wonder, should this be part of the environment config? The same could also 
be said for the other options here. It is a bit odd that those options won't do 
anything if a different environment is used.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272132)
Time Spent: 2h 50m  (was: 2h 40m)

> add --mountTempDir option for easier data sharing with Docker container
> ---
>
> Key: BEAM-7534
> URL: https://issues.apache.org/jira/browse/BEAM-7534
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> add --mountTempDir option for easier data sharing with Docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (BEAM-7662) Create Combine Python Load Test Jenkins Job [Flink]

2019-07-04 Thread Kamil Wasilewski (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7662 started by Kamil Wasilewski.
--
> Create Combine Python Load Test Jenkins Job [Flink]
> ---
>
> Key: BEAM-7662
> URL: https://issues.apache.org/jira/browse/BEAM-7662
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-7662) Create Combine Python Load Test Jenkins Job [Flink]

2019-07-04 Thread Kamil Wasilewski (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-7662:
--

Assignee: Kamil Wasilewski

> Create Combine Python Load Test Jenkins Job [Flink]
> ---
>
> Key: BEAM-7662
> URL: https://issues.apache.org/jira/browse/BEAM-7662
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-07-04 Thread Shehzaad Nakhoda (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6694 started by Shehzaad Nakhoda.
--
> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (BEAM-7018) Regex transform for Python SDK

2019-07-04 Thread Shehzaad Nakhoda (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7018 started by Shehzaad Nakhoda.
--
> Regex transform for Python SDK
> --
>
> Key: BEAM-7018
> URL: https://issues.apache.org/jira/browse/BEAM-7018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> PTransorms to use Regular Expressions to process elements in a PCollection
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-07-04 Thread Shehzaad Nakhoda (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6694:
--

Assignee: (was: Shehzaad Nakhoda)

> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (BEAM-6756) Support lazy iterables in schemas

2019-07-04 Thread Shehzaad Nakhoda (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6756 started by Shehzaad Nakhoda.
--
> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-881) Provide a PTransform in IOs providing a "standard" Avro IndexedRecord

2019-07-04 Thread Ryan Skraba (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878502#comment-16878502
 ] 

Ryan Skraba commented on BEAM-881:
--

[~jbonofre] – Two years later, it seems obvious that 
org.apache.beam.sdk.values.Row ("immutable tuple-like schema to represent one 
element in a PCollection") is the gold standard that meets all of the 
requirements in this feature!  What do you think?

> Provide a PTransform in IOs providing a "standard" Avro IndexedRecord
> -
>
> Key: BEAM-881
> URL: https://issues.apache.org/jira/browse/BEAM-881
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>
> Now, each IO is using a different data format. For instance, the 
> {{JmsIO.Read}} provides a {{PCollection}} of {{JmsRecord}} (and 
> {{JmsIO.Write}} expects also a {{JmsRecord}}), {{KafkaIO.Read}} provides a 
> {{PCollection}} of {{KafkaRecord}}.
> It could appear a bit "complex" for users to manipulate such kind of data 
> format: some users may expect kind of standard format.
> Without modifying the existing IO, we could add a {{PTransform}} (as part of 
> the IO) that an user can optionally use. This transform will convert the IO 
> data format (let say {{JmsRecord}} for instance) to a standard Avro 
> {{IndexedRecord}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with UnboundedSource

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=272103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272103
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 04/Jul/19 09:44
Start Date: 04/Jul/19 09:44
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #8757: [BEAM-7427] Fix 
JmsCheckpointMark Avro Encoding
URL: https://github.com/apache/beam/pull/8757#issuecomment-508416823
 
 
   Why not just changing the coder of the checkpoint ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272103)
Time Spent: 50m  (was: 40m)

> JmsCheckpointMark Avro Serialisation issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7550) Implement missing pipeline parameters in ParDo Load Test

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7550?focusedWorklogId=272102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272102
 ]

ASF GitHub Bot logged work on BEAM-7550:


Author: ASF GitHub Bot
Created on: 04/Jul/19 09:43
Start Date: 04/Jul/19 09:43
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8847: [BEAM-7550] Missing 
pipeline parameters in ParDo Load Test
URL: https://github.com/apache/beam/pull/8847#issuecomment-508417431
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272102)
Time Spent: 4h  (was: 3h 50m)

> Implement missing pipeline parameters in ParDo Load Test
> 
>
> Key: BEAM-7550
> URL: https://issues.apache.org/jira/browse/BEAM-7550
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Without some pipeline parameters in ParDo Load Test in Python, it is 
> impossible to create all required test cases (see proposal: 
> [https://s.apache.org/load-test-basic-operations]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-7682) Combine.GroupedValues javadoc code snippet does not work

2019-07-04 Thread Alexey Romanenko (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-7682.

   Resolution: Fixed
Fix Version/s: 2.15.0

> Combine.GroupedValues javadoc code snippet does not work
> 
>
> Key: BEAM-7682
> URL: https://issues.apache.org/jira/browse/BEAM-7682
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc
> Fix For: 2.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The snippet in the javadoc of GroupedValues says:
> {code:java}
> PCollection> pc = ...;
> PCollection>> groupedByKey = pc.apply( new 
> GroupByKey());
> PCollection> sumByKey = groupedByKey.apply( 
> Combine.groupedValues( new Sum.SumIntegerFn()));
> {code}
> but should be:
> {code:java}
> PCollection> pc = ...;
> PCollection>> groupedByKey = 
> pc.apply(GroupByKey.create());
> PCollection> sumByKey = 
> groupedByKey.apply(Combine.groupedValues(Sum.ofIntegers()));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7682) Combine.GroupedValues javadoc code snippet does not work

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7682?focusedWorklogId=272100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272100
 ]

ASF GitHub Bot logged work on BEAM-7682:


Author: ASF GitHub Bot
Created on: 04/Jul/19 09:41
Start Date: 04/Jul/19 09:41
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8995: 
[BEAM-7682] Fix Combine.GroupedValues javadoc code snippet
URL: https://github.com/apache/beam/pull/8995
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272100)
Time Spent: 40m  (was: 0.5h)

> Combine.GroupedValues javadoc code snippet does not work
> 
>
> Key: BEAM-7682
> URL: https://issues.apache.org/jira/browse/BEAM-7682
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The snippet in the javadoc of GroupedValues says:
> {code:java}
> PCollection> pc = ...;
> PCollection>> groupedByKey = pc.apply( new 
> GroupByKey());
> PCollection> sumByKey = groupedByKey.apply( 
> Combine.groupedValues( new Sum.SumIntegerFn()));
> {code}
> but should be:
> {code:java}
> PCollection> pc = ...;
> PCollection>> groupedByKey = 
> pc.apply(GroupByKey.create());
> PCollection> sumByKey = 
> groupedByKey.apply(Combine.groupedValues(Sum.ofIntegers()));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with UnboundedSource

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=272101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272101
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 04/Jul/19 09:41
Start Date: 04/Jul/19 09:41
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #8757: [BEAM-7427] Fix 
JmsCheckpointMark Avro Encoding
URL: https://github.com/apache/beam/pull/8757#issuecomment-508416823
 
 
   Just to the rationale, what's the point about avro coder for checkpoint mark 
? It's serializable already. The element of the PCollection should be avro 
compliant, but I don't see the point for checkpoint.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272101)
Time Spent: 40m  (was: 0.5h)

> JmsCheckpointMark Avro Serialisation issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7683) MongoDBIO.withQueryFn ignores custom filter

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7683?focusedWorklogId=272071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272071
 ]

ASF GitHub Bot logged work on BEAM-7683:


Author: ASF GitHub Bot
Created on: 04/Jul/19 09:18
Start Date: 04/Jul/19 09:18
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9004: [BEAM-7683] 
MongoDBIO.withQueryFn ignores custom filter
URL: https://github.com/apache/beam/pull/9004
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272071)
Time Spent: 20m  (was: 10m)

> MongoDBIO.withQueryFn ignores custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Assignee: Chaim
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7691) UnboundedSource.CheckpointMark should mention that implementations should be Serializable or have have an associated Coder

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7691:
---
Labels: documentation javadoc newbie starter  (was: documentation javadoc)

> UnboundedSource.CheckpointMark should mention that implementations should be 
> Serializable or have have an associated Coder
> --
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7691) UnboundedSource.CheckpointMark should mention that implementations should be Serializable or have have an associated Coder

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7691:
---
Labels: documentation javadoc  (was: )

> UnboundedSource.CheckpointMark should mention that implementations should be 
> Serializable or have have an associated Coder
> --
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7691) UnboundedSource.CheckpointMark should mention that implementations should be Serializable or have have an associated Coder

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878456#comment-16878456
 ] 

Ismaël Mejía commented on BEAM-7691:


Probably add doc on this to the PTransform style guide and the fact that this 
should be included in the tests too is a good idea.

> UnboundedSource.CheckpointMark should mention that implementations should be 
> Serializable or have have an associated Coder
> --
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7691) UnboundedSource.CheckpointMark should mention that implementations should be Serializable or have have an associated Coder

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7691:
---
Status: Open  (was: Triage Needed)

> UnboundedSource.CheckpointMark should mention that implementations should be 
> Serializable or have have an associated Coder
> --
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-7691) UnboundedSource.CheckpointMark should mention that implementations should be Serializable or have have an associated Coder

2019-07-04 Thread JIRA

Ismaël Mejía created BEAM-7691:
--

 Summary: UnboundedSource.CheckpointMark should mention that 
implementations should be Serializable or have have an associated Coder
 Key: BEAM-7691
 URL: https://issues.apache.org/jira/browse/BEAM-7691
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Ismaël Mejía






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with UnboundedSource

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=272060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272060
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 04/Jul/19 08:50
Start Date: 04/Jul/19 08:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8757: [BEAM-7427] 
Fix JmsCheckpointMark Avro Encoding
URL: https://github.com/apache/beam/pull/8757#discussion_r300294831
 
 

 ##
 File path: 
sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java
 ##
 @@ -333,6 +335,13 @@ public void testCheckpointMark() throws Exception {
 assertEquals(0, count(QUEUE));
   }
 
+  @Test
+  public void testJmsCheckpointMarkAvroEncoding() {
+AvroCoder avroCoder = 
AvroCoder.of(JmsCheckpointMark.class);
+
+assertNotNull(avroCoder);
 
 Review comment:
   Can you please add a test here that validates that we can encode and decode 
an instance of  `JmsCheckpointMark` correctly. For example by using: 
`CoderProperties.coderDecodeEncodeEqual(avroCoder, checkpointMark);` or 
`CoderProperties.structuralValueDecodeEncodeEqual(avroCoder, checkpointMark);`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272060)
Time Spent: 0.5h  (was: 20m)

> JmsCheckpointMark Avro Serialisation issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with UnboundedSource

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7427:
---
Summary: JmsCheckpointMark Avro Serialisation issue with UnboundedSource  
(was: JmsCheckpointMark Avro Serialisation issue with unbounded Source)

> JmsCheckpointMark Avro Serialisation issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7427) JmsCheckpointMark Avro Serialisation issue with unbounded Source

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7427:
---
Summary: JmsCheckpointMark Avro Serialisation issue with unbounded Source  
(was: JmsCheckpointMark AVRO Serialisation issue with unbounded Source)

> JmsCheckpointMark Avro Serialisation issue with unbounded Source
> 
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7683?focusedWorklogId=272049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272049
 ]

ASF GitHub Bot logged work on BEAM-7683:


Author: ASF GitHub Bot
Created on: 04/Jul/19 08:19
Start Date: 04/Jul/19 08:19
Worklog Time Spent: 10m 
  Work Description: chaimt commented on pull request #9004: [BEAM-7683] - 
fix withQueryFn when split is more than 0
URL: https://github.com/apache/beam/pull/9004
 
 
   in the case of split more than 0, this user definied filter was overridded 
by the split filter
   
   @jbonofre
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
   
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build

[jira] [Assigned] (BEAM-7427) JmsCheckpointMark AVRO Serialisation issue with unbounded Source

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7427:
--

Assignee: Mourad

> JmsCheckpointMark AVRO Serialisation issue with unbounded Source
> 
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-4087?focusedWorklogId=272047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272047
 ]

ASF GitHub Bot logged work on BEAM-4087:


Author: ASF GitHub Bot
Created on: 04/Jul/19 08:12
Start Date: 04/Jul/19 08:12
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8255: [BEAM-4087] implement 
configurable dependency versions
URL: https://github.com/apache/beam/pull/8255#issuecomment-508386323
 
 
   @adude3141 sorry for having unproritized this subject, but I really care 
about it, it was just there was too much stuff ongoing and could not look at 
it. Do you think we should keep this open, or take the approach you mention in 
the other PR, or any other new idea?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272047)
Time Spent: 4h 10m  (was: 4h)

> Gradle build does not allow to overwrite versions of provided dependencies
> --
>
> Key: BEAM-4087
> URL: https://issues.apache.org/jira/browse/BEAM-4087
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> In order to test modules with provided dependencies in maven we can execute 
> for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 
> -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this 
> with gradle because the version of the dependencies are defined locally and 
> not in the gradle.properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878423#comment-16878423
 ] 

Ismaël Mejía commented on BEAM-7589:


Mmm it might be also that the Dataflow worker has not been updated yet. Maybe 
[~kedin] or [~pabloem] may help us confirm if this is the case and if there is 
a way Brachi can test this in advance.

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7673) README.md doesn't mention that Docker is required for build

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878415#comment-16878415
 ] 

Ismaël Mejía commented on BEAM-7673:


Would you be interested on providing a PR to fix this one [~elharo] ?

> README.md doesn't mention that Docker is required for build
> ---
>
> Key: BEAM-7673
> URL: https://issues.apache.org/jira/browse/BEAM-7673
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Minor
>
> README.md on github says:
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
>  openjdk-8-jdk \
>  python-setuptools \
>  python-pip \
>  virtualenv
> That's correct as far as it goes, but it's incomplete. .gradlew will fail in 
> task ':website:buildDockerImage' unless docker is installed and running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7673) README.md doesn't mention that Docker is required for build

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7673:
---
Description: 
README.md on github says:

If you'd like to build and install the whole project from the source 
distribution, you may need some additional tools installed in your system. In a 
Debian-based distribution:

sudo apt-get install \
 openjdk-8-jdk \
 python-setuptools \
 python-pip \
 virtualenv

That's correct as far as it goes, but it's incomplete. .gradlew will fail in 
task ':website:buildDockerImage' unless docker is installed and running.

  was:
README.md on github says:

If you'd like to build and install the whole project from the source 
distribution, you may need some additional tools installed in your system. In a 
Debian-based distribution:

sudo apt-get install \
openjdk-8-jdk \
python-setuptools \
python-pip \
virtualenv


That's correct as far as it goes, but it's incomplete. .gradlew will fail in  
task ':website:buildDockerImage' unless dopcker is installed and running.



> README.md doesn't mention that Docker is required for build
> ---
>
> Key: BEAM-7673
> URL: https://issues.apache.org/jira/browse/BEAM-7673
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.13.0
>Reporter: Elliotte Rusty Harold
>Priority: Minor
>
> README.md on github says:
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
>  openjdk-8-jdk \
>  python-setuptools \
>  python-pip \
>  virtualenv
> That's correct as far as it goes, but it's incomplete. .gradlew will fail in 
> task ':website:buildDockerImage' unless docker is installed and running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7674) Define streaming ITs tests for direct runner in consistent way in Python 2 and Python 3 suites.

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7674:
---
Status: Open  (was: Triage Needed)

> Define streaming ITs tests for direct runner in consistent way in Python 2 
> and  Python 3 suites.
> 
>
> Key: BEAM-7674
> URL: https://issues.apache.org/jira/browse/BEAM-7674
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Currently in Python 2 direct runner test suite  some tests run in streaming 
> mode:
> https://github.com/apache/beam/blob/fbd1f4cf7118c7b2fb4e3a4cf46646e98f3e3b8d/sdks/python/build.gradle#L130
> However in Python 3, we run both Batch and Streaming direct runner tests in 
> Batch mode: 
> https://github.com/apache/beam/blob/fbd1f4cf7118c7b2fb4e3a4cf46646e98f3e3b8d/sdks/python/test-suites/direct/py35/build.gradle#L32
> We should check whether we need to explicitly separate the tests into batch 
> and streaming and define all directrunner suites consistently.
> cc: [~Juta]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7673) README.md doesn't mention that Docker is required for build

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7673:
---
Status: Open  (was: Triage Needed)

> README.md doesn't mention that Docker is required for build
> ---
>
> Key: BEAM-7673
> URL: https://issues.apache.org/jira/browse/BEAM-7673
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.13.0
>Reporter: Elliotte Rusty Harold
>Priority: Minor
>
> README.md on github says:
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
>  openjdk-8-jdk \
>  python-setuptools \
>  python-pip \
>  virtualenv
> That's correct as far as it goes, but it's incomplete. .gradlew will fail in 
> task ':website:buildDockerImage' unless docker is installed and running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7673) README.md doesn't mention that Docker is required for build

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7673:
---
Affects Version/s: (was: 2.13.0)

> README.md doesn't mention that Docker is required for build
> ---
>
> Key: BEAM-7673
> URL: https://issues.apache.org/jira/browse/BEAM-7673
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Minor
>
> README.md on github says:
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
>  openjdk-8-jdk \
>  python-setuptools \
>  python-pip \
>  virtualenv
> That's correct as far as it goes, but it's incomplete. .gradlew will fail in 
> task ':website:buildDockerImage' unless docker is installed and running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7681) Support Fanout in Apache BEAM SQL extension

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7681:
---
Status: Open  (was: Triage Needed)

> Support Fanout in Apache BEAM SQL extension
> ---
>
> Key: BEAM-7681
> URL: https://issues.apache.org/jira/browse/BEAM-7681
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Brachi Packter
>Priority: Major
>
> I want to use Fanout improvements: 
> [https://www.waitingforcode.com/apache-beam/fanouts-apache-beam-combine-transform/read]
> How can I use it when I'm doing my join and aggregation via BeamSQL?
> Can you add API that I can configure the fanout?
> For example:
> `SqlTransform.query(query).withFanOut()`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7681) Support Fanout in Apache BEAM SQL extension

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7681:
---
Component/s: (was: sdk-java-core)
 dsl-sql

> Support Fanout in Apache BEAM SQL extension
> ---
>
> Key: BEAM-7681
> URL: https://issues.apache.org/jira/browse/BEAM-7681
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Brachi Packter
>Priority: Major
>
> I want to use Fanout improvements: 
> [https://www.waitingforcode.com/apache-beam/fanouts-apache-beam-combine-transform/read]
> How can I use it when I'm doing my join and aggregation via BeamSQL?
> Can you add API that I can configure the fanout?
> For example:
> `SqlTransform.query(query).withFanOut()`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7684) WindmillStateCache grossly misunderestimates object size in its weighting function

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7684:
---
Status: Open  (was: Triage Needed)

> WindmillStateCache grossly misunderestimates object size in its weighting 
> function
> --
>
> Key: BEAM-7684
> URL: https://issues.apache.org/jira/browse/BEAM-7684
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Priority: Major
>
> As a little experiment, we set the WindmillStateCache size to 2 GB.  After we 
> let a job run for a few hours, the workers all died from OOMs.  Digging into 
> the memory usage, over 10 GB of heap was used by the WindmillStateCache.  The 
> total weight the cache had calculated however was only ~800 MB.
> It looks like the StateCacheEntry calculation is way off.  The weight of an 
> entry was calculated as only ~200, while the actual heap usage of it (as 
> reported by YourKit at least) was around 2,400 bytes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7676) All SDK workers have worker_id="1"

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7676:
---
Status: Open  (was: Triage Needed)

> All SDK workers have worker_id="1"
> --
>
> Key: BEAM-7676
> URL: https://issues.apache.org/jira/browse/BEAM-7676
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SDK workers are created using multiple job factories, which all give their 
> initial workers id 1 [1]. We could perhaps identify sdk workers also by the 
> factory that created them, for example worker_id=$FACTORY-$WORKER (e.g. 
> worker_id="1-1", "1-2"..."2-1"...)
>  
> [1] 
> [https://github.com/apache/beam/blob/89b08e133be5a2c6bcdbd36242f16ef7ab796902/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java#L115]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7676) All SDK workers have worker_id="1"

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7676:
---
Summary: All SDK workers have worker_id="1"  (was: all SDK workers have 
worker_id="1")

> All SDK workers have worker_id="1"
> --
>
> Key: BEAM-7676
> URL: https://issues.apache.org/jira/browse/BEAM-7676
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SDK workers are created using multiple job factories, which all give their 
> initial workers id 1 [1]. We could perhaps identify sdk workers also by the 
> factory that created them, for example worker_id=$FACTORY-$WORKER (e.g. 
> worker_id="1-1", "1-2"..."2-1"...)
>  
> [1] 
> [https://github.com/apache/beam/blob/89b08e133be5a2c6bcdbd36242f16ef7ab796902/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java#L115]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7678) typehints with_output_types annotation doesn't work for stateful DoFn

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7678:
---
Status: Open  (was: Triage Needed)

> typehints with_output_types annotation doesn't work for stateful DoFn 
> --
>
> Key: BEAM-7678
> URL: https://issues.apache.org/jira/browse/BEAM-7678
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.13.0
>Reporter: Enrico Canzonieri
>Priority: Minor
>
> The output types typehints seem to be ignored when using a stateful DoFn, but 
> the same typehint works perfectly when used without state. This issue 
> prevents a custom Coder from being used because Beam will default to one of 
> the {{FastCoders}} (I believe Pickle).
> Example code:
> {code}
> @typehints.with_output_types(Message)
> class StatefulDoFn(DoFn):
> COUNTER_STATE = BagStateSpec('counter', VarIntCoder())
> def process(self, element, counter=DoFn.StateParam(COUNTER_STATE)):
>   (key, messages) = element
>   newMessage = Message()
>   return [newMessage]
> {code}
> The example code is just defining a stateful DoFn for python. The used runner 
> is the Flink 1.6.4 portable runner.
> Finally, overriding {{infer_output_type}} to return a 
> {{typehints.List[Message]}} solves the issue.
> Looking at the code, it seems to me that in 
> [https://github.com/apache/beam/blob/v2.13.0/sdks/python/apache_beam/pipeline.py#L643]
>  we do not take the typehints into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7680) synthetic_pipeline_test.py flaky

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7680:
---
Status: Open  (was: Triage Needed)

> synthetic_pipeline_test.py flaky
> 
>
> Key: BEAM-7680
> URL: https://issues.apache.org/jira/browse/BEAM-7680
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> 11:51:43 FAIL: testSyntheticSDFStep 
> (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest)
> 11:51:43 
> --
> 11:51:43 Traceback (most recent call last):
> 11:51:43   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py",
>  line 82, in testSyntheticSDFStep
> 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed)
> 11:51:43 AssertionError: False is not true : 3.659700632095337{code}
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull]
>  
> Two flaky TODOs: 
> [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7688) Flink portable runner gets stuck when waiting for SDK Harness to close

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7688:
---
Status: Open  (was: Triage Needed)

> Flink portable runner gets stuck when waiting for SDK Harness to close
> --
>
> Key: BEAM-7688
> URL: https://issues.apache.org/jira/browse/BEAM-7688
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> When parallelism = nproc:
> "MapPartition (MapPartition at [37]{Analyze, RandomizeData, ReadFromText, 
> DecodeForAnalyze}) (9/12)" #2855 prio=5 os_prio=0 tid=0x7f9184022800 
> nid=0x2b58 waiting on condition [0x7f9091592000]
>  java.lang.Thread.State: WAITING (parking)
>  at (C/C++) 0x7f926a97a9f2 (Unknown Source)
>  at (C/C++) 0x7f9269f1dd99 (Unknown Source)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0xca218030> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.beam.sdk.fn.data.CompletableFutureInboundDataClient.awaitCompletion(CompletableFutureInboundDataClient.java:48)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.awaitCompletion(BeamFnDataInboundObserver.java:90)
>  at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:209)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:209)
>  at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>  at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>  at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7685) Join Reordering

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7685:
---
Status: Open  (was: Triage Needed)

> Join Reordering
> ---
>
> Key: BEAM-7685
> URL: https://issues.apache.org/jira/browse/BEAM-7685
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
>
> Query Parser does not reorder joins based on their costs. The fix is simple 
> we need to include the rules that are related to reordering joins such as 
> JoinCommuteRule. However, reordering joins may produce plans that has Cross 
> Join or other not supported types of join. We should either rewrite those 
> rules to consider that or return infinite cost for those types of joins so 
> that they will not be selected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7686) Erroneous Beam Deprecation warnings are confusing Bigquery IO users.

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7686:
---
Status: Open  (was: Triage Needed)

> Erroneous Beam Deprecation warnings are confusing Bigquery IO users.
> 
>
> Key: BEAM-7686
> URL: https://issues.apache.org/jira/browse/BEAM-7686
> Project: Beam
>  Issue Type: Improvement
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>
> Bigquery sink is marked deprecated: 
> https://github.com/apache/beam/blob/32e2e3e910619d400568073dde0d7b36698a416a/sdks/python/apache_beam/io/gcp/bigquery.py#L499,
>  however it is used by Dataflow runner in codepath that processes 
> WriteToBigQuery 
> https://github.com/apache/beam/blob/32e2e3e910619d400568073dde0d7b36698a416a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L680.
> This generates a warning message for all BQ users, including those who use 
> WriteToBigQuery: "Warning: BigQuerySink is deprecated since 2.11.0. Use 
> WriteToBigQuery instead".
> This message is a red herring and confuses BQ users.
> One possibility to address this is to make BigQuerySink an alias for  
> _BigQuerySink, and use _BigQuerySink internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7687) python postcommits and precommits flaky

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7687:
---
Status: Open  (was: Triage Needed)

> python postcommits and precommits flaky
> ---
>
> Key: BEAM-7687
> URL: https://issues.apache.org/jira/browse/BEAM-7687
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Priority: Major
>
> Dataflow pipelines are failing with "Workflow failed." Sometimes there are 
> "Internal Issue" errors logged:
> {code:java}
> 13:17:13 root: INFO: 2019-07-03T19:51:33.859Z: JOB_MESSAGE_ERROR: Workflow 
> failed. Causes: Internal Issue (5ffafa8893c163ee): 82159483:17
> {code}
> [https://builds.apache.org/job/beam_PostCommit_Python_Verify_PR/858/consoleFull]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7687) Python postcommits and precommits flaky

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7687:
---
Summary: Python postcommits and precommits flaky  (was: python postcommits 
and precommits flaky)

> Python postcommits and precommits flaky
> ---
>
> Key: BEAM-7687
> URL: https://issues.apache.org/jira/browse/BEAM-7687
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Priority: Major
>
> Dataflow pipelines are failing with "Workflow failed." Sometimes there are 
> "Internal Issue" errors logged:
> {code:java}
> 13:17:13 root: INFO: 2019-07-03T19:51:33.859Z: JOB_MESSAGE_ERROR: Workflow 
> failed. Causes: Internal Issue (5ffafa8893c163ee): 82159483:17
> {code}
> [https://builds.apache.org/job/beam_PostCommit_Python_Verify_PR/858/consoleFull]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7689) Temporary directory for WriteOperation may not be unique in FileBaseSink

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7689:
---
Summary: Temporary directory for WriteOperation may not be unique in 
FileBaseSink  (was: temporary directory for WriteOperation may not be unique in 
FileBaseSink)

> Temporary directory for WriteOperation may not be unique in FileBaseSink
> 
>
> Key: BEAM-7689
> URL: https://issues.apache.org/jira/browse/BEAM-7689
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-files
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Critical
>
> Temporary directory for WriteOperation in FileBasedSink is generated from a 
> second-granularity timestamp (-MM-dd_HH-mm-ss) and unique id. Such 
> granularity is not enough to make temporary directories unique between 
> different jobs. When two jobs share the same temporary directory, output file 
> may not be produced in one job since the required temporary file can be 
> deleted from another job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7689) temporary directory for WriteOperation may not be unique in FileBaseSink

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7689:
---
Status: Open  (was: Triage Needed)

> temporary directory for WriteOperation may not be unique in FileBaseSink
> 
>
> Key: BEAM-7689
> URL: https://issues.apache.org/jira/browse/BEAM-7689
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-files
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Critical
>
> Temporary directory for WriteOperation in FileBasedSink is generated from a 
> second-granularity timestamp (-MM-dd_HH-mm-ss) and unique id. Such 
> granularity is not enough to make temporary directories unique between 
> different jobs. When two jobs share the same temporary directory, output file 
> may not be produced in one job since the required temporary file can be 
> deleted from another job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7190) Enable file system based token authentication for Samza portable runner

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7190:
---
Summary: Enable file system based token authentication for Samza portable 
runner  (was: enable file system based token authentication for portable runner)

> Enable file system based token authentication for Samza portable runner
> ---
>
> Key: BEAM-7190
> URL: https://issues.apache.org/jira/browse/BEAM-7190
> Project: Beam
>  Issue Type: Task
>  Components: runner-samza
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For Samza and potentially other portable runners, there is a need to secure 
> the communication between sdk worker and runner. Currently the SSL/TLS in 
> portability is half done.
> However, after investigation we found that it's sufficient to just 1) use 
> loopback address 2) enforce authentication and that way the communication is 
> both authenticated and secured.
> This ticket intends to track the implementation of the solution above. More 
> details can be found in the following PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7677) Create a CLI to convert a pipeline to graphviz dot format

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878404#comment-16878404
 ] 

Ismaël Mejía commented on BEAM-7677:


So far the only Beam module that offers a CLI is the SQL CLI one, so probably 
good to put it there for the moment. A really REALLY interesting reason to put 
it there would be to get the pipeline dot definition for a given SQL query, 
that would be a really nice addition to your idea.

> Create a CLI to convert a pipeline to graphviz dot format
> -
>
> Key: BEAM-7677
> URL: https://issues.apache.org/jira/browse/BEAM-7677
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> As a Beam developer (especially one using python or go) I want a way to 
> convert a pipeline to graphviz dot format so that I can easily visualize it.  
> The Beam source comes with a great utility in the form of 
> [PortablePipelineDotRenderer.java|https://github.com/apache/beam/blob/2df702a1448fa6cbd22cd225bf16e9ffc4c82595/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PortablePipelineDotRenderer.java#L29].
>  It seems like a great util to wrap with a CLI, so that it can be easily 
> accessed from other languages and just for general ease-of-use for java 
> developers.
> The tool would take an input pipeline message as either a path to a file or 
> via stdin, and would output a dot string to stdout or to a file.
> I'm happy to make a PR for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7550) Implement missing pipeline parameters in ParDo Load Test

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7550?focusedWorklogId=272036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272036
 ]

ASF GitHub Bot logged work on BEAM-7550:


Author: ASF GitHub Bot
Created on: 04/Jul/19 07:40
Start Date: 04/Jul/19 07:40
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #8847: [BEAM-7550] Missing 
pipeline parameters in ParDo Load Test
URL: https://github.com/apache/beam/pull/8847#issuecomment-508375981
 
 
   Thanks @lgajowy. I squahed commits as you suggested.
   Moreover, I extracted `get_option_or_default` method to the LoadTest class, 
because I'm going to reuse it in other load tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272036)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement missing pipeline parameters in ParDo Load Test
> 
>
> Key: BEAM-7550
> URL: https://issues.apache.org/jira/browse/BEAM-7550
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Without some pipeline parameters in ParDo Load Test in Python, it is 
> impossible to create all required test cases (see proposal: 
> [https://s.apache.org/load-test-basic-operations]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7677) Create a CLI to convert a pipeline to graphviz dot format

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7677:
---
Summary: Create a CLI to convert a pipeline to graphviz dot format  (was: 
create a CLI to convert a pipeline to graphviz dot format)

> Create a CLI to convert a pipeline to graphviz dot format
> -
>
> Key: BEAM-7677
> URL: https://issues.apache.org/jira/browse/BEAM-7677
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Priority: Major
>
> As a Beam developer (especially one using python or go) I want a way to 
> convert a pipeline to graphviz dot format so that I can easily visualize it.  
> The Beam source comes with a great utility in the form of 
> [PortablePipelineDotRenderer.java|https://github.com/apache/beam/blob/2df702a1448fa6cbd22cd225bf16e9ffc4c82595/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PortablePipelineDotRenderer.java#L29].
>  It seems like a great util to wrap with a CLI, so that it can be easily 
> accessed from other languages and just for general ease-of-use for java 
> developers.
> The tool would take an input pipeline message as either a path to a file or 
> via stdin, and would output a dot string to stdout or to a file.
> I'm happy to make a PR for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-7677) Create a CLI to convert a pipeline to graphviz dot format

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7677:
--

Assignee: Chad Dombrova

> Create a CLI to convert a pipeline to graphviz dot format
> -
>
> Key: BEAM-7677
> URL: https://issues.apache.org/jira/browse/BEAM-7677
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> As a Beam developer (especially one using python or go) I want a way to 
> convert a pipeline to graphviz dot format so that I can easily visualize it.  
> The Beam source comes with a great utility in the form of 
> [PortablePipelineDotRenderer.java|https://github.com/apache/beam/blob/2df702a1448fa6cbd22cd225bf16e9ffc4c82595/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PortablePipelineDotRenderer.java#L29].
>  It seems like a great util to wrap with a CLI, so that it can be easily 
> accessed from other languages and just for general ease-of-use for java 
> developers.
> The tool would take an input pipeline message as either a path to a file or 
> via stdin, and would output a dot string to stdout or to a file.
> I'm happy to make a PR for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7677) create a CLI to convert a pipeline to graphviz dot format

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7677:
---
Status: Open  (was: Triage Needed)

> create a CLI to convert a pipeline to graphviz dot format
> -
>
> Key: BEAM-7677
> URL: https://issues.apache.org/jira/browse/BEAM-7677
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Priority: Major
>
> As a Beam developer (especially one using python or go) I want a way to 
> convert a pipeline to graphviz dot format so that I can easily visualize it.  
> The Beam source comes with a great utility in the form of 
> [PortablePipelineDotRenderer.java|https://github.com/apache/beam/blob/2df702a1448fa6cbd22cd225bf16e9ffc4c82595/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PortablePipelineDotRenderer.java#L29].
>  It seems like a great util to wrap with a CLI, so that it can be easily 
> accessed from other languages and just for general ease-of-use for java 
> developers.
> The tool would take an input pipeline message as either a path to a file or 
> via stdin, and would output a dot string to stdout or to a file.
> I'm happy to make a PR for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878399#comment-16878399
 ] 

Ismaël Mejía commented on BEAM-7683:


Ah that's great. I pinged Ahmed since he worked in that area too, but feel free 
to bring a PR. In the meantime i will assign the ticket to you.

> MongoDBIO - withQueryFn, ignoring custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Priority: Major
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7683:
--

Assignee: Chaim

> MongoDBIO - withQueryFn, ignoring custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Assignee: Chaim
>Priority: Major
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread Chaim (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878394#comment-16878394
 ] 

Chaim commented on BEAM-7683:
-

i have a solution for it, i am checking and will send a pull request

> MongoDBIO - withQueryFn, ignoring custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Priority: Major
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878389#comment-16878389
 ] 

Ismaël Mejía commented on BEAM-7683:


[~sandboxws] any chance you can take a look at this one?

> MongoDBIO - withQueryFn, ignoring custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Priority: Major
> Fix For: 2.14.0
>
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7683:
---
Fix Version/s: (was: 2.14.0)

> MongoDBIO - withQueryFn, ignoring custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Priority: Major
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7683) MongoDBIO - withQueryFn, ignoring custom filter

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7683:
---
Status: Open  (was: Triage Needed)

> MongoDBIO - withQueryFn, ignoring custom filter
> ---
>
> Key: BEAM-7683
> URL: https://issues.apache.org/jira/browse/BEAM-7683
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.13.0
>Reporter: Chaim
>Priority: Major
> Fix For: 2.14.0
>
>
> begin.apply(tableInfo.getTableName(), MongoDbIO.read()
>  .withUri("mongodb://" + connectionParams)
>  .withQueryFn(FindQuery.create().withFilters(filter))
>  .withDatabase(dbName)
>  .withNumSplits(splitCount)
>  .withCollection(tableInfo.getTableName()));
>  
> the filter in withQueryFn is ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-07-04 Thread Shoaib Zafar (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878386#comment-16878386
 ] 

Shoaib Zafar commented on BEAM-6694:


[~shehzaadn]. This task is in progress.

> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7261) Add support for BasicSessionCredentials for aws credentials.

2019-07-04 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878383#comment-16878383
 ] 

Ismaël Mejía commented on BEAM-7261:


I also added you as a contributor in case you want to self assign this ticket.

> Add support for BasicSessionCredentials for aws credentials.
> 
>
> Key: BEAM-7261
> URL: https://issues.apache.org/jira/browse/BEAM-7261
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: David Brown
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently AWS for beam only supports basic Aws Credentials with a Secret and 
> a Key. Need to support session tokens for s3 instances with tighter 
> credentials. Would involve adding BasicSessionCredentials to the AwsModule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-3154) Support multiple KeyRanges when reading from BigTable

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-3154.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Support multiple KeyRanges when reading from BigTable
> -
>
> Key: BEAM-3154
> URL: https://issues.apache.org/jira/browse/BEAM-3154
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Ryan Niemocienski
>Assignee: Solomon Duskis
>Priority: Minor
> Fix For: 2.4.0
>
>
> BigTableIO.Read currently only supports reading one KeyRange from BT. It 
> would be nice to read multiple ranges from BigTable in one read. Thoughts on 
> the feasibility of this before I dig into it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-3032) Add RedshiftIO

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-3032:
--

Assignee: (was: Jacob Marble)

> Add RedshiftIO
> --
>
> Key: BEAM-3032
> URL: https://issues.apache.org/jira/browse/BEAM-3032
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
> Environment: AWS Redshift
>Reporter: Jacob Marble
>Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I would like to add a RedshiftIO Java extension to perform bulk read/write 
> to/from AWS Redshift via the UNLOAD and COPY Redshift SQL commands. This 
> requires S3, which is the subject of BEAM-2500.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-2295) Hadoop IO connectors require additional repositories

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-2295:
--

Assignee: Jean-Baptiste Onofré  (was: Jean-Baptiste Onofré)

> Hadoop IO connectors require additional repositories
> 
>
> Key: BEAM-2295
> URL: https://issues.apache.org/jira/browse/BEAM-2295
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>
> Several dependencies are not in https://repo.maven.apache.org/maven2.
> 404 when trying to visit https://repo.maven.apache.org/maven2/cascading/
> My workaround is to add conjars.org repository to my local profile.
>
>clojars
>https://clojars.org/repo/
>
>true
>
>
>true
>
>
> The question is how to make the io connector modules have the additional repo 
> by default? I can see this issue will come up when more io connectors are 
> added, and when they need additional repos.
> [INFO] 
> 
> [INFO] Building Apache Beam :: SDKs :: Java :: IO :: Hadoop :: jdk1.8-tests 
> 2.0.0
> [INFO] 
> 
> [INFO]
> [INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ 
> beam-sdks-java-io-hadoop-jdk1.8-tests ---
> [INFO] Deleting 
> /Users/peihe/Downloads/apache-beam-2.0.0/sdks/java/io/hadoop/jdk1.8-tests 
> (includes = [**/*.pyc, **/*.egg-info/, **/sdks/python/LICENSE, 
> **/sdks/python/NOTICE, **/sdks/python/README.md], excludes = [])
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce) @ 
> beam-sdks-java-io-hadoop-jdk1.8-tests ---
> Downloading: 
> https://repo.maven.apache.org/maven2/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
> [WARNING] Could not find artifact 
> org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in central 
> (https://repo.maven.apache.org/maven2)
> Try downloading the file manually from the project website.
> Then, install it using the command:
> mvn install:install-file -DgroupId=org.pentaho 
> -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde 
> -Dpackaging=jar -Dfile=/path/to/file
> Alternatively, if you host your own repository you can deploy the file there:
> mvn deploy:deploy-file -DgroupId=org.pentaho 
> -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde 
> -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
> Path to dependency:
>   1) org.apache.beam:beam-sdks-java-io-hadoop-jdk1.8-tests:jar:2.0.0
>   2) org.elasticsearch:elasticsearch-hadoop:jar:5.0.0
>   3) org.apache.hive:hive-service:jar:1.2.1
>   4) org.apache.hive:hive-exec:jar:1.2.1
>   5) org.apache.calcite:calcite-core:jar:1.2.0-incubating
>   6) org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde
>   org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde
> from the specified remote repositories:
>   apache.snapshots (https://repository.apache.org/snapshots, releases=false, 
> snapshots=true),
>   central (https://repo.maven.apache.org/maven2, releases=true, 
> snapshots=false)
> Downloading: 
> https://repo.maven.apache.org/maven2/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
> [WARNING] Could not find artifact cascading:cascading-hadoop:jar:2.6.3 in 
> central (https://repo.maven.apache.org/maven2)
> Try downloading the file manually from the project website.
> Then, install it using the command:
> mvn install:install-file -DgroupId=cascading 
> -DartifactId=cascading-hadoop -Dversion=2.6.3 -Dpackaging=jar 
> -Dfile=/path/to/file
> Alternatively, if you host your own repository you can deploy the file there:
> mvn deploy:deploy-file -DgroupId=cascading -DartifactId=cascading-hadoop 
> -Dversion=2.6.3 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] 
> -DrepositoryId=[id]
> Path to dependency:
>   1) org.apache.beam:beam-sdks-java-io-hadoop-jdk1.8-tests:jar:2.0.0
>   2) org.elasticsearch:elasticsearch-hadoop:jar:5.0.0
>   3) cascading:cascading-hadoop:jar:2.6.3
>   cascading:cascading-hadoop:jar:2.6.3
> from the specified remote repositories:
>   apache.snapshots (https://repository.apache.org/snapshots, releases=false, 
> snapshots=true),
>   central (https://repo.maven.apache.org/maven2, releases=true, 
> snapshots=false)
> Downloading: 
> https://repo.maven.apache.org/maven2/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
> [WARNING] Could not

[jira] [Assigned] (BEAM-2110) HDFSFileSource throws IndexOutOfBoundsException when trying to read big file (gaming_data1.csv)

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-2110:
--

Assignee: Jean-Baptiste Onofré  (was: Jean-Baptiste Onofré)

> HDFSFileSource throws IndexOutOfBoundsException when trying to read big file 
> (gaming_data1.csv)
> ---
>
> Key: BEAM-2110
> URL: https://issues.apache.org/jira/browse/BEAM-2110
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-file-system
> Environment: CentOS 7, Oracle JDK 8
>Reporter: Gergely Novák
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: Not applicable
>
>
> I modified the wordcount example to read from HDFS with this code:
> {code}
> pipeline.apply(Read.from(HDFSFileSource.fromText(options.getInput(
> {code}
> This worked for a number of small files I tried with. But with the included 
> example: gs://apache-beam-samples/game/gaming_data*.csv (moved to HDFS) fails 
> with the following trace:
> {noformat}
> Caused by: java.lang.IndexOutOfBoundsException
>   at java.nio.Buffer.checkBounds(Buffer.java:567)
>   at java.nio.ByteBuffer.get(ByteBuffer.java:686)
>   at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:285)
>   at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:168)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
>   at 
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
>   at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
>   at 
> org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
>   at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
>   at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
>   at 
> org.apache.beam.sdk.io.hdfs.HDFSFileSource$HDFSFileReader.advance(HDFSFileSource.java:492)
>   at 
> org.apache.beam.sdk.io.hdfs.HDFSFileSource$HDFSFileReader.start(HDFSFileSource.java:465)
>   at 
> org.apache.beam.runners.flink.metrics.ReaderInvocationUtil.invokeStart(ReaderInvocationUtil.java:50)
>   at 
> org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat.open(SourceInputFormat.java:79)
>   at 
> org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat.open(SourceInputFormat.java:45)
>   at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:144)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-1717) Maven release/deploy tries to uploads some artifacts more than once

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-1717.

   Resolution: Invalid
 Assignee: (was: Jean-Baptiste Onofré)
Fix Version/s: Not applicable

> Maven release/deploy tries to uploads some artifacts more than once 
> 
>
> Key: BEAM-1717
> URL: https://issues.apache.org/jira/browse/BEAM-1717
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Amit Sela
>Priority: Minor
> Fix For: Not applicable
>
>
> Running maven {{release}} or {{deploy}} causes some artifacts to deploy more 
> than once which fails deployments to release Nexus.
> While this is not an issue for the Apache release process (because it uses a 
> staging Nexus), this affects users who wish to deploy their own fork. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-2295) Hadoop IO connectors require additional repositories

2019-07-04 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-2295.

   Resolution: Invalid
Fix Version/s: Not applicable

> Hadoop IO connectors require additional repositories
> 
>
> Key: BEAM-2295
> URL: https://issues.apache.org/jira/browse/BEAM-2295
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>
> Several dependencies are not in https://repo.maven.apache.org/maven2.
> 404 when trying to visit https://repo.maven.apache.org/maven2/cascading/
> My workaround is to add conjars.org repository to my local profile.
>
>clojars
>https://clojars.org/repo/
>
>true
>
>
>true
>
>
> The question is how to make the io connector modules have the additional repo 
> by default? I can see this issue will come up when more io connectors are 
> added, and when they need additional repos.
> [INFO] 
> 
> [INFO] Building Apache Beam :: SDKs :: Java :: IO :: Hadoop :: jdk1.8-tests 
> 2.0.0
> [INFO] 
> 
> [INFO]
> [INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ 
> beam-sdks-java-io-hadoop-jdk1.8-tests ---
> [INFO] Deleting 
> /Users/peihe/Downloads/apache-beam-2.0.0/sdks/java/io/hadoop/jdk1.8-tests 
> (includes = [**/*.pyc, **/*.egg-info/, **/sdks/python/LICENSE, 
> **/sdks/python/NOTICE, **/sdks/python/README.md], excludes = [])
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce) @ 
> beam-sdks-java-io-hadoop-jdk1.8-tests ---
> Downloading: 
> https://repo.maven.apache.org/maven2/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
> [WARNING] Could not find artifact 
> org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in central 
> (https://repo.maven.apache.org/maven2)
> Try downloading the file manually from the project website.
> Then, install it using the command:
> mvn install:install-file -DgroupId=org.pentaho 
> -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde 
> -Dpackaging=jar -Dfile=/path/to/file
> Alternatively, if you host your own repository you can deploy the file there:
> mvn deploy:deploy-file -DgroupId=org.pentaho 
> -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde 
> -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
> Path to dependency:
>   1) org.apache.beam:beam-sdks-java-io-hadoop-jdk1.8-tests:jar:2.0.0
>   2) org.elasticsearch:elasticsearch-hadoop:jar:5.0.0
>   3) org.apache.hive:hive-service:jar:1.2.1
>   4) org.apache.hive:hive-exec:jar:1.2.1
>   5) org.apache.calcite:calcite-core:jar:1.2.0-incubating
>   6) org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde
>   org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde
> from the specified remote repositories:
>   apache.snapshots (https://repository.apache.org/snapshots, releases=false, 
> snapshots=true),
>   central (https://repo.maven.apache.org/maven2, releases=true, 
> snapshots=false)
> Downloading: 
> https://repo.maven.apache.org/maven2/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
> [WARNING] Could not find artifact cascading:cascading-hadoop:jar:2.6.3 in 
> central (https://repo.maven.apache.org/maven2)
> Try downloading the file manually from the project website.
> Then, install it using the command:
> mvn install:install-file -DgroupId=cascading 
> -DartifactId=cascading-hadoop -Dversion=2.6.3 -Dpackaging=jar 
> -Dfile=/path/to/file
> Alternatively, if you host your own repository you can deploy the file there:
> mvn deploy:deploy-file -DgroupId=cascading -DartifactId=cascading-hadoop 
> -Dversion=2.6.3 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] 
> -DrepositoryId=[id]
> Path to dependency:
>   1) org.apache.beam:beam-sdks-java-io-hadoop-jdk1.8-tests:jar:2.0.0
>   2) org.elasticsearch:elasticsearch-hadoop:jar:5.0.0
>   3) cascading:cascading-hadoop:jar:2.6.3
>   cascading:cascading-hadoop:jar:2.6.3
> from the specified remote repositories:
>   apache.snapshots (https://repository.apache.org/snapshots, releases=false, 
> snapshots=true),
>   central (https://repo.maven.apache.org/maven2, releases=true, 
> snapshots=false)
> Downloading: 
> https://repo.maven.apache.org/maven2/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
> [WARNING] Could not find

[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-07-04 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=272005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272005
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 04/Jul/19 06:11
Start Date: 04/Jul/19 06:11
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-508352376
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 272005)
Time Spent: 19h 20m  (was: 19h 10m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 107 matches

Mail list logo