[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=415409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415409
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 03/Apr/20 09:37
Start Date: 03/Apr/20 09:37
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r402882661
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
 
 Review comment:
   Fix: https://github.com/apache/beam/pull/11303
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415409)
Time Spent: 4h 40m  (was: 4.5h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=415392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415392
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 03/Apr/20 08:53
Start Date: 03/Apr/20 08:53
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r402852054
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
 
 Review comment:
   Indeed looks like a rebasing error. A bit tricky one to get right because we 
changed the way the container removal worked before the rebasing was done. We 
were using the `--rm` initially but before the rebase we changed it to remove 
the container explicitly via `docker remove `.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415392)
Time Spent: 4.5h  (was: 4h 20m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=415099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415099
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 02/Apr/20 22:40
Start Date: 02/Apr/20 22:40
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r402637170
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
 
 Review comment:
   Why was this added in this PR? It seems orthogonal to `semi_persist_dir`. I 
believe this is a regression; perhaps a rebasing error?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415099)
Time Spent: 4h 20m  (was: 4h 10m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315209
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 17:51
Start Date: 19/Sep/19 17:51
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 315209)
Time Spent: 4h 10m  (was: 4h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315208
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 17:50
Start Date: 19/Sep/19 17:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r326304747
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
+}
+
+String semiPersistDir = 
pipelineOptions.as(RemoteEnvironmentOptions.class).getSemiPersistDir();
+ImmutableList.Builder argsBuilder =
+ImmutableList.builder()
+.add(String.format("--id=%s", workerId))
+.add(String.format("--logging_endpoint=%s", loggingEndpoint))
+.add(String.format("--artifact_endpoint=%s", artifactEndpoint))
+.add(String.format("--provision_endpoint=%s", provisionEndpoint))
+.add(String.format("--control_endpoint=%s", controlEndpoint));
+if (semiPersistDir != null) {
 
 Review comment:
   Actually, the semi_persist_dir is not inferred from the pipeline options in 
the bootloader code. Like you said, it has to be this way currently, but it 
would be nice to not duplicate this information in the future.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 315208)
Time Spent: 4h  (was: 3h 50m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315202
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 17:32
Start Date: 19/Sep/19 17:32
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r326296442
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
+}
+
+String semiPersistDir = 
pipelineOptions.as(RemoteEnvironmentOptions.class).getSemiPersistDir();
+ImmutableList.Builder argsBuilder =
+ImmutableList.builder()
+.add(String.format("--id=%s", workerId))
+.add(String.format("--logging_endpoint=%s", loggingEndpoint))
+.add(String.format("--artifact_endpoint=%s", artifactEndpoint))
+.add(String.format("--provision_endpoint=%s", provisionEndpoint))
+.add(String.format("--control_endpoint=%s", controlEndpoint));
+if (semiPersistDir != null) {
 
 Review comment:
   That's right. This is redundant here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 315202)
Time Spent: 3h 50m  (was: 3h 40m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315174
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 16:50
Start Date: 19/Sep/19 16:50
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r326278269
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
+}
+
+String semiPersistDir = 
pipelineOptions.as(RemoteEnvironmentOptions.class).getSemiPersistDir();
+ImmutableList.Builder argsBuilder =
+ImmutableList.builder()
+.add(String.format("--id=%s", workerId))
+.add(String.format("--logging_endpoint=%s", loggingEndpoint))
+.add(String.format("--artifact_endpoint=%s", artifactEndpoint))
+.add(String.format("--provision_endpoint=%s", provisionEndpoint))
+.add(String.format("--control_endpoint=%s", controlEndpoint));
+if (semiPersistDir != null) {
 
 Review comment:
   So we essentially pass the same piece of information to the worker twice: As 
entry point argument and then again within the pipeline options. It needs to be 
done this way due to the container contract. Would be nice to revisit in the 
future.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 315174)
Time Spent: 3h 40m  (was: 3.5h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315166
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 16:37
Start Date: 19/Sep/19 16:37
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r326273162
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -132,20 +133,30 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
 // host networking on Mac)
 .add("--env=DOCKER_MAC_CONTAINER=" + 
System.getenv("DOCKER_MAC_CONTAINER"));
 
-List args =
-ImmutableList.of(
-String.format("--id=%s", workerId),
-String.format("--logging_endpoint=%s", loggingEndpoint),
-String.format("--artifact_endpoint=%s", artifactEndpoint),
-String.format("--provision_endpoint=%s", provisionEndpoint),
-String.format("--control_endpoint=%s", controlEndpoint));
+Boolean retainDockerContainer =
+
pipelineOptions.as(ManualDockerEnvironmentOptions.class).getRetainDockerContainers();
+if (!retainDockerContainer) {
+  dockerOptsBuilder.add("--rm");
+}
+
+String semiPersistDir = 
pipelineOptions.as(RemoteEnvironmentOptions.class).getSemiPersistDir();
+ImmutableList.Builder argsBuilder =
+ImmutableList.builder()
+.add(String.format("--id=%s", workerId))
+.add(String.format("--logging_endpoint=%s", loggingEndpoint))
+.add(String.format("--artifact_endpoint=%s", artifactEndpoint))
+.add(String.format("--provision_endpoint=%s", provisionEndpoint))
+.add(String.format("--control_endpoint=%s", controlEndpoint));
+if (semiPersistDir != null) {
 
 Review comment:
   Isn't this provided to the environment through the provision endpoint?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 315166)
Time Spent: 3.5h  (was: 3h 20m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=315153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315153
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 16:30
Start Date: 19/Sep/19 16:30
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9452: [BEAM-7945] Allow runner 
to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#issuecomment-533209634
 
 
   I've squashed the fixup commits and updated the PR. Will merge once the 
tests pass again.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 315153)
Time Spent: 3h 20m  (was: 3h 10m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=314955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-314955
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 10:36
Start Date: 19/Sep/19 10:36
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#issuecomment-533071697
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 314955)
Time Spent: 3h 10m  (was: 3h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=314875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-314875
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 19/Sep/19 07:39
Start Date: 19/Sep/19 07:39
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#issuecomment-533008240
 
 
   @mxm @tweise  Thanks a lot for the review. Sorry that I missed the comments 
from @tweise.
   I have updated the PR, would be great if you can take another look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 314875)
Time Spent: 3h  (was: 2h 50m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=311277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311277
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 12/Sep/19 08:58
Start Date: 12/Sep/19 08:58
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#issuecomment-530733430
 
 
   R: @mxm 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 311277)
Time Spent: 2h 50m  (was: 2h 40m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=308248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308248
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 07/Sep/19 00:59
Start Date: 07/Sep/19 00:59
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321948469
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/** Options that are used to control configuration of the remote environment. 
*/
+@Experimental
+@Hidden
+public interface RemoteEnvironmentOptions extends PipelineOptions {
+
+  @Description("Local semi-persistent directory")
+  @Default.String("/tmp")
 
 Review comment:
   I think the default should be null (no default), so that the environment can 
pick its suitable tmp directory when nothing is specified by the user. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308248)
Time Spent: 2h 40m  (was: 2.5h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=308245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308245
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 07/Sep/19 00:58
Start Date: 07/Sep/19 00:58
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321948469
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/** Options that are used to control configuration of the remote environment. 
*/
+@Experimental
+@Hidden
+public interface RemoteEnvironmentOptions extends PipelineOptions {
+
+  @Description("Local semi-persistent directory")
+  @Default.String("/tmp")
 
 Review comment:
   I think the default should be null, so that the environment can pick its 
suitable tmp directory when nothing is specified by the user. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308245)
Time Spent: 2.5h  (was: 2h 20m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=308241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308241
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 07/Sep/19 00:49
Start Date: 07/Sep/19 00:49
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#issuecomment-529056178
 
 
   Thanks for the review @mxm !
   I have update the PR according your comments. I appreciate if you can have a 
another look. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308241)
Time Spent: 2h 20m  (was: 2h 10m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307892
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 14:24
Start Date: 06/Sep/19 14:24
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321759869
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/** Options that are used to control configuration of the remote environment. 
*/
+@Experimental
+@Hidden
+public interface RemoteEnvironmentOptions extends PipelineOptions {
+
+  @Description("Local semi-persistent directory")
+  @Default.String("/tmp")
 
 Review comment:
   Let's keep the existing default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307892)
Time Spent: 2h 10m  (was: 2h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307745
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 10:29
Start Date: 06/Sep/19 10:29
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9452: 
[BEAM-7945] Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321674314
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/** Options that are used to control configuration of the remote environment. 
*/
+@Experimental
+@Hidden
+public interface RemoteEnvironmentOptions extends PipelineOptions {
+
+  @Description("Local semi-persistent directory")
+  @Default.String("/tmp")
 
 Review comment:
   Currently, we keep the same as the default value of other default 
configuration, such as:`boot.go`.
   - 
https://github.com/apache/beam/blob/d21bbaf4c70986c2dbdbe8f6fce35b2b2cb4843d/sdks/go/container/boot.go#L41
   - 
https://github.com/apache/beam/blob/d21bbaf4c70986c2dbdbe8f6fce35b2b2cb4843d/sdks/python/container/boot.go#L51
   
   So, how about we keep using `/tmp` as default value ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307745)
Time Spent: 2h  (was: 1h 50m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307725
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 09:39
Start Date: 06/Sep/19 09:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321655867
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/options/RemoteEnvironmentOptionsTest.java
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link RemoteEnvironmentOptions}. */
+@RunWith(JUnit4.class)
+public class RemoteEnvironmentOptionsTest {
+
+  @Test
+  public void testSemiDirectory() {
+RemoteEnvironmentOptions options = 
PipelineOptionsFactory.as(RemoteEnvironmentOptions.class);
+String semiDir = "/ab/cd";
+options.setSemiPersistDir(semiDir);
+assertEquals(semiDir, options.getSemiPersistDir());
 
 Review comment:
   This should also test the default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307725)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307723
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 09:39
Start Date: 06/Sep/19 09:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321655867
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/options/RemoteEnvironmentOptionsTest.java
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link RemoteEnvironmentOptions}. */
+@RunWith(JUnit4.class)
+public class RemoteEnvironmentOptionsTest {
+
+  @Test
+  public void testSemiDirectory() {
+RemoteEnvironmentOptions options = 
PipelineOptionsFactory.as(RemoteEnvironmentOptions.class);
+String semiDir = "/ab/cd";
+options.setSemiPersistDir(semiDir);
+assertEquals(semiDir, options.getSemiPersistDir());
 
 Review comment:
   This should test the default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307723)
Time Spent: 1.5h  (was: 1h 20m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307724
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 09:39
Start Date: 06/Sep/19 09:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321656526
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/RemoteEnvironmentOptions.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/** Options that are used to control configuration of the remote environment. 
*/
+@Experimental
+@Hidden
+public interface RemoteEnvironmentOptions extends PipelineOptions {
+
+  @Description("Local semi-persistent directory")
+  @Default.String("/tmp")
 
 Review comment:
   Should this be `System.getProperty('java.io.tmpdir')`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307724)
Time Spent: 1h 40m  (was: 1.5h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307599
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 03:21
Start Date: 06/Sep/19 03:21
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9452: 
[BEAM-7945] Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321562506
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -815,6 +815,7 @@ message StartWorkerRequest {
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 
5;
+  string semi_persist_dir = 6;
 
 Review comment:
   Oh, Yes, I see, this is useless changes. `pipeline_options` already defined 
in the `ProvisionInfo`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307599)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307221
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 05/Sep/19 14:54
Start Date: 05/Sep/19 14:54
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321316686
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -815,6 +815,7 @@ message StartWorkerRequest {
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 
5;
+  string semi_persist_dir = 6;
 
 Review comment:
   This should not be added here. Pipeline options are provided through the 
provisioning endpoint.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307221)
Time Spent: 1h 10m  (was: 1h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307217
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 05/Sep/19 14:49
Start Date: 05/Sep/19 14:49
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321313810
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -92,18 +92,18 @@
   private final int environmentExpirationMillis;
 
   public static DefaultJobBundleFactory create(JobInfo jobInfo) {
+PipelineOptions pipelineOption =
 
 Review comment:
   "pipelineOptions"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307217)
Time Spent: 1h  (was: 50m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307165
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 05/Sep/19 14:00
Start Date: 05/Sep/19 14:00
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321282379
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -815,6 +815,7 @@ message StartWorkerRequest {
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 
5;
+  string semi_persist_dir = 6;
 
 Review comment:
   I'm not sure whether this flexibility is desired. I could imagine that the 
person who starts the worker pool does not want arbitrary persist directories, 
but rather rather a fixed one. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307165)
Time Spent: 50m  (was: 40m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307135
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 05/Sep/19 13:14
Start Date: 05/Sep/19 13:14
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9452: 
[BEAM-7945] Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321250359
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -815,6 +815,7 @@ message StartWorkerRequest {
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 
5;
+  string semi_persist_dir = 6;
 
 Review comment:
   Great to have your suggestions. :)
   Maybe I have not understood your idea. If we configure the dir for the whole 
pool, we 
   may loose the flexibility that different jobs may set different 
semi_persist_dir for the workers in the same worker pool?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307135)
Time Spent: 40m  (was: 0.5h)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=305641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-305641
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 03/Sep/19 15:15
Start Date: 03/Sep/19 15:15
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9452: [BEAM-7945] Allow 
runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r320328068
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -815,6 +815,7 @@ message StartWorkerRequest {
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 
5;
+  string semi_persist_dir = 6;
 
 Review comment:
   Should this be dynamic or rather configured up front for the worker pool?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 305641)
Time Spent: 0.5h  (was: 20m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=303951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303951
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 29/Aug/19 23:34
Start Date: 29/Aug/19 23:34
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9452: [BEAM-7945] 
Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#issuecomment-526399094
 
 
   I appreciate if you have time to look up the changes @robertwb @mxm :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 303951)
Time Spent: 20m  (was: 10m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=303612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303612
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 29/Aug/19 11:45
Start Date: 29/Aug/19 11:45
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9452: 
[BEAM-7945] Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452
 
 
   Currently "semi_persist_dir" is not configurable. This may become a problem 
in certain scenarios. For example, the default value of "semi_persist_dir" is 
"/tmp" 
(https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48) 
in Python SDK harness. When the environment type is "PROCESS", the disk of 
"/tmp" may be filled up and unexpected issues will occur in production 
environment. 
   
   So, This pull request makes the semi_persist_dir configurable through adding 
a new PipelineOption(RemoteEnvironmentOptions).The Pipeline option will be 
passed to the `DefaultJobBundleFactory` and then be used in each 
EnvironmentFactory(docker, process, external and embedded).
   
   For details of the discussion can be found in [1].
   
   [1] 
https://lists.apache.org/list.html?d...@beam.apache.org:lte=1M:%5BDISCUSS%5D%20Turn%20%60WindowedValue
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build