[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=97=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-97 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 18:37 Start Date: 12/Jun/18 18:37 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194573872 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,406 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import static com.google.common.base.Preconditions.checkArgument; + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.common.collect.ImmutableMap; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + private static final Joiner JOINER = Joiner.on(""); + private static final Charset CHARSET = StandardCharsets.UTF_8; + private static final int DATA_1KB = 1 << 10; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=94=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 18:37 Start Date: 12/Jun/18 18:37 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194573699 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,406 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import static com.google.common.base.Preconditions.checkArgument; + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.common.collect.ImmutableMap; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + private static final Joiner JOINER = Joiner.on(""); + private static final Charset CHARSET = StandardCharsets.UTF_8; + private static final int DATA_1KB = 1 << 10; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=98=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-98 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 18:37 Start Date: 12/Jun/18 18:37 Worklog Time Spent: 10m Work Description: jkff closed pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java new file mode 100644 index 000..48d8ad6d610 --- /dev/null +++ b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import static com.google.common.base.Preconditions.checkNotNull; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This implementation is experimental. + * + * {@link ArtifactStagingServiceImplBase} based on beam file system. {@link + * BeamFileSystemArtifactStagingService} requires {@link StagingSessionToken} in every me call. The + * manifest is put in {@link StagingSessionToken#getBasePath()}/{@link + * StagingSessionToken#getSessionId()} and artifacts are put in {@link + * StagingSessionToken#getBasePath()}/{@link StagingSessionToken#getSessionId()}/{@link + * BeamFileSystemArtifactStagingService#ARTIFACTS}. + * + * The returned token is the path to the manifest file. + * + * The manifest file is encoded in {@link ProxyManifest}. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + public static final String
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=95=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 18:37 Start Date: 12/Jun/18 18:37 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194573653 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,406 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import static com.google.common.base.Preconditions.checkArgument; + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.common.collect.ImmutableMap; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + private static final Joiner JOINER = Joiner.on(""); + private static final Charset CHARSET = StandardCharsets.UTF_8; + private static final int DATA_1KB = 1 << 10; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=93=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 18:37 Start Date: 12/Jun/18 18:37 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194573173 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import static com.google.common.base.Preconditions.checkNotNull; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This implementation is experimental. + * + * {@link ArtifactStagingServiceImplBase} based on beam file system. {@link + * BeamFileSystemArtifactStagingService} requires {@link StagingSessionToken} in every me call. The + * manifest is put in {@link StagingSessionToken#getBasePath()}/{@link + * StagingSessionToken#getSessionId()} and artifacts are put in {@link + * StagingSessionToken#getBasePath()}/{@link StagingSessionToken#getSessionId()}/{@link + * BeamFileSystemArtifactStagingService#ARTIFACTS}. + * + * The returned token is the path to the manifest file. + * + * The manifest file is encoded in {@link ProxyManifest}. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + public static final String ARTIFACTS = "artifacts"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=96=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-96 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 18:37 Start Date: 12/Jun/18 18:37 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194573619 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,406 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import static com.google.common.base.Preconditions.checkArgument; + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.common.collect.ImmutableMap; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + private static final Joiner JOINER = Joiner.on(""); + private static final Charset CHARSET = StandardCharsets.UTF_8; + private static final int DATA_1KB = 1 << 10; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=19=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-19 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 12/Jun/18 16:43 Start Date: 12/Jun/18 16:43 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194810545 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws Exception { assertFiles(files.keySet(), stagingToken); } + @Test + public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception { +String stagingSession = "123"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken, +Paths.get(srcDir.toString(), fileName).toAbsolutePath().toString(), fileName); + } catch (Exception e) { +Assert.fail(e.getMessage()); + } + metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build()); +}); + } +} finally { + executorService.shutdown(); + executorService.awaitTermination(2, TimeUnit.SECONDS); +} + +String stagingToken = commitManifest(stagingSessionToken, metadata); +Assert.assertEquals( +Paths.get(destDir.toAbsolutePath().toString(), stagingSession, "MANIFEST").toString(), +stagingToken); +assertFiles(files.keySet(), stagingToken); + } + + @Test + public void putArtifactsMultipleFilesConcurrentSessionsTest() throws Exception { +String stagingSession1 = "123"; +String stagingSession2 = "abc"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken1 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession1, destDir.toUri().getPath()); +String stagingSessionToken2 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession2, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken1, Review comment: I was actually thinking of something much simpler, i.e upload file 1a, upload file 2a, upload file 1b, check, but this should have the same coverage. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110878 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 22:40 Start Date: 11/Jun/18 22:40 Worklog Time Spent: 10m Work Description: angoenka commented on issue #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#issuecomment-396409541 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110878) Time Spent: 14h 40m (was: 14.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 14h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110876 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 22:31 Start Date: 11/Jun/18 22:31 Worklog Time Spent: 10m Work Description: angoenka commented on issue #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#issuecomment-396407813 retest please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110876) Time Spent: 14.5h (was: 14h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 14.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110846 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:25 Start Date: 11/Jun/18 21:25 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194551115 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws Exception { assertFiles(files.keySet(), stagingToken); } + @Test + public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception { +String stagingSession = "123"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken, +Paths.get(srcDir.toString(), fileName).toAbsolutePath().toString(), fileName); + } catch (Exception e) { +Assert.fail(e.getMessage()); + } + metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build()); +}); + } +} finally { + executorService.shutdown(); + executorService.awaitTermination(2, TimeUnit.SECONDS); +} + +String stagingToken = commitManifest(stagingSessionToken, metadata); +Assert.assertEquals( +Paths.get(destDir.toAbsolutePath().toString(), stagingSession, "MANIFEST").toString(), +stagingToken); +assertFiles(files.keySet(), stagingToken); + } + + @Test + public void putArtifactsMultipleFilesConcurrentSessionsTest() throws Exception { +String stagingSession1 = "123"; +String stagingSession2 = "abc"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken1 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession1, destDir.toUri().getPath()); +String stagingSessionToken2 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession2, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken1, Review comment: The API does not allow uploading multiple chunks of same file in parallel. This testcase simulates file uploaded by 2 separate session in parallel. I will create 2 sets of files here which should make the session completely different. This is an automated message from the Apache Git Service. To respond to the message, please log on
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110843 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:15 Start Date: 11/Jun/18 21:15 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194550351 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws Exception { assertFiles(files.keySet(), stagingToken); } + @Test + public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception { +String stagingSession = "123"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken, +Paths.get(srcDir.toString(), fileName).toAbsolutePath().toString(), fileName); + } catch (Exception e) { +Assert.fail(e.getMessage()); + } + metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build()); +}); + } +} finally { + executorService.shutdown(); + executorService.awaitTermination(2, TimeUnit.SECONDS); +} + +String stagingToken = commitManifest(stagingSessionToken, metadata); +Assert.assertEquals( +Paths.get(destDir.toAbsolutePath().toString(), stagingSession, "MANIFEST").toString(), +stagingToken); +assertFiles(files.keySet(), stagingToken); + } + + @Test + public void putArtifactsMultipleFilesConcurrentSessionsTest() throws Exception { +String stagingSession1 = "123"; +String stagingSession2 = "abc"; +Map files = new HashMap<>(); Review comment: ImmutableMap.of(k,v) is only applicable for 5 k-v while we are having more KVs. Using builders to create the immutable map. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110843) Time Spent: 14h 10m (was: 14h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 14h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110836 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194535246 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110830 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194525812 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110835 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194535202 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110832 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194527254 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110833 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194547405 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. Review comment: sure. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110833) Time Spent: 13h 40m (was: 13.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 13h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110834 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194536154 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110837 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194534830 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110829 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194526541 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110831 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:04 Start Date: 11/Jun/18 21:04 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194531422 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110828 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:03 Start Date: 11/Jun/18 21:03 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194545548 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws Exception { assertFiles(files.keySet(), stagingToken); } + @Test + public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception { +String stagingSession = "123"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken, +Paths.get(srcDir.toString(), fileName).toAbsolutePath().toString(), fileName); + } catch (Exception e) { +Assert.fail(e.getMessage()); + } + metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build()); +}); + } +} finally { + executorService.shutdown(); + executorService.awaitTermination(2, TimeUnit.SECONDS); +} + +String stagingToken = commitManifest(stagingSessionToken, metadata); +Assert.assertEquals( +Paths.get(destDir.toAbsolutePath().toString(), stagingSession, "MANIFEST").toString(), +stagingToken); +assertFiles(files.keySet(), stagingToken); + } + + @Test + public void putArtifactsMultipleFilesConcurrentSessionsTest() throws Exception { +String stagingSession1 = "123"; +String stagingSession2 = "abc"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken1 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession1, destDir.toUri().getPath()); +String stagingSessionToken2 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession2, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken1, Review comment: There should be at least *some* difference in what is being placed to actually verify there is no cross-staging interference. Granted, there's also no need to place a huge number of multi-chunk files here; a single file for one and two for the other would be perfectly fine. This is an automated message from the Apache Git Service. To respond to
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110825 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:02 Start Date: 11/Jun/18 21:02 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194546744 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110827 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:02 Start Date: 11/Jun/18 21:02 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194544535 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws Exception { assertFiles(files.keySet(), stagingToken); } + @Test + public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception { +String stagingSession = "123"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken, +Paths.get(srcDir.toString(), fileName).toAbsolutePath().toString(), fileName); + } catch (Exception e) { +Assert.fail(e.getMessage()); + } + metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build()); +}); + } +} finally { + executorService.shutdown(); + executorService.awaitTermination(2, TimeUnit.SECONDS); +} + +String stagingToken = commitManifest(stagingSessionToken, metadata); +Assert.assertEquals( +Paths.get(destDir.toAbsolutePath().toString(), stagingSession, "MANIFEST").toString(), +stagingToken); +assertFiles(files.keySet(), stagingToken); + } + + @Test + public void putArtifactsMultipleFilesConcurrentSessionsTest() throws Exception { +String stagingSession1 = "123"; +String stagingSession2 = "abc"; +Map files = new HashMap<>(); Review comment: Nit (here and above): I prefer using ImmutableMaps for constants like this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110827) Time Spent: 12h 50m (was: 12h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 12h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110826 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 11/Jun/18 21:02 Start Date: 11/Jun/18 21:02 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194545548 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws Exception { assertFiles(files.keySet(), stagingToken); } + @Test + public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception { +String stagingSession = "123"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken, +Paths.get(srcDir.toString(), fileName).toAbsolutePath().toString(), fileName); + } catch (Exception e) { +Assert.fail(e.getMessage()); + } + metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build()); +}); + } +} finally { + executorService.shutdown(); + executorService.awaitTermination(2, TimeUnit.SECONDS); +} + +String stagingToken = commitManifest(stagingSessionToken, metadata); +Assert.assertEquals( +Paths.get(destDir.toAbsolutePath().toString(), stagingSession, "MANIFEST").toString(), +stagingToken); +assertFiles(files.keySet(), stagingToken); + } + + @Test + public void putArtifactsMultipleFilesConcurrentSessionsTest() throws Exception { +String stagingSession1 = "123"; +String stagingSession2 = "abc"; +Map files = new HashMap<>(); +files.put("file5cb", (DATA_1KB / 2) /*500b*/); +files.put("file1kb", DATA_1KB /*1 kb*/); +files.put("file15cb", (DATA_1KB * 3) / 2 /*1.5 kb*/); +files.put("nested/file1kb", DATA_1KB /*1 kb*/); +files.put("file10kb", 10 * DATA_1KB /*10 kb*/); +files.put("file100kb", 100 * DATA_1KB /*100 kb*/); + +final String text = "abcdefghinklmop\n"; +files.forEach((fileName, size) -> { + Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath(); + try { +Files.createDirectories(filePath.getParent()); +Files.write(filePath, +Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / text.length())).intValue()) +.getBytes(CHARSET)); + } catch (IOException ignored) { + } +}); +String stagingSessionToken1 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession1, destDir.toUri().getPath()); +String stagingSessionToken2 = BeamFileSystemArtifactStagingService +.generateStagingSessionToken(stagingSession2, destDir.toUri().getPath()); + +List metadata = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(8); +try { + for (String fileName : files.keySet()) { +executorService.execute(() -> { + try { +putArtifact(stagingSessionToken1, Review comment: There should be at least *some* difference in what is being placed to actually verify there is no cross-staging interference. Granted, there's also no need to place a huge number of files here, a single file for one and two for the other would be perfectly fine. This is an automated message from the Apache Git Service. To respond to the message,
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110442 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238250 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110434 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238156 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110436 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238135 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110437 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238107 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() Review comment: Oh nice, didn't know we already had such a proto. CC: @axelmagn This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110437) Time Spent: 11h 40m (was: 11.5h) > ArtifactStagingService that stages to a distributed filesystem >
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110440 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238203 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. Review comment: Please document more about how this works - how it stores artifacts, manifests etc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110440) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110439 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238189 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110443 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238229 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110435 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238157 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110438 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238271 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110441 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 20:32 Start Date: 09/Jun/18 20:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194238285 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId manifestResourceId = getManifestFileResourceId(request.getStagingSessionToken()); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder() + .setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { +proxyManifestBuilder.addLocation(Location.newBuilder() +.setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +.toString()).build()); + } + try (WritableByteChannel manifestWritableByteChannel = FileSystems + .create(manifestResourceId, MimeTypes.TEXT)) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110369 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194207605 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110363 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194207729 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110362 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194207275 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) Review comment: Location is a proto so will keep it as it is
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110365 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194207894 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110366 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194206311 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110366) > ArtifactStagingService that stages to a
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110368 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194209128 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110361 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194208734 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110364 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194207039 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); Review comment: Makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110370 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194208961 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110367 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 02:03 Start Date: 09/Jun/18 02:03 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194208717 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110343 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194203941 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); Review comment: Nit: for symmetry, maybe also encapsulate the above into a getManifestFileResourceId? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ---
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110349 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194205918 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110350 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194205879 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110345 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194204766 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110348 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194204644 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) Review comment: Should location instead be a map? This is an automated message from the Apache Git
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110344 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194202354 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110340 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194204099 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); Review comment: It would be easier to read if this were fully qualified. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110341 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194200497 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,285 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110346 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194204959 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110342 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194205152 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110351 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194205685 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110347 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 09/Jun/18 00:09 Start Date: 09/Jun/18 00:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194205444 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110328 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 08/Jun/18 22:45 Start Date: 08/Jun/18 22:45 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194193541 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + + +import com.google.common.base.Joiner; +import com.google.common.base.Strings; +import com.google.protobuf.ByteString; +import com.google.protobuf.util.JsonFormat; +import io.grpc.inprocess.InProcessChannelBuilder; +import io.grpc.stub.StreamObserver; +import java.io.FileInputStream; +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactChunk; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.Manifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.InProcessServerFactory; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Tests for {@link BeamFileSystemArtifactStagingService}. + */ +@RunWith(JUnit4.class) +public class BeamFileSystemArtifactStagingServiceTest { + + public static final Joiner JOINER = Joiner.on(""); + public static final Charset CHARSET = StandardCharsets.UTF_8; + private GrpcFnServer server; + private BeamFileSystemArtifactStagingService artifactStagingService; + private ArtifactStagingServiceStub stub; + private Path srcDir; + private Path destDir; + + @Before + public void setUp() throws Exception { +artifactStagingService = new BeamFileSystemArtifactStagingService(); +server = GrpcFnServer +.allocatePortAndCreateFor(artifactStagingService, InProcessServerFactory.create()); +stub = +ArtifactStagingServiceGrpc.newStub( + InProcessChannelBuilder.forName(server.getApiServiceDescriptor().getUrl()).build()); + +srcDir = Files.createTempDirectory("BFSTemp"); +destDir = Files.createTempDirectory("BFDTemp"); + + } + + @After + public void tearDown() throws Exception { +if (server != null) { +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110327 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 08/Jun/18 22:45 Start Date: 08/Jun/18 22:45 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#discussion_r194193649 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java ## @@ -0,0 +1,285 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.artifact; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.hash.Hashing; +import com.google.protobuf.util.JsonFormat; +import io.grpc.stub.StreamObserver; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Serializable; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Builder; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest; +import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse; +import org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase; +import org.apache.beam.runners.fnexecution.FnService; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.util.MimeTypes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link ArtifactStagingServiceImplBase} based on beam file system. + */ +public class BeamFileSystemArtifactStagingService extends ArtifactStagingServiceImplBase implements +FnService { + + private static final Logger LOG = + LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class); + private static final ObjectMapper MAPPER = new ObjectMapper(); + // Use UTF8 for all text encoding. + private static final Charset CHARSET = StandardCharsets.UTF_8; + public static final String MANIFEST = "MANIFEST"; + + @Override + public StreamObserver putArtifact( + StreamObserver responseObserver) { +return new PutArtifactStreamObserver(responseObserver); + } + + @Override + public void commitManifest( + CommitManifestRequest request, StreamObserver responseObserver) { +try { + ResourceId jobResourceDirId = getJobDirResourceId(request.getStagingSessionToken()); + ResourceId manifestResourceId = jobResourceDirId + .resolve(MANIFEST, StandardResolveOptions.RESOLVE_FILE); + ResourceId artifactDirResourceId = getArtifactDirResourceId(request.getStagingSessionToken()); + Builder proxyManifestBuilder = ProxyManifest.newBuilder().setManifest(request.getManifest()); + for (ArtifactMetadata artifactMetadata : request.getManifest().getArtifactList()) { + proxyManifestBuilder.addLocation(Location.newBuilder().setName(artifactMetadata.getName()) +.setUri(artifactDirResourceId +.resolve(encodedFileName(artifactMetadata), StandardResolveOptions.RESOLVE_FILE) +
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110309 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 08/Jun/18 22:14 Start Date: 08/Jun/18 22:14 Worklog Time Spent: 10m Work Description: angoenka opened a new pull request #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591 Artifact staging service which uses BeamFileSystem to stage files on various file systems. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110309) Time Spent: 7h 50m (was: 7h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110310 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 08/Jun/18 22:14 Start Date: 08/Jun/18 22:14 Worklog Time Spent: 10m Work Description: angoenka commented on issue #5591: [BEAM-4290] Beam File System based ArtifactStagingService URL: https://github.com/apache/beam/pull/5591#issuecomment-395906244 R: @bsidhom @jkff @axelmagn @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110310) Time Spent: 8h (was: 7h 50m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108920 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 05/Jun/18 02:23 Start Date: 05/Jun/18 02:23 Worklog Time Spent: 10m Work Description: herohde commented on issue #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#issuecomment-394559684 @angoenka This PR seems to have broken the build: ./gradlew build [...] Task :beam-sdks-go-container:resolveBuildDependencies Resolving ./github.com/apache/beam/sdks/go@/Users/herohde/go/src/github.com/apache/beam/sdks/go github.com/apache/beam/runners/gcp/gcsproxy/vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:133:32: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:145:58: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:148:7: md.Md5 undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Md5) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:149:66: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:149:81: md.Md5 undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Md5) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:153:12: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108920) Time Spent: 7h 40m (was: 7.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108919 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 05/Jun/18 02:22 Start Date: 05/Jun/18 02:22 Worklog Time Spent: 10m Work Description: herohde commented on issue #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#issuecomment-394559684 @angoenka This PR seems to have broken the build: # ./gradlew build [...] > Task :beam-sdks-go-container:resolveBuildDependencies Resolving ./github.com/apache/beam/sdks/go@/Users/herohde/go/src/github.com/apache/beam/sdks/go # github.com/apache/beam/runners/gcp/gcsproxy/vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:133:32: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:145:58: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:148:7: md.Md5 undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Md5) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:149:66: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:149:81: md.Md5 undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Md5) vendor/github.com/apache/beam/sdks/go/pkg/beam/artifact/gcsproxy/staging.go:153:12: md.Name undefined (type *jobmanagement_v1.PutArtifactMetadata has no field or method Name) > Task :beam-runners-gcp-gcsproxy:buildLinuxAmd64 FAILED This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108919) Time Spent: 7.5h (was: 7h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108773 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 21:41 Start Date: 04/Jun/18 21:41 Worklog Time Spent: 10m Work Description: angoenka commented on issue #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#issuecomment-394508969 Yes, The missing "token" will be filled in subsequent PR where applicable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108773) Time Spent: 7h 10m (was: 7h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108740 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192501650 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) A token for artifact staging session. Review comment: Yes, staging_session_token is a session token so all the artifacts related to that session are expected to use the same token. Sure, I will update the documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108740) Time Spent: 6.5h (was: 6h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108745 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192509819 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactServiceStager.java ## @@ -145,7 +146,9 @@ public ArtifactMetadata get() throws Exception { StreamObserver requestObserver = stub.putArtifact(responseObserver); ArtifactMetadata metadata = ArtifactMetadata.newBuilder().setName(file.getStagingName()).build(); - requestObserver.onNext(PutArtifactRequest.newBuilder().setMetadata(metadata).build()); + PutArtifactMetadata putMetadata = PutArtifactMetadata.newBuilder().setMetadata(metadata) + .setStagingSessionToken("token").build(); Review comment: I will add the todo as the token is not implemented yet. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108745) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108741 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192502440 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -124,6 +127,8 @@ message PutArtifactResponse { message CommitManifestRequest { // (Required) The manifest to commit. Manifest manifest = 1; + // (Required) A token for artifact staging session. + string staging_session_token = 2; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108741) Time Spent: 6h 40m (was: 6.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108746 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192511665 ## File path: runners/reference/java/src/main/java/org/apache/beam/runners/reference/testing/TestJobService.java ## @@ -59,6 +59,7 @@ public void prepare( PrepareJobResponse.newBuilder() .setPreparationId(preparationId) .setArtifactStagingEndpoint(stagingEndpoint) +.setStagingSessionToken("TestStagingToken") Review comment: At this point, it should only be set as it is not currently used. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108746) Time Spent: 7h (was: 6h 50m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108744 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192872171 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -87,7 +87,9 @@ def Prepare(self, request, context=None): use_grpc=self._use_grpc, sdk_harness_factory=sdk_harness_factory) logging.debug("Prepared job '%s' as '%s'", request.job_name, preparation_id) -return beam_job_api_pb2.PrepareJobResponse(preparation_id=preparation_id) +# TODO(angoenka): Pass an appropriate staging_session_token Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108744) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108738 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192509534 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; + + // (required) Token for the artifact staging. This token also represent an artifact Review comment: The job service is expected to generate the token some how. Adding how its generated here will me suggestion for implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108738) Time Spent: 6h 10m (was: 6h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108743 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192510656 ## File path: runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobService.java ## @@ -97,6 +97,7 @@ public void prepare( PrepareJobResponse.newBuilder() .setPreparationId(preparationId) .setArtifactStagingEndpoint(artifactStagingService.getApiServiceDescriptor()) + .setStagingSessionToken(tempDir.toFile().getAbsolutePath()) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108743) Time Spent: 6h 50m (was: 6h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108742 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192511287 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/InMemoryJobService.java ## @@ -114,6 +114,8 @@ public void prepare( .newBuilder() .setPreparationId(preparationId) .setArtifactStagingEndpoint(stagingServiceDescriptor) + // TODO: Pass the correct token for staging. Review comment: The correct token will depend upon the implementation of ArtifactStagingService. We will have to revisit this in next PR where I will add the ArtifactStagingService. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108742) Time Spent: 6h 40m (was: 6.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108739 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 04/Jun/18 20:53 Start Date: 04/Jun/18 20:53 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192510959 ## File path: runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/artifact/LocalFileSystemArtifactStagerServiceTest.java ## @@ -88,7 +88,11 @@ public void singleDataPutArtifactSucceeds() throws Exception { String name = "my-artifact"; requestObserver.onNext( ArtifactApi.PutArtifactRequest.newBuilder() - .setMetadata(ArtifactApi.ArtifactMetadata.newBuilder().setName(name).build()) +.setMetadata( Review comment: Not really. The LocalFileSystemArtifactStagerServiceTest does not use StagingSessionToken so there is no need to verify at the moment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108739) Time Spent: 6h 20m (was: 6h 10m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108174 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499066 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) A token for artifact staging session. Review comment: Are all PutArtifactMetadata's within the same PutArtifactRequest stream required to use the same token? Also: document where this token is supposed to come from? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108174) Time Spent: 5h 20m (was: 5h 10m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108176 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499216 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -124,6 +127,8 @@ message PutArtifactResponse { message CommitManifestRequest { // (Required) The manifest to commit. Manifest manifest = 1; + // (Required) A token for artifact staging session. + string staging_session_token = 2; Review comment: Document where this token comes from? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108176) Time Spent: 5h 40m (was: 5.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108179 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499781 ## File path: runners/reference/java/src/main/java/org/apache/beam/runners/reference/testing/TestJobService.java ## @@ -59,6 +59,7 @@ public void prepare( PrepareJobResponse.newBuilder() .setPreparationId(preparationId) .setArtifactStagingEndpoint(stagingEndpoint) +.setStagingSessionToken("TestStagingToken") Review comment: Ditto, does it need to be verified or only set? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108179) Time Spent: 6h (was: 5h 50m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108177 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499599 ## File path: runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobService.java ## @@ -97,6 +97,7 @@ public void prepare( PrepareJobResponse.newBuilder() .setPreparationId(preparationId) .setArtifactStagingEndpoint(artifactStagingService.getApiServiceDescriptor()) + .setStagingSessionToken(tempDir.toFile().getAbsolutePath()) Review comment: Add a comment saying that we intentionally use the temp dir path as the staging token, and clarify who (what class) is going to interpret it this way? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108177) Time Spent: 5h 50m (was: 5h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108180 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499671 ## File path: runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/artifact/LocalFileSystemArtifactStagerServiceTest.java ## @@ -88,7 +88,11 @@ public void singleDataPutArtifactSucceeds() throws Exception { String name = "my-artifact"; requestObserver.onNext( ArtifactApi.PutArtifactRequest.newBuilder() - .setMetadata(ArtifactApi.ArtifactMetadata.newBuilder().setName(name).build()) +.setMetadata( Review comment: In this file we only set the token - should we also verify it somewhere or it's not needed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108180) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108172 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499486 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactServiceStager.java ## @@ -145,7 +146,9 @@ public ArtifactMetadata get() throws Exception { StreamObserver requestObserver = stub.putArtifact(responseObserver); ArtifactMetadata metadata = ArtifactMetadata.newBuilder().setName(file.getStagingName()).build(); - requestObserver.onNext(PutArtifactRequest.newBuilder().setMetadata(metadata).build()); + PutArtifactMetadata putMetadata = PutArtifactMetadata.newBuilder().setMetadata(metadata) + .setStagingSessionToken("token").build(); Review comment: This looks like a dummy value in a non-test class: is this correct? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108172) Time Spent: 5h (was: 4h 50m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108178 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499706 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/InMemoryJobService.java ## @@ -114,6 +114,8 @@ public void prepare( .newBuilder() .setPreparationId(preparationId) .setArtifactStagingEndpoint(stagingServiceDescriptor) + // TODO: Pass the correct token for staging. Review comment: Address this - what would be the correct token? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108178) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108181 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499852 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -87,7 +87,9 @@ def Prepare(self, request, context=None): use_grpc=self._use_grpc, sdk_harness_factory=sdk_harness_factory) logging.debug("Prepared job '%s' as '%s'", request.job_name, preparation_id) -return beam_job_api_pb2.PrepareJobResponse(preparation_id=preparation_id) +# TODO(angoenka): Pass an appropriate staging_session_token Review comment: Ditto This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108181) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108173 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192499261 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; + + // (required) Token for the artifact staging. This token also represent an artifact Review comment: Ditto This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108173) Time Spent: 5h 10m (was: 5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=108175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108175 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 01/Jun/18 19:57 Start Date: 01/Jun/18 19:57 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support staging_session_token URL: https://github.com/apache/beam/pull/5489#discussion_r192500128 ## File path: sdks/python/apache_beam/runners/portability/portable_stager_test.py ## @@ -64,7 +64,9 @@ def _stage_files(self, files): test_port = server.add_insecure_port('[::]:0') server.start() stager = portable_stager.PortableStager( -grpc.insecure_channel('localhost:%s' % test_port)) +artifact_service_channel=grpc.insecure_channel( +'localhost:%s' % test_port), +staging_session_token='token') Review comment: Ditto - verify that it's present in the requests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108175) Time Spent: 5.5h (was: 5h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107937 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 31/May/18 23:25 Start Date: 31/May/18 23:25 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r192264472 ## File path: sdks/go/gogradle.lock ## @@ -200,7 +200,7 @@ dependencies: - "g...@github.com:golang/protobuf.git" vcs: "git" name: "github.com/golang/protobuf" -commit: "bbd03ef6da3a115852eaf24c8a1c46aeb39aa175" +commit: "3a3da3a4e26776cc22a79ef46d5d58477532dede" Review comment: The new proto generator is incompatible with old golang/protobuf Reference discussion https://github.com/google/protobuf/issues/4582 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107937) Time Spent: 4h 50m (was: 4h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107936 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 31/May/18 23:25 Start Date: 31/May/18 23:25 Worklog Time Spent: 10m Work Description: angoenka commented on issue #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#issuecomment-393713259 The PR is ready for review PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107936) Time Spent: 4h 40m (was: 4.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107911 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 31/May/18 22:10 Start Date: 31/May/18 22:10 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r192252323 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; + // (Required) The Artifact metadata. + ArtifactMetadata metadata = 2; +} + // A request to stage an artifact. message PutArtifactRequest { // (Required) oneof content { -// The Artifact metadata. The first message in a PutArtifact call must contain the name -// of the artifact. -ArtifactMetadata metadata = 1; +// The first message in a PutArtifact call must contain this field. Review comment: The structure of proto makes it difficult to pass an additional field in there. To pass an additional field in PutArtifactRequest we will have to some thing like this ``` message PutArtifactRequest { // (Required) oneof content { // The Artifact metadata. The FIRST message in a PutArtifact call must contain the name // of the artifact. string staging_session_token = 1; // The Artifact metadata. The SECOND message in a PutArtifact call must contain the name // of the artifact. ArtifactMetadata metadata = 2; // A chunk of the artifact. All messages after the first in a PutArtifact call must contain a // chunk. ArtifactChunk data = 3; } } ``` To avoid this sequencing of fields, I prefer to make a separate Message which should be passed in first request. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107911) Time Spent: 4.5h (was: 4h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107910 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 31/May/18 22:09 Start Date: 31/May/18 22:09 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r192252323 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; + // (Required) The Artifact metadata. + ArtifactMetadata metadata = 2; +} + // A request to stage an artifact. message PutArtifactRequest { // (Required) oneof content { -// The Artifact metadata. The first message in a PutArtifact call must contain the name -// of the artifact. -ArtifactMetadata metadata = 1; +// The first message in a PutArtifact call must contain this field. Review comment: The structure of proto makes it difficult to pass an additional field in there. To pass an additional field in PutArtifactRequest we will have to some thing like this ``` message PutArtifactRequest { // (Required) oneof content { // The Artifact metadata. The FIRST message in a PutArtifact call must contain the name // of the artifact. string staging_session_token = 1; // The Artifact metadata. The SECOND message in a PutArtifact call must contain the name // of the artifact. ArtifactMetadata metadata = 2; // A chunk of the artifact. All messages after the first in a PutArtifact call must contain a // chunk. ArtifactChunk data = 3; } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107910) Time Spent: 4h 20m (was: 4h 10m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107877 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 31/May/18 20:49 Start Date: 31/May/18 20:49 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r192168198 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; + // (Required) The Artifact metadata. + ArtifactMetadata metadata = 2; +} + // A request to stage an artifact. message PutArtifactRequest { // (Required) oneof content { -// The Artifact metadata. The first message in a PutArtifact call must contain the name -// of the artifact. -ArtifactMetadata metadata = 1; +// The first message in a PutArtifact call must contain this field. Review comment: Any reason not to put the staging session token as a top-level field in PutArtifactRequest, instead of adding the new message PutArtifactMetadata? The latter feels confusing since the session token is not really associated with any particular artifact. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107877) Time Spent: 4h 10m (was: 4h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107733 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 31/May/18 16:15 Start Date: 31/May/18 16:15 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r192155816 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107733) Time Spent: 4h (was: 3h 50m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107403 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 22:28 Start Date: 30/May/18 22:28 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191942404 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: Just to reiterate on naming. I don't have strong preference for naming between `artifact_staging_id and staging_session_token` But I will go ahead with `staging_session_token`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107403) Time Spent: 3h 50m (was: 3h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107402 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 22:25 Start Date: 30/May/18 22:25 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191941805 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; Review comment: Had an offline discussion with @herohde Enhancement to `ApiServiceDescriptor` is certainly some thing to consider but its our of scope for this PR. For This PR we are going with approach 3 mentioned in the document. https://docs.google.com/document/d/12zNk3O2nhTB8Zmxw5U78qXrvlk5r42X8tqF248IDlpI/edit#heading=h.mvxjskcybk6q This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107402) Time Spent: 3h 40m (was: 3.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107392 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 21:41 Start Date: 30/May/18 21:41 Worklog Time Spent: 10m Work Description: axelmagn commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191932190 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: I see. I guess I didn't realize it was that widespread. In that case I retract my objections, since you're right that we should follow the prevailing style. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107392) Time Spent: 3.5h (was: 3h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107348 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 20:02 Start Date: 30/May/18 20:02 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191904156 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: We have a bunch of places where there is an token/id and a runner may choose to put something there. Migrating to struct for all the tokens is a valid discussion to have but I feel should be separate from this PR as this PR already copies existing behavior when it comes to tokens. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107348) Time Spent: 3h 20m (was: 3h 10m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107342 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 19:44 Start Date: 30/May/18 19:44 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191898873 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; Review comment: Thanks for the link. I agree with @lukecwik on adding more information to `ApiServiceDescriptor` as ApiServiceDescriptor is meant to hold all the relevant connection information. In that case it is fair to pack headers for connections in `ApiServiceDescriptor` which the framework should simply pass. Enhancing `ApiServiceDescriptor` to have headers (for now and credentials etc later) is a more suitable approach. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107342) Time Spent: 3h 10m (was: 3h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107337 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 19:36 Start Date: 30/May/18 19:36 Worklog Time Spent: 10m Work Description: axelmagn commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191896270 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: Okay. Understandable. Have we considered using an anonymous Struct for implementatation-specific metadata, so that we can decouple it from the identifier? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107337) Time Spent: 3h (was: 2h 50m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107336 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 19:34 Start Date: 30/May/18 19:34 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191895524 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; Review comment: Sorry. Forgot to link to it: https://github.com/apache/beam/pull/5349#discussion_r190624823. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107336) Time Spent: 2h 50m (was: 2h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107333 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 19:19 Start Date: 30/May/18 19:19 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191891215 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: Given the diversity of ArtifactStagingService, its really hard to capture all the required metadata. One thing which we are trying to do in the proto is not to enforce any implementation details. From proto perspective, artifact_staging_id/token is just a text string. It can very well be just an id and the implementation of the service can look up that id in another system or it can just be json so that implementation can extract all the relevant information from the token without referring to another system. This gives the implementation flexibility to implement the service in the way they want. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107333) Time Spent: 2h 40m (was: 2.5h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107332 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 19:13 Start Date: 30/May/18 19:13 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191889464 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; Review comment: Sorry, I don't have context of that PR. Does the client_id refers to a common id which is shared across all grpc connection to identify client or is it a service specific id so that the service can identify client connected to it. I think if it is the 1st then we can pack all the relevant info in that id. However, I like the idea of having explicit token for artifact staging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107332) Time Spent: 2.5h (was: 2h 20m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=107331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107331 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 19:11 Start Date: 30/May/18 19:11 Worklog Time Spent: 10m Work Description: axelmagn commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191888360 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: The discussion document alludes to the following items to be specified by the job service and passed to the staging service: - Base directory to put artifacts in. - TTL for the artifacts. - Authentication to submit artifacts. - Credentials to store artifacts in distributed file system. Of these, how much is going to fit into the `artifact_staging_id`? Since this is already a metadata proto, why are we packing data into a string to be parsed later? Is there a particular parser for it that we already have an implementation for? If so, we need to document it as such. Otherwise I'd recommend that any metadata contained within the `artifact_staging_id` should be made explicit as fields in `PutArtifactMetadata`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107331) Time Spent: 2h 20m (was: 2h 10m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=106967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-106967 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 01:24 Start Date: 30/May/18 01:24 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191617046 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -69,12 +69,16 @@ message PrepareJobRequest { message PrepareJobResponse { // (required) The ID used to associate calls made while preparing the job. preparationId is used - // to run the job, as well as in other pre-execution APIs such as Artifact staging. + // to run the job. string preparation_id = 1; // An endpoint which exposes the Beam Artifact Staging API. Artifacts used by the job should be // staged to this endpoint, and will be available during job execution. org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_staging_endpoint = 2; Review comment: @lukecwik suggested in a separate PR to add a client_id to the ApiServiceDescriptor proto and always send that as a header. Would that ID be suitable for an artifact_staging_id separate from preparation_id? Then no other changes would be needed (except fixing the comment above) and the propagation would be done by general purpose logic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 106967) Time Spent: 2h 10m (was: 2h) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem
[ https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=106957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-106957 ] ASF GitHub Bot logged work on BEAM-4290: Author: ASF GitHub Bot Created on: 30/May/18 00:07 Start Date: 30/May/18 00:07 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #5489: [BEAM-4290] proto changes to support artifact_staging_id URL: https://github.com/apache/beam/pull/5489#discussion_r191610707 ## File path: model/job-management/src/main/proto/beam_artifact_api.proto ## @@ -102,13 +99,19 @@ message ArtifactChunk { bytes data = 1; } +message PutArtifactMetadata { + // (Required) An identifier for artifact staging session. + string artifact_staging_id = 1; Review comment: It makes sense. And thanks for giving it a better name. I will go with `artifact_staging_id -> staging_session_token` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 106957) Time Spent: 1h 50m (was: 1h 40m) > ArtifactStagingService that stages to a distributed filesystem > -- > > Key: BEAM-4290 > URL: https://issues.apache.org/jira/browse/BEAM-4290 > Project: Beam > Issue Type: Sub-task > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Using the job's staging directory from PipelineOptions. > Physical layout on the distributed filesystem is TBD but it should allow for > arbitrary filenames and ideally for eventually avoiding uploading artifacts > that are already there. > Handling credentials is TBD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)