[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1057574820 > @xintongsong How does this look? https://issues.apache.org/jira/browse/FLINK-26451 Perfect! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1056063900 @galenwarren, Yes, I think we should create one asap. Sorry, I should bring this up earlier. I haven't been closely following the release progress these days, distracted by some other works. A manual testing issue should provide enough information for an arbitrary 3rd party to test the feature. - Links to the documentation - Suggestions on how to verify the feature - In our case, we may want to add a notice that a google cloud / storage account is needed for the testing Moreover, in order for the issue to be properly tracked, the following fields should be set. - Fix Version/s: 1.15.0 - Priority: Blocker - Labels: release-testing You may find examples [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.15.0%20AND%20labels%20%3D%20release-testing%20ORDER%20BY%20priority%2C%20Rank%20ASC). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1019700632 @galenwarren, sorry for the late response. I was OoO the past a few days. > The change I'd propose to help @sap1ens (or anyone else who used the pre-released code) would be to change 0 to 1 in the following lines in GSCommitRecoverableSerializer and GSResumeRecoverableSerializer We don't usually provide compatibility guarantees for pre-released codes. However, in this specific case, I see barely any cost in making lives of the pre-released code users easier. So +1 for changing the serializer version. I've already opened a hotfix PR #18409 for including `flink-gs-fs-hadoop` into `flink-dist`, and I can change the serializer version there. Concerning the authentication, do you have an idea how much effort the 2nd approach may require? - I think it would be ideal that we can fallback to `core-site.xml`, if it contains the credentials AND `GOOGLE_APPLICATION_CREDENTIALS` is not specified. This would be my top preference, if feasible. - If the above approach requires too much effort / time (as we are approaching to the feature freeze date), I'd also be fine with documenting this as a known limitation (that `core-site.xml` does not work with the `RecoverableWriter` part), and address this as a follow-up issue in the next release cycle. - I would suggest not to exclude the `core-site.xml` approach from the documentation, unless we decide to deprecate / remove this approach for good. Otherwise, it might confuse users whether the approach is still available or not. And as far as I can see, there's not yet strong reasons for making such an incompatible change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1017058084 @sap1ens, Thanks for the notice, and sorry that your previous comment was overlooked among the tons of comments. I think you're correct that this should be included in `flink-dist`. I'll fix it with a hotfix commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1017032455 Thanks @galenwarren, merging this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1012679021 @galenwarren, The commit doesn't need to be too detailed, as more details can be found with the JIRA id. Something briefly describes the major changes would be good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1008252314 @galenwarren, Thanks for updating the PR. The PR looks very close to a mergable state. The tests look really good to me. I have only a few minor tests, majorly relates to exposure of implementation internal states for testing. Let me try to list the remaining actions before merging this PR. - Addressing the last comments. - Clean-up the commit history. (This can be as simple as squash everything into a big commit, with a proper descriptive message.) - Wait a bit longer for the compiling issue, see if @zentol has any insight when he comes back. Concerning the documentation, we definitely need it for this new feature. It would be even better to revisiting the documentation for the GCS connectivity as a whole: the [existing GCS page](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/gcs/), FLINK-25577 and this new recoverable writer. I'd suggest to work on the documentation in a separate JIRA, as this PR is already overlarge. We would need to get the documentation ready by feature freeze (Feb 6th), as required for all new features in the 1.15 release, so that people can start testing the features accordingly as soon as the freeze. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1007069020 We can for sure proceed with the code review. I'll give it another pass this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1006401289 > It seems that with this change, compiling Flink will work: [6e568b8](https://github.com/apache/flink/commit/6e568b86e5ce14fe03c6a74f91bb686ed393f0be) Now, the license check is failing. Not sure why. Is it related to this change? @rmetzger, I think the license check failure is because you put `grpclib` in the pom, which should be `grpclb`. But I have to admit that I have no idea why this commit can fix the compiling issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1004488375 Hi @galenwarren, Nice to hear from you and to know that you're ready for moving this PR forward. I'll give it a review asap. Regarding the CI failures: - The compiling issue is likely an environmental or connectivity problem. The build succeed in the `compile_ci` stage, where Alibaba provided hosts and maven mirror are used. The failures came from `e2e_1_ci` and `e2e_2_ci`, where Azure provided hosts and the Google maven mirror are used. I've verified that the Google mirror contains the required [jar](https://maven-central-eu.storage-download.googleapis.com/maven2/io/grpc/grpc-core/1.41.1/grpc-core-1.41.1.jar) and the [metadata](https://maven-central-eu.storage-download.googleapis.com/maven2/io/grpc/grpc-core/maven-metadata.xml) is correct. I suspect there's a problem / instability in downloading artifacts from the Google mirror in our Azure agents, and only the new dependencies run into the problem because we have cached the local maven repository from previous runs. CC @zentol, any insight to share? - The UnalignedCheckpointRescaleITCase failure should be a known instability (FLINK-25426), which is prioritized as a blocker and should be addressed soon. We can rebase this PR once FLINK-25426 is closed and see if the problem is solved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-938283249 @galenwarren Great to hear from you. It would be nice to see this in the next release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-877911839 @galenwarren No worries and take your time please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-876855655 Hi @galenwarren, How are things going? Just to let you know that the feature freeze for the 1.14 release will be around early August, which is about 3 weeks from now. No worries if you need more time for this. We don't necessarily need to get this feature into the 1.14 release. This notice is just in case you'd like to seeing it in the release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-862042096 Thanks for the info, @galenwarren. I guess I'll wait for the remaining being done. Concerning the testability, ideally yes. It would be nice to have test cases covering behaviors of both general interfaces (FileSystem, RecoverableWriter, etc.) and the internal GCS specific components, and the standard test suites are supposed to make the former easier. However, these standard test suites requires corresponding external storage services to be available. E.g., HDFS tests leverage the hadoop-provided `MiniDFSCluster`, S3 tests deploys a MinIO instance in docker. I'm not sure if there's a good way to setup a light-weighted testing purpose google storage service. If it requires too much efforts, I'd be fine with adding only the GCS specific unit tests, leaving the IT cases / E2E tests as a separate follow-up effort. PS: `AbstractRecoverableWriterTest` is probably also related, in addition to the two classes you've mentioned. PPS: We may also consider updating the documentation in a separate follow-up PR. For a major feature like this, we might want to involve some product specialists for the documentation review. It would help simplify the review process if the engineering and documentation changes are separated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-861150767 Thanks for updating the PR, @galenwarren. - The Azure build has been triggered, on your latest commit (e751250). It's probably not very timely. - It's good to hear that excluding `javax.annotation-api` works. That indeed fixes part of the problems. The complaint regarding prohibited GPL license has gone. - The `LicenseChecker` is still complaining about `Found a total of 1 severe license issues`. After checking the code, I found `Module flink-gs-fs-hadoop is missing a NOTICE file.` is considered as a severe issues. Admittedly, this is a bit implicit being logged at the warning level. - Could you please add a NOTICE file for this new module? You can take a look at this [guidelines](https://cwiki.apache.org/confluence/display/FLINK/Licensing). - I'll open a separate ticket taking care of the logging level issue. Except for the licensing issue, are we ready for another round of review? I noticed you have pushed a commit for adding logs. How is the rework of unit tests going? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-850241316 @galenwarren, Sorry, I haven't finished the review yet, distracted by some other works. I'm halfway there, and should be able to finish this early next week. Meantime, I did managed to look into the license issue. The problem is that `flink-gs-fs-hadoop` depends on `javax.annotation-api` (indirectly, via `google-cloud-storage`) at the `compile` scope. `javax.annotation-api` has a GPLv2 license, which means per the [Apache guidelines](https://www.apache.org/legal/resolved.html#category-x ) we cannot bundle it into any Apache Flink artifacts. Could you try exclude `javax.annotation-api` in the pom, see if that fix the CI failure? Moreover, could you try if the feature works without bundling this dependency? If not, users would need to manually download this dependency to the lib folder, and we need to mention this in documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-846680704 Thanks for the updates, @galenwarren. I'll try to give it another pass this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-833201079 @rmetzger, thanks for the pointer :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-829741180 Thank you, @galenwarren. I'm not aware of any guidelines on logging either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-829053315 @galenwarren FYI, next week is public holidays in China, and I could be less responsive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-827297699 @rmetzger, thanks for help looking into this. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-826292351 @galenwarren, In most cases, I pull the codes to review them, but not necessarily build them. For this time, I tried build your codes locally and have not run into any problems. I checked the logs, and it seems there's a connectivity issue between the azure worker and google maven mirror. This is a bit complicated. In short, IIUC, the problem should gone once we fix the license issues. - We have two worker pools for our ci build. - The `compile_ci` stage is executed on Alibaba hosted machines, downloading artifacts from Alibaba mirror. All artifacts can be properly downloaded, thus the license checking fails in 20+ minutes which is after downloading the `grpc` dependencies. - The `e2e_ci` stage is executed on Azure hosted machines, downloading artifacts from google mirror. The error happens in about 5 minutes, complaining about not finding the dependency. I think it's a connectivity issue because I tried downloading the same artifact from the same google mirror on my local machine and it succeed. - We leverage Azure cache for sharing maven dependency artifacts across builds. That means, for all dependencies that have been downloaded previously, they are fetched from azure cache rather than downloaded again from the maven mirror. That's probably why the `e2e_ci` build does not fail for other artifacts. This can be confirmed from the maven log, where no "Downloading xxx" can be found until the failure. I'll find people to look into the connectivity issue between google mirror and azure workers. Meantime, I think once we fix the license issues and make the `compile_ci` stage pass, the new dependencies should be uploaded to the azure cache, and we should no longer have this error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-826292351 @galenwarren, In most cases, I pull the codes to review them, but not necessarily build them. For this time, I tried build your codes locally and have not run into any problems. I checked the logs, and it seems there's a connectivity issue between the azure worker and google maven mirror. This is a bit complicated. In short, IIUC, the problem should gone once we fix the license issues. - We have two worker pools for our ci build. - The `compile_ci` stage is executed on Alibaba hosted machines, downloading artifacts from Alibaba mirror. All artifacts can be properly downloaded, thus the license checking fails in 20+ minutes which is after downloading the `grpc` dependencies. - The `e2e_ci` stage is executed on Azure hosted machines, downloading artifacts from google mirror. The error happens in about 5 minutes, complaining about not finding the dependency. I think it's a connectivity issue because I tried downloading the same artifact from the same google mirror on my local machine and it succeed. - We leverage Azure cache for sharing maven dependency artifacts across builds. That means, for all dependencies that have been downloaded previously, they are fetched from azure cache rather than downloaded again from the maven mirror. That's probably why the `e2e_ci` build does not fail for other artifacts. This can be confirmed from the maven log, where no "Downloading xxx" can be found until the failure. I'll find people to look into the connectivity issue between google mirror and azure workers. Meantime, I think once we fix the license issues and make the `compile_ci` stage pass, the new dependencies should be uploaded to the azure cache, and we should no longer have this error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #15599: [FLINK-11838][flink-gs-fs-hadoop] Create Google Storage file system with recoverable writer support
xintongsong commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-819175527 Thanks for preparing this PR, @galenwarren. I'll try to take a look asap. Quick response to your questions. - The licensing issues are described [here](https://cwiki.apache.org/confluence/display/FLINK/Licensing). In short, you need to manually create the NOTICE file. The script is mainly used for generating the root NOTICE file from those of the sub-modules. - I think `commit` should fail if the file is already committed. The contract of this interface says it publishes the file, making it visible. We should not allow a job restarted from an earlier checkpoint/savepoint to overwrite a published file. BTW, AZP failed during compiling. Please take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org