Hi, I've learned that Azure has released a new Java SDK for blob storage that replaces the SDK originally used to create the AzureDataStore. The new SDK is not backwards compatible with the original, but contains a key bug fix for an Oak bug identified in OAK-8013.
I'd like to have a discussion whether we should consider updating AzureDataStore to use this latest Azure SDK. Please take some time to read and weigh in. Question 1 - Why move from the old SDK to the new SDK? The old SDK has a bug which prevents a fix for OAK-8013 (see also OAK-8104). In the current state, Oak will not properly support direct download of binaries with special characters in the filename. The way to fix this issue is to move away from the old SDK. Question 2 - Why is moving to the new SDK a big deal? The new SDK is completely different from the old SDK. While the new SDK has new classes etc., the primary difference is a new paradigm - it uses a more fluent/event-driven, async-style model. Using this new SDK will require that AzureDataStore do some tricks to perform the async operations in synchronized ways, have to manage conversions from byte buffers to streams, etc. So not only is the new SDK not backward compatible, it also uses a different approach. This will result in substantial changes to AzureDataStore, with a significant accompanying risk. In addition, I've been playing with the new SDK over the past few days and I have concerns about the SDK itself. A very basic sample application, which is nearly a verbatim copy of their online sample, prints warnings to the console when it is run: > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by com.microsoft.rest.v2.Validator to field java.util.HashMap.serialVersionUID > WARNING: Please consider reporting this to the maintainers of com.microsoft.rest.v2.Validator > WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations > WARNING: All illegal access operations will be denied in a future release I've seen other issues, like unhandled exceptions, in other sample apps I've created, even in code that actually does perform the desired tasks correctly. Question 3 - What are our options? I see three: 1. Stay with the current, deprecated Azure SDK. We would probably be unable to fix OAK-8013/OAK-8105 correctly in that case, at least for Azure, which would mean direct downloads of files with special characters in the filename would not work. (I suppose it is theoretically possible that Microsoft would implement a fix in the deprecated SDK, but considering that this bug is fixed in the new SDK I think it is unlikely.) 2. Update AzureDataStore to use the latest SDK. I expect this will be a significant effort - several weeks probably, at least, due to many unknowns and the errors and exceptions the code currently produces and trying to work them out of the code. 3. Rip out the Azure SDK dependencies altogether and instead implement AzureDataStore directly to the Azure REST endpoints. The last option is one I'm strongly considering. Moving away from the SDK is perhaps not great at first, but it avoids this problem in the future and we don't have to worry about trying to accommodate an asynchronous API in our synchronous access model. I don't expect that the work will be any greater. My primary concern is whether we can rely on backwards compatibility in the REST APIs moving forward. I'm trying to find this out. What does everyone else think? What questions do I need to get answered? Which option sounds best, or is there another better option I didn't list? -MR
