I just realized the problem is probably due to the fact that HTML is generally case insensitive and the regex expects the HMTL tag to be exactly <UploadId>. It's much easier (for me) to imagine the match failed because they used an HTML tag with a different case than to imagine they embedded a space in the id string.
The simplest fix then would be: Pattern pattern = Pattern.compile("(?i)<UploadId>([\\S&&[^<]]+)</UploadId>"); The addition of the (?i) at the front would allow the entire regex to match all explicit characters in a case-insensitive manner. I've tested this with a Java regex tester and it works. I'll compile and test this myself. I'll let you know if it seems to fix the issue. John On Tue, Sep 14, 2021 at 4:52 PM John Calcote <john.calc...@gmail.com> wrote: > Hi Andrew, > > Thanks for the quick response, but I don't think there's anything wrong > with that regex. The expression - "<UploadId>([\\S&&[^<]]+)</UploadId>" - > is pretty simple once you understand what the double ampersand does - it's > an intersection operator. This regex means: Match the string > <UploadId>X</UploadId> where X can be any list of non-whitespace characters > (\\S) except the '<' character (which, of course, allows the expression to > pick up all the non-whitespace characters between the opening tag and the > start of the closing tag or the first whitespace character). The parens > around X in the expression indicate a group, so it can be extracted from > the results. If there ARE any whitespace characters between the opening and > closing html tags, the expression will not match because the first > character after the capture group is expected to be a '<' character. > > The system to which we're trying to upload is a t-systems OTC (Open > Telecom Cloud) S3 service. We've heard OTC is based on Huawei Object > Storage. It's possible it's not a perfect S3 implementation and this is the > first time we've tried to hit it with a multi-part upload. It's possible > the S3 service is sending an UploadId with an embedded whitespace > character, which would cause the match to fail, and the capture group to > return null. Although it seems stupid to do so, I don't see anything in the > Amazon spec about not using whitespace in the upload id. To make a space > work properly, you'd have to URL-encode the uploadId when using it in > subsequent PUT request parameters. > > Further thoughts? > > John > > On Thu, Sep 9, 2021 at 5:15 PM Andrew Gaul <g...@apache.org> wrote: > >> On Thu, Sep 09, 2021 at 07:37:49PM -0000, John Calcote wrote: >> > java.lang.NullPointerException: Null id >> > at >> org.jclouds.blobstore.domain.AutoValue_MultipartUpload.<init>(AutoValue_MultipartUpload.java:32) >> ~[jclouds-blobstore-2.3.0.jar:2.3.0] >> > at >> org.jclouds.blobstore.domain.MultipartUpload.create(MultipartUpload.java:35) >> ~[jclouds-blobstore-2.3.0.jar:2.3.0] >> > at >> org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:371) >> ~[s3-2.3.0.jar:2.3.0] >> > at >> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356) >> ~[jclouds-blobstore-2.3.0.jar:2.3.0] >> > at >> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:349) >> ~[jclouds-blobstore-2.3.0.jar:2.3.0] >> > at >> org.jclouds.s3.blobstore.S3BlobStore.putBlob(S3BlobStore.java:262) >> ~[s3-2.3.0.jar:2.3.0] >> >> UploadIdFromHttpResponseViaRegex has a suspicious regular expression: >> >> Pattern.compile("<UploadId>([\\S&&[^<]]+)</UploadId>") >> >> Do you use AWS or another S3 object store? I suspect that this regex >> fails to match in some corner case. Could you simplify it and submit a >> GitHub PR? >> >> -- >> Andrew Gaul >> http://gaul.org/ >> >