I just realized the problem is probably due to the fact that HTML is
generally case insensitive and the regex expects the HMTL tag to be exactly
<UploadId>. It's much easier (for me) to imagine the match failed because
they used an HTML tag with a different case than to imagine they embedded a
space in the id string.

The simplest fix then would be:

   Pattern pattern =
Pattern.compile("(?i)<UploadId>([\\S&&[^<]]+)</UploadId>");

The addition of the (?i) at the front would allow the entire regex to match
all explicit characters in a case-insensitive manner. I've tested this with
a Java regex tester and it works.

I'll compile and test this myself. I'll let you know if it seems to fix the
issue.

John

On Tue, Sep 14, 2021 at 4:52 PM John Calcote <john.calc...@gmail.com> wrote:

> Hi Andrew,
>
> Thanks for the quick response, but I don't think there's anything wrong
> with that regex. The expression - "<UploadId>([\\S&&[^<]]+)</UploadId>" -
> is pretty simple once you understand what the double ampersand does - it's
> an intersection operator. This regex means: Match the string
> <UploadId>X</UploadId> where X can be any list of non-whitespace characters
> (\\S) except the '<' character (which, of course, allows the expression to
> pick up all the non-whitespace characters between the opening tag and the
> start of the closing tag or the first whitespace character). The parens
> around X in the expression indicate a group, so it can be extracted from
> the results. If there ARE any whitespace characters between the opening and
> closing html tags, the expression will not match because the first
> character after the capture group is expected to be a '<' character.
>
> The system to which we're trying to upload is a t-systems OTC (Open
> Telecom Cloud) S3 service. We've heard OTC is based on Huawei Object
> Storage. It's possible it's not a perfect S3 implementation and this is the
> first time we've tried to hit it with a multi-part upload. It's possible
> the S3 service is sending an UploadId with an embedded whitespace
> character, which would cause the match to fail, and the capture group to
> return null. Although it seems stupid to do so, I don't see anything in the
> Amazon spec about not using whitespace in the upload id. To make a space
> work properly, you'd have to URL-encode the uploadId when using it in
> subsequent PUT request parameters.
>
> Further thoughts?
>
> John
>
> On Thu, Sep 9, 2021 at 5:15 PM Andrew Gaul <g...@apache.org> wrote:
>
>> On Thu, Sep 09, 2021 at 07:37:49PM -0000, John Calcote wrote:
>> > java.lang.NullPointerException: Null id
>> >         at
>> org.jclouds.blobstore.domain.AutoValue_MultipartUpload.<init>(AutoValue_MultipartUpload.java:32)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.blobstore.domain.MultipartUpload.create(MultipartUpload.java:35)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:371)
>> ~[s3-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:349)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.s3.blobstore.S3BlobStore.putBlob(S3BlobStore.java:262)
>> ~[s3-2.3.0.jar:2.3.0]
>>
>> UploadIdFromHttpResponseViaRegex has a suspicious regular expression:
>>
>>     Pattern.compile("<UploadId>([\\S&&[^<]]+)</UploadId>")
>>
>> Do you use AWS or another S3 object store?  I suspect that this regex
>> fails to match in some corner case.  Could you simplify it and submit a
>> GitHub PR?
>>
>> --
>> Andrew Gaul
>> http://gaul.org/
>>
>

Reply via email to