Maybe use SHA-256 or later. SHA-1 is obsolete and one never knows when it will 
be removed from JDK.

> Am 02.03.2021 um 04:10 schrieb Shirai Takashi/ 白井隆 <shi...@nintendo.co.jp>:
> 
> Hi, there.
> 
> I've found another trouble in Elasticsearch connector.
> Elasticsearch output connector use the URI string as ID.
> Elasticsearch allows the length of ID no more than 512 bytes.
> If the URL length is too long, it causes HTTP 400 error.
> 
> I prepare two solutions with this attached patch.
> The one is URI decoding.
> If the URI includes multibyte characters,
> the ID is URL encoded duplicately.
> Ex) U+3000 -> %E3%80%80 -> %25E3%2580%2580
> This enlarges the ID length unnecessarily.
> Then I add the option to decode URI as the ID before encoding.
> 
> But the length may still longer than 512 bytes.
> The other solution is hashing.
> The new added options are the following.
> Raw) uses the URI string as is.
> Hash) hashes (SHA1) the URI string always.
> Hash if long) hashes the URI only if its length exceeds 512 bytes.
> The last one is prepared for the compatibility.
> 
> Both of solutions cause a new problem.
> If the URI is decoded or hashed,
> the original URI cannot be keeped in each document.
> Then I add the new fields.
> URI field name) keeps the original URI string as is.
> Decoded URI field name) keeps the decoded URI string.
> The default settings provides these fields as empty.
> 
> 
> I sended the patch for Ingest-Attachment the other day.
> Then this mail attaches the two patches.
> apache-manifoldcf-2.18-elastic-id.patch.gz:
> The patch for 2.18 including the patch of the other day.
> apache-manifoldcf-elastic-id.patch.gz:
> The patch for the source patched the other day.
> 
> By the way, I tryed to describe the above to some documents.
> But no suitable document is found in the ManifoldCF package.
> The Elasticsearch document may be wrote for the ancient spacifications.
> Where can I describe this new specifications?
> 
> ----
> Nintendo, Co., Ltd.
> Product Technology Dept.
> Takashi SHIRAI
> PHONE: +81-75-662-9600
> mailto:shi...@nintendo.co.jp
> <apache-manifoldcf-2.18-elastic-id.patch.gz>
> <apache-manifoldcf-elastic-id.patch.gz>

Reply via email to