Maybe use SHA-256 or later. SHA-1 is obsolete and one never knows when it will be removed from JDK.
> Am 02.03.2021 um 04:10 schrieb Shirai Takashi/ 白井隆 <shi...@nintendo.co.jp>: > > Hi, there. > > I've found another trouble in Elasticsearch connector. > Elasticsearch output connector use the URI string as ID. > Elasticsearch allows the length of ID no more than 512 bytes. > If the URL length is too long, it causes HTTP 400 error. > > I prepare two solutions with this attached patch. > The one is URI decoding. > If the URI includes multibyte characters, > the ID is URL encoded duplicately. > Ex) U+3000 -> %E3%80%80 -> %25E3%2580%2580 > This enlarges the ID length unnecessarily. > Then I add the option to decode URI as the ID before encoding. > > But the length may still longer than 512 bytes. > The other solution is hashing. > The new added options are the following. > Raw) uses the URI string as is. > Hash) hashes (SHA1) the URI string always. > Hash if long) hashes the URI only if its length exceeds 512 bytes. > The last one is prepared for the compatibility. > > Both of solutions cause a new problem. > If the URI is decoded or hashed, > the original URI cannot be keeped in each document. > Then I add the new fields. > URI field name) keeps the original URI string as is. > Decoded URI field name) keeps the decoded URI string. > The default settings provides these fields as empty. > > > I sended the patch for Ingest-Attachment the other day. > Then this mail attaches the two patches. > apache-manifoldcf-2.18-elastic-id.patch.gz: > The patch for 2.18 including the patch of the other day. > apache-manifoldcf-elastic-id.patch.gz: > The patch for the source patched the other day. > > By the way, I tryed to describe the above to some documents. > But no suitable document is found in the ManifoldCF package. > The Elasticsearch document may be wrote for the ancient spacifications. > Where can I describe this new specifications? > > ---- > Nintendo, Co., Ltd. > Product Technology Dept. > Takashi SHIRAI > PHONE: +81-75-662-9600 > mailto:shi...@nintendo.co.jp > <apache-manifoldcf-2.18-elastic-id.patch.gz> > <apache-manifoldcf-elastic-id.patch.gz>