Hello, Oleksandr. It deserves JIRA, please raise one. On Tue, Oct 15, 2019 at 8:17 PM Oleksandr Drapushko <drapus...@gmail.com> wrote:
> Hello Community, > > I've discovered data loss bug and couldn't find any mention of it. Please > confirm this bug haven't been reported yet. > > > Description: > > If you try to update non pre-analyzed fields in a document using atomic > updates, data in pre-analyzed fields (if there is any) will be lost. The > bug was discovered in Solr 8.2 and 7.7.2. > > > Steps to reproduce: > > 1. Index this document into techproducts > { > "id": "a", > "n_s": "s1", > "pre": > > "{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}" > } > > 2. Query the document > { > "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[ > { > "id":"a", > "n_s":"s1", > "pre":"Alaska", > "_version_":1647475215142223872}] > }} > > 3. Update using atomic syntax > { > "add": { > "doc": { > "id": "a", > "n_s": {"set": "s2"} > } > } > } > > 4. Observe the warning in solr log > UI: > WARN x:techproducts_shard2_replica_n6 PreAnalyzedField Error parsing > pre-analyzed field 'pre' > > solr.log: > WARN (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8 > x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing > pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type > java.lang.String, expected Map > at > > org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86) > > 5. Query the document again > { > "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[ > { > "id":"a", > "n_s":"s2", > "_version_":1647475461695995904}] > }} > > Result: There is no 'pre' field in the document anymore. > > > My thoughts on it: > > 1. Data loss can be prevented if the warning will be replaced with error > (re-throwing exception). Atomic updates for such documents still won't > work, but updates will be explicitly rejected. > > 2. Solr tries to read the document from index, merge it with input document > and re-index the document, but when it reads indexed pre-analyzed fields > the format is different, so Solr cannot parse and re-index those fields > properly. > > > Thank you, > Oleksandr > -- Sincerely yours Mikhail Khludnev