[ https://issues.apache.org/jira/browse/ASTERIXDB-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896273#comment-15896273 ]
Wenhai commented on ASTERIXDB-1813: ----------------------------------- Refer to patch https://asterix-gerrit.ics.uci.edu/#/c/1076/. > similarity-jaccard-prefix() issue > --------------------------------- > > Key: ASTERIXDB-1813 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1813 > Project: Apache AsterixDB > Issue Type: Bug > Reporter: Taewoo Kim > Assignee: Wenhai > > For the following two records, similarity-jaccard-prefix() doesn't generate > the correct result. Switch the line (skip-index, indexnl) to see the > difference. In order to see this, you need to enable the fuzzy join rule. It > doesn't happen in the master yet. This bug needs to be fixed before enabling > the fuzzy join rule. > {code} > drop dataverse test if exists; > create dataverse test; > use dataverse test; > create type DBLPType as open { > id: uuid > } > create dataset AmazonReviewNoDup(DBLPType) > primary key id; > create index AmazonReviewNoDup_summary_b_idx > on AmazonReviewNoDup(summary:string?) type btree enforced; > create index AmazonReviewNoDup_summary_kw_idx > on AmazonReviewNoDup(summary:string?) type keyword enforced; > insert into dataset AmazonReviewNoDup( > { "id": uuid("83208a78-7007-8d77-935b-d9127e4cc9dc"), "summary": "Clear, > Concise, and fun!" } > ); > insert into dataset AmazonReviewNoDup( > { "id": uuid("83208a78-7007-8d77-935b-d9127e4cc9dd"), "summary": "Clear, > Concise, and Charitable" } > ); > for $o in dataset > AmazonReviewNoDup > for $i in dataset > AmazonReviewNoDup > //where /* +indexnl */ similarity-jaccard(word-tokens($o.summary), > word-tokens($i.summary)) >= 0.6 > where /* +skip-index */ similarity-jaccard(word-tokens($o.summary), > word-tokens($i.summary)) >= 0.6 > and $o.id < $i.id > return {"oid":$o.id, "iid":$i.id}; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)