If you reindex, I’ve become a big fan of adding a date field with an index
timestamp.
That will allow you to check whether everything has been reindexed.
<field name="indexed_datetime" type="date" stored="true" indexed="true"
multiValued="false" default="NOW" docValues="true" />
wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/ (my blog)
> On Jul 28, 2020, at 2:11 PM, Jörn Franke <[email protected]> wrote:
>
> A regex search at query time would leave room for attacks (eg a regex can
> easily be designed to block the Solr server forever).
>
> If the field is store you can also try to use a cursor to go through all
> entries using a cursor and reindex the doc based on the field:
>
> https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html
>
> This would also imply that you have the other fields stored. Otherwise
> reindex.
> You can do this in parallel to the existing index and once finished simply
> change the alias for the collection (that would be without any downtime for
> the users but you require of course corresponding space).
>
>> Am 28.07.2020 um 21:06 schrieb lstusr 5u93n4 <[email protected]>:
>>
>> Possible... yes. Agreed that this is the right approach. But if we already
>> have a big index that we're searching through? Any way to "hack it"?
>>
>>> On Tue, 28 Jul 2020 at 14:55, Walter Underwood <[email protected]>
>>> wrote:
>>>
>>> I’d do that at index time. Add an update request processor script that
>>> does the regex and adds a field has_credit_card_number:true.
>>>
>>> wunder
>>> Walter Underwood
>>> [email protected]
>>> http://observer.wunderwood.org/ (my blog)
>>>
>>>>> On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4 <[email protected]> wrote:
>>>>
>>>> Let's say I have a text field that's been indexed with the standard
>>>> tokenizer, and I want to match the docs that have credit card numbers in
>>>> them (this is for altruistic purposes, not nefarious ones!). What's the
>>>> best way to build a search that will do this?
>>>>
>>>> Searching for "???? ???? ???? ????" seems to return inconsistent results.
>>>>
>>>> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
>>>> should work, but that's not matching the docs I think it should either...
>>>>
>>>> Any suggestions?
>>>>
>>>> Thanks In Advance!
>>>
>>>