Hello guys,

Hey, I think I´ve found how to do this just adding a filter. Just for
anyone´s curiosity:

   <fieldType name="emails" class="solr.TextField" sortMissingLast="true"
omitNorms="true">
      <analyzer>
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.TypeTokenFilterFactory" types="email_type.txt"
useWhitelist="true"/>
      </analyzer>
    </fieldType>

Anyway, I still need to do a query like the following to retrieve those
documents with at least one E-mail detected:

http://localhost:8080/mysolr/select?q=emails:[* TO
*]&start=0&rows=10&sort=mydate desc

And I don´t like it, to be honest,

Regards,




2013/7/30 Luis Cappa Banda <luisca...@gmail.com>

> Hello, Jack, Steve,
>
> Thank you for your answers. I´ve never used UAX29URLEmailTokenizerFactory,
> but I´ve read about it before trying RegExp´s queries. As far as I know, 
> UAX29URLEmailTokenizerFactory
> allows to tokenize an entry text value into patterns that match URLs,
> E-mails, etc. Reading the documentation I haven´t found any way to select
> just E-mail patterns, not URL ones, for example. I feel that it may have
> sense to specify one or multiple patterns in a configuration file to be
> setted during the Tokenizer definition in the schema.xml, but I found
> nothing.
>
> I´ve just want to retrieve those documents indexed where they appear at
> least one E-mail inside de text. However, even using 
> UAX29URLEmailTokenizerFactory,
> and suposing that I store that E-mail data in a field called 'emails' (I
> feel creative, hehe), a query like the following appears to be... dirty:
>
> http://localhost:8080/mysolr/select?q=emails:[* TO
> *]&start=0&rows=10&sort=mydate desc
>
> What do you think about?
>
> And Andy... I know many RegExps to find E-mail patterns in a text - that
> wasn´t my question, and of course there is no perfect one. However, Lucene
> RegExp syntax is different from classic RegExp one, so is not as easy as
> copy & paste any RegExps and, voilá! E-mails everywhere.
>
> Thank you very much in advance,
>
> Best regards,
>
>
>
>
>
> 2013/7/30 Jack Krupansky <j...@basetechnology.com>
>
>> Just use the UAX29URLEmailTokenizerFactory, which recognizes email
>> addresses.
>>
>> Any particular reason that you're trying to reinvent the wheel?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Luis Cappa Banda
>> Sent: Tuesday, July 30, 2013 10:53 AM
>> To: solr-user@lucene.apache.org
>> Subject: Email regular expression.
>>
>>
>> Hello everyone!
>>
>> Unfortunately I have to search all E-mail addresses found in a text field
>> from each document. I've been reading for a while how to use RegExp's in
>> Solr, but after trying some of them they didn't work. I've noticed that
>> Lucene RegExp syntax sometimes is very different from the classic RegExp
>> syntax, so that may be the reason why they didn't work for me, and maybe
>> someone more expert can help me.
>>
>> The syntax is the following:
>>
>> *E-mail: *
>>
>> text:/[a-z0-9_\|-]+(\.[a-z0-9_**\|-]|)*@[a-z0-9-]|(\.[a-z0-9-]**
>> |)*\.([a-z]{2,4})/
>>
>> Thank you very much in advance!
>>
>> Best regards,
>>
>> --
>> - Luis Cappa
>>
>
>
>
> --
> - Luis Cappa
>



-- 
- Luis Cappa

Reply via email to