[ 
https://issues.apache.org/jira/browse/SOLR-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680807#action_12680807
 ] 

Fergus McMenemie commented on SOLR-1061:
----------------------------------------

Yes, yes. Another usecase I ran into a lot was having lat/long within the same 
XML field, this would have been really useful.  I guess if the matcher fails 
the fields/colums firstName and secondName are undefined?  However although the 
above is neat and clean it can of course now be done as follows:-
{code}
   <field column="firstName"       regex="Mr(\w*)\b\w*" replaceWith="$1"  
sourceColName="full_name"/>
   <field column="secondName" regex="Mr\w*\b(\w*)" replaceWith="$1"  
sourceColName="full_name"/>
{code}

Also I would think the following will be a related common usecase; imagine a 
field which listed an indeterminate number of aliases or alternate names for a 
person. This is bad data design but it happens. We need to expose regex's 
global feature

{code}
<firstName>josephine</firstname>
<aliases>jo,joe,jos<aliases>
{code}

{code}
   <field column="alias" regex="([^,]+)"  regex_options="global" 
sourceColName="aliases"/>
{code}

which would populate the column alias with multiple values. The attribute 
regex_options allows other regex options such as case insensitivity to be added 
as well.



> Improve regexTransformer to create multiple columns from regexGroups
> --------------------------------------------------------------------
>
>                 Key: SOLR-1061
>                 URL: https://issues.apache.org/jira/browse/SOLR-1061
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>
> example
> {code:xml}
> <field column="doesnotmatter" regex="Mr(\w*)\b(\w*)" 
> sourceColName="full_name"  groupNames="1:firstName,2:secondName"/>
> {code}
> This is more efficient in extracting multiple values from a single String. In 
> this case the column is redundant but it is ok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to