Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Why didn't I thought of that. That's another alternative. Thank you for 
your suggestion. Appreciate it.


On 10/13/2016 5:41 AM, Georg Sorst wrote:

You could use a PatternReplaceCharFilter before your tokenizer to replace
the dot with a space character.

Derek Poh  schrieb am Mi., 12. Okt. 2016 11:38:


Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The
field does has values with numbers in them therefore it is not applicable.
Thank you.


On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote:

You can use LetterTokenizerFactory instead.

Regards,
Dheerendra Kulkarni

On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh 

wrote:

Hi

How can I split words with period in between into separate tokens.
Eg. "Co.Ltd" => "Co" "Ltd" .

I am using StandardTokenizerFactory and it does notreplace periods

(dots)

that are not followed by whitespace are kept as part of the token,
including Internet domain names.

This is the field definition,



  
  
  


  
  
  
synonyms="synonyms.txt"

ignoreCase="true" expand="true"/>
  



Solr versionis 10.4.10.

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and

you

must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.




--
CONFIDENTIALITY NOTICE

This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.





--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: Split words with period in between into separate tokens

2016-10-12 Thread Georg Sorst
You could use a PatternReplaceCharFilter before your tokenizer to replace
the dot with a space character.

Derek Poh  schrieb am Mi., 12. Okt. 2016 11:38:

> Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The
> field does has values with numbers in them therefore it is not applicable.
> Thank you.
>
>
> On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote:
> > You can use LetterTokenizerFactory instead.
> >
> > Regards,
> > Dheerendra Kulkarni
> >
> > On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh 
> wrote:
> >
> >> Hi
> >>
> >> How can I split words with period in between into separate tokens.
> >> Eg. "Co.Ltd" => "Co" "Ltd" .
> >>
> >> I am using StandardTokenizerFactory and it does notreplace periods
> (dots)
> >> that are not followed by whitespace are kept as part of the token,
> >> including Internet domain names.
> >>
> >> This is the field definition,
> >>
> >>  >> positionIncrementGap="100">
> >>
> >>  
> >>   >> words="stopwords.txt" />
> >>  
> >>
> >>
> >>  
> >>   >> words="stopwords.txt" />
> >>   synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >>  
> >>
> >> 
> >>
> >> Solr versionis 10.4.10.
> >>
> >> Derek
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> This e-mail (including any attachments) may contain confidential and/or
> >> privileged information. If you are not the intended recipient or have
> >> received this e-mail in error, please inform the sender immediately and
> >> delete this e-mail (including any attachments) from your computer, and
> you
> >> must not use, disclose to anyone else or copy this e-mail (including any
> >> attachments), whether in whole or in part.
> >> This e-mail and any reply to it may be monitored for security, legal,
> >> regulatory compliance and/or other appropriate reasons.
> >
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
>
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
>
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>
>


Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh

Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The 
field does has values with numbers in them therefore it is not applicable. 
Thank you.


On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote:

You can use LetterTokenizerFactory instead.

Regards,
Dheerendra Kulkarni

On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh  wrote:


Hi

How can I split words with period in between into separate tokens.
Eg. "Co.Ltd" => "Co" "Ltd" .

I am using StandardTokenizerFactory and it does notreplace periods (dots)
that are not followed by whitespace are kept as part of the token,
including Internet domain names.

This is the field definition,


   
 
 
 
   
   
 
 
 
 
   


Solr versionis 10.4.10.

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.






--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: Split words with period in between into separate tokens

2016-10-12 Thread Dheerendra Kulkarni
You can use LetterTokenizerFactory instead.

Regards,
Dheerendra Kulkarni

On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh  wrote:

> Hi
>
> How can I split words with period in between into separate tokens.
> Eg. "Co.Ltd" => "Co" "Ltd" .
>
> I am using StandardTokenizerFactory and it does notreplace periods (dots)
> that are not followed by whitespace are kept as part of the token,
> including Internet domain names.
>
> This is the field definition,
>
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" />
> 
>   
>   
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
>
> Solr versionis 10.4.10.
>
> Derek
>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.




-- 
Regards,
Dheerendra