Re:How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread 齐保元


import contrib/smartcn.jar is not complicated.or you can try FatJAR.


At 2012-09-06 22:04:58,Cheng  wrote:
>Hi,
>
>The default Lucene core jar contains no the smartcn analyzer. How can I
>include it into the core jar.
>
>Thanks!


Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
Thanks. I will try that.

Another question. How to use my own dictionary instead of the default one
either in FatJAR or smartcn.jar?

On Thu, Sep 6, 2012 at 10:07 AM, 齐保元  wrote:

>
>
> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>
>
> At 2012-09-06 22:04:58,Cheng  wrote:
> >Hi,
> >
> >The default Lucene core jar contains no the smartcn analyzer. How can I
> >include it into the core jar.
> >
> >Thanks!
>


Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
Also, I checked and couldn't find the smartcn.jar in the originally shipped
Lucene jar. Should I build it myself? and how?
Thanks.

On Thu, Sep 6, 2012 at 10:10 AM, Cheng  wrote:

> Thanks. I will try that.
>
> Another question. How to use my own dictionary instead of the default one
> either in FatJAR or smartcn.jar?
>
>
> On Thu, Sep 6, 2012 at 10:07 AM, 齐保元  wrote:
>
>>
>>
>> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>>
>>
>> At 2012-09-06 22:04:58,Cheng  wrote:
>> >Hi,
>> >
>> >The default Lucene core jar contains no the smartcn analyzer. How can I
>> >include it into the core jar.
>> >
>> >Thanks!
>>
>
>


Re:Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread 齐保元

1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
2.smartcn seems not able to import your own dictionay,it can only import stop 
word dict;You can try IKAnalyzer instead.


At 2012-09-06 22:10:15,Cheng  wrote:
>Thanks. I will try that.
>
>Another question. How to use my own dictionary instead of the default one
>either in FatJAR or smartcn.jar?
>
>On Thu, Sep 6, 2012 at 10:07 AM  wrote:
>
>>
>>
>> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>>
>>
>> At 2012-09-06 22:04:58,Cheng  wrote:
>> >Hi,
>> >
>> >The default Lucene core jar contains no the smartcn analyzer. How can I
>> >include it into the core jar.
>> >
>> >Thanks!
>>


Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
IKAnalyzer is not supported in Lucene, right?

On Thu, Sep 6, 2012 at 10:14 AM, 齐保元  wrote:

>
> 1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
> 2.smartcn seems not able to import your own dictionay,it can only import
> stop word dict;You can try IKAnalyzer instead.
>
>
> At 2012-09-06 22:10:15,Cheng  wrote:
> >Thanks. I will try that.
> >
> >Another question. How to use my own dictionary instead of the default one
> >either in FatJAR or smartcn.jar?
> >
> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
> >
> >>
> >>
> >> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
> >>
> >>
> >> At 2012-09-06 22:04:58,Cheng  wrote:
> >> >Hi,
> >> >
> >> >The default Lucene core jar contains no the smartcn analyzer. How can I
> >> >include it into the core jar.
> >> >
> >> >Thanks!
> >>
>


Re:Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread 齐保元
it's under contrib/analyzers/smartcn in lucene 3.6.maybe what you use is source 
code.


At 2012-09-06 22:14:27,Cheng  wrote:
>Also, I checked and couldn't find the smartcn.jar in the originally shipped
>Lucene jar. Should I build it myself? and how?
>Thanks.
>
>On Thu, Sep 6, 2012 at 10:10 AM, Cheng  wrote:
>
>> Thanks. I will try that.
>>
>> Another question. How to use my own dictionary instead of the default one
>> either in FatJAR or smartcn.jar?
>>
>>
>> On Thu, Sep 6, 2012 at 10:07 AM, 齐保元  wrote:
>>
>>>
>>>
>>> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>>>
>>>
>>> At 2012-09-06 22:04:58,Cheng  wrote:
>>> >Hi,
>>> >
>>> >The default Lucene core jar contains no the smartcn analyzer. How can I
>>> >include it into the core jar.
>>> >
>>> >Thanks!
>>>
>>
>>


Re:Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread 齐保元


you'd better tell me the version of lucene.the latest version ikanlyzer2012 
support lucene3.6



 
>IKAnalyzer is not supported in Lucene, right?
>
>On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
>
>>
>> 1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
>> 2.smartcn seems not able to import your own dictionay,it can only import
>> stop word dict;You can try IKAnalyzer instead.
>>
>>
>> At 2012-09-06 22:10:15,Cheng  wrote:
>> >Thanks. I will try that.
>> >
>> >Another question. How to use my own dictionary instead of the default one
>> >either in FatJAR or smartcn.jar?
>> >
>> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
>> >
>> >>
>> >>
>> >> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>> >>
>> >>
>> >> At 2012-09-06 22:04:58,Cheng  wrote:
>> >> >Hi,
>> >> >
>> >> >The default Lucene core jar contains no the smartcn analyzer. How can I
>> >> >include it into the core jar.
>> >> >
>> >> >Thanks!
>> >>
>>


Re: Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
analyzer to use my own dictionary and work together with Lucene?

Thanks so much for help.

On Thu, Sep 6, 2012 at 10:19 AM, 齐保元  wrote:

>
>
> you'd better tell me the version of lucene.the latest version
> ikanlyzer2012 support lucene3.6
>
>
>
>
> >IKAnalyzer is not supported in Lucene, right?
> >
> >On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
> >
> >>
> >> 1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
> >> 2.smartcn seems not able to import your own dictionay,it can only import
> >> stop word dict;You can try IKAnalyzer instead.
> >>
> >>
> >> At 2012-09-06 22:10:15,Cheng  wrote:
> >> >Thanks. I will try that.
> >> >
> >> >Another question. How to use my own dictionary instead of the default
> one
> >> >either in FatJAR or smartcn.jar?
> >> >
> >> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
> >> >
> >> >>
> >> >>
> >> >> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
> >> >>
> >> >>
> >> >> At 2012-09-06 22:04:58,Cheng  wrote:
> >> >> >Hi,
> >> >> >
> >> >> >The default Lucene core jar contains no the smartcn analyzer. How
> can I
> >> >> >include it into the core jar.
> >> >> >
> >> >> >Thanks!
> >> >>
> >>
>


Re:Re: Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread qibaoyuan
check out http://code.google.com/p/ik-analyzer/  it's quite straightforward.



At 2012-09-06 22:22:45,Cheng  wrote:
>I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
>analyzer to use my own dictionary and work together with Lucene?
>
>Thanks so much for help.
>
>On Thu, Sep 6, 2012 at 10:19 AM, 齐保元  wrote:
>
>>
>>
>> you'd better tell me the version of lucene.the latest version
>> ikanlyzer2012 support lucene3.6
>>
>>
>>
>>
>> >IKAnalyzer is not supported in Lucene, right?
>> >
>> >On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
>> >
>> >>
>> >> 1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
>> >> 2.smartcn seems not able to import your own dictionay,it can only import
>> >> stop word dict;You can try IKAnalyzer instead.
>> >>
>> >>
>> >> At 2012-09-06 22:10:15,Cheng  wrote:
>> >> >Thanks. I will try that.
>> >> >
>> >> >Another question. How to use my own dictionary instead of the default
>> one
>> >> >either in FatJAR or smartcn.jar?
>> >> >
>> >> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
>> >> >
>> >> >>
>> >> >>
>> >> >> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>> >> >>
>> >> >>
>> >> >> At 2012-09-06 22:04:58,Cheng  wrote:
>> >> >> >Hi,
>> >> >> >
>> >> >> >The default Lucene core jar contains no the smartcn analyzer. How
>> can I
>> >> >> >include it into the core jar.
>> >> >> >
>> >> >> >Thanks!
>> >> >>
>> >>
>>


Re: Re: Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
Thanks.

The instruction says that user can use IKAnalyzercfg.xml to configure the
extension dictionary and stopword dictionary. It also mentions that the xml
file should be put to the class root.

In an eclipse java project, where is the class root?

Thanks





On Thu, Sep 6, 2012 at 10:27 AM, qibaoyuan  wrote:

> check out http://code.google.com/p/ik-analyzer/  it's quite
> straightforward.
>
>
>
> At 2012-09-06 22:22:45,Cheng  wrote:
> >I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
> >analyzer to use my own dictionary and work together with Lucene?
> >
> >Thanks so much for help.
> >
> >On Thu, Sep 6, 2012 at 10:19 AM, 齐保元  wrote:
> >
> >>
> >>
> >> you'd better tell me the version of lucene.the latest version
> >> ikanlyzer2012 support lucene3.6
> >>
> >>
> >>
> >>
> >> >IKAnalyzer is not supported in Lucene, right?
> >> >
> >> >On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
> >> >
> >> >>
> >> >> 1.fatjar is a tool for archiving jars/classes together NOTan
> analyzer.
> >> >> 2.smartcn seems not able to import your own dictionay,it can only
> import
> >> >> stop word dict;You can try IKAnalyzer instead.
> >> >>
> >> >>
> >> >> At 2012-09-06 22:10:15,Cheng  wrote:
> >> >> >Thanks. I will try that.
> >> >> >
> >> >> >Another question. How to use my own dictionary instead of the
> default
> >> one
> >> >> >either in FatJAR or smartcn.jar?
> >> >> >
> >> >> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> import contrib/smartcn.jar is not complicated.or you can try
> FatJAR.
> >> >> >>
> >> >> >>
> >> >> >> At 2012-09-06 22:04:58,Cheng  wrote:
> >> >> >> >Hi,
> >> >> >> >
> >> >> >> >The default Lucene core jar contains no the smartcn analyzer. How
> >> can I
> >> >> >> >include it into the core jar.
> >> >> >> >
> >> >> >> >Thanks!
> >> >> >>
> >> >>
> >>
>


Re:Re: Re: Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread qibaoyuan
src folder






At 2012-09-06 22:50:01,Cheng  wrote:
>Thanks.
>
>The instruction says that user can use IKAnalyzercfg.xml to configure the
>extension dictionary and stopword dictionary. It also mentions that the xml
>file should be put to the class root.
>
>In an eclipse java project, where is the class root?
>
>Thanks
>
>
>
>
>
>On Thu, Sep 6, 2012 at 10:27 AM, qibaoyuan  wrote:
>
>> check out http://code.google.com/p/ik-analyzer/  it's quite
>> straightforward.
>>
>>
>>
>> At 2012-09-06 22:22:45,Cheng  wrote:
>> >I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
>> >analyzer to use my own dictionary and work together with Lucene?
>> >
>> >Thanks so much for help.
>> >
>> >On Thu, Sep 6, 2012 at 10:19 AM, 齐保元  wrote:
>> >
>> >>
>> >>
>> >> you'd better tell me the version of lucene.the latest version
>> >> ikanlyzer2012 support lucene3.6
>> >>
>> >>
>> >>
>> >>
>> >> >IKAnalyzer is not supported in Lucene, right?
>> >> >
>> >> >On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
>> >> >
>> >> >>
>> >> >> 1.fatjar is a tool for archiving jars/classes together NOTan
>> analyzer.
>> >> >> 2.smartcn seems not able to import your own dictionay,it can only
>> import
>> >> >> stop word dict;You can try IKAnalyzer instead.
>> >> >>
>> >> >>
>> >> >> At 2012-09-06 22:10:15,Cheng  wrote:
>> >> >> >Thanks. I will try that.
>> >> >> >
>> >> >> >Another question. How to use my own dictionary instead of the
>> default
>> >> one
>> >> >> >either in FatJAR or smartcn.jar?
>> >> >> >
>> >> >> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> import contrib/smartcn.jar is not complicated.or you can try
>> FatJAR.
>> >> >> >>
>> >> >> >>
>> >> >> >> At 2012-09-06 22:04:58,Cheng  wrote:
>> >> >> >> >Hi,
>> >> >> >> >
>> >> >> >> >The default Lucene core jar contains no the smartcn analyzer. How
>> >> can I
>> >> >> >> >include it into the core jar.
>> >> >> >> >
>> >> >> >> >Thanks!
>> >> >> >>
>> >> >>
>> >>
>>


Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Mark Parker
I'm building documentation from the Lucene 4.0.0-BETA source (though
this was also an issue with the ALPHA source), and the output has null
characters in it. I believe that this is because the source looks like
this:

/**
 * Add a phrase->phrase synonym mapping.
 * Phrases are character sequences where words are
 * separated with character zero (\u).  Empty words
 * (two \us in a row) are not allowed in the input nor
 * the output!
 *
 * @param input input phrase
 * @param output output phrase
 * @param includeOrig true if the original should be included
 */

These \u characters are converted to null (\0) characters in the
output, which are invalid in XML (I'm outputting XML). Indeed, this is
a problem in the built documentation at the Apache Lucene site
(http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/synonym/SynonymMap.Builder.html)
where the documentation looks like this (in my browser):

Add a phrase->phrase synonym mapping. Phrases are character sequences
where words are separated with character zero (). Empty words (two s
in a row) are not allowed in the input nor the output!

The actual HTML file does have null characters at the two locations,
which may be technically correct, but not very helpful. I believe the
"\u" in the source ought to be escaped in some way, so that
something more meaningful than \0 ends up in the output. I'd submit a
patch, just for the prestige of it, but I don't have the slightest
idea what the change should be, not being a Java guy at all.

For those interested in why I'm messing with this, then, I'm using
IKVM to convert the Java Lucene libraries to .NET assemblies (well,
one assembly) and converting the javadoc comments to XML documentation
for good IntelliSense in Visual Studio. It works wonderfully, and we
use it in very successful commercial software!

Note that I'm not subscribed to the list, so please CC me if there are
questions.

Mark

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Robert Muir
Thanks for reporting this Mark.

I think it was not intended to have actual null characters here (or
probably anywhere in javadocs).

Our javadocs checkers should be failing on stuff like this...

On Thu, Sep 6, 2012 at 1:52 PM, Mark Parker  wrote:
> I'm building documentation from the Lucene 4.0.0-BETA source (though
> this was also an issue with the ALPHA source), and the output has null
> characters in it. I believe that this is because the source looks like
> this:
>
> /**
>  * Add a phrase->phrase synonym mapping.
>  * Phrases are character sequences where words are
>  * separated with character zero (\u).  Empty words
>  * (two \us in a row) are not allowed in the input nor
>  * the output!
>  *
>  * @param input input phrase
>  * @param output output phrase
>  * @param includeOrig true if the original should be included
>  */
>
> These \u characters are converted to null (\0) characters in the
> output, which are invalid in XML (I'm outputting XML). Indeed, this is
> a problem in the built documentation at the Apache Lucene site
> (http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/synonym/SynonymMap.Builder.html)
> where the documentation looks like this (in my browser):
>
> Add a phrase->phrase synonym mapping. Phrases are character sequences
> where words are separated with character zero (). Empty words (two s
> in a row) are not allowed in the input nor the output!
>
> The actual HTML file does have null characters at the two locations,
> which may be technically correct, but not very helpful. I believe the
> "\u" in the source ought to be escaped in some way, so that
> something more meaningful than \0 ends up in the output. I'd submit a
> patch, just for the prestige of it, but I don't have the slightest
> idea what the change should be, not being a Java guy at all.
>
> For those interested in why I'm messing with this, then, I'm using
> IKVM to convert the Java Lucene libraries to .NET assemblies (well,
> one assembly) and converting the javadoc comments to XML documentation
> for good IntelliSense in Visual Studio. It works wonderfully, and we
> use it in very successful commercial software!
>
> Note that I'm not subscribed to the list, so please CC me if there are
> questions.
>
> Mark
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>



-- 
lucidworks.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Benson Margulies
On Thu, Sep 6, 2012 at 1:59 PM, Robert Muir  wrote:

> Thanks for reporting this Mark.
>
> I think it was not intended to have actual null characters here (or
> probably anywhere in javadocs).
>
> Our javadocs checkers should be failing on stuff like this...
>
> On Thu, Sep 6, 2012 at 1:52 PM, Mark Parker  wrote:
> > I'm building documentation from the Lucene 4.0.0-BETA source (though
> > this was also an issue with the ALPHA source), and the output has null
> > characters in it. I believe that this is because the source looks like
> > this:
> >
> > /**
> >  * Add a phrase->phrase synonym mapping.
> >  * Phrases are character sequences where words are
> >  * separated with character zero (\u).  Empty words
> >  * (two \us in a row) are not allowed in the input nor
> >  * the output!
> >  *
> >  * @param input input phrase
> >  * @param output output phrase
> >  * @param includeOrig true if the original should be included
> >  */
> >
> > These \u characters are converted to null (\0) characters in the
> > output, which are invalid in XML (I'm outputting XML). Indeed, this is
> > a problem in the built documentation at the Apache Lucene site
> > (
> http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/synonym/SynonymMap.Builder.html
> )
> > where the documentation looks like this (in my browser):
> >
>

Converted to U+000 by what, I wonder? Javadoc shouldn't be doing that. If
it does,  I wonder if we need \\u instead?


> > Add a phrase->phrase synonym mapping. Phrases are character sequences
> > where words are separated with character zero (). Empty words (two s
> > in a row) are not allowed in the input nor the output!
> >
> > The actual HTML file does have null characters at the two locations,
> > which may be technically correct, but not very helpful. I believe the
> > "\u" in the source ought to be escaped in some way, so that
> > something more meaningful than \0 ends up in the output. I'd submit a
> > patch, just for the prestige of it, but I don't have the slightest
> > idea what the change should be, not being a Java guy at all.
> >
> > For those interested in why I'm messing with this, then, I'm using
> > IKVM to convert the Java Lucene libraries to .NET assemblies (well,
> > one assembly) and converting the javadoc comments to XML documentation
> > for good IntelliSense in Visual Studio. It works wonderfully, and we
> > use it in very successful commercial software!
> >
> > Note that I'm not subscribed to the list, so please CC me if there are
> > questions.
> >
> > Mark
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
>
>
> --
> lucidworks.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Chris Hostetter

: Converted to U+000 by what, I wonder? Javadoc shouldn't be doing that. If
: it does,  I wonder if we need \\u instead?

aparently it is...

https://mail-archives.apache.org/mod_mbox/harmony-dev/200802.mbox/%3c47b2f7ae.2000...@gmail.com%3E




-Hoss

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Robert Muir
On Thu, Sep 6, 2012 at 2:12 PM, Chris Hostetter
 wrote:
>
> : Converted to U+000 by what, I wonder? Javadoc shouldn't be doing that. If
> : it does,  I wonder if we need \\u instead?
>
> aparently it is...
>
> https://mail-archives.apache.org/mod_mbox/harmony-dev/200802.mbox/%3c47b2f7ae.2000...@gmail.com%3E
>
>

Its definitely javadoc. For now I used U+:
http://svn.apache.org/viewvc?view=revision&revision=1381711

-- 
lucidworks.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Mark Parker
On Thu, Sep 6, 2012 at 12:40 PM, Robert Muir  wrote:
> On Thu, Sep 6, 2012 at 2:12 PM, Chris Hostetter
>  wrote:
>>
>> : Converted to U+000 by what, I wonder? Javadoc shouldn't be doing that. If
>> : it does,  I wonder if we need \\u instead?
>>
>> aparently it is...
>>
>> https://mail-archives.apache.org/mod_mbox/harmony-dev/200802.mbox/%3c47b2f7ae.2000...@gmail.com%3E
>>
>>
>
> Its definitely javadoc. For now I used U+:
> http://svn.apache.org/viewvc?view=revision&revision=1381711
>
> --
> lucidworks.com

Thanks!

Mark

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-06 Thread Martin O'Shea
Thanks for that piece of advice.

 I ended up passing my snowballAnalyzer and standardAnalyzers as parameters to 
ShingleFilterWrappers and processing the outputs via a TermVectorMapper. 

It seems to work quite well.

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: 05 Sep 2012 01 53
To: java-user@lucene.apache.org
Subject: Re: Using a Lucene ShingleFilter to extract frequencies of bigrams in 
Lucene

On Tue, Sep 4, 2012 at 12:37 PM, Martin O'Shea  wrote:
>
> Does anyone know if this can be used in conjunction with other 
> analyzers to return the frequencies of the bigrams or trigrams found, e.g.:
>
>
>
> "please divide this please divide sentence into shingles"
>
>
>
> Would return 2 for "please divide"?
>
>
>
> I'm currently using Lucene 3.0.2 to extract frequencies of unigrams 
> from a string using a combination of a TermVectorMapper and 
> Standard/Snowball analyzers.
>
>
>
> I should add that my strings are built up from a database and then 
> indexed by Lucene in memory and are not persisted beyond this. Use of 
> other products like Solr is not intended.
>

The bigrams etc generated by shingles are terms just like the unigrams. So you 
can wrap any other analyzer with a ShingleAnalyzerWrapper if you want the 
shingles.

If you just want to use Lucene's analyzers to tokenize the text and compute 
within-document frequencies for a one-off purpose, I think indexing and 
creating term vectors could be overkill: you could just consume the tokens from 
the Analyzer and make a hashmap or whatever you need...

There are examples in the org.apache.lucene.analysis package javadocs.

--
lucidworks.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



How to create a Lucene in-memory index at webapp deployment time

2012-09-06 Thread Kasun Perera
I have a web java/jsp application running on Apache Tomcat server. In this
web application I have used lucene, to index and calculate similrarity
between some PDF documents(PDF documents are in the database). My live
server dosent allow web-app to access files, so I have created the
in-memory lucene index using RAMDirectory class.

In the current way that I have coded in my application, when for each time
user access the lucene involved functionality, it creates a new in-memory
index.

Is there any way to create the in-memory index at the webapp deployment
time, so that in-memory index will be created only once and I can access
in-memory index as long as web app is live?

-- 
Regards

Kasun Perera