Re: Inconsistent results in Solr Search with Lucene Index

2007-12-03 Thread trysteps

I fixed that problem with reconfiguring schema.xml.
Thanks for your help.
Jak

Grant Ingersoll yazmış:
Have you setup your Analyzers, etc. so they correspond to the exact 
ones that you were using in Lucene? Under the Solr Admin you can try 
the analysis tool to see how your index and queries are treated. What 
happens if you do a *:* query from the Admin query screen?


If your index is reasonably sized, I would just reindex, but you 
shouldn't have to do this.


-Grant

On Nov 27, 2007, at 8:18 AM, trysteps wrote:


Hi All,
I am trying to use Solr Search with Lucene Index so just set all 
schema.xml configs like tokenize and field necessaries.

But I can not get results like Lucene.
For example ,
search for 'dog' returns lots of results with lucene but in Solr, I 
can't get any result. But search with 'dog*' returns same result with 
Lucene.
What is the best way to integrate Lucene index to Solr, are there any 
well-documented sources?

Thanks for your Attention,
Trysteps



--
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








RE: Tips for searching

2007-12-03 Thread Will Johnson
If you want any letter and any possible substring you might be better off
breaking every word into single letters with special tokens between words:
ie:

the quick brown fox

Becomes

t h e ZZ q u i c k ZZ b r o w n ZZ f o x

then you can do all the single letter searches and multi letter searches
turn into phrase searches.  Ie:

uic (from quick)

would be rewritten as

u i c

And so on.  This should give you better performance and more predictable
results than wildcard searches depending on the size and complexity of your
data.  Relevancy would be horrible since the tf/idf would always have a
common denominator depending on character set but there are ways around that
as well.

- will 

 

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 30, 2007 7:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Tips for searching

On 30-Nov-07, at 4:43 PM, Dave C. wrote:


 Thanks for the quick response Mike...
 Ideally it should match more than just a single character, i.e.  
 the in weather or pro in profile or 000 in 18000.

 Would these cases be taken care of by the StopFilterFactory?

No... you are looking for variant of WildcardQuery's.  Prefix  
wildcards are supported (pro* - profile), but generalize wildcard  
queries aren't enabled by default.  There has been lots of discussion  
on the list if you do a search.

-Mike



Tomcat6?

2007-12-03 Thread Jörg Kiegeland
In the Solr wiki, there is not described how to install Solr on Tomcat 
6, and I not managed it myself :(
In the chapter Configuring Solr Home with JNDI there is mentioned the 
directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with 
TOMCAT 6.


Alternatively I tried the folder $CATALINA_HOME/work/Catalina/localhost, 
but with no success.. (I can query the top level page, but the Solr 
Admin link then not works).


Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_



Re: Tomcat6?

2007-12-03 Thread Matthew Runo

In context.xml, I added..

Environment name=/solr/home value=/Users/mruno/solr-src/example/ 
solr type=java.lang.String /


I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:

In the Solr wiki, there is not described how to install Solr on  
Tomcat 6, and I not managed it myself :(
In the chapter Configuring Solr Home with JNDI there is mentioned  
the directory $CATALINA_HOME/conf/Catalina/localhost , which not  
exists with TOMCAT 6.


Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ 
localhost, but with no success.. (I can query the top level page,  
but the Solr Admin link then not works).


Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_





RE: Tomcat6?

2007-12-03 Thread Charlie Jackson
$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can 
create it and it will work exactly the same way it did in Tomcat 5. It's not 
created by default because its not needed by the manager webapp anymore.


-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 03, 2007 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Tomcat6?

In context.xml, I added..

Environment name=/solr/home value=/Users/mruno/solr-src/example/ 
solr type=java.lang.String /

I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:

 In the Solr wiki, there is not described how to install Solr on  
 Tomcat 6, and I not managed it myself :(
 In the chapter Configuring Solr Home with JNDI there is mentioned  
 the directory $CATALINA_HOME/conf/Catalina/localhost , which not  
 exists with TOMCAT 6.

 Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ 
 localhost, but with no success.. (I can query the top level page,  
 but the Solr Admin link then not works).

 Can anybody help?

 -- 
 Dipl.-Inf. Jörg Kiegeland
 ikv++ technologies ag
 Bernburger Strasse 24-25, D-10963 Berlin
 e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
 phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
 =
 Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
 board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
 supervising board: Prof. Dr. Bernd Mahr (chairman)
 _




RE: Solr Highlighting, word index

2007-12-03 Thread Owens, Martin


 You can tell lucene to store token offsets using TermVectors  
 (configurable via schema.xml).  Then you can customize the request  
 handler to return the token offsets (and/or positions) by retrieving  
 the TVs.

I think that is the best plan of action, how do I create a custom request 
handler that will use the existing indexed fields? There will be 2 requests as 
I see it, 1 for the search and 1 to retrieve the offsets when you view one of 
those found items. Any advice you can give me will be much appricated as I've 
had no luck with google so far.

Thanks for your help so far,

Best Regards, Martin Owens



How to delete records that don't contain a field?

2007-12-03 Thread Jeff Leedy
I was wondering if there was a way to post a delete query using curl to 
delete all records that do not contain a certain field--something like this:


curl http://localhost:8080/solr/update --data-binary 
'deletequery-_title:[* TO *]/query/delete' -H 
'Content-type:text/xml; charset=utf-8'


The minus syntax seems to return the correct list of ids (that is, all 
records that do not contain the _title field) when I use the Solr 
administrative console to do the above query, so I'm wondering if Solr 
just doesn't support this type of delete.


Thanks for any help...


Re: How to delete records that don't contain a field?

2007-12-03 Thread Yonik Seeley
On Dec 3, 2007 5:22 PM, Jeff Leedy [EMAIL PROTECTED] wrote:

 I was wondering if there was a way to post a delete query using curl to
 delete all records that do not contain a certain field--something like
 this:

 curl http://localhost:8080/solr/update --data-binary
 'deletequery-_title:[* TO *]/query/delete' -H
 'Content-type:text/xml; charset=utf-8'

 The minus syntax seems to return the correct list of ids (that is, all
 records that do not contain the _title field) when I use the Solr
 administrative console to do the above query, so I'm wondering if Solr
 just doesn't support this type of delete.


Not yet... it makes sense to support this in the future though.

-Yonik


1.2 commit script chokes on 1.2 response format

2007-12-03 Thread Charles Hornberger
LIke others before me, I stumbled across this bug, where
solr/bin/commit warns that a commit failed when in fact it succeeded
quite nicely, while getting collection distribution up  running
today:

http://www.mail-archive.com/solr-user@lucene.apache.org/msg04585.html

It's a trivial fix, and it seems like it's already been done in trunk:


http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/commit?r1=543259r2=555612view=patch

The change has not been applied to 1.2. It might be nice if it were.

-Charlie


RE: How to delete records that don't contain a field?

2007-12-03 Thread Norskog, Lance
Wouldn't this be: *:* AND negative query 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Monday, December 03, 2007 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: How to delete records that don't contain a field?

On Dec 3, 2007 5:22 PM, Jeff Leedy [EMAIL PROTECTED] wrote:

 I was wondering if there was a way to post a delete query using curl 
 to delete all records that do not contain a certain field--something 
 like
 this:

 curl http://localhost:8080/solr/update --data-binary
 'deletequery-_title:[* TO *]/query/delete' -H 
 'Content-type:text/xml; charset=utf-8'

 The minus syntax seems to return the correct list of ids (that is, all

 records that do not contain the _title field) when I use the Solr 
 administrative console to do the above query, so I'm wondering if Solr

 just doesn't support this type of delete.


Not yet... it makes sense to support this in the future though.

-Yonik


Re: CJK Analyzers for Solr

2007-12-03 Thread James liu
it seems good.

On Dec 3, 2007 1:01 AM, Ken Krugler [EMAIL PROTECTED] wrote:

 Wunder - are you aware of any free dictionaries
 for either C or J or K?  When I dealt with this
 in the past, I looked for something free, but
 found only commercial dictionaries.

 I would use data files from:

 http://ftp.monash.edu.au/pub/nihongo/00INDEX.html

 -- Ken


 Sematext -- http://sematext.com/ -- Lucene -
 Solr - Nutch - Original Message  From:
 Walter Underwood [EMAIL PROTECTED] To:
 solr-user@lucene.apache.org Sent: Wednesday,
 November 28, 2007 5:43:32 PM Subject: Re: CJK
 Analyzers for Solr With Ultraseek, we switched
 to a dictionary-based segmenter for Chinese
 because the N-gram highlighting wasn't
 acceptable to our Chinese customers. I guess it
 is something to check for each application.
 wunder On 11/27/07 10:46 PM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:  For what
 it's worth I worked on indexing and searching a
 *massive* pile of  data, a good portion of
 which was in CJ and some K.  The n-gram approach
 was  used for all 3 languages and the quality
 of search results, including  highlighting was
 evaluated and okay-ed by native speakers of
 these languages.   Otis  --  Sematext --
 http://sematext.com/ -- Lucene - Solr -
 Nutch   - Original Message   From:
 Walter Underwood [EMAIL PROTECTED]  To:
 solr-user@lucene.apache.org  Sent: Tuesday,
 November 27, 2007 2:41:38 PM  Subject: Re: CJK
 Analyzers for Solr   Dictionaries are
 surprisingly expensive to build and maintain
 and  bi-gram is surprisingly effective for
 Chinese. See this paper:  
 http://citeseer.ist.psu.edu/kwok97comparing.html  
 I expect that n-gram indexing would be less
 effective for Japanese  because it is an
 inflected language. Korean is even harder. It
 might  work to break Korean into the phonetic
 subparts and use n-gram on  those.   You
 should not do term highlighting with any of the
 n-gram methods.  The relevance can be very
 good, but the highlighting just looks dumb.  
 wunder   On 11/27/07 8:54 AM, Eswar K
 [EMAIL PROTECTED] wrote:   Is there any
 specific reason why the CJK analyzers in Solr
 were   chosen to be  n-gram based instead of
 it being a morphological analyzer which is 
 kind of  implemented in Google as it
 considered to be more effective than the 
 n-gram  ones?   Regards, 
 Eswar On Nov 27, 2007 7:57 AM, Eswar
 K [EMAIL PROTECTED] wrote:   thanks
 james...   How much time does it take to
 index 18m docs?   - EswarOn
 Nov 27, 2007 7:43 AM, James liu
 [EMAIL PROTECTED]  wrote:   i not
 use HYLANDA analyzer.   i use
 je-analyzer and indexing at least 18m
 docs.   i m sorry i only use chinese
 analyzer.On Nov 27, 2007 10:01
 AM, Eswar K [EMAIL PROTECTED]
 wrote:   What is the performance of
 these CJK analyzers (one in lucene and 
 hylanda  )?  We would potentially be
 indexing millions of documents.  
 James,   We would have a look at
 hylanda too. What abt japanese and korean 
 analyzers,  any
 recommendations?   - Eswar  
 On Nov 27, 2007 7:21 AM, James liu
 [EMAIL PROTECTED]   wrote:  
 I don't think NGram is good method for
 Chinese.   CJKAnalyzer of Lucene is
 2-Gram.   Eswar K:   if it is
 chinese analyzer,,i recommend 
 hylandaÅiwww.hylanda.comÅj,,,it  is 
 the best chinese analyzer and it not
 free.   if u wanna free chinese analyzer,
 maybe u can try je-analyzer.   it 
 have  some problem when using
 it. On Nov 27, 2007
 5:56 AM, Otis Gospodnetic  
 [EMAIL PROTECTED] 
 wrote:   Eswar,  
 We've uses the NGram stuff that exists in
 Lucene's  contrib/analyzers  instead
 of CJK.  Doesn't that allow you to do everything
 that   the  Chinese  and CJK
 analyzers do?  It's been a few months since I've
 looked   at  Chinese  and CJK
 Analzyers, so I could be off.  
 Otis   --  Sematext --
 http://sematext.com/ -- Lucene - Solr -
 Nutch   - Original Message
   From: Eswar K
 [EMAIL PROTECTED]  To:
 solr-user@lucene.apache.org  Sent:
 Monday, November 26, 2007 8:30:52 AM 
 Subject: CJK Analyzers for Solr  
 Hi,   Does Solr come with Language
 analyzers for CJK? If not, can you 
 please  direct me to some good CJK
 analyzers?   Regards, 
 Eswar   
 --  regards 
 jl   -- 
 regards  jl   


 --
 Ken Krugler
 Krugle, Inc.
 +1 530-210-6378
 If you can't find it, you can't fix it




-- 
regards
jl