date:20090507

Phrase matching on a text field

2009-05-07 Thread Phil Chadwick

Hi,

I'm trying to figure out why phrase matching on a text field only works
some of the time.

I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR
INTEGRATED CATCHMENT.  The FOR seems to be causing a problem...

The title field is indexed as both s_title and t_title (string and text,
as defined in the demo schema), thus:

field name=title type=string indexed=false stored=false
multiValued=false /
field name=s_title type=string indexed=true stored=true
multiValued=false /
field name=t_title type=text indexed=true stored=false
multiValued=false /
copyField source=title dest=s_title /
copyField source=title dest=t_title /

I can match the document with an exact query on the string:

q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT

I can match the document with this phrase query on the text:

q=t_title:future directions

which uses the parsedquery shown by debugQuery=true:

str name=rawquerystringt_title:future directions/str
str name=querystringt_title:future directions/str
str name=parsedqueryPhraseQuery(t_title:futur direct)/str
str name=parsedquery_toStringt_title:futur direct/str

Similarly, I can match the document with this query:

q=t_title:integrated catchment

which uses the parsedquery shown by debugQuery=true:

str name=rawquerystringt_title:integrated catchment/str
str name=querystringt_title:integrated catchment/str
str name=parsedqueryPhraseQuery(t_title:integr catchment)/str
str name=parsedquery_toStringt_title:integr catchment/str

But I can not match the document with the query:

q=t_title:future directions for integrated catchment

which uses the phrase query shown by debugQuery=true:

str name=rawquerystring
t_title:future directions for integrated catchment/str
str name=querystring
t_title:future directions for integrated catchment/str
str name=parsedquery
PhraseQuery(t_title:futur direct integr catchment)/str
str name=parsedquery_toString
t_title:futur direct integr catchment/str

Any wisdom gratefully accepted.

Cheers,


-- 
Phil

640K ought to be enough for anybody.
-- Bill Gates, in 1981

What are the Unicode encodings supported by Solr?

2009-05-07 Thread KK

Hi,
I'd like to know about the different Unicode[/any other?] encodings
supported by Solr for posting docs [thru Solrj in my case]. Is it that just
UTF-8, UCN  supported or other character encodings like NCR(decimal),
NCR(hex) etc are supported as well?
Now the problem is that during automating the crawling and indexing process
for Solr I found that for most of the pages the encoding is UTF-8[in this
case searching works fine] but for others the encoding is some other
character encoding[like NCR(dec), NCR(hex) or might be something else, don't
have much idea on this]. So when I fetch the page content thru java methods
using InputSteamReaders and after stripping various tags what I obtained is
raw text with some encoding not getting supported by Solr. So either I've to
confing Solr to support these other encodings as well[only if it is
possible] otherwise convert whatever is the raw text to UTF-8 using some
standard encoders[this solution seems better to me, provided I'm able to
detect the encoding for input]. I'd like to know if there are some standard
encoders are available for this purpose[must be right? din't google much].
Any advice on this is highly appreciated.

An off-beat Q:
In some of the pages I'm getting some \ufffd chars which I think is some
sort of unmappable[by Java?] character, right?. Any idea on how to handle
this? Just replacing with blank char will not do [this depends on the
requirement, though].


Thanks,
KK.

Dataimporthandler Timestamp Error ?

2009-05-07 Thread gateway0


Hi,

when I do a full import I get the following error :

Caused by: java.sql.SQLException: Cannot convert value '-00-00
00:00:00' from column 10 to TIMESTAMP.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321)
at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573)
at
com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617)
at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220)
... 11 more
Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented
as java.sql.Timestamp
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027)
... 17 more

But I thought the Timestamp is generated automatically and has nothing to do
with my mysql database?

best regards, Sebastian
-- 
View this message in context: 
http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dataimporthandler Timestamp Error ?

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

you may need to change the mysql connection parameters so that it does
not throw error for null date

jdbc:mysql://localhost/test?zeroDateTimeBehavior=convertToNull

On Thu, May 7, 2009 at 1:39 PM, gateway0 reiterwo...@yahoo.de wrote:

 Hi,

 when I do a full import I get the following error :

 Caused by: java.sql.SQLException: Cannot convert value '-00-00
 00:00:00' from column 10 to TIMESTAMP.
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
        at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321)
        at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573)
        at
 com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617)
        at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943)
        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901)
        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220)
        ... 11 more
 Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented
 as java.sql.Timestamp
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
        at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027)
        ... 17 more

 But I thought the Timestamp is generated automatically and has nothing to do
 with my mysql database?

 best regards, Sebastian
 --
 View this message in context: 
 http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Creating new QParserPlugin

2009-05-07 Thread Andrey Klochkov

Hi!

I agree that Solr is difficult to extend in many cases. We just patch Solr,
and I guess many other users patch it too. What I propose is to create some
Solr-community site (Solr incubator?) to public patches there, and Solr core
team could then look there and choose patches to apply to the Solr codebase.
I know that one can use Jira for that, but it's not convinient to use it in
this way.

On Thu, May 7, 2009 at 2:41 AM, KaktuChakarabati jimmoe...@gmail.comwrote:


 Hello everyone,
 I am trying to write a new QParserPlugin+QParser, one that will work
 similar
 to how DisMax does, but will give me more control over the
 FunctionQuery-related part of the query processing (e.g in regards to a
 specified bf parameter).

 In specific, I want to be able to affect the way the queryNorm (and
 possibly
 other factors) interact with a
 pre-computed value I store in a static field (i.e I compute an index-time
 score for a document that I wish to use in a bf as a ValueSource, without
 being affected by queryNorm or other such extranous considerations.)

 While trying this, I notice I run alot into cases where some parts I try to
 override/inherit from are private to a java package namespace, and this
 makes the whole thing very cumbersome.

 Examples for this are the DismaxQParser class which is defined as a local
 class inside the DisMaxQParserPlugin.java file (i think this is bad
 practice
 - otherwise, FunctionQParserPlugin/FunctionQParser do have their own
 seperate files, so i think this is a good convention to follow generally).
 Another case is where i try to inherit from FunctionQParser and end up not
 being able to replicate some of the parse() logic, because it uses the
 QueryParsing.StrParser class which is a static inner class and so is only
 accessible from the solr.search namespace.

 In short, many such cases seem to arise and i think this poses a
 considerable limitation on
 the possibilities of extending solr.
 If this resonates with more people here, I'd take this issue up with
 solr-dev.

 Otherwise, if some of you have some notions about going about what i'm
 trying to do differently,
 I would be happy to hear.

 Thanks,
 -Chak
 --
 View this message in context:
 http://www.nabble.com/Creating-new-QParserPlugin-tp23416974p23416974.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Andrew Klochkov

Re: Dataimporthandler Timestamp Error ?

2009-05-07 Thread gateway0


Awesome, thanks!!! I first thought it could be blob-field related.

Have a nice day

Sebastian


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 you may need to change the mysql connection parameters so that it does
 not throw error for null date
 
 jdbc:mysql://localhost/test?zeroDateTimeBehavior=convertToNull
 
 On Thu, May 7, 2009 at 1:39 PM, gateway0 reiterwo...@yahoo.de wrote:

 Hi,

 when I do a full import I get the following error :

 Caused by: java.sql.SQLException: Cannot convert value '-00-00
 00:00:00' from column 10 to TIMESTAMP.
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
        at
 com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321)
        at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573)
        at
 com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617)
        at
 com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943)
        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901)
        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220)
        ... 11 more
 Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be
 represented
 as java.sql.Timestamp
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
        at
 com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027)
        ... 17 more

 But I thought the Timestamp is generated automatically and has nothing to
 do
 with my mysql database?

 best regards, Sebastian
 --
 View this message in context:
 http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23424354.html
Sent from the Solr - User mailing list archive at Nabble.com.

French and SpellingQueryConverter

2009-05-07 Thread Jonathan Mamou


Hi
I have tried to run the following code
package org.apache.solr.spelling;

import org.apache.lucene.analysis.fr.FrenchAnalyzer;


public class Test {

  public static void main (String args[]) {
SpellingQueryConverter sqc = new SpellingQueryConverter();
sqc.analyzer = new FrenchAnalyzer();
System.out.println(sqc.convert(français));
  };

}};

I would expect to get [(français,0,8,type=ALPHANUM)]
However I get [(fran,0,4,type=ALPHANUM), (ais,5,8,type=ALPHANUM)]
Is there any issue with the support of special characters?
Thanks
Jonathan

Re: no subject aka Replication Stall

2009-05-07 Thread Jeff Newburn

We have not pushed the fix into production yet.  However, I am wondering two
things. 1. If the download takes more than 10 seconds (our replication can
take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have
2 line changes 1 has a large amount. Do we need the latest 2 or just the
latest 1?

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Wed, 6 May 2009 10:05:49 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: no subject aka Replication Stall
 
 SOLR-1096

Re: Solr Plugins Simple Questions for the Simpleton

2009-05-07 Thread Jeff Newburn

 On May 6, 2009, at 3:25 PM, Jeff Newburn wrote:
 
 We are trying to implement a SearchCompnent plugin.  I have been
 looking at
 QueryElevateComponent trying to weed through what needs to be done.
 My
 basic desire is to get the results back and manipulate them either by
 altering the actual results or the facets.
 
 Questions:
 1. Do the components fire off in order or all individually? If so
 how does
 one chain them together?
 
 http://wiki.apache.org/solr/SearchComponent
I apologize. This question was more looking for insight into how the
requests are made.  One more interesting question is what does each
component get from the previous one?

 
 2. Where are the actual documents returned (ie what object gets the
 return
 results)?
 
 Look on the ResponseBuilder object.
I looked into the javadoc for this class and the description is as follows:
This class is experimental and will be changing in the future.

Are there any tips to point us in the right direction to use and manipulate
this? Also does this class get passed from component to component?


 
 3. Is there any specific place I should manipulate the result set?
 
 I've done it in the past right on the response docset/doclist, but
 I've seen others discourage this kind of thing b/c you might not know
 the downstream effects
So does the doc list get passed down the chain in the responsebuilder?


 4. Can the individual documents be changed before returning to the
 client?
 
 In what way?
In a way that might manipulate what is returned.  We have 2 potential
avenues. 1. Change the document to remove some values out of a multivalued
field or 2. Change the facets returned.

 -- 
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search

Re: Upgrading from 1.2.0 to 1.3.0

2009-05-07 Thread Rob Casson

this isn't advice on how to upgrade, but if you/your-project have a
bit of time to wait, 1.4 sounds like it's getting close to an official
releasefyi.

cheers,
rob


On Tue, May 5, 2009 at 1:05 PM, Francis Yakin fya...@liquid.com wrote:

 What's the best way to upgrade solr from 1.2.0 to 1.3.0 ?

 We have the current index that our users search running on 1.2.0 Solr version.

 We would like to upgrade it to 1.3.0?

 We have Master/Slaves env.

 What's the best way to upgrade it without affecting the search? Do we need to 
 do it on master or slaves first?



 Thanks

 Francis

Re: Solr autocompletion in rails

2009-05-07 Thread manisha_5


Thanks a lot for the information. But I am still a bit confused about the use
of TermsComponents. Like where are we exactly going to put these codes in
Solr.For example I changed schema.xml to add autocomplete feauture.I read
your blog too, its very helpful.But still a little confused. :-((
Can you explain it a bit?



Matt Weber-2 wrote:
 
 You will probably want to use the new TermsComponent in Solr 1.4.  See
 http://wiki.apache.org/solr/TermsComponent 
 .  I just recently wrote a blog post about using autocompletion with  
 TermsComponent, a servlet, and jQuery.  You can probably follow these  
 instructions, but instead of writing a servlet you can write a rails  
 handler parsing the json output directly.
 
 http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/
  
 .
 
 Thanks,
 
 Matt Weber
 
 
 
 On May 4, 2009, at 9:39 AM, manisha_5 wrote:
 

 Hi,

 I am new to solr. I am using solr server to index the data and make  
 search
 in a Ruby on rails project.I want to add autocompletion feature. I  
 tried
 with the xml patch in the schema.xml file of solr, but dont know how  
 to test
 if the feature is working.also havent been able to integrate the  
 same in the
 Rails project that is using Solr.Can anyone please provide some help  
 in this
 regards??

 the patch of codes in Schema.xml is :

 fieldType name=autocomplete class=solr.TextField
analyzer type=index
tokenizer class=solr.NGramTokenizerFactory  
 minGramSize=3
 maxGramSize=15 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.EdgeNGramFilterFactory  
 maxGramSize=100
 minGramSize=1 /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.PatternReplaceFilterFactory
 pattern=^(.{20})(.*)? replacement=$1 replace=all /
/analyzer
   /fieldType

 -- 
 View this message in context:
 http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23372020.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23428267.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When should I optimize/

2009-05-07 Thread Eric Sabourin

Great... thanks for the response!

2009/5/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 it is wise to optimize the index once in a while (daily may be). But
 it depends on how many commits you do in a day. Every commit causes
 fragmentation of index files and your search can become slow if you do
 not optimize it.

 But optimizing always is not recommended because it is time consuming
 and your replication (if it is a master slave setup) can take longer

 .
 if you do a delete all then do an optimize anyway

 On Wed, May 6, 2009 at 9:18 PM, Eric Sabourin
 eric.sabourin2...@gmail.com wrote:
  Is the optimize xml command something which is only required when I
 delete
  all the docs?
  Or should I also send the optimize command following other operations? or
  daily?
 
  Thanks...
  Eric
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Eric
Sent from Halifax, NS, Canada

Is it possible to writing solr result on disk from the server side?

2009-05-07 Thread arno13


Do you know if it's possible to writing solr results directly on a hard disk
from server side and not to use an HTTP connection to transfer the results?

While the query time is very fast for solr, I want to do that, cause of the
time taken during the transfer of the results between the client and the
solr server when you have lot of 'rows'.
For instance for 10'000 rows, the query time could be 50 ms and 19s to get
the results from the server. As my client and server are on the same system,
I could get the results faster directly on the hard disk (or better in a ram
disk), is it possible configuring solr for that?

Regards,



 
-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-writing-solr-result-on-disk-from-the-server-side--tp23428509p23428509.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: no subject aka Replication Stall

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

the patches have gone into the trunk. The latest patch should be the
one if you wish to run a patched Solr.

10 secs readTimeout means that if there is no data coming from the
other end for 10 secs, then the waiting thread returns throwing an
exception. It is not the total time taken to read the entire data. At
least that is what I observed while testing.

BTW, if the timeout occurs it resumes from the point where the failure
happened. It retries 5 times before giving up.

On Thu, May 7, 2009 at 7:32 PM, Jeff Newburn jnewb...@zappos.com wrote:
 We have not pushed the fix into production yet.  However, I am wondering two
 things. 1. If the download takes more than 10 seconds (our replication can
 take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have
 2 line changes 1 has a large amount. Do we need the latest 2 or just the
 latest 1?

 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562


 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Wed, 6 May 2009 10:05:49 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: no subject aka Replication Stall

 SOLR-1096





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Is it possible to writing solr result on disk from the server side?

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

did you consider using an EmbeddedSolrServer?

On Thu, May 7, 2009 at 8:25 PM, arno13 arnaud.gaudi...@healthonnet.org wrote:

 Do you know if it's possible to writing solr results directly on a hard disk
 from server side and not to use an HTTP connection to transfer the results?

 While the query time is very fast for solr, I want to do that, cause of the
 time taken during the transfer of the results between the client and the
 solr server when you have lot of 'rows'.
 For instance for 10'000 rows, the query time could be 50 ms and 19s to get
 the results from the server. As my client and server are on the same system,
 I could get the results faster directly on the hard disk (or better in a ram
 disk), is it possible configuring solr for that?

 Regards,




 --
 View this message in context: 
 http://www.nabble.com/Is-it-possible-to-writing-solr-result-on-disk-from-the-server-side--tp23428509p23428509.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: no subject aka Replication Stall

2009-05-07 Thread Jeff Newburn

Excellent! Thank you I am going to start testing that.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Reply-To: solr-user@lucene.apache.org
 Date: Thu, 7 May 2009 20:26:02 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: no subject aka Replication Stall

 the patches have gone into the trunk. The latest patch should be the
 one if you wish to run a patched Solr.

 10 secs readTimeout means that if there is no data coming from the
 other end for 10 secs, then the waiting thread returns throwing an
 exception. It is not the total time taken to read the entire data. At
 least that is what I observed while testing.

 BTW, if the timeout occurs it resumes from the point where the failure
 happened. It retries 5 times before giving up.

 On Thu, May 7, 2009 at 7:32 PM, Jeff Newburn jnewb...@zappos.com wrote:
 We have not pushed the fix into production yet.  However, I am wondering two
 things. 1. If the download takes more than 10 seconds (our replication can
 take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have
 2 line changes 1 has a large amount. Do we need the latest 2 or just the
 latest 1?

 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Wed, 6 May 2009 10:05:49 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: no subject aka Replication Stall

 SOLR-1096

 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: large index vs multicore

2009-05-07 Thread Nicolas Pastorino


Hi, and sorry for slightly hijacking the thread,

On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:



Hi,

Without knowing the details, I'd say keep it in the same index if  
the additional information shares some/enough fields with the main  
product data and separately if it's sufficiently distinct (this  
also means 2 queries and manual merging/joining).


Where would this manual merging/joining occur? At the client-side or  
inside Solr, before returning the results ?

I was wondering what relevancy, sorting, etc. would become.
--
Nicolas



Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wednesday, March 25, 2009 5:46:40 PM
Subject: large index vs multicore

Hi All,
In my project, I have one primary core containing all  
the basic

information for a product.
Now I need to add additional information which will be searched  
and displayed in

conjunction with the product results.
My question is - From design and query speed point of - should I  
add new core to
handle the additional data or should I add the data to the  
existing core.


The data size is not very large around 150,000 - 200,000 documents.

Any insights into this will be helpful

Thanks,
Kalyan Manepalli




--
Nicolas Pastorino
Consultant - Trainer - System Developer
Phone :  +33 (0)4.78.37.01.34
eZ Systems ( Western Europe )  |  http://ez.no

RE: When should I optimize/

2009-05-07 Thread Wang, Ching-Hsien

We do optimize once a day at 1am.

Ching-hsien Wang,  Manager
Library and Archives System Support Branch
Office of Chief Information Officer
Smithsonian Institution
202-633-5581(office)  202-312-2874(fax)
wan...@si.edu
Visit us online: www.siris.si.edu

-Original Message-
From: Eric Sabourin [mailto:eric.sabourin2...@gmail.com] 
Sent: Thursday, May 07, 2009 10:52 AM
To: solr-user@lucene.apache.org
Subject: Re: When should I optimize/

Great... thanks for the response!

2009/5/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 it is wise to optimize the index once in a while (daily may be). But
 it depends on how many commits you do in a day. Every commit causes
 fragmentation of index files and your search can become slow if you do
 not optimize it.

 But optimizing always is not recommended because it is time consuming
 and your replication (if it is a master slave setup) can take longer

 .
 if you do a delete all then do an optimize anyway

 On Wed, May 6, 2009 at 9:18 PM, Eric Sabourin
 eric.sabourin2...@gmail.com wrote:
  Is the optimize xml command something which is only required when I
 delete
  all the docs?
  Or should I also send the optimize command following other operations? or
  daily?
 
  Thanks...
  Eric
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Eric
Sent from Halifax, NS, Canada

Re: Phrase matching on a text field

2009-05-07 Thread Jay Hill

The string fieldtype is not being tokenized, while the text fieldtype is
tokenized. So the stop word for is being removed by a stop word filter,
which doesn't happen with the text field type (no tokenizing).

Have a look at the schema.xml in the example dir and look at the default
configuration for both the text and string fieldtypes. String string
fieldtype is not analyzed whereas the text fieldtype has a number of
different filters that take action.

-Jay


On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick
p.chadw...@internode.on.netwrote:

 Hi,

 I'm trying to figure out why phrase matching on a text field only works
 some of the time.

 I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR
 INTEGRATED CATCHMENT.  The FOR seems to be causing a problem...

 The title field is indexed as both s_title and t_title (string and text,
 as defined in the demo schema), thus:

field name=title type=string indexed=false stored=false
multiValued=false /
field name=s_title type=string indexed=true stored=true
multiValued=false /
field name=t_title type=text indexed=true stored=false
multiValued=false /
copyField source=title dest=s_title /
copyField source=title dest=t_title /

 I can match the document with an exact query on the string:

q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT

 I can match the document with this phrase query on the text:

q=t_title:future directions

 which uses the parsedquery shown by debugQuery=true:

str name=rawquerystringt_title:future directions/str
str name=querystringt_title:future directions/str
str name=parsedqueryPhraseQuery(t_title:futur direct)/str
str name=parsedquery_toStringt_title:futur direct/str

 Similarly, I can match the document with this query:

q=t_title:integrated catchment

 which uses the parsedquery shown by debugQuery=true:

str name=rawquerystringt_title:integrated catchment/str
str name=querystringt_title:integrated catchment/str
str name=parsedqueryPhraseQuery(t_title:integr catchment)/str
str name=parsedquery_toStringt_title:integr catchment/str

 But I can not match the document with the query:

q=t_title:future directions for integrated catchment

 which uses the phrase query shown by debugQuery=true:

str name=rawquerystring
t_title:future directions for integrated catchment/str
str name=querystring
t_title:future directions for integrated catchment/str
str name=parsedquery
PhraseQuery(t_title:futur direct integr catchment)/str
str name=parsedquery_toString
t_title:futur direct integr catchment/str

 Any wisdom gratefully accepted.

 Cheers,


 --
 Phil

 640K ought to be enough for anybody.
-- Bill Gates, in 1981

Re: French and SpellingQueryConverter

2009-05-07 Thread Jay Hill

It seems to me that this is just the expected behavior of the FrenchAnalyzer
using the FrenchStemmer. I'm not familiar with the French language, but in
English words like running, runner, and runs are all stemmed down to run
as intended. I don't know what other words in French would stem down to
franc, but wouldn't this be what you would want? If not, maybe experiment
with some of the other Analyzers to see if they give you what you need.

-Jay

On Thu, May 7, 2009 at 6:51 AM, Jonathan Mamou ma...@il.ibm.com wrote:


 Hi
 I have tried to run the following code
 package org.apache.solr.spelling;

 import org.apache.lucene.analysis.fr.FrenchAnalyzer;


 public class Test {

  public static void main (String args[]) {
SpellingQueryConverter sqc = new SpellingQueryConverter();
sqc.analyzer = new FrenchAnalyzer();
System.out.println(sqc.convert(français));
  };

 }};

 I would expect to get [(français,0,8,type=ALPHANUM)]
 However I get [(fran,0,4,type=ALPHANUM), (ais,5,8,type=ALPHANUM)]
 Is there any issue with the support of special characters?
 Thanks
 Jonathan

RE: What are the Unicode encodings supported by Solr?

2009-05-07 Thread Steven A Rowe

Hi KK,

On 5/7/2009 at 2:55 AM, KK wrote:
 In some of the pages I'm getting some \ufffd chars which I think is
 some sort of unmappable[by Java?] character, right?. Any idea on how
 to handle this? Just replacing with blank char will not do [this
 depends on the requirement, though].

From http://www.unicode.org/charts/PDF/UFFF0.pdf:

FFFD: REPLACEMENT CHARACTER: used to replace an
incoming character whose value is unknown or
unrepresentable in Unicode.

Also, from http://www.unicode.org/versions/Unicode5.1.0/:

Applications are free to use any of these noncharacter
code points internally but should never attempt to
exchange them. If a noncharacter is received in open
interchange, an application is not required to
interpret it in any way. It is good practice, however,
to recognize it as a noncharacter and to take
appropriate action, such as replacing it with U+FFFD
REPLACEMENT CHARACTER, to indicate the problem in the
text. It is not recommended to simply delete
noncharacter code points from such text, because of
the potential security issues caused by deleting
uninterpreted characters. (See conformance clause C7
in Section 3.2, Conformance Requirements, and Unicode
Technical Report #36, Unicode Security
Considerations.)

So if you're seeing \ufffd in text, you (or someone before you in the 
processing chain) attempted to convert the text from some other encoding into 
Unicode, but the encoding conversion failed (no target Unicode character 
corresponding to the source character).  This can happen when attempting to 
convert from an incorrectly identified source encoding.

Steve

Sorting by 'starts with'

2009-05-07 Thread wojtekpia


I have an index of product names. I'd like to sort results so that entries
starting with the user query come first. 
E.g. 

q=kitchen

Results would sort something like:
1. kitchen appliance
2. kitchenaid dishwasher
3. fridge for kitchen

It looks like using a query Function Query comes close, but I don't know how
to write a subquery that only matches if the value starts with the query
string. 

Has anyone solved a similar need?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Sorting-by-%27starts-with%27-tp23432815p23432815.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr spring application context error

2009-05-07 Thread Raju444us


I have configured solr using tomcat.Everything works fine.I overrode
QParserPlugin and configured it.The overriden QParserPlugin has a dependency
on another project say project1.So I made a jar of the project and copied
the jar to the solr/home lib dir.

the project1 project is using spring.It has a factory class which loads the
beans.Iam using this factory calss in QParserPlugin to get a bean.When I
start my tomcat the factory class is loading fine.But the problem is its not
loading the beans.And Iam getting exception 

org.springframework.beans.factory.BeanDefinitionStoreException: IOException
parsing XML document from class path resource
[com/mypackage/applicationContext.xml]; nested exception is
java.io.FileNotFoundException: class path resource
[com/mypackage/applicationContext.xml] cannot be opened because it does not
exist

Do I need to do something else?. Can anybody please help me.

Thanks,
Raju


-- 
View this message in context: 
http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solrcofig.xml - need some info

2009-05-07 Thread Raju444us


This is resolved.I solved this by reading solrPlugins on the solr wiki.

Thanks,
Raju

Raju444us wrote:
 
 Hi Hoss,
 
 If i extend SolrQueryParser and override method getFieldQuery for some
 customization.Can I configure my new queryParser somthing like below
 
   requestHandler name=standard class=solr.MynewParser default=true
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
!-- 
int name=rows10/int
str name=fl*/str
str name=version2.1/str
 --
  /lst
   /requestHandler
 
 Do I need to place my new Parser class in solr/home/lib folder?
 Is this the right way to do this.
 
 Thanks,
 Raju
 
 
 
 
 hossman wrote:
 
 : I am pretty new to solr. I was wondering what is this mm attribute in
 : requestHandler in solrconfig.xml and how it works. Tried to search wiki
 : could not find it
 
 Hmmm... yeah wiki search does mid-word matching doesn't it?
 
 the key thng to realize is that the requestHandler you were looking at 
 when you saw that option was the DisMaxRequestHandler...
 
  http://wiki.apache.org/solr/DisMaxRequestHandler
 
 
 
 -Hoss
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solrcofig.xml---need-some-info-tp15341858p23433477.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Control segment size

2009-05-07 Thread vivek sar

Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi,

 You are looking for maxMergeDocs, I believe.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 23, 2009 1:08:20 PM
 Subject: Control segment size

 Hi,

   Is there any configuration to control the segments' file size in
 Solr? Currently, I've an index (70G) with 80 segment files and one of
 the file is 24G. We noticed that in some cases commit takes over 2
 hours to complete (committing 50K records), whereas usually it
 finishes in 20 seconds. After further investigation it turns out the
 system was doing lot of paging - the file system buffer was trying to
 write back the big segment back to disk. I got 20G memory on system
 with 6 G assigned to Solr instance (running 2 instances).

 It seems if I can control the segment size to max of 4-5 GB I'll be
 ok. Is there any way to do so?

 I got merging factor of 100 - does that impacts the size too? Why
 different segments have different size?

 Thanks,
 -vivek

Backups using Java-based Replication (forced snapshot)

2009-05-07 Thread Grant Ingersoll

On the page http://wiki.apache.org/solr/SolrReplication, it says the  
following:
Force a snapshot on master.This is useful to take periodic  
backups .command :  http://master_host:port/solr/replication? 
command=snapshoot


This then puts the snapshot under the data directory.  Perfectly  
reasonable thing to do.  However, is it possible to have it take in a  
directory location and store the snapshot there?  For instance, I may  
want to have it write to a specific directory that is being watched  
for backup data.


Thanks,
Grant


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: French and SpellingQueryConverter

2009-05-07 Thread Jonathan Mamou

Hi
It does not seem to be related to FrenchStemmer, the stemmer does not split
a word into 2 words. I have checked with other words and
SpellingQueryConverter always splits words with special character.
I think that the issue is in SpellingQueryConverter class
Pattern.compile.((?:(?!(\\w+:|\\d+)))\\w+);?:
According to
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html,
\w A word character: [a-zA-Z_0-9]
I think that special character should also be added to the regex.
Best regards,
Jonathan


   
 Jay Hill  
 jayallenh...@gma 
 il.comTo
   solr-user@lucene.apache.org 
 07/05/2009 20:33   cc
   
   Subject
 Please respond to Re: French and  
 solr-u...@lucene. SpellingQueryConverter  
apache.org 
   
   
   
   
   




It seems to me that this is just the expected behavior of the
FrenchAnalyzer
using the FrenchStemmer. I'm not familiar with the French language, but in
English words like running, runner, and runs are all stemmed down to run
as intended. I don't know what other words in French would stem down to
franc, but wouldn't this be what you would want? If not, maybe experiment
with some of the other Analyzers to see if they give you what you need.

-Jay

On Thu, May 7, 2009 at 6:51 AM, Jonathan Mamou ma...@il.ibm.com wrote:


 Hi
 I have tried to run the following code
 package org.apache.solr.spelling;

 import org.apache.lucene.analysis.fr.FrenchAnalyzer;


 public class Test {

  public static void main (String args[]) {
SpellingQueryConverter sqc = new SpellingQueryConverter();
sqc.analyzer = new FrenchAnalyzer();
System.out.println(sqc.convert(français));
  };

 }};

 I would expect to get [(français,0,8,type=ALPHANUM)]
 However I get [(fran,0,4,type=ALPHANUM), (ais,5,8,type=ALPHANUM)]
 Is there any issue with the support of special characters?
 Thanks
 Jonathan

Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Jim Murphy


Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock
used between addDoc and commit.  

My mental model of the process was this: clients can add/update documents
until the auto commit threshold was hit.  At that point the commit tracker
would schedule a background commit.  The commit would run and NOT BLOCK
subsequent adds.  clearly thast not happening because when the autocommit
background thread runs it gets the iwCommit lock blocking anyone in addDoc
trying to get iwAccess lock.

Is this just the way it is or is it possible to configure Solr to process
the pending documents int he background, queuing new documents in memory as
before.  

Question 2: I ask this question because autocommits are taking a LONG time
to complete, like 10-25 seconds.  I have a 40M document index many 10s of
GBs.  What can I do to speed this up?

Thanks

Jim
-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23435224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Yonik Seeley

On Thu, May 7, 2009 at 5:03 PM, Jim Murphy jim.mur...@pobox.com wrote:
 Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock
 used between addDoc and commit.

 My mental model of the process was this: clients can add/update documents
 until the auto commit threshold was hit.  At that point the commit tracker
 would schedule a background commit.  The commit would run and NOT BLOCK
 subsequent adds.  clearly thast not happening because when the autocommit
 background thread runs it gets the iwCommit lock blocking anyone in addDoc
 trying to get iwAccess lock.

Background: in the past, you had to close the Lucene IndexWriter so
all changes would be flushed to disk (and you could then open a new
IndexReader to seel the changes).  You obviously can't be adding new
documents while you're trying to close the writer - hence the locking.
 It as also the case that readers and writers had to be opened and
closed in the right way to handle things like deletes (which had to go
through the reader).

This is no longer the case, and we should revisit the locking.  I do
think we should be able to continue indexing while doing a commit.

-Yonik
http://www.lucidimagination.com

preImportDeleteQuery

2009-05-07 Thread wojtekpia


Hi,
I'm importing data using the DIH. I manage all my data updates outside of
Solr, so I use the full-import command to update my index (with
clean=false). Everything works fine, except that I can't delete documents
easily using the DIH. I noticed the preImportDeleteQuery attribute, but
doesn't seem to do what I'm looking for. I'm looking to do something like:

preImportDeleteQuery=ItemId={select ItemId from table where
status='delete'}

http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to address
this, but I couldn't find any documentation for it in the wiki. Can someone
provide an example of how to use this?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr autocompletion in rails

2009-05-07 Thread Matt Weber

First, your solrconfig.xml should have the something similar to the
following:

searchComponent name=termsComp
class=org.apache.solr.handler.component.TermsComponent/

requestHandler name=/autoSuggest
class=org.apache.solr.handler.component.SearchHandler

arr name=components
strtermsComp/str
/arr
/requestHandler

This will give you a request handler called /autoSuggest that you
will use for suggestions.

Then you need to write some rails code to access this. I am not very
familiar with ruby, but I believe you might want to try http://wiki.apache.org/solr/solr-ruby
. Make sure you set your query type to /autoSuggest. If that won't
work for you, then just use the standard http libraries to access the
autoSuggest url directly and get json output.

With any of these methods make sure you set the following parameters:

terms=true
terms.fl=source_field
terms.lower=input_term
terms.prefix=input_term
terms.lower.incl=false

For direct access to the json output you will want these as well:

indent=true
wt=json

The terms.fl parameter specifys the field(s) you want to use as the
source for suggestions. Make sure this field has very little
processing done on it, maybe lowercasing and tokenization only.

Here is an example url that should give you some output once things
are working:

http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=spellterms.lower=tterms.prefix=tterms.lower.incl=falseindent=truewt=json

The next thing is to parse the json output and do whatever you want
with the results. In my example, I just printed out each suggestion
on a single line of the response because this is what the jQuery
autocomplete plugin wanted. The easiest way to parse the json output
is to use the json ruby library, http://json.rubyforge.org/.

After you have your rails controller working you can hook it into your
FE with some javascript like I did in the example on my blog. Hope
this helps.

Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com

On May 7, 2009, at 7:37 AM, manisha_5 wrote:

Thanks a lot for the information. But I am still a bit confused
about the use
of TermsComponents. Like where are we exactly going to put these
codes in
Solr.For example I changed schema.xml to add autocomplete feauture.I
read

your blog too, its very helpful.But still a little confused. :-((
Can you explain it a bit?

Matt Weber-2 wrote:

You will probably want to use the new TermsComponent in Solr 1.4.
See

http://wiki.apache.org/solr/TermsComponent
. I just recently wrote a blog post about using autocompletion with
TermsComponent, a servlet, and jQuery. You can probably follow these
instructions, but instead of writing a servlet you can write a rails
handler parsing the json output directly.

http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/
.

Thanks,

Matt Weber

On May 4, 2009, at 9:39 AM, manisha_5 wrote:

Hi,

I am new to solr. I am using solr server to index the data and make
search
in a Ruby on rails project.I want to add autocompletion feature. I
tried
with the xml patch in the schema.xml file of solr, but dont know how
to test
if the feature is working.also havent been able to integrate the
same in the
Rails project that is using Solr.Can anyone please provide some help
in this
regards??

the patch of codes in Schema.xml is :

fieldType name=autocomplete class=solr.TextField
analyzer type=index
tokenizer class=solr.NGramTokenizerFactory
minGramSize=3
maxGramSize=15 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.EdgeNGramFilterFactory
maxGramSize=100
minGramSize=1 /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all /
/analyzer
/fieldType

--
View this message in context:
http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23372020.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23428267.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: preImportDeleteQuery

2009-05-07 Thread Martin Davidsson


On May 7, 2009, at 4:52 PM, wojtekpia wrote:


Hi,
I'm importing data using the DIH. I manage all my data updates  
outside of

Solr, so I use the full-import command to update my index (with
clean=false). Everything works fine, except that I can't delete  
documents
easily using the DIH. I noticed the preImportDeleteQuery attribute,  
but
doesn't seem to do what I'm looking for. I'm looking to do something  
like:


preImportDeleteQuery=ItemId={select ItemId from table where
status='delete'}

http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to  
address
this, but I couldn't find any documentation for it in the wiki. Can  
someone

provide an example of how to use this?

Thanks,

Wojtek
--
View this message in context: 
http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
Sent from the Solr - User mailing list archive at Nabble.com.


I haven't used those special variables but I noticed an example of  
$skipDoc in the wiki under the Indexing Wikipedia example (http://wiki.apache.org/solr/DataImportHandler 
).


-- Martin

bug? No highlighting results with dismax and q.alt=:

2009-05-07 Thread Peter Wolanin

For the Drupal Apache Solr Integration module, we are exploring the
possibility of doing facet browsing  - since we are using dismax as
the default handler, this would mean issuing a query with an empty q
and falling back to to q.alt='*:*' or some other q.alt that matches
all docs.

However, I notice when I do this that we do not get any highlights
back in the results despite defining a highlight alternate field.

In contrast, if I force the standard request handler then I do get
text back from the highlight alternate field:

select/?q=*:*qt=standardhl=truehl.fl=bodyhl.alternateField=bodyhl.maxAlternateFieldLength=256

However, I then loose the nice dismax features of weighting the
results using bq and bf parameters.  So, is this a bug or the intended
behavior?

The relevant fragment of the solrconfig.xml is this:

  requestHandler name=partitioned class=solr.SearchHandler default=true
lst name=defaults
 str name=defTypedismax/str

 str name=q.alt*:*/str

   !-- example highlighter config, enable per-query with hl=true --
 str name=hltrue/str
 str name=hl.flbody/str
 int name=hl.snippets3/int
 str name=hl.mergeContiguoustrue/str
   !-- instructs Solr to return the field itself if no query terms are
found --
 str name=f.body.hl.alternateFieldbody/str
 str name=f.body.hl.maxAlternateFieldLength256/str


Full solrconfig.xml and other files:
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/?pathrev=DRUPAL-6--1

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Jim Murphy


Interesting.  So is there a JIRA ticket open for this already? Any chance of
getting it into 1.4?  Its seriously kicking out butts right now.  We write
into our masters with ~50ms response times till we hit the autocommit then
add/update response time is 10-30 seconds.  Ouch.

I'd be willing to work on submitting a patch given a better understanding of
the issue. 

Jim
-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23438134.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching on a text field

2009-05-07 Thread Phil Chadwick

Hi Jay

Thank you for your response.

The data relating to the string (s_title) defines *exactly* what was
fed into the SOLR indexing.  The string is not otherwise relevant to
the question.

The essence of my question is why can the indexed text (t_title) not
be phrase matched by the query on the text when the word for is
present in the query.

The following work (and I would expect them to work):

q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT
q=t_title:future directions
q=t_title:integrated catchment

The following do not work (and I would expect them to work):

q=t_title:directions for integrated

The following do not work (not sure if I expect them to work or not):

q=t_title:directions integrated

My reading is that if the FOR is removed in the text indexing, it
should also be removed for the text query!

I also added 'enablePositionIncrements=true' to the text query analyzer
to make it the same as the text index analyzer:

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true/

There was no change in the outcome.

The definitions for text and string were exactly as in the SOLR 1.3
example schema (shown below).

The section of that schema for text is shown below.

fieldType name=text class=solr.TextField positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true
  expand=true/
filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  !-- enablePositionIncrements=true --
  /
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1
  generateNumberParts=1
  catenateWords=0
  catenateNumbers=0
  catenateAll=0
  splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

/fieldType


Cheers,


-- 
Phil

The art of being wise is the art of knowing what to overlook.
-- William James



Jay Hill wrote:

 The string fieldtype is not being tokenized, while the text fieldtype is
 tokenized. So the stop word for is being removed by a stop word filter,
 which doesn't happen with the text field type (no tokenizing).
 
 Have a look at the schema.xml in the example dir and look at the default
 configuration for both the text and string fieldtypes. String string
 fieldtype is not analyzed whereas the text fieldtype has a number of
 different filters that take action.

 On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick
 p.chadw...@internode.on.netwrote:
 
  Hi,
 
  I'm trying to figure out why phrase matching on a text field only works
  some of the time.
 
  I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR
  INTEGRATED CATCHMENT.  The FOR seems to be causing a problem...
 
  The title field is indexed as both s_title and t_title (string and text,
  as defined in the demo schema), thus:
 
 field name=title type=string indexed=false stored=false
 multiValued=false /
 field name=s_title type=string indexed=true stored=true
 multiValued=false /
 field name=t_title type=text indexed=true stored=false
 multiValued=false /
 copyField source=title dest=s_title /
 copyField source=title dest=t_title /
 
  I can match the document with an exact query on the string:
 
 q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT
 
  I can match the document with this phrase query on the text:
 
 q=t_title:future directions
 
  which uses the parsedquery shown by debugQuery=true:
 
 str name=rawquerystringt_title:future directions/str
 str name=querystringt_title:future directions/str
 str name=parsedqueryPhraseQuery(t_title:futur direct)/str
 str name=parsedquery_toStringt_title:futur direct/str
 
  Similarly, I can match the document with this query:
 
 q=t_title:integrated catchment
 
  which uses the parsedquery shown by debugQuery=true:
 
 str name=rawquerystringt_title:integrated catchment/str
 str name=querystringt_title:integrated catchment/str
 str name=parsedqueryPhraseQuery(t_title:integr catchment)/str
 str name=parsedquery_toStringt_title:integr catchment/str

StatsComponent and 1.3

2009-05-07 Thread David Shettler

Foreword:  I'm not a java developer :)

OSVDB.org and datalossdb.org make use of solr pretty extensively via
acts_as_solr.

I found myself with a real need for some of the StatsComponent stuff
(mainly the sum feature), so I pulled down a nightly build and played
with it.  StatsComponent proved perfect, but... the nightly build
output seems to be different, and thus incompatible with acts_as_solr.

Now, I realize this is more or less an acts_as_solr issue, but...

Is it possible, with some degree of effort (obviously) for me to
essentially port some of the functionality of StatsComponent to 1.3
myself?  It's that, or waiting for 1.4 to come out and someone
developing support for it into acts_as_solr, or myself fixing what I
have for acts_as_solr to work with the output.  I'm just trying to
gauge the easiest solution :)

Any feedback or suggestions would be grand.

Thanks,

Dave
Open Security Foundation

Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-07 Thread Yonik Seeley

On Thu, May 7, 2009 at 8:37 PM, Jim Murphy jim.mur...@pobox.com wrote:
 Interesting.  So is there a JIRA ticket open for this already? Any chance of
 getting it into 1.4?

No ticket currently open, but IMO it could make it for 1.4.

 Its seriously kicking out butts right now.  We write
 into our masters with ~50ms response times till we hit the autocommit then
 add/update response time is 10-30 seconds.  Ouch.

It's probably been made a little worse lately since Lucene now does
fsync on index files before writing the segments file that points to
those files.  A necessary evil to prevent index corruption.

 I'd be willing to work on submitting a patch given a better understanding of
 the issue.

Great, go for it!

-Yonik
http://www.lucidimagination.com

Re: Backups using Java-based Replication (forced snapshot)

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

makes sense. I'll open an issue

On Fri, May 8, 2009 at 1:53 AM, Grant Ingersoll gsing...@apache.org wrote:
 On the page http://wiki.apache.org/solr/SolrReplication, it says the
 following:
 Force a snapshot on master.This is useful to take periodic backups .command
 :  http://master_host:port/solr/replication?command=snapshoot;

 This then puts the snapshot under the data directory.  Perfectly reasonable
 thing to do.  However, is it possible to have it take in a directory
 location and store the snapshot there?  For instance, I may want to have it
 write to a specific directory that is being watched for backup data.

 Thanks,
 Grant


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: preImportDeleteQuery

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

are you doing a full-import or a delta-import?

for delta-import there is an option of deletedPkQuery which should
meet your needs

On Fri, May 8, 2009 at 5:22 AM, wojtekpia wojte...@hotmail.com wrote:

 Hi,
 I'm importing data using the DIH. I manage all my data updates outside of
 Solr, so I use the full-import command to update my index (with
 clean=false). Everything works fine, except that I can't delete documents
 easily using the DIH. I noticed the preImportDeleteQuery attribute, but
 doesn't seem to do what I'm looking for. I'm looking to do something like:

 preImportDeleteQuery=ItemId={select ItemId from table where
 status='delete'}

 http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to address
 this, but I couldn't find any documentation for it in the wiki. Can someone
 provide an example of how to use this?

 Thanks,

 Wojtek
 --
 View this message in context: 
 http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr spring application context error

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

a point to keep in mind is that all the plugin code and everything
else must be put into the solrhome/lib directory.

where have you placed the file com/mypackage/applicationContext.xml ?

On Fri, May 8, 2009 at 12:19 AM, Raju444us gudipal...@gmail.com wrote:

 I have configured solr using tomcat.Everything works fine.I overrode
 QParserPlugin and configured it.The overriden QParserPlugin has a dependency
 on another project say project1.So I made a jar of the project and copied
 the jar to the solr/home lib dir.

 the project1 project is using spring.It has a factory class which loads the
 beans.Iam using this factory calss in QParserPlugin to get a bean.When I
 start my tomcat the factory class is loading fine.But the problem is its not
 loading the beans.And Iam getting exception

 org.springframework.beans.factory.BeanDefinitionStoreException: IOException
 parsing XML document from class path resource
 [com/mypackage/applicationContext.xml]; nested exception is
 java.io.FileNotFoundException: class path resource
 [com/mypackage/applicationContext.xml] cannot be opened because it does not
 exist

 Do I need to do something else?. Can anybody please help me.

 Thanks,
 Raju


 --
 View this message in context: 
 http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Fwd: Solr MultiCore dataDir bug - a fix

2009-05-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

I didn't notice that the mail was not sent to the list. Plz send all
your communication to the mailing list

-- Forwarded message --
From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
Date: 2009/5/8
Subject: Re: Solr MultiCore dataDir bug - a fix
To: pasi.j.matilai...@tieto.com

are you sure that your solrconfig.xml does not have a dataDir tag ?
If it is there, then it is supposed to take precedence over the one
you have put in solr.xml

On Fri, May 8, 2009 at 10:43 AM,  pasi.j.matilai...@tieto.com wrote:
 Hello,

 I encountered yesterday the problem that MultiCore Solr doesn't handle
 properly the dataDir setting in solr.xml, regardless of whether it's
 specified as a nested property or as an attribute to core element. I found a
 mail thread where you on March 4th, 2009 promised to have it fixed in a day
 or two.
 (http://markmail.org/message/oylfeldy53lebsfe#query:solr%20multicore%20datadir+page:1+mid:abfbhdxxt3r3zujs+state:results)

 Anyway, as the current Solr trunk didn't contain the fix, I hunted the bug
 down myself. And as I don't want to take the time to get account to update
 the patch to Solr SVN myself, I'm sending the fix to you instead.

 In current trunk, in SolrCore constructor, at line 491, there currently is:

     if (dataDir == null)
     dataDir = config.get(dataDir,cd.getDataDir());

 I replaced this with the following code:

     if (dataDir == null) {
  if (cd.getDataDir()!=null)
  dataDir = cd.getDataDir();
  else
     dataDir = config.get(dataDir,cd.getDefaultDataDir());
     }

 I'm not sure this fully represents how this is supposed to work, but it
 works anyway. At least when I specify dataDir as an attribute to core
 element with a path relative to instanceDir:

     !-- instanceDir resolves to solr/current/ and dataDir to
 solr/current/data --
     core name=current instanceDir=current dataDir=data /

 Best regards,

 Pasi J. Matilainen, Software Engineer
 Tieto Finland Oy, RD Services, Devices RD
 pasi.j.matilai...@tieto.com, mobile +358 (0)40 575 7738, fax +358 (0)14 618
 566
 Visiting address: Mattilanniemi 6, 40101 JYVÄSKYLÄ, Mailing address: P.O.
 Box 163, 40101 JYVÄSKYLÄ, Finland, www.tieto.com

 Meet the new Tieto: www.tieto.com/newtieto
 Please note: The information contained in this message may be legally
 privileged and confidential and protected from disclosure. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 unauthorised use, distribution or copying of this communication is strictly
 prohibited. If you have received this communication in error, please notify
 us immediately by replying to the message and deleting it from your
 computer. Thank You.

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Phrase matching on a text field

2009-05-07 Thread Phil Chadwick

Hi,

I have tracked this problem to:

  https://issues.apache.org/jira/browse/SOLR-879

Executive summary is that there are errors that relate to
text fields in both:

  - src/java/org/apache/solr/search/SolrQueryParser.java
  - example/solr/conf/schema.xml

It is fixed in 1.4.

Thank you Yonik Seeley for the original diagnosis and fix.

Cheers,


-- 
Phil

It may be that your sole purpose in life is simply to serve as a
warning to others.



Phil Chadwick wrote:

 Hi Jay
 
 Thank you for your response.
 
 The data relating to the string (s_title) defines *exactly* what was
 fed into the SOLR indexing.  The string is not otherwise relevant to
 the question.
 
 The essence of my question is why can the indexed text (t_title) not
 be phrase matched by the query on the text when the word for is
 present in the query.
 
 The following work (and I would expect them to work):
 
 q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT
 q=t_title:future directions
 q=t_title:integrated catchment
 
 The following do not work (and I would expect them to work):
 
 q=t_title:directions for integrated
 
 The following do not work (not sure if I expect them to work or not):
 
 q=t_title:directions integrated
 
 My reading is that if the FOR is removed in the text indexing, it
 should also be removed for the text query!
 
 I also added 'enablePositionIncrements=true' to the text query analyzer
 to make it the same as the text index analyzer:
 
 filter class=solr.StopFilterFactory
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true/
 
 There was no change in the outcome.
 
 The definitions for text and string were exactly as in the SOLR 1.3
 example schema (shown below).
 
 The section of that schema for text is shown below.
 
 fieldType name=text class=solr.TextField positionIncrementGap=100
 
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=1
   generateNumberParts=1
   catenateWords=1
   catenateNumbers=1
   catenateAll=0
   splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
   protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
   synonyms=synonyms.txt
   ignoreCase=true
   expand=true/
 filter class=solr.StopFilterFactory
   ignoreCase=true
   words=stopwords.txt
   !-- enablePositionIncrements=true --
   /
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=1
   generateNumberParts=1
   catenateWords=0
   catenateNumbers=0
   catenateAll=0
   splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
   protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 
 /fieldType
 
 
 Cheers,
 
 
 -- 
 Phil
 
 The art of being wise is the art of knowing what to overlook.
   -- William James
 
 
 
 Jay Hill wrote:
 
  The string fieldtype is not being tokenized, while the text fieldtype is
  tokenized. So the stop word for is being removed by a stop word filter,
  which doesn't happen with the text field type (no tokenizing).
  
  Have a look at the schema.xml in the example dir and look at the default
  configuration for both the text and string fieldtypes. String string
  fieldtype is not analyzed whereas the text fieldtype has a number of
  different filters that take action.
 
  On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick
  p.chadw...@internode.on.netwrote:
  
   Hi,
  
   I'm trying to figure out why phrase matching on a text field only works
   some of the time.
  
   I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR
   INTEGRATED CATCHMENT.  The FOR seems to be causing a problem...
  
   The title field is indexed as both s_title and t_title (string and text,
   as defined in the demo schema), thus:
  
  field name=title type=string indexed=false stored=false
  multiValued=false /
  field name=s_title type=string indexed=true stored=true
  multiValued=false /
  field name=t_title type=text indexed=true stored=false
  multiValued=false /
  copyField source=title dest=s_title /
  copyField source=title dest=t_title /
  
   I can match the document with an exact query on the string:
  
  q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT
  
   I can match the document with this phrase query on the text:
  
  q=t_title:future directions
  
   which uses the parsedquery shown by debugQuery=true:
  
  str

RE: Core Reload issue

2009-05-07 Thread Sagar Khetkade


 

From my understanding re-indexing the documents is a different thing. If you 
have the stop word filter for field type say text then after reloading the 
core if i type in a query which is stop word only it would get parsed from 
stop word filter which would eventually will not serach against the index.

But in my case i am getting the results having the stop word; so the issue.

 

~Sagar 

 

 
 From: noble.p...@gmail.com
 Date: Tue, 5 May 2009 10:09:29 +0530
 Subject: Re: Core Reload issue
 To: solr-user@lucene.apache.org
 
 If you change the the conf files and if you reindex the documents it
 must be reflected are you sure you re-indexed?
 
 On Tue, May 5, 2009 at 10:00 AM, Sagar Khetkade
 sagar.khetk...@hotmail.com wrote:
 
  Hi,
 
  I came across a strange problem while reloading the core in multicore 
  scenario. In the config of one of the core I am making changes in the 
  synonym and stopword files and then reloading the core. The core gets 
  reloaded but the changes in stopword and synonym fiels does not get 
  reflected when I query in. The filters for index and query are the same. I 
  face this problem even if I reindex the documents. But when I restart the 
  servlet container in which the solr is embedded I problem does not 
  resurfaces.
  My ultimate goal is/was to reflect the changes made in the text files 
  inside the config folder.
  Is this the expected behaviour or some problem at my side. Could anyone 
  suggest me the possible work around?
 
  Thanks in advance!
 
  Regards,
  Sagar Khetkade
  _
  More than messages–check out the rest of the Windows Live™.
  http://www.microsoft.com/india/windows/windowslive/
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

_
Planning the weekend ? Here’s what is happening in your town.
http://msn.asklaila.com/events/

43 matches

Mail list logo