Re: Extended stats via JMX

2010-02-25 Thread Matthew Runo
https://issues.apache.org/jira/browse/SOLR-1750 might help you, since I don't 
think that all of stats.jsp is exposed via MBeans. I could be wrong about that 
though.. (apologies, our solr servers are firewalled and I can't connect via 
JMX at the moment)

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 24, 2010, at 9:26 PM, Dan Trainor wrote:

 Hi -
 
 This is my first night working with JMX, particularly for the purpose of 
 querying Solr statistics running under Tomcat.  Before, I was trying to xpath 
 to stats.jsp which just felt dirty.
 
 I gotta say - I think this is pretty neat.
 
 Right now, being inexperienced with JMX and all, I was wondering if there was 
 a way to pull all Solr-specific items out of there.  I see some general 
 counters pertaining to each of my Solr instances, but nothing along the lines 
 of lookups, hits, hit ratios, and the like.  They're all more Tomcat-centric 
 - memory usage etc etc.  That's fine, too, but the whole point of this 
 exercise is to get instance-specific statistics regarding Solr.
 
 Is this information exposed via JMX under Solr?  Can I pull a list somewhere 
 of all items that find their way through JMX to be seen from an external 
 source?
 
 Thanks in advance
 -dant
 



Solr Admin XPath

2009-12-02 Thread Matthew Runo
Hello folks!

I'm attempting (!) to pull some data from the stats.jsp page using XPath so 
that it can be reported in a different application. I cannot seem to get the 
average QPS for the dismax handler, no matter how I try:

try {
XPathExpression reqPerSec = 
xpath.compile(/solr/solr-info/QUERYHANDLER/entry[name = 
'dismax']/stats/st...@name = 'avgRequestsPerSecond']);

this.setDismaxRequestsPerSecond((String) reqPerSec.evaluate(doc, 
XPathConstants.STRING));
} catch (XPathExpressionException e) {
LOG.error(Failed to parse solr stats output, e);
}

This doesn't throw any errors, and the XPath works just fine in /any/ XPath 
tester I try... except Java. 

Any tips? I have a feeling this is something obvious that I'm missing =\

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833



Re: What does this error mean?

2009-11-27 Thread Matthew Runo
It means that there was 2 warming searchers, and then a commit came in and 
caused a third to try to warm up at the same time. Do you use any warming 
queries, or have large caches?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Nov 27, 2009, at 5:46 AM, Paul Tomblin wrote:

 NFO: start 
 commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
 Nov 27, 2009 3:45:35 AM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {} 0 634
 Nov 27, 2009 3:45:35 AM org.apache.solr.core.SolrCore getSearcher
 WARNING: [nutch] Error opening new searcher. exceeded limit of
 maxWarmingSearchers=2, try again later.
 Nov 27, 2009 3:45:35 AM
 org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0
 635
 Nov 27, 2009 3:45:35 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new
 searcher. exceeded limit of maxWarmingSear
 chers=2, try again later.
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1029)
at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.jav
 a:85)
at 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:48)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:619)
 
 Nov 27, 2009 3:45:35 AM org.apache.solr.core.SolrCore execute
 INFO: [nutch] webapp=/solrChunk path=/update
 params={waitSearcher=truecommit=truewt=javabinwaitFlush=trueversion=1}
 status=503 QTime=634
 Nov 27, 2009 3:45:35 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new
 searcher. exceeded limit of maxWarmingSearchers=2, try again later.
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1029)
at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:48)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293

Min/Max for a numeric field?

2009-11-19 Thread Matthew Runo
Hello folks!

We're trying to code a UI element which requires us to know both the
minimum and maximum values for a given field - say, price.

This seems like it has to be a solved problem, but we're just not sure
of the best way to go about getting this data out of solr. We could
facet on price and then loop through the values to find the min and
max -- but it seems like there's a better way out there.

Is there a way to get a min and max for a given field, in one solr call?

--Matthew Runo


Re: Nonsensical Solr Relevancy Score

2009-09-11 Thread Matthew Runo
I'd actually like to see a detailed wiki page on how all the parts of  
a score are actually calculated and inter-related, but I'm not  
knowledgeable enough to write it =\


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:

I have done a search on the word “blue” in our index.  The  
debugQuery shows
some extremely strange methods of scoring.  Somehow product 1 gets a  
higher
score with only 1 match on the word blue when product 2 gets a lower  
score
with the same field match AND an additional field match.  Can  
someone please
help me understand why such an obviously more relevant product is  
given a

lower score.

 str name=954058
2.3623571 = (MATCH) sum of:
 0.26248413 = (MATCH) max plus 0.5 times others of:
   0.26248413 = (MATCH) weight(productNameSearch:blue in 112779),  
product

of:
 0.032673787 = queryWeight(productNameSearch:blue), product of:
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   1.0 = fieldNorm(field=productNameSearch, doc=112779)
 2.099873 = (MATCH) max plus 0.5 times others of:
   2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779),  
product

of:
 0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
   8.0 = boost
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   1.0 = fieldNorm(field=productNameSearch, doc=112779)
/str
 str name=402943
1.9483687 = (MATCH) sum of:
 0.63594794 = (MATCH) max plus 0.5 times others of:
   0.16405259 = (MATCH) weight(productNameSearch:blue in 8142),  
product of:

 0.032673787 = queryWeight(productNameSearch:blue), product of:
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.625 = fieldNorm(field=productNameSearch, doc=8142)
   0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
 0.15009704 = queryWeight(color:blue^10.0), product of:
   10.0 = boost
   3.6904235 = idf(docFreq=9309, numDocs=136731)
   0.0040672035 = queryNorm
 3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
   1.0 = tf(termFreq(color:blue)=1)
   3.6904235 = idf(docFreq=9309, numDocs=136731)
   1.0 = fieldNorm(field=color, doc=8142)
 1.3124207 = (MATCH) max plus 0.5 times others of:
   1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142),  
product

of:
 0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
   8.0 = boost
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.625 = fieldNorm(field=productNameSearch, doc=8142)
/str

--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562





Re: solr v1.4 in production?

2009-07-01 Thread Matthew Runo
We're using an svn grab of 1.4 in production mostly to get the Java  
replication code. We don't have any problems to report.


Here's the version we're using:

1.4-dev 749558:749756M - built on 2009-03-03 at 13:10:05

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Jul 1, 2009, at 5:47 AM, Ed Summers wrote:


Here at the Library of Congress we've got several production Solr
instances running v1.3. We've been itching to get at what will be v1.4
and were wondering if anyone else happens to be using it in production
yet. Any information you can provide would be most welcome.

//Ed





Re: Solr vs Sphinx

2009-05-15 Thread Matthew Runo
I agree regarding posting different types of files - because right now  
if you're just starting out with Solr, taking the sample files from  
the distro and going from there is the /only path/ =\


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On May 15, 2009, at 6:41 AM, Eric Pugh wrote:

Something that would be interesting is to share solr configs for  
various types of indexing tasks.  From a solr configuration aimed at  
indexing web pages to one doing large amounts of text to one that  
indexes specific structured data.  I could see those being posted on  
the wiki and helping folks who say I want to do X, is there an  
example?.


I think most folks start with the example Solr install and tweak  
from there, which probably isn't the best path...


Eric

On May 15, 2009, at 8:09 AM, Mark Miller wrote:


In the spirit of good defaults:

I think we should change the Solr highlighter to highlight phrase  
queries by default, as well as prefix,range,wildcard constantscore  
queries. Its awkward to have to tell people you have to turn those  
on. I'd certainly prefer to have to turn them off if I have some  
limitation rather than on.


- Mark


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal








Re: Who is running 1.4 nightly in production?

2009-05-12 Thread Matthew Runo
We're using 1.4-dev 749558:749756M that we built on 2009-03-03  
13:10:05 for our master/slave production environment using the Java  
Replication code.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On May 12, 2009, at 2:02 PM, Walter Underwood wrote:


We're planning our move to 1.4, and want to run one of our production
servers with the new code. Just to feel better about it, is anyone  
else

running 1.4 in production?

I'm building 2009-05-11 right now.

wuner





Re: Additive filter queries

2009-04-10 Thread Matthew Runo
That would work, but the other part of our problem comes in when we  
then try to facet on the resulting set.. If we filter by size 1, for  
example, and then facet Width again - we get facet results that have  
no size 1's, because we have no taught solr what 1_W means, etc etc..


I think field collapsing might solve this for us, maybe..

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Apr 9, 2009, at 5:23 PM, Chris Hostetter wrote:


: Right now a document looks like this:
:
: doc
: !-- style level --
: productID1598548/productID
: styleID12545/styleID
: brandAdidas/brand
: size1, 2, 3, 4, 5, 6, 7/size
: widthAA, A, B, W, W, /width
: colorBrown/color
: /doc
:
: If we went down a level, it could look like..
: doc
: !-- stock level --
: productID1598548/productID
: styleID12545/styleID
: stockID654641654684/stockID
: brandAdidas/brand
: size1/size
: widthAA/width
: colorBrown/color
: /doc

If you want result at the product level then you don't have to  
have one

*doc* per legal size+width pair ... you just need one *term* per
valid size+width pair

 size1, 2, 3, 4, 5, 6, 7/size
 widthAA, A, B, W, W, /width
 opts1_W 2W 3_B 3_W 4_AA 4_A 4_B 4_W 4_WW 5_W 5_ 6_  
7_/opts


a search for size 4 clogs would look like...

 q=clogsfq=size:5facet.field=optsf.opts.facet.prefix=4_

...and the facet counts for opts would tell me what widths were
available (and how many).

for completeness you typically want to index the pairs in both  
directions
(1_W and W_1 ... typically in seperate fields) so the user can  
filter by
either option first ... for something like size+color this makes  
sense,

but i'm guessing with shoes no one expects to narrow by width untill
they've narrowed by size first.


-Hoss





Field Collapsing Patch

2009-04-08 Thread Matthew Runo

Hello folks -

Is anyone using the Field Collapsing patch from SOLR-236 (https://issues.apache.org/jira/browse/SOLR-236 
) in their production environment? We're considering using it, but  
wanted to ensure it was at a point where it could be used before  
spending a lot of time on it.


Any thoughts on the patch / issue? Any reasons not to use it?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833



Re: Additive filter queries

2009-04-03 Thread Matthew Runo
We could do that by going down one level in our inventory, but then we  
have other problems.. for example:


Right now a document looks like this:

doc
!-- style level --
productID1598548/productID
styleID12545/styleID
brandAdidas/brand
size1, 2, 3, 4, 5, 6, 7/size
widthAA, A, B, W, W, /width
colorBrown/color
/doc

If we went down a level, it could look like..
doc
!-- stock level --
productID1598548/productID
styleID12545/styleID
stockID654641654684/stockID
brandAdidas/brand
size1/size
widthAA/width
colorBrown/color
/doc

The question now is this:
- At the stock level, we don't want a search for brown shoes to  
return with al the various size/width combos as separate results -  
each productId / styleId combo should be a single result


- At the stock level, if you filter by Size: 7 and then Width: B  
you're assured to only get things that are width B and size 7


- At the style level, we can't tell for sure which size / width combos  
are in stock, since this data is not exposed to solr


This seems like a problem that isn't unique to us. Any store that has  
size/width or anything like that will have the same issue. How might  
it be solved?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Apr 3, 2009, at 1:13 AM, Fergus McMenemie wrote:

I have a design question for all of those who might be willing to  
provide an

answer.

We are looking for a way to do a type of additive filters.  Our  
documents
are comprised of a single item of a specified color.  We will use  
shoes as
an example.  Each document contains a multivalued ³size² field with  
all
sizes and a multivalued ³width² field for all widths available for  
a given
color.  Our issue is that the values are not linked to each other.   
This
issue can be seen when a user chooses a size (e.g. 7) and we filter  
the
options down to only size 7.  When the width facet is displayed it  
will have
all widths available for all documents that match on size 7 even  
though most
don¹t come in a wide width.  We are looking for strategies to  
filter facets

based on other facets in separate queries.

--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


Ditto!

As best I understand, you somehow need to arrange for each different
combination of colour, size and width to be indexed as a separate sol
document.

--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===





Re: How to Index IP address

2009-03-24 Thread Matthew Runo
I don't think that Solr is the best thing to use for searching a text  
file. I'd use grep myself, if you're on a unix-like system.


To use solr, you'd need to throw each network 'event' (GET, POST, etc  
etc) into an XML document, and post those into Solr so it could  
generate the index. You could then do things like
ip:10.206.158.154 to find a specific IP address, or even ip: 
10.206.158* to get a subnet.


Perhaps the thing that's building your text file could post to Solr  
instead?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 9:32 AM, nga pham wrote:


Hi All,

I have a txt file, that captured all of my network traffic.  How can  
I use

Solr to filter out a particular IP address?

Thank you,
Nga.




Re: How to Index IP address

2009-03-24 Thread Matthew Runo
Well, I think you'll have the same problem. Lucene, and Solr (since  
it's built on Lucene) are both going to expect a structured document  
as input. Once you send in a bunch of documents, you can then query  
them for whatever you want to find.


A quick search of the internets found me this Apache Labs project -  
called Pinpoint. It's designed to take log data in, and build an index  
out of it. I'm not sure how developed it is, but it might be a good  
starting point for you. There are probably other projects out there  
along the same lines.. Here's Pinpoint: http://svn.apache.org/repos/asf/labs/pinpoint/trunk/


Why do you want to use Solr / Lucene to look through your files? If  
you have a huge dataset, some people are using Hadoop (a version of  
Google's MapReduce) to look through very large sets of logfiles: http://www.lexemetech.com/2008/01/hadoop-and-log-file-analysis.html


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 10:28 AM, nga pham wrote:

Do you think luence is better to filter out a particular IP address  
from a

txt file?

Thank you Runo,
Nga

On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo mr...@zappos.com  
wrote:


I don't think that Solr is the best thing to use for searching a  
text file.

I'd use grep myself, if you're on a unix-like system.

To use solr, you'd need to throw each network 'event' (GET, POST,  
etc etc)
into an XML document, and post those into Solr so it could generate  
the

index. You could then do things like
ip:10.206.158.154 to find a specific IP address, or even ip: 
10.206.158* to

get a subnet.

Perhaps the thing that's building your text file could post to Solr
instead?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833


On Mar 24, 2009, at 9:32 AM, nga pham wrote:

Hi All,


I have a txt file, that captured all of my network traffic.  How  
can I use

Solr to filter out a particular IP address?

Thank you,
Nga.








Re: Version 1.4 of Solr

2009-03-11 Thread Matthew Runo
Yes, we are using the Java replication feature to send our index and  
configuration files from our master server to 4 slaves.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 11, 2009, at 9:29 AM, Jon Baer wrote:


Are you using the replication feature by any chance?

- Jon

On Mar 10, 2009, at 2:28 PM, Matthew Runo wrote:

We're currently using 1.4 in production right now, using a recent  
nightly. It's working fine for us.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 10, 2009, at 10:25 AM, Vauthrin, Laurent wrote:


Hello,



I'm not sure if this is the right forum for this, but I'm  
wondering if I

could get a rough timeline of when version 1.4 of Solr might be out?
I'm trying to figure out whether we will be able to use the new  
built-in

replication as opposed to the current rsync collection distribution.



Thanks,

Laurent









Re: How can I configure different types in Solr?

2009-03-06 Thread Matthew Runo
I'm not 100% sure what you mean by custom types, but if you're  
talking about objects then there's no reason they can't both be in  
your schema. Any given document does not need to have all the fields  
in it, so you could flatten them both into one schema if you wanted.


You could also use the multicore feature and have a core for Object As  
and a core for Object Bs and then you'd just query both of them and  
then combine to get your results.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 6, 2009, at 5:43 AM, Praveen_Kumar_J wrote:



Hi

How do I configure different custom types or schemas in Solr?


Assume I have some custom types type1 and type1 (some composite  
classes).


Can I configure these 2 types in a single schema file?


I need these 2 types to be online for creating and searching data.

Please provide me some sample configuration.



Regards,
Praveen
--
View this message in context: 
http://www.nabble.com/How-can-I-configure-different-types-in-Solr--tp22372731p22372731.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: solr and tomcat

2009-03-03 Thread Matthew Runo
It looks like if you set a -Dsolr.data.dir=foo then you could specify  
where the index would be stored, yes?  Are you properly setting your  
solr.home? I've never had to set the data directory specifically, Solr  
has always put it under my home.


From solrconfig.xml:
 dataDir${solr.data.dir:./solr/data}/dataDir

Since Solr is running under tomcat, I'd assume that the index will  
always appear to be owned by tomcat as well. I don't think there is  
any way to have a different user for the written files - but someone  
else might want to chime in before you believe me 100% on this one.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote:


Hi. I'm sorry if this is the second time this message comes through!

A few questions here...

#1
Does anyone know how to set the user/group and/or permissions on the  
index
that solr creates? It's always the tomcat user. Is it possible to  
change

this in my context file? Help!

#2
I'm deploying Solr via Tomcat and really thought I had this stuff  
down. But
it seems that with some recent system upgrades, my scheme is failing  
to set

the data dir correctly.

I'm deploying solr to tomcat, using a context file as described here:
http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

But when I deploy, Tomcat says that it can't find a ./data/index  
directory

-- relative to the tomcat home directory. How can I set the data dir
relative to the solr home value I'm specifying in the tomcat context  
file?

Note: a hard-coded absolute path works, but I want to configure at
deployment time.

In the past, I tried setting the data dir in the same way the solr  
home is
set in the context file without luck. Does this now work in the  
latest solr

nightly?

Thanks,




Re: solr and tomcat

2009-03-03 Thread Matthew Runo
Perhaps you could hard code it in the solrconfig.xml file for each  
solr instance? Other than that, what we did was run multiple instances  
of Tomcat. That way if something goes bad in one, it doesn't affect  
the others.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 3, 2009, at 8:39 AM, Matt Mitchell wrote:


Hi Matthew,

The problem is that we have multiple instances of solr running under  
one
tomcat. So setting -Dsolr.data.dir=foo would set the home for every  
solr. I

guess multi-core might solve my problem, but that'd change our app
architecture too much, maybe some other day.

I *kind* of have a solution for the permissions thing though:

- The project user is part of the tomcat group.
- The tomcat user is part of the project user group.
- We're making a call to umask 002 in the tomcat catalina.sh file  
(means

all files created will have group write)

So when solr (tomcat) creates the index, they're group writable now  
and I

can remove etc.!

So, I still need to figure out the data.dir problem. Hmm.

Thanks for your help,
Matt

On Tue, Mar 3, 2009 at 11:31 AM, Matthew Runo mr...@zappos.com  
wrote:


It looks like if you set a -Dsolr.data.dir=foo then you could  
specify where
the index would be stored, yes?  Are you properly setting your  
solr.home?
I've never had to set the data directory specifically, Solr has  
always put

it under my home.

From solrconfig.xml:
dataDir${solr.data.dir:./solr/data}/dataDir

Since Solr is running under tomcat, I'd assume that the index will  
always
appear to be owned by tomcat as well. I don't think there is any  
way to have
a different user for the written files - but someone else might  
want to

chime in before you believe me 100% on this one.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833


On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote:

Hi. I'm sorry if this is the second time this message comes through!


A few questions here...

#1
Does anyone know how to set the user/group and/or permissions on  
the index
that solr creates? It's always the tomcat user. Is it possible to  
change

this in my context file? Help!

#2
I'm deploying Solr via Tomcat and really thought I had this stuff  
down.

But
it seems that with some recent system upgrades, my scheme is  
failing to

set
the data dir correctly.

I'm deploying solr to tomcat, using a context file as described  
here:


http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

But when I deploy, Tomcat says that it can't find a ./data/index  
directory

-- relative to the tomcat home directory. How can I set the data dir
relative to the solr home value I'm specifying in the tomcat  
context file?

Note: a hard-coded absolute path works, but I want to configure at
deployment time.

In the past, I tried setting the data dir in the same way the solr  
home is
set in the context file without luck. Does this now work in the  
latest

solr
nightly?

Thanks,








Re: solr and tomcat

2009-03-03 Thread Matthew Runo
I see where your problems come in then. I'm not sure of the answer  
though =\


We've not had issues running multiple tomcat instances per server. I  
think at one point a few weeks ago we ran 6 instances per server, on  
quad core Xeon servers with 16gb of ram. Our use case might be  
different than yours though - each of these instances was basically  
the same for us (getting around a lucene sync issue) and they were all  
load balanced together so no single instance got more than a few  
requests per second.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 3, 2009, at 8:53 AM, Matt Mitchell wrote:

That's exactly what we're doing (setting the value in each config).  
The main
problem with that is we have multiple people working on each of  
these solr
projects, in different environments. Their data.dir path is always  
the same
(relative) value which works fine under Jetty. But running under  
tomcat, the
data dir is relative to tomcat's home. So an absolute hard-coded  
path is the
only solution. My hope was that we'd be able to override it using  
the same

method as setting the solr/home value in the tomcat context file.

The thought of running multiple tomcats is interesting. Do you have  
any

issues with memory or cpu performance?

Thanks,
Matt

On Tue, Mar 3, 2009 at 11:45 AM, Matthew Runo mr...@zappos.com  
wrote:


Perhaps you could hard code it in the solrconfig.xml file for each  
solr
instance? Other than that, what we did was run multiple instances  
of Tomcat.

That way if something goes bad in one, it doesn't affect the others.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 3, 2009, at 8:39 AM, Matt Mitchell wrote:

Hi Matthew,


The problem is that we have multiple instances of solr running  
under one
tomcat. So setting -Dsolr.data.dir=foo would set the home for  
every solr.

I
guess multi-core might solve my problem, but that'd change our app
architecture too much, maybe some other day.

I *kind* of have a solution for the permissions thing though:

- The project user is part of the tomcat group.
- The tomcat user is part of the project user group.
- We're making a call to umask 002 in the tomcat catalina.sh  
file (means

all files created will have group write)

So when solr (tomcat) creates the index, they're group writable  
now and I

can remove etc.!

So, I still need to figure out the data.dir problem. Hmm.

Thanks for your help,
Matt

On Tue, Mar 3, 2009 at 11:31 AM, Matthew Runo mr...@zappos.com  
wrote:


It looks like if you set a -Dsolr.data.dir=foo then you could  
specify

where
the index would be stored, yes?  Are you properly setting your  
solr.home?
I've never had to set the data directory specifically, Solr has  
always

put
it under my home.

From solrconfig.xml:
dataDir${solr.data.dir:./solr/data}/dataDir

Since Solr is running under tomcat, I'd assume that the index  
will always
appear to be owned by tomcat as well. I don't think there is any  
way to

have
a different user for the written files - but someone else might  
want to

chime in before you believe me 100% on this one.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833


On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote:

Hi. I'm sorry if this is the second time this message comes  
through!




A few questions here...

#1
Does anyone know how to set the user/group and/or permissions on  
the

index
that solr creates? It's always the tomcat user. Is it possible  
to change

this in my context file? Help!

#2
I'm deploying Solr via Tomcat and really thought I had this  
stuff down.

But
it seems that with some recent system upgrades, my scheme is  
failing to

set
the data dir correctly.

I'm deploying solr to tomcat, using a context file as described  
here:



http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

But when I deploy, Tomcat says that it can't find a ./data/index
directory
-- relative to the tomcat home directory. How can I set the data  
dir
relative to the solr home value I'm specifying in the tomcat  
context

file?
Note: a hard-coded absolute path works, but I want to configure at
deployment time.

In the past, I tried setting the data dir in the same way the  
solr home

is
set in the context file without luck. Does this now work in the  
latest

solr
nightly?

Thanks,











Re: Lucene sync bottleneck?

2009-02-27 Thread Matthew Runo

We're using:

Solr Specification Version: 1.3.0.2009.01.23.10.46.02
Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23  
10:46:02

Lucene Specification Version: 2.9-dev
Lucene Implementation Version: 2.9-dev 724059 - 2008-12-06 20:08:54

We'll see about getting up to trunk and firing off our load test and  
seeing if we can get it to happen with that.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 7:44 AM, Yonik Seeley wrote:


I'm using trunk, but I set a breakpoint on SegmentReader.isDeleted()
on an index with deletions, and I couldn't get it to be called.

numDocs : 26
maxDoc : 130
reader:SolrIndexReader 
{this=1935e6f,r=readonlymultisegmentrea...@1935e6f,segments=5}



-Yonik
http://www.lucidimagination.com


On Thu, Feb 26, 2009 at 4:55 PM, Matthew Runo mr...@zappos.com  
wrote:
I see a ReadOnlySegmentReader now - we're on an optimized index now  
which

gets around the isDeleted() check.

(solr4, optimized)
searcherName : searc...@260f8e27 main
caching : true
numDocs : 139583
maxDoc : 139583
readerImpl : ReadOnlySegmentReader
readerDir :
org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta-main/index
indexVersion : 1233423823917
openedAt : Thu Feb 26 13:29:25 PST 2009
registeredAt : Thu Feb 26 13:29:42 PST 2009
warmupTime : 16910

(solr1, non optimized)
searcherName : searc...@36be11a1 main
caching : true
numDocs : 139561
maxDoc : 139591
readerImpl : ReadOnlyMultiSegmentReader
readerDir :
org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta-main/index
indexVersion : 1233423823924
openedAt : Thu Feb 26 13:48:16 PST 2009
registeredAt : Thu Feb 26 13:49:11 PST 2009
warmupTime : 54785

I did a thread dump against the optimized server just now, but  
didn't find

anything blocked to check which reader was actually in use this time.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 26, 2009, at 1:39 PM, Yonik Seeley wrote:


That's interesting.
We should be using read-only readers, which should not synchronize  
on

the deleted docs check.  But as your stack trace shows, you're using
SegmentReader and MultiSegmentReader.

Right now, if I look at the admin/statistics page at the searcher,  
it

shows the following for the reader:


reader:SolrIndexReader 
{this=42f352,r=readonlymultisegmentrea...@42f352,segments=6}


Hopefully the fact that it's a ReadOnlyMultiSegmentReader means that
it contains ReadOnlySegmentReader instances, which don't synchronize
on isDeleted.

What do you see?

-Yonik

On Thu, Feb 26, 2009 at 4:09 PM, Matthew Runo mr...@zappos.com  
wrote:


Hello folks!

I was under the impression that this sync bottleneck was fixed in  
recent
versions of Solr/Lucene, but we're seeing it with 1.4-dev right  
now. When

we
load test a server with 100 threads (using jmeter), we see several
threads
all blocked at the same spot:

http-8080-exec-505 - Thread t...@594
 java.lang.Thread.State: BLOCKED on
org.apache.lucene.index.segmentrea...@2b6f5d18 owned by:
http-8080-exec-434
  at
org 
.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
737)

  at

org 
.apache 
.lucene 
.index.MultiSegmentReader.isDeleted(MultiSegmentReader.java:266)

  at

org.apache.solr.search.function.FunctionQuery 
$AllScorer.next(FunctionQuery.java:118)

  at

org.apache.solr.search.function.FunctionQuery 
$AllScorer.skipTo(FunctionQuery.java:137)

  at

org 
.apache 
.lucene 
.search 
.BooleanScorer2$SingleMatchScorer.skipTo(BooleanScorer2.java:170)

  at
org 
.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java: 
76)

  at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
357)

  at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
320)

  at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
136)
  at org.apache.lucene.search.Searcher.search(Searcher.java: 
126)
  at org.apache.lucene.search.Searcher.search(Searcher.java: 
105)

  at

org 
.apache 
.solr 
.search 
.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1231)

  at

org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
917)

  at

org 
.apache 
.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:338)

  at

org 
.apache 
.solr 
.handler.component.QueryComponent.process(QueryComponent.java:164)

  at

org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:171)

  at

org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
  at

org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
303)

  at

org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
232)

  at

org 
.apache 
.catalina 
.core 
.ApplicationFilterChain

Re: Lucene sync bottleneck?

2009-02-27 Thread Matthew Runo
We're just using an SVN up, with no local modifications. It's probably  
a formatting difference from having opened solr in an IDE.


We're building from lucene and solr trunk right now, and I'll let you  
all know how that goes. We'll test it as best we can with JMeter. The  
build we had up there was breaking with between 100 and 200  
simultaneous threads (due to blocking on isDeleted).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 8:01 AM, Chris Hostetter wrote:



: Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23  
10:46:02


that M indicates there were local modifications (relative svn version
#737141) at the time of compilation.

Do you have some local patches?
anything that would have affected the way IndexReaders get opened?



-Hoss





Re: Lucene sync bottleneck?

2009-02-27 Thread Matthew Runo

OK. Call me chicken little.

We must have had bad class files or something hanging out in our build  
that had the issues. Having built from trunk, we're seeing perfectly  
fine response times even at 500 requests a second.


Thank you for your help, and sorry to bring it up without testing trunk.

Thanks again for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 8:11 AM, Matthew Runo wrote:

We're just using an SVN up, with no local modifications. It's  
probably a formatting difference from having opened solr in an IDE.


We're building from lucene and solr trunk right now, and I'll let  
you all know how that goes. We'll test it as best we can with  
JMeter. The build we had up there was breaking with between 100 and  
200 simultaneous threads (due to blocking on isDeleted).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 8:01 AM, Chris Hostetter wrote:



: Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23  
10:46:02


that M indicates there were local modifications (relative svn version
#737141) at the time of compilation.

Do you have some local patches?
anything that would have affected the way IndexReaders get opened?



-Hoss







Lucene sync bottleneck?

2009-02-26 Thread Matthew Runo

Hello folks!

I was under the impression that this sync bottleneck was fixed in  
recent versions of Solr/Lucene, but we're seeing it with 1.4-dev right  
now. When we load test a server with 100 threads (using jmeter), we  
see several threads all blocked at the same spot:


http-8080-exec-505 - Thread t...@594
   java.lang.Thread.State: BLOCKED on  
org.apache.lucene.index.segmentrea...@2b6f5d18 owned by: http-8080- 
exec-434
	at org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
737)
	at  
org 
.apache 
.lucene.index.MultiSegmentReader.isDeleted(MultiSegmentReader.java:266)
	at org.apache.solr.search.function.FunctionQuery 
$AllScorer.next(FunctionQuery.java:118)
	at org.apache.solr.search.function.FunctionQuery 
$AllScorer.skipTo(FunctionQuery.java:137)
	at  
org 
.apache 
.lucene 
.search.BooleanScorer2$SingleMatchScorer.skipTo(BooleanScorer2.java:170)
	at  
org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:76)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
357)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
320)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
136)

at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
	at  
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1231)
	at  
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:917)
	at  
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
338)
	at  
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java:164)
	at  
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
171)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
235)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
233)
	at  
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
175)
	at  
org 
.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: 
128)
	at  
org 
.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 
102)
	at  
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at  
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)
	at  
org 
.apache 
.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:879)
	at org.apache.coyote.http11.Http11NioProtocol 
$Http11ConnectionHandler.process(Http11NioProtocol.java:719)
	at org.apache.tomcat.util.net.NioEndpoint 
$SocketProcessor.run(NioEndpoint.java:2080)
	at java.util.concurrent.ThreadPoolExecutor 
$Worker.runTask(ThreadPoolExecutor.java:885)
	at java.util.concurrent.ThreadPoolExecutor 
$Worker.run(ThreadPoolExecutor.java:907)

at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
- locked java.util.concurrent.locks.reentrantlock$nonfairs...@4d54c7be


I checked the Lucene SVN and it looks like that's still appearing to  
be a bottleneck.


http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/index/SegmentReader.java?view=markup

Any tips?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833



Re: Lucene sync bottleneck?

2009-02-26 Thread Matthew Runo
I see a ReadOnlySegmentReader now - we're on an optimized index now  
which gets around the isDeleted() check.


(solr4, optimized)
searcherName : searc...@260f8e27 main
caching : true
numDocs : 139583
maxDoc : 139583
readerImpl : ReadOnlySegmentReader
readerDir : org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta- 
main/index

indexVersion : 1233423823917
openedAt : Thu Feb 26 13:29:25 PST 2009
registeredAt : Thu Feb 26 13:29:42 PST 2009
warmupTime : 16910

(solr1, non optimized)
searcherName : searc...@36be11a1 main
caching : true
numDocs : 139561
maxDoc : 139591
readerImpl : ReadOnlyMultiSegmentReader
readerDir : org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta- 
main/index

indexVersion : 1233423823924
openedAt : Thu Feb 26 13:48:16 PST 2009
registeredAt : Thu Feb 26 13:49:11 PST 2009
warmupTime : 54785

I did a thread dump against the optimized server just now, but didn't  
find anything blocked to check which reader was actually in use this  
time.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 26, 2009, at 1:39 PM, Yonik Seeley wrote:


That's interesting.
We should be using read-only readers, which should not synchronize on
the deleted docs check.  But as your stack trace shows, you're using
SegmentReader and MultiSegmentReader.

Right now, if I look at the admin/statistics page at the searcher, it
shows the following for the reader:

reader:SolrIndexReader 
{this=42f352,r=readonlymultisegmentrea...@42f352,segments=6}


Hopefully the fact that it's a ReadOnlyMultiSegmentReader means that
it contains ReadOnlySegmentReader instances, which don't synchronize
on isDeleted.

What do you see?

-Yonik

On Thu, Feb 26, 2009 at 4:09 PM, Matthew Runo mr...@zappos.com  
wrote:

Hello folks!

I was under the impression that this sync bottleneck was fixed in  
recent
versions of Solr/Lucene, but we're seeing it with 1.4-dev right  
now. When we
load test a server with 100 threads (using jmeter), we see several  
threads

all blocked at the same spot:

http-8080-exec-505 - Thread t...@594
  java.lang.Thread.State: BLOCKED on
org.apache.lucene.index.segmentrea...@2b6f5d18 owned by: http-8080- 
exec-434

   at
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
737)

   at
org 
.apache 
.lucene.index.MultiSegmentReader.isDeleted(MultiSegmentReader.java: 
266)

   at
org.apache.solr.search.function.FunctionQuery 
$AllScorer.next(FunctionQuery.java:118)

   at
org.apache.solr.search.function.FunctionQuery 
$AllScorer.skipTo(FunctionQuery.java:137)

   at
org 
.apache 
.lucene 
.search.BooleanScorer2$SingleMatchScorer.skipTo(BooleanScorer2.java: 
170)

   at
org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java: 
76)

   at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
357)

   at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
320)

   at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:136)
   at org.apache.lucene.search.Searcher.search(Searcher.java:126)
   at org.apache.lucene.search.Searcher.search(Searcher.java:105)
   at
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1231)

   at
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
917)

   at
org 
.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
338)

   at
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java: 
164)

   at
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:171)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
232)

   at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)

   at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)

   at
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

   at
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:175)

   at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)

   at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

   at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)

   at
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)

   at
org 
.apache 
.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 
879)

   at
org.apache.coyote.http11

Re: why don't we have a forum for discussion?

2009-02-18 Thread Matthew Runo

At the risk of sounding me too... me too!

Email is something I already use throughout the day - it's easy to pop  
over into the folder I send all the solr-user mail to and quickly scan  
the subject lines.


Nabble is great for searching though.. I only have 12,126 of the solr- 
user messages archived locally so far..


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 18, 2009, at 2:16 PM, Walter Underwood wrote:


I really prefer a mailing list. If I had to visit a website to
contribute, my participation would go to zero.

I might not be typical -- I've been handling a few hundred
messages a day for the past twenty five years.

wunder (e-mail is the killer app)

On 2/18/09 2:09 PM, Stephen Weiss swe...@stylesight.com wrote:

I third the motion SOLR is the second largest contributor to my  
e-

mail glut (my company's marketing is #1).  I often have no idea what
area of Solr I'm actually asking about when I have a question, so I
would disagree and say a general forum provides a place to post when
you don't really understand the internals so well.

But almost anything would be better than the current situation.  This
list is SOLR's best documentation so I wouldn't want to just stop
getting it (and stuff just goes unnoticed in digests), but it could  
be

presented better.  A forum with a search function and notifications
would be a big improvement, especially as the community grows.

--
Steve

On Feb 18, 2009, at 3:28 PM, Jon Baer wrote:


I don't think general discussion forums really help ... it would
be great if every major page in the Solr wiki had a discuss link off
to somewhere though +1 for that ...

Ie:
http://wiki.apache.org/solr/SolrRequestHandler
http://wiki.apache.org/solr/SolrReplication
etc.

For me even panning over discussion history on topics would be
helpful.

- Jon

On Feb 18, 2009, at 2:56 PM, Martin Lamothe wrote:


Yep, I second the motion.
This mailing list overloads my poor BB curve.

-M

2009/2/18 Tony Wang ivyt...@gmail.com


I am just curious why we don't have a forum for discussion or you
guys
think
it's really necessary to receive lots of crap information about
Solr and
nutch in email? I can offer you a forum for discussion anyway.

--
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信


--
Martin Lamothe
Business Development and Operations
Wiser Web Solutions Inc.
Direct: (613) 262-5558
Toll-free: 1-800-949-4737
E-mail: m.lamo...@wiserweb.com
http://www.wiserweb.com







Re: 500 Errors on update

2009-02-02 Thread Matthew Runo

Could you also provide us with the error you were getting?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 2, 2009, at 1:46 PM, Derek Springer wrote:


Hi all,
I recently created a Solr index to track some news articles that I  
follow
and I've noticed that I occasionally receive 500 errors when posting  
an

update. It doesn't happen every time and I can't seem to reproduce the
error. I should mention that I have another Solr index setup under  
the same
instance (configured via solr.xml) and I do not seem to be having  
the same

issue. Also, I can query the index without issue.

Does anyone know if this is an error with the Tomcat server I have  
set up,
or an issue with Solr itself? Has anyone else experienced a similar  
issue?


If it's any help, here's a dump of the xml that caused an error:

Pinging Solr Error: HTTP Error 500: Internal Server Error
?xml version=1.0 encoding=UTF-8?add
 doc
   field name=title'The day the music died'? Hardly/ 
field

   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/JBV2Hu7Pisg/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/JBV2Hu7Pisg/index.html 


/field
   field name=summaryThe plane crash that killed Buddy
Holly, Ritchie Valens and The Big Bopper has echoed through rock 'n'  
roll
history for 50 years, representing, if not the end of rock 'n' roll  
itself,
the close of an era. On Monday night, the  anniversary of the trio's  
deaths,

a huge tribute concert is taking place./field
   field name=pubdate2009-02-02T15:43:54Z/field
   field name=sourcewww.cnn.com/field
 /doc

 doc
   field name=title'867-5309' number for sale on
eBay/field
   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/rxehPnDAe7Y/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/rxehPnDAe7Y/index.html 


/field
   field name=summaryJenny's phone number is for  
sale, but

not for a song./field
   field name=pubdate2009-02-02T18:53:42Z/field
   field name=sourcewww.cnn.com/field
 /doc

 doc
   field name=titlePorn airs during Super Bowl/field
   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/pCTDvXLkyb4/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/pCTDvXLkyb4/index.html 


/field
   field name=summarySuper Bowl fans in Tucson,  
Arizona,

caught a different kind of show during Sunday's big game./field
   field name=pubdate2009-02-02T17:34:43Z/field
   field name=sourcewww.cnn.com/field
 /doc

 doc
   field name=titleGallery: Hayden Panettiere at the  
big

game/field
   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/cygh8gfbXR0/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/cygh8gfbXR0/index.html 


/field
   field name=summaryGallery: Hayden Panettiere at  
the big

game/field
   field name=pubdate2009-02-02T14:46:26Z/field
   field name=sourcewww.cnn.com/field
 /doc

 doc
   field name=titleFormer 'Homicide' star breaks
out/field
   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/Uxic4SVAHVo/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/Uxic4SVAHVo/index.html 


/field
   field name=summaryAs the critics rave and the
nominations flow in for her latest role in Frozen River, Melissa  
Leo, a
veteran of the independent film scene and shows such as Homicide,  
has

managed to stay grounded in her work as an actress./field
   field name=pubdate2009-02-02T13:19:10Z/field
   field name=sourcewww.cnn.com/field
 /doc

 doc
   field name=titleDon McLean: Buddy Holly was a
genius/field
   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/eBj6NfUFKzs/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/eBj6NfUFKzs/index.html 


/field
   field name=summaryOf all the unique oddities of my
career, I am perhaps proudest of the fact that I am forever linked  
with

Buddy Holly./field
   field name=pubdate2009-02-02T20:55:16Z/field
   field name=sourcewww.cnn.com/field
 /doc

 doc
   field name=titleSports attorney: Phelps could lose
endorsements/field
   field name=link
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/px0QszfYZ3Y/index.htmlhttp://rss.cnn.com/%7Er/rss/cnn_showbiz/%7E3/px0QszfYZ3Y/index.html 


/field
   field name=summaryOlympic gold medalist Michael  
Phelps
has acknowledged he engaged in regrettable behavior and  
demonstrated bad
judgment, after a British newspaper published a photograph of the  
swimmer

using a marijuana pipe./field
   field name=pubdate2009-02-02T19:21:10Z/field

Re: Optimizing Improving results based on user feedback

2009-01-30 Thread Matthew Runo
I've thought about patching the QueryElevationComponent to apply  
boosts rather than a specific sort. Then the file might look like..


query text=AAA doc id=A boost=5 / doc id=B boost=4 / / 
query
And I could write a script that looks at click data once a day to fill  
out this file.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Jan 30, 2009, at 6:37 AM, Ryan McKinley wrote:

It may not be as fine-grained as you want, but also check the  
QueryElevationComponent.  This takes a preconfigured list of what  
the top results should be for a given query and makes thoes  
documents the top results.


Presumably, you could use click logs to determine what the top  
result should be.



On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote:


A Decision Theoretic Framework for Ranking using Implicit Feedback
uses clicks, but the best part of that paper is all the side comments
about difficulties in evaluation. For example, if someone clicks on
three results, is that three times as good or two failures and a
success? We have to know the information need to decide. That paper
is in the LR4IR 2008 proceedings.

Both Radlinski and Joachims seem to be focusing on click data.

I'm thinking of something much simpler, like taking the first
N hits and reordering those before returning. Brute force, but
would get most of the benefit. Usually, you only have reliable
click data for a small number of documents on each query, so
it is a waste of time to rerank the whole list. Besides, if you
need to move something up 100 places on the list, you should
probably be tuning your regular scoring rather than patching
it with click data.

wunder

On 1/29/09 3:43 PM, Matthew Runo mr...@zappos.com wrote:


Agreed, it seems that a lot of the algorithms in these papers would
almost be a whole new RequestHandler ala Dismax. Luckily a lot of  
them

seem to be built on Lucene (at least the ones that I looked at that
had code samples).

Which papers did you see that actually talked about using clicks? I
don't see those, beyond Addressing Malicious Noise in Clickthrough
Data by Filip Radlinski and also his Query Chains: Learning to  
Rank

from Implicit Feedback - but neither is really on topic.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote:


Thanks, I didn't know there was so much research in this area.
Most of the papers at those workshops are about tuning the
entire ranking algorithm with machine learning techniques.

I am interested in adding one more feature, click data, to an
existing ranking algorithm. In my case, I have enough data to
use query-specific boosts instead of global document boosts.
We get about 2M search clicks per day from logged in users
(little or no click spam).

I'm checking out some papers from Thorsten Joachims and from
Microsoft Research that are specifically about clickthrough
feedback.

wunder

On 1/27/09 11:15 PM, Neal Richter nrich...@gmail.com wrote:

OK I've implemented this before, written academic papers and  
patents

related to this task.

Here are some hints:
- you're on the right track with the editorial boosting elevators
- http://wiki.apache.org/solr/UserTagDesign
- be darn careful about assuming that one click is enough evidence
to boost a long
  'distance'
- first page effects in search will skew the learning badly if you
don't compensate.
 95% of users never go past the first page of results, 1% go
past the second
 page.  So perfectly good results on the second page get
permanently locked out
- consider forgetting what you learn under some condition

In fact this whole area is called 'learning to rank' and is a hot
research topic in IR.
http://web.mit.edu/shivani/www/Ranking-NIPS-05/
http://research.microsoft.com/en-us/um/people/lr4ir-2007/
https://research.microsoft.com/en-us/um/people/lr4ir-2008/

- Neal Richter


On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo mr...@zappos.com
wrote:

Hello folks!

We've been thinking about ways to improve organic search results
for a while
(really, who hasn't?) and I'd like to get some ideas on ways to
implement a
feedback system that uses user behavior as input. Basically, it'd
work on
the premise that what the user actually clicked on is probably a
really good
match for their search, and should be boosted up in the results
for that
search.

For example, if I search for rain boots, and really love the
10th result
down (and show it by clicking on it), then we'd like to capture
this and use
the data to boost up that result //for that search//. We've
thought about
using index time boosts for the documents, but that'd boost it
regardless of
the search terms, which isn't what we want. We've thought about
using the
Elevator handler, but we don't really want to force a product to
the top -
we'd prefer it slowly rises over time as more and more people
click it from
the same search terms. Another

Re: Question about rating documents

2009-01-29 Thread Matthew Runo
You could use a boost function to gently boost up items which were  
marked as more popular.


You would send the function query in the bf parameter with your  
query, and you can find out more about syntax here: http://wiki.apache.org/solr/FunctionQuery


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Jan 29, 2009, at 10:27 AM, Reece wrote:


Currently I'm using SOLR 1.2 to index a few million documents.  It's
been requested that a way for users to rate the documents be done so
that something rated higher would show up higher in search results and
vice verse.

I've been thinking about it, but can't come up with a good way to do
this and still have the best match ranking of the results according
to search terms entered by the users.

I was hoping someone had done something similar or would have some
insight on it.

Thanks in advance!

-Reece





Re: solr as the data store

2009-01-28 Thread Matthew Runo
One thing to keep in mind is that things like joins are impossible in  
solr, but easy in a database. So if you ever need to do stuff like run  
reports, you're probably better off with a database to query on -  
unless you cover your bases very well in the solr index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Jan 28, 2009, at 12:37 PM, Ian Connor wrote:


Hi All,

Is anyone using Solr (and thus the lucene index) as there database  
store.


Up to now, we have been using a database to build Solr from.  
However, given
that lucene already keeps the stored data intact, and that  
rebuilding from
solr to solr can be very fast, the need for the separate database  
does not

seem so necessary.

It seems totally possible to maintain just the solr shards and treat  
them as
the database (backups, redundancy, etc are already built right in).  
The idea
that we would need to rebuild from scratch seems unlikely and the  
speed
boost by using solr shards for data massaging and reindexing seems  
very

appealing.

Has anyone else thought about this or done this and ran into  
problems that
caused them to go back to a seperate database model? Is there a  
critical

need you can think is missing?

--
Regards,

Ian Connor




Optimizing Improving results based on user feedback

2009-01-27 Thread Matthew Runo

Hello folks!

We've been thinking about ways to improve organic search results for a  
while (really, who hasn't?) and I'd like to get some ideas on ways to  
implement a feedback system that uses user behavior as input.  
Basically, it'd work on the premise that what the user actually  
clicked on is probably a really good match for their search, and  
should be boosted up in the results for that search.


For example, if I search for rain boots, and really love the 10th  
result down (and show it by clicking on it), then we'd like to capture  
this and use the data to boost up that result //for that search//.  
We've thought about using index time boosts for the documents, but  
that'd boost it regardless of the search terms, which isn't what we  
want. We've thought about using the Elevator handler, but we don't  
really want to force a product to the top - we'd prefer it slowly  
rises over time as more and more people click it from the same search  
terms. Another way might be to stuff the keyword into the document,  
the more times it's in the document the higher it'd score - but  
there's gotta be a better way than that.


Obviously this can't be done 100% in solr - but if anyone had some  
clever ideas about how this might be possible it'd be interesting to  
hear them.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833



Re: Sizing a Linux box for Solr?

2009-01-21 Thread Matthew Runo
At a certain level it will become better to have multiple smaller  
boxes rather than one huge one. I've found that even an old P4 with 2  
gigs of ram has decent response time on our 150,000 item index with  
only a few users - but it quickly goes downhill if we get more than 5  
or 6. How many documents are you going to be storing in your index?  
How much of them will be stored versus indexed? Will you be  
faceting on the results?


In general, I'd recommend a 64 bit processor with enough ram to store  
your index in ram - but that might not be possible with millions of  
records. Our 150,000 item index is about a gig and a half when  
optimized but yours will likely be different depending on how much you  
store. Faceting takes more memory than pure searching as well.


I'm sure that we could work out some better suggestions with more  
information about your use case.


http://www.nabble.com/Solr---User-f14480.html is a great place to go  
for searching the solr user list.


-Matthew

On Jan 21, 2009, at 8:55 AM, Thomas Dowling wrote:


Is there a useful guide somewhere that suggests system configurations
for machines that will support multiple large-ish Solr indexes?  I'm
working on a group of library databases (journal article citations +
abstracts, mostly), and need to provide some sort of helpful  
information

to our hardware people.  Other than lots, is there an answer for We
have X millions of records, of Y average size, with Z peak  
simultaneous
users, so the memory needed for reasonable search performance is  
_?
Or is the limiting factor on search performance going to be  
something else?


[Standard caveat: I did try checking the solr-user archives, but was
hampered by the fact that there's no search function.  The cobbler's
children go barefoot.]


--
Thomas Dowling
Ohio Library and Information Network
tdowl...@ohiolink.edu





Re: place log4j.properties

2009-01-15 Thread Matthew Runo
Have you tried placing it up in /WEB-INF/classes/? I'd think that'd be  
the root of the classpath for solr, and maybe where it's looking for  
the file?


If you figure it out, could you update the wiki?

--Matthew

On Jan 14, 2009, at 3:39 AM, Marc Sturlese wrote:



Hey there,
I have changed the log system in the nightly build to log4j  
following this

comment:

http://wiki.apache.org/solr/SolrLogging

Everything is loaded correclty but I am geting this INFO:

log4j:WARN No appenders could be found for logger
(org.apache.solr.servlet.SolrDispatchFilter).
log4j:WARN Please initialize the log4j system properly.

I think the problem is that the wepapp is not finding the  
log4j.properties.

I have tryed placing it in the firs class level:
./WEB-INF/classes/org/apache/solr/servlet/

But doesn't seem to recognize it... Any advice?

Thanks in advance

--
View this message in context: 
http://www.nabble.com/place-log4j.properties-tp21454379p21454379.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Dismax Minimum Match/Stopwords Bug

2008-12-29 Thread Matthew Runo
Hmm, that makes sense to me - however I still think that even if we  
have mm set to 2 and we have the 7449078 it should still match  
7449078 in a productId field (it does not: http://zeta.zappos.com/search?department=term=the+7449078) 
. This seems like it works against the way one would reasonably expect  
it to - that stopwords shouldn't impact the counts for mm (so, the  
7449078 would count as 1 term for mm since the is a stopword).


Would there be a way around this? Could we possibly get it reworked?  
What would the downside to that be?


We have people asking for the north to return results from a brand  
called the north face - but it doesn't, and can't, because of this  
mm issue.


Thanks for your time helping us with this issue =)

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Dec 20, 2008, at 10:45 AM, Chris Hostetter wrote:



: Would this mean that, for example, if we wanted to search  
productId (long)
: we'd need to make a field type that had stopwords in it rather  
than simply

: using (long)?

not really ... that's kind of a special usecase.  if someone  
searches for
a productId that's usually *all* they search for (1 chunk of input  
fro

mthe query parser) so it's mandatory and produces a clause across all
fields.  It doesn't matter if the other fields have stopwords --  
even if

the productId happens to be a stop word, that just means it doesn't
produce a clause on those stop worded fields, but it will will on  
your

productId field.

The only case where you might get into trouble is if someone  
searches for

the 123456 ... now you have two chunks of input, so the mm param
comes into play you have no stopwords on your productId field so both
the and 123456 produce clauses, but the isn't going to be  
found in

your productId field, and because of stopwords it doens't exist in the
other fields at all ... so you don't match anything.

FWIW: if i remember right if you want to put numeric fields in the  
qf, i
think you need *all* of them to be numeric and all of your input  
needs to

be numeric, or you get exceptions from the FieldType (not the dismax
parser) when people search for normal words.   i always copyField
productId into a productId_str field for purposes like this.


-Hoss





Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Matthew Runo

I'm using Java 6 and it's compiling for me.

I'm doing..

ant clean
ant dist

and it works just fine. Maybe try an 'ant clean'?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Dec 17, 2008, at 9:17 AM, Toby Cole wrote:

I came across this too earlier, I just deleted the contrib/ 
javascript directory.
Of course, if you need javascript library then you'll have to get it  
building.


Sorry, probably not that helpful. :)
Toby.

On 17 Dec 2008, at 17:03, Kay Kay wrote:


I downloaded the latest .tgz and ran

$ ant dist


docs:

 [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/ 
javascript/dist/doc
  [java] Exception in thread main java.lang.NoClassDefFoundError:  
org/mozilla/javascript/tools/shell/Main

  [java] at JsRun.main(Unknown Source)
  [java] Caused by: java.lang.ClassNotFoundException:  
org.mozilla.javascript.tools.shell.Main
  [java] at java.net.URLClassLoader$1.run(URLClassLoader.java: 
200)
  [java] at java.security.AccessController.doPrivileged(Native  
Method)
  [java] at  
java.net.URLClassLoader.findClass(URLClassLoader.java:188)

  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  [java] at sun.misc.Launcher 
$AppClassLoader.loadClass(Launcher.java:301)

  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
  [java] at  
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

  [java] ... 1 more

BUILD FAILED
/opt/src/apache-solr-nightly/common-build.xml:335: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/common-build.xml:212: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java  
returned: 1



and came across the above mentioned error.

The class seems to be from the rhino (mozilla js ) library. Is it  
supposed to be packaged by default / is there a license restriction  
that prevents from being so .




Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com





Re: Dismax Minimum Match/Stopwords Bug

2008-12-15 Thread Matthew Runo
Would this mean that, for example, if we wanted to search productId  
(long) we'd need to make a field type that had stopwords in it rather  
than simply using (long)?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Dec 12, 2008, at 11:56 PM, Chris Hostetter wrote:



: I have discovered some weirdness with our Minimum Match  
functionality.
: Essentially it comes up with absolutely no results on certain  
queries.
: Basically, searches with 2 words and 1 being ³the² don¹t have a  
return
: result.  From what we can gather the minimum match criteria is  
making it
: such that if there are 2 words then both are required.   
Unfortunately, the


you haven't mentioned what qf you're using, and you only listed one  
field
type, which includes stopwords -- but i suspect your qf contains at  
least

one field that *doesn't* remove stopwords.

this is in fact an unfortunate aspect of the way dismax works --
each chunk of text recognized by the querypaser is passed to each
analyzer for each field.  Any chunk that produces a query for a field
becomes a DisjunctionMaxQuery, and is included in the mm count --  
even
if that chunk is a stopword in every other field (and produces no  
query)


so you have to either be consistent with your stopwords across all  
fields,
or make your mm really small.  searching for dismax stopwords  
turns this

up...

http://www.nabble.com/Re%3A-DisMax-request-handler-doesn%27t-work-with-stopwords--p11016770.html

...if i'm wrong about your situation (some fields in the qf with  
stopwords
and some fields without) then please post all of the params you are  
using
(not just mm) and the full parsedquery_tostring from when  
debugQuery=true

is turned on.




-Hoss




Re: Solr 1.3 - response time very long

2008-12-03 Thread Matthew Runo
Are you manipulating the query at all between the url like /test/ 
selector?cache=0backend=solrrequest=/relevance/search/D and what  
gets sent to Solr? To me, those don't look like solr requests (I could  
be missing something though). I'd be curious to see the actual  
requests to try and let you know why you're getting an error (what  
error is it giving you?).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Dec 3, 2008, at 1:02 AM, sunnyfr wrote:



Hi again,

In my test, I've maximum response time : 65 sec for an average at 3,
So it might come some request which provide error, for exemple in my  
test
for 50 000 requests I've around 30 requests which get back error,  
that's why

the max time response is 65sec.

I just don't get why I've this error on some request. like :
/test/selector?cache=0backend=solrrequest=/relevance/search/D
/test/selector?cache=0backend=solrrequest=/relevance/search/?f+you
/test/selector?cache=0backend=solrrequest=/relevance/search/?
/test/selector?cache=0backend=solrrequest=/relevance/search/the
/test/selector?cache=0backend=solrrequest=/relevance/search/?
...
When I search it manually not by jMeter .. indeed it takes a long  
time and

then it get back ids.
What do you think?

Thanks a lot for your help.


sunnyfr wrote:


Hi Matthew, Hi Yonik,

...sorry for the flag .. didnt want to ...

Solr 1.3  / Apache 5.5

Data's directory size : 7.9G
I'm using jMeter to hit http request, I'm sending exactly the same  
on solr

and sphinx(mysql) by http either.

solr
http://test-search.com/test/selector?cache=0backend=solrrequest=/relevance/search/dog
sphinx
http://test-search.com/test/selector?cache=0backend=mysqlrequest=/relevance/search/dog

when threads are more than 4 it's gettting slower, for a big test  
during
40mn with increasing to 100 threads/sec for solr like for sphinx,  
at the

end the average for solr is 3sec and for sphinx 1sec.

solrconfig.xml :  http://www.nabble.com/file/p20802690/solrconf.xml
solrconf.xml

schema.xml:
fields
   field name=idtype=sint 
indexed=true

stored=true  omitNorms=true /

   field name=duration  type=sint 
indexed=true

stored=false omitNorms=true /
   field name=created   type=date 
indexed=true

stored=true omitNorms=true /
   field name=modified  type=date 
indexed=true

stored=false omitNorms=true /
   field name=rating_binratetype=sint 
indexed=true

stored=true omitNorms=true /
   field name=user_id   type=sint 
indexed=true

stored=false omitNorms=true /
   field name=country   type=string   
indexed=true

stored=false omitNorms=true /
   field name=language  type=string   
indexed=true

stored=true omitNorms=true /
  ...
   field name=stat_viewstype=sint 
indexed=true

stored=true omitNorms=true /
   field name=stat_views_today  type=sint 
indexed=true

stored=false omitNorms=true /
   field name=stat_views_last_week  type=sint 
indexed=true

stored=false omitNorms=true /
   field name=stat_views_last_month type=sint 
indexed=true

stored=false omitNorms=true /
   field name=stat_comments type=sint 
indexed=true

stored=false omitNorms=true /
   field name=stat_comments_today   type=sint 
indexed=true

stored=false omitNorms=true /
   field name=stat_comments_last_week   type=sint 
indexed=true

stored=false omitNorms=true /
   field name=stat_comments_last_month  type=sint 
indexed=true

stored=false omitNorms=true /
...
   field name=title type=text
indexed=true stored=true /
   field name=title_fr  type=text_fr
indexed=true stored=false /
   field name=title_en  type=text_en
indexed=true stored=false /
   field name=title_de  type=text_de
indexed=true stored=false /
   field name=title_es  type=text_es
indexed=true stored=false /
   field name=title_ru  type=text_ru
indexed=true stored=false /
   field name=title_pt  type=text_pt
indexed=true stored=false /
   field name=title_nl  type=text_nl
indexed=true stored=false /
   field name=title_el  type=text_el
indexed=true stored=false /
   field name=title_ja  type=text_ja
indexed=true stored=false /
   field name=title_it  type=text_it
indexed=true stored=false /

   field name=description   type=text 
indexed=true

stored=true /
   field name=description_frtype=text_fr
indexed=true stored=false /
   field name=description_entype=text_en
indexed=true stored=false /
   field name=description_detype=text_de
indexed=true stored=false /
   field name=description_estype=text_es
indexed=true stored=false /
   field name=description_rutype

Re: omiting no price documents when sorting on price

2008-11-26 Thread Matthew Runo
You also don't /need/ to put in a price in the index. If something  
doesn't have a value for a field, you can just not send the field.  
Then sorting won't be thrown off by the dummy value. Then those  
documents simply won't have a price field.


Of course, if you need to facet on it or query it or anything like  
that you can't do this..


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Nov 26, 2008, at 9:09 AM, Erick Erickson wrote:


Assuming you know you want to do this at query time, couldn't
you just add a -price:0 clause to your query?

Best
Erick

On Wed, Nov 26, 2008 at 11:00 AM, joeMcElroy [EMAIL PROTECTED] wrote:



im sure this is an easy question but...

when a product doesn't have a price, I index the price as 0. When  
sorting

on
price, these values come up first or last. How can you omit these  
items

when
sorting against price.

thanks

joe
--
View this message in context:
http://www.nabble.com/omiting-no-price-documents-when-sorting-on-price-tp20703795p20703795.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: [VOTE] Community Logo Preferences

2008-11-24 Thread Matthew Runo

1. 
https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg
2. 
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Nov 23, 2008, at 8:59 AM, Ryan McKinley wrote:


Please submit your preferences for the solr logo.

For full voting details, see:
 http://wiki.apache.org/solr/LogoContest#Voting

The eligible logos are:
 http://people.apache.org/~ryan/solr-logo-options.html

Any and all members of the Solr community are encouraged to reply to  
this thread and list (up to) 5 ranked choices by listing the Jira  
attachment URLs. Votes will be assigned a point value based on rank.  
For each vote, 1st choice has a point value of 5, 5th place has a  
point value of 1, and all others follow a similar pattern.


https://issues.apache.org/jira/secure/attachment/12345/yourfrstchoice.jpg
https://issues.apache.org/jira/secure/attachment/34567/yoursecondchoice.jpg
...

This poll will be open until Wednesday November 26th, 2008 @ 11:59PM  
GMT


When the poll is complete, the solr committers will tally the  
community preferences and take a final vote on the logo.


A big thanks to everyone would submitted possible logos -- its great  
to see so many good options.




Fwd: Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Matthew Runo

Hello -

I wanted to forward this on, since I thought that people here might be  
able to use this to build indexes. So long as the lucene version in  
LuSQL matches the version in Solr, it would work fine for indexing -  
yea?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

Begin forwarded message:


From: Glen Newton [EMAIL PROTECTED]
Date: November 17, 2008 4:32:18 AM PST
To: [EMAIL PROTECTED]
Subject: Software Announcement: LuSql: Database to Lucene indexing
Reply-To: [EMAIL PROTECTED]

LuSql is a simple but powerful tool for building Lucene indexes from
relational databases. It is a command-line Java application for the
construction of a Lucene index from an arbitrary SQL query of a
JDBC-accessible SQL database. It allows a user to control a number of
parameters, including the SQL query to use, individual
indexing/storage/term-vector nature of fields, analyzer, stop word
list, and other tuning parameters. In its default mode it uses
threading to take advantage of multiple cores.

LuSql can handle complex queries, allows for additional per record
sub-queries, and has a plug-in architecture for arbitrary Lucene
document manipulation. Its only dependencies are three Apache Commons
libraries, the Lucene core itself, and a JDBC driver.

LuSql has been extensively tested, including a large 6+ million
full-text  metadata journal article document collection, producing an
86GB Lucene index in ~13 hours.

http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

Glen Newton

--

-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Re: Solr Core Size limit

2008-11-11 Thread Matthew Runo
What happens when we use another uniqueKey in this case? I was under  
the assumption that if we say uniqueKeystyleId/uniqueKey then our  
doc IDs will be our styleIds.


Is there a secondary ID that's kept internal to Solr/Lucene in this  
case?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Nov 11, 2008, at 10:25 AM, Otis Gospodnetic wrote:


Doc ID gaps are zapped during segment merges and index optimization.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Norberto Meijome [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 10, 2008 6:45:01 PM
Subject: Re: Solr Core Size limit

On Mon, 10 Nov 2008 10:24:47 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:

I don't think there is a limit other than your hardware and the  
internal Doc

ID which limits you to 2B docs on 32-bit machines.


Hi Otis,
just curious is this internal doc ID reused when an optimise  
happens? or gaps left and re-filled when 2B is reached ?


cheers,
b

_
{Beto|Norberto|Numard} Meijome

Whenever you find that you are on the side of the majority, it is  
time to reform.

  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery  
when wet. Reading disclaimers makes you go blind. Writing them is  
worse. You have been Warned.




Re: How to search a DataImportHandler solr index

2008-10-23 Thread Matthew Runo
So you were able to get things working? What was your experience with  
the DataImportHandler like?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Oct 23, 2008, at 6:50 AM, Nick80 wrote:



Never mind. I needed to specify in schema.xml that the field is  
multiValued.

--
View this message in context: 
http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20131412.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: SolrJ + HTTP caching

2008-10-15 Thread Matthew Runo
We've been using Varnish (http://varnish.projects.linpro.no/) in front  
of our Solr servers, and have been seeing about a 70% hit rate for the  
queries. We're using SolrJ, and have seen no bad effects of the cache.


That said, we're just caching everything for a few minutes. We don't  
pick and choose which queries get cached in Varnish, and our business  
users are fine with the index being a few minutes stale as the TTL  
expires on the cache. I don't think solr has a way to, at query time,  
change the cache control headers.


http://wiki.apache.org/solr/SolrAndHTTPCaches may be a good jumping  
off point for more thought.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Oct 15, 2008, at 9:24 AM, Jon Baer wrote:


Hi,

What is the proper behavior suppose to be between SolrJ and  
caching?  Im proxying through a framework and wondering if it is  
possible to turn on / turn off caching programatically depending on  
the type of query (or if this will have no effect whatsoever) ...  
since SolrJ uses Apache HTTP client libs can it negotiate anything  
here?


SOLR-127: HTTP Caching awareness.  Solr now recognizes HTTP Request
   headers related to HTTP Caching (see RFC 2616 sec13) and will  
respond
   with 304 Not Modified when appropriate.  New options have been  
added

   to solrconfig.xml to influence this behavior.
   (Thomas Peuss via hossman)

Thanks.

- Jon





Re: Hardware config for SOLR

2008-09-18 Thread Matthew Runo
I can't speak to a lot of this - but regarding the servers I'd go with  
the more powerful ones, if only for the amount of ram. Your index will  
likely be larger than 1 gig, and with only two you'll have a lot of  
your index not stored in ram, which will slow down your QPS.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Sep 17, 2008, at 3:32 PM, Andrey Shulinskiy wrote:


Hello,



We're planning to use SOLR for our project, got some questions.



So I asked some Qs yesterday, got no answers whatsoever. Wondering if
they didn't make sense, or if the e-mail was too long... :-)

Anyway, I'll try to ask them again and hope for some answers this  
time.


It's a very new experience for us so any help is really appreciated.



First, some numbers we're expecting.

- The average size of a doc: ~100K

- The number of indexes: 1

- The query response time we're looking for:  200 - 300ms

- The number of stored docs:

1st year: 500K - 1M

2nd year: 2-3M

- The estimated number of concurrent users per second

1st year: 15 - 25

2nd year: 40 - 60

- The estimated number of queries

1st year: 15 - 25

2nd year: 40 - 60



Now the questions



1)  Should we do sharding or not?

If we start without sharding, how hard will it be to enable it?

Is it just some config changes + the index rebuild or is it more?

My personal opinion is to go without sharding at first and enable it
later if do get a lot of documents.



2)  How should we organize our clusters to ensure redundancy?

Should we have 2 or more identical Masters (means that all the
updates/optimisations/etc. are done for every one of them)?

An alternative, afaik, is to reconfigure one slave to become the new
Master, how hard is that?



3) Basically, we can get servers of two kinds:



* Single Processor, Dual Core Opteron 2214HE

* 2 GB DDR2 SDRAM

* 1 x 250 GB (7200 RPM) SATA Drive(s)



* Dual Processor, Quad Core 5335

* 16 GB Memory (Fully Buffered)

* 2 x 73 GB (10k RPM) 2.5 SAS Drive(s), RAID 1



The second - more powerful - one is more expensive, of course.



How can we take advantage of the multiprocessor/multicore servers?

Is there some special setup required to make, say, 2 instances of SOLR
run on the same server using different processors/cores?



4)  Does it make much difference to get a more powerful Master?

Or, on the contrary, as slaves will be queried more often, they should
be the better ones? Maybe just the HDDs for the slaves should be as  
fast

as possible?



5) How many slaves does it make sense to have per one Master?

What's (roughly) the performance gain from 1 to 2, 2 - 3, etc?

When does it stop making sense to add more slaves?

As far as I understand, it depends mainly on the size of the index.
However, I'd guess the time required to do a push for too many slaves
can be a problem too, correct?



Thanks,

Andrey.







Re: [SPAM] Multiple Process of the SAME solr instance

2008-09-17 Thread Matthew Runo
I'm not 100% sure on what you mean, but if you're asking if you can  
run two or more solr webapps and use them all to build up one index,  
then you can't. You'll end up with a corrupted index. Only one  
solr.war webapp can write to an index at a time.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Sep 17, 2008, at 7:54 AM, mohitranka wrote:



Hi All,
I am using Solr 1.3 with Tomcat 5.5.20 as servlet  
container. I
need to create multiple process of the same Solr instance to process  
the
incoming indexes effectively. Can you point me how (and where :-) )  
to do

it?

Thanks and regards,
Mohit Ranka

--
View this message in context: 
http://www.nabble.com/Multiple-Process-of-the-SAME-solr-instance-tp19533951p19533951.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr Slaves Sync

2008-09-04 Thread Matthew Runo
As far as I can tell, there is no need to remove a slave from a pool  
while performing the sync. It's all done in the background and doesn't  
change anything till the final commit/ is ran to open a new searcher.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Sep 4, 2008, at 10:46 AM, OLX - Pablo Garrido wrote:


Hello

We have a 3 Solr Servers replication schema, one Master and 2 Slaves,
commits are done every 5 minutes on the Master and an optimize is done
once a day during midnight, snapshots are copied via rsync to Slaves  
are

done every 10 minutes, we are facing serious problems when doing the
sync after the optimize and keeping Slaves serving queries as usual,
active connections to Slaves increase highly during Optimize Snapshot
sync, is there any way we can tune this process ? we try this  
process :


1. stopping Sync process on one Slave
2. taking the other one out of the LB pool
3. do the sync on this offline Slave
4. after sync is over add back to LB Pool synced Slave
5. take other Slave out from LB Pool
6. start sync process on the offline Slave
8 add back synced Slave to LB Pool

following these steps we sometimes face high active connections when
moving Slaves back to LB Pool. Has anybody faced this situation in
production envs ? Thanks

Pablo





Re: Building a multilevel query.

2008-09-03 Thread Matthew Runo
I think in order to do this you'd need to run two queries. We do this  
as well, for example..


Facet on the product types that match a query term.
For each product type, run another query to facet on the subcategories.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Sep 2, 2008, at 5:20 PM, Erik Holstad wrote:


Hi!
I want to do a query that first queries on one specific field and
for all those that match the input do a second query.

For example if we have a type field where one of the options
is user and a title fields includes the names of the users.

So I want to find all data with type field = user where the name
Erik is in the title field.

Is this possible? Have been playing with faceting, but can't get
the facet.query to work, and otherwise I just get all the results.

Regards Erik




Re: SpellCheckComponent bug?

2008-08-27 Thread Matthew Runo
runnning does have multiple suggestions, Cunning and Running - but it  
properly picks Running. I have not noticed this for any other term,  
but I have not exhaustively tested others yet.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 27, 2008, at 7:52 AM, Grant Ingersoll wrote:

Hmm, sounds like a bug.  A test case would be great, but at a  
minimum file a JIRA.


Do those other terms that collate properly have multiple suggestions?

On Aug 25, 2008, at 6:24 PM, Matthew Runo wrote:


Hello folks!

I seem to be seeing a bug in the SpellCheckComponent..

Search term: Quicksilver... I get two suggestions...

lst name=suggestion
int name=frequency2/int
str name=wordQuicksilver/str
/lst

lst name=suggestion
int name=frequency220/int
str name=wordQuiksilver/str
/lst

...and it's not correctly spelled...

bool name=correctlySpelledfalse/bool

...but the collation is of the first term - not the one with the  
highest frequency?


str name=collationQuicksilver/str

This seems to be anti-what-the-docs-say collation should do. Other,  
more popular terms (shoez, runnning, etc) all seem to collate  
properly. I'm hitting Solr via SolrJ and not really doing anything  
too fancy - using SVN head at the moment. Just wondered if anyone  
had any ideas. There are no synonyms in this system, so I don't  
think that could be it. I've rebuilt the search index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833








SpellCheckComponent bug?

2008-08-25 Thread Matthew Runo

Hello folks!

I seem to be seeing a bug in the SpellCheckComponent..

Search term: Quicksilver... I get two suggestions...

lst name=suggestion
int name=frequency2/int
str name=wordQuicksilver/str
/lst

lst name=suggestion
int name=frequency220/int
str name=wordQuiksilver/str
/lst

...and it's not correctly spelled...

bool name=correctlySpelledfalse/bool

...but the collation is of the first term - not the one with the  
highest frequency?


str name=collationQuicksilver/str

This seems to be anti-what-the-docs-say collation should do. Other,  
more popular terms (shoez, runnning, etc) all seem to collate  
properly. I'm hitting Solr via SolrJ and not really doing anything too  
fancy - using SVN head at the moment. Just wondered if anyone had any  
ideas. There are no synonyms in this system, so I don't think that  
could be it. I've rebuilt the search index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833



Re: Deadlock in lucene?

2008-08-19 Thread Matthew Runo
I know this isn't really the place for this, so please forgive me -  
but does this patch look reasonably safe to use to skip the isDeleted  
check inside of FunctionQuery?


My reasoning behind this is that many people (us included) will be  
building the index on a separate server, and then using the  
replication scripts to publish the files out to several read-only  
servers. On those instances, deletedDocs would always be empty, since  
it's a read only instance - and so we can conveniently skip the Lucene  
code in question. This flag would also be good for other optimizations  
that can only be made when you assume the index is read-only.


Solr seems to work with the flag set - any reasons why this will crash  
and/or kill my kitten?


(please forgive my posting this here instead of in solr-dev!)

Index: src/java/org/apache/solr/search/FunctionQParser.java
===
--- src/java/org/apache/solr/search/FunctionQParser.java	(revision  
687135)
+++ src/java/org/apache/solr/search/FunctionQParser.java	Tue Aug 19  
11:08:45 PDT 2008

@@ -49,7 +49,7 @@
 }
 ***/

-return new FunctionQuery(vs);
+return new FunctionQuery(vs,  
req.getSchema().getSolrConfig().isReadOnly() );

   }

   /**
Index: src/java/org/apache/solr/search/function/FunctionQuery.java
===
--- src/java/org/apache/solr/search/function/FunctionQuery.java	 
(revision 687135)
+++ src/java/org/apache/solr/search/function/FunctionQuery.java	Tue  
Aug 19 11:08:45 PDT 2008

@@ -31,12 +31,14 @@
  */
 public class FunctionQuery extends Query {
   ValueSource func;
+  Boolean readOnly;

   /**
* @param func defines the function to be used for scoring
*/
-  public FunctionQuery(ValueSource func) {
+  public FunctionQuery(ValueSource func, Boolean readOnly) {
 this.func=func;
+this.readOnly=readOnly;
   }

   /** @return The associated ValueSource */
@@ -113,7 +115,7 @@
 if (doc=maxDoc) {
   return false;
 }
-if (reader.isDeleted(doc)) continue;
+if (!readOnly  reader.isDeleted(doc)) continue;
 // todo: maybe allow score() to throw a specific exception
 // and continue on to the next document if it is thrown...
 // that may be useful, but exceptions aren't really good
Index: src/java/org/apache/solr/core/Config.java
===
--- src/java/org/apache/solr/core/Config.java   (revision 687135)
+++ src/java/org/apache/solr/core/Config.java	Tue Aug 19 11:08:45 PDT  
2008

@@ -45,6 +45,8 @@
   private final String name;
   private final SolrResourceLoader loader;

+  private Boolean readOnly;
+
   /**
* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String,  
InputStream, String)} instead.

*/
@@ -254,6 +256,19 @@
  return val!=null ? Double.parseDouble(val) : def;
}

+  /**
+   * Is the index set up to be readOnly? If so, this will cause the  
FunctionQuery stuff to not check

+   * for deleted documents.
+   * @return boolean readOnly
+   */
+   public boolean isReadOnly() {
+   if( this.readOnly == null ){
+   readOnly = getBool(/mainIndex/readOnly, false);
+   }
+
+   return readOnly;
+   }
+
   // The following functions were moved to ResourceLoader
   
//-

Index: example/solr/conf/solrconfig.xml
===
--- example/solr/conf/solrconfig.xml(revision 687135)
+++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
@@ -114,6 +114,12 @@
  This is not needed if lock type is 'none' or 'single'
  --
 unlockOnStartupfalse/unlockOnStartup
+
+   !-- In the event that you are only using this index for reads,
+you can enable this flag. This will skip some checks that
+can cause performance issues when under high load
+   --
+   readOnlyfalse/readOnly
   /mainIndex

   !--	Enables JMX if and only if an existing MBeanServer is found,  
use




--- end patch ---

On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote:


It's not a deadlock (just a synchronization bottleneck) , but it is a
known issue in Lucene and there has been some progress in improving
the situation.
-Yonik


On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo [EMAIL PROTECTED]  
wrote:

Hello folks!

I was just wondering if anyone else has seen this issue under heavy  
load. We
had some servers set to very high thread limits (12 core servers  
with 32
gigs of ram), and found several threads would end up in this  
state


Name: http-8080-891
State: BLOCKED on [EMAIL PROTECTED]  
owned by:

http-8080-191
Total blocked: 97,926  Total waited: 16

Stack trace:
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
674)
org.apache.solr.search.function.FunctionQuery

Re: Deadlock in lucene?

2008-08-19 Thread Matthew Runo
Ouch, that's certainly a problem! I'll have to think some more on this  
one.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 19, 2008, at 1:42 PM, Otis Gospodnetic wrote:

Matthew, just because an index is read-only on some server it  
doesn't mean it contains no deletes (no docs marked as deleted, but  
not yet removed from the index).  So you still want to check  
isDeleted(doc) *unless* you are certain the index has no docs marked  
as deleted (this happens after optimization).


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Matthew Runo [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, August 19, 2008 4:26:59 PM
Subject: Re: Deadlock in lucene?

I know this isn't really the place for this, so please forgive me -
but does this patch look reasonably safe to use to skip the isDeleted
check inside of FunctionQuery?

My reasoning behind this is that many people (us included) will be
building the index on a separate server, and then using the
replication scripts to publish the files out to several read-only
servers. On those instances, deletedDocs would always be empty, since
it's a read only instance - and so we can conveniently skip the  
Lucene
code in question. This flag would also be good for other  
optimizations

that can only be made when you assume the index is read-only.

Solr seems to work with the flag set - any reasons why this will  
crash

and/or kill my kitten?

(please forgive my posting this here instead of in solr-dev!)

Index: src/java/org/apache/solr/search/FunctionQParser.java
===
--- src/java/org/apache/solr/search/FunctionQParser.java(revision
687135)
+++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug  
19

11:08:45 PDT 2008
@@ -49,7 +49,7 @@
 }
 ***/

-return new FunctionQuery(vs);
+return new FunctionQuery(vs,
req.getSchema().getSolrConfig().isReadOnly() );
   }

   /**
Index: src/java/org/apache/solr/search/function/FunctionQuery.java
===
--- src/java/org/apache/solr/search/function/FunctionQuery.java
(revision 687135)
+++ src/java/org/apache/solr/search/function/FunctionQuery.java 
Tue

Aug 19 11:08:45 PDT 2008
@@ -31,12 +31,14 @@
  */
 public class FunctionQuery extends Query {
   ValueSource func;
+  Boolean readOnly;

   /**
* @param func defines the function to be used for scoring
*/
-  public FunctionQuery(ValueSource func) {
+  public FunctionQuery(ValueSource func, Boolean readOnly) {
 this.func=func;
+this.readOnly=readOnly;
   }

   /** @return The associated ValueSource */
@@ -113,7 +115,7 @@
 if (doc=maxDoc) {
   return false;
 }
-if (reader.isDeleted(doc)) continue;
+if (!readOnly  reader.isDeleted(doc)) continue;
 // todo: maybe allow score() to throw a specific exception
 // and continue on to the next document if it is thrown...
 // that may be useful, but exceptions aren't really good
Index: src/java/org/apache/solr/core/Config.java
===
--- src/java/org/apache/solr/core/Config.java(revision 687135)
+++ src/java/org/apache/solr/core/Config.javaTue Aug 19  
11:08:45 PDT

2008
@@ -45,6 +45,8 @@
   private final String name;
   private final SolrResourceLoader loader;

+  private Boolean readOnly;
+
   /**
* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String,
InputStream, String)} instead.
*/
@@ -254,6 +256,19 @@
  return val!=null ? Double.parseDouble(val) : def;
}

+  /**
+   * Is the index set up to be readOnly? If so, this will cause the
FunctionQuery stuff to not check
+   * for deleted documents.
+   * @return boolean readOnly
+   */
+   public boolean isReadOnly() {
+   if( this.readOnly == null ){
+   readOnly = getBool(/mainIndex/readOnly, false);
+   }
+
+   return readOnly;
+   }
+
   // The following functions were moved to ResourceLoader

//-

Index: example/solr/conf/solrconfig.xml
===
--- example/solr/conf/solrconfig.xml(revision 687135)
+++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
@@ -114,6 +114,12 @@
  This is not needed if lock type is 'none' or 'single'
  --
 false
+
+
+false



use




--- end patch ---

On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote:

It's not a deadlock (just a synchronization bottleneck) , but it  
is a

known issue in Lucene and there has been some progress in improving
the situation.
-Yonik


On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo
wrote:

Hello folks!

I was just wondering if anyone else has seen this issue under heavy
load. We
had

Synonyms with spaces not working

2008-08-18 Thread Matthew Runo

Hello folks!

Sorry to ask such a basic question but synonyms might be the end of  
me.. I suspect that there is something fundamentally wrong with the  
field type I've set up..


fieldType name=text class=solr.TextField  
positionIncrementGap=100

analyzer
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt  
ignoreCase=true expand=true/


tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1  
generateNumberParts=1 catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

In synonyms.txt I have a *large* list of synonyms in the following  
format..


a, b, c d e, f, g = something

I'm having the behavior that searches for a, b, f, and g all work, but  
the c d e does not. I suspected that was because things were getting  
split on white space before they were going to the synonym filter, so  
I moved the synonym filters to be before the tokenizer. Something's  
still wrong though... any help would be most appreciated!


Thank you for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833



Deadlock in lucene?

2008-08-18 Thread Matthew Runo

Hello folks!

I was just wondering if anyone else has seen this issue under heavy  
load. We had some servers set to very high thread limits (12 core  
servers with 32 gigs of ram), and found several threads would end up  
in this state


Name: http-8080-891
State: BLOCKED on [EMAIL PROTECTED] owned  
by: http-8080-191

Total blocked: 97,926  Total waited: 16

Stack trace:
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:674)
org.apache.solr.search.function.FunctionQuery 
$AllScorer.next(FunctionQuery.java:116)
org 
.apache 
.lucene 
.util.ScorerDocQueue.topNextAndAdjustElsePop(ScorerDocQueue.java:116)
org 
.apache 
.lucene 
.search 
.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:175)
org 
.apache 
.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java: 
228)

org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:76)
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:357)
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137)
org.apache.lucene.search.Searcher.search(Searcher.java:126)
org.apache.lucene.search.Searcher.search(Searcher.java:105)
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1148)
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:834)
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
269)
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
169)
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
128)

org.apache.solr.core.SolrCore.execute(SolrCore.java:1143)
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
235)
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
233)
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
175)
org 
.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: 
128)
org 
.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 
102)
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)
org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)

org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
java.lang.Thread.run(Thread.java:619)

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833



Synonyms help in 1.3-HEAD?

2008-08-14 Thread Matthew Runo

Hello folks!

Having a heck of a time trying to get a synonyms file to work  
properly. It seems that something's wrong with the way it's been set  
up, but, honestly, I can't see anything wrong with it. Some samples...


This works...
zutanoapparel = zutano

But this does not...
aadias, aadidas, aaidas, adadas, adaddas, adaddis, adadias, adadis,  
adaidas, adaies, addedas, addedis, addidaas, addidads, addidais,  
addidas, addidascom, addiddas, addides, addidis, adeadas, adedas,  
adeddas, adedias, adiada, adiadas, adiadis, adiads, adida, adidaas,  
adidas1, adidass, adidaz, adidda, adiddas, adiddias, adidias, adidis,  
adiidas, aditas, adudas, afidas, aididas, wwwadidascom = adidas


This works...
liumiani, loomiani, lumaini, lumanai, lumani, lumiami, lumian,  
lumiana, lumianai, lumiari, luminani, lumini, luminiani = lumiani


But this does not...
clegerie, cleregie, clergerie, clergie, robertclaregie, robert  
claregie, robertclargeries, robert clargeries, robertclegerie, robert  
clegerie, robertcleregie, robert cleregie, robertclergeic, robert  
clergeic, robertclergerie, robertclergi, robert clergi, robertclergie,  
robert clergie, robertclergoe, robert clergoe, robertclerige, robert  
clerige, robertclerterie, robert clerterie = Robert Clergerie


This is how they're set up in my schema..
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt  
ignoreCase=true expand=true/


Is there a limit to the number of terms in the list of options? It  
seems that the ones that are shorter work, while the longer lists  
don't. I'm at a loss as to why though..


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833



Re: Synonyms help in 1.3-HEAD?

2008-08-14 Thread Matthew Runo
Thank you for your suggestion, I really don't see anything 'wrong'  
with the longer lists.. I entered https://issues.apache.org/jira/browse/SOLR-702 
 for this issue, and attached relevant files. If you need anything  
more, don't hesitate to contact me!


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 14, 2008, at 10:16 AM, Yonik Seeley wrote:


There should be no limit, so you may have uncovered a bug.  Could you
open a JIRA issue?  If it's a real bug, it should get fixed before
1.3.

-Yonik

On Thu, Aug 14, 2008 at 12:35 PM, Matthew Runo [EMAIL PROTECTED]  
wrote:

Hello folks!

Having a heck of a time trying to get a synonyms file to work  
properly. It
seems that something's wrong with the way it's been set up, but,  
honestly, I

can't see anything wrong with it. Some samples...

This works...
zutanoapparel = zutano

But this does not...
aadias, aadidas, aaidas, adadas, adaddas, adaddis, adadias, adadis,  
adaidas,
adaies, addedas, addedis, addidaas, addidads, addidais, addidas,  
addidascom,
addiddas, addides, addidis, adeadas, adedas, adeddas, adedias,  
adiada,
adiadas, adiadis, adiads, adida, adidaas, adidas1, adidass, adidaz,  
adidda,

adiddas, adiddias, adidias, adidis, adiidas, aditas, adudas, afidas,
aididas, wwwadidascom = adidas

This works...
liumiani, loomiani, lumaini, lumanai, lumani, lumiami, lumian,  
lumiana,

lumianai, lumiari, luminani, lumini, luminiani = lumiani

But this does not...
clegerie, cleregie, clergerie, clergie, robertclaregie, robert  
claregie,

robertclargeries, robert clargeries, robertclegerie, robert clegerie,
robertcleregie, robert cleregie, robertclergeic, robert clergeic,
robertclergerie, robertclergi, robert clergi, robertclergie, robert  
clergie,

robertclergoe, robert clergoe, robertclerige, robert clerige,
robertclerterie, robert clerterie = Robert Clergerie

This is how they're set up in my schema..
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/

Is there a limit to the number of terms in the list of options? It  
seems
that the ones that are shorter work, while the longer lists don't.  
I'm at a

loss as to why though..

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833








Bug in admin center JSP?

2008-08-12 Thread Matthew Runo

Hello!

I've noticed that the admin center of SVN head seems to report two  
open searches recently, though they appear to be the same searcher..


Example:

name:[EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  searcherName : [EMAIL PROTECTED] main
caching : true
numDocs : 157474
maxDoc : 467325
readerImpl : MultiSegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944089163
openedAt : Tue Aug 12 06:48:41 PDT 2008
registeredAt : Tue Aug 12 06:48:42 PDT 2008
warmupTime : 1190

name:   searcher
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  searcherName : [EMAIL PROTECTED] main
caching : true
numDocs : 157474
maxDoc : 467325
readerImpl : MultiSegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944089163
openedAt : Tue Aug 12 06:48:41 PDT 2008
registeredAt : Tue Aug 12 06:48:42 PDT 2008
warmupTime : 1190


Note that the stats: 	searcherName : [EMAIL PROTECTED] main line is  
the same for both - leading me to think that this is just a display  
issue. Is anyone else seeing this?


--Matthew


Re: Bug in admin center JSP?

2008-08-12 Thread Matthew Runo
Ah, that makes sense. I just wanted to point it out in case it wasn't  
intentional since it wasn't apparent from the front end as to why they  
were listed twice.


Thanks for taking a moment to reply =)

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 12, 2008, at 8:06 AM, Shalin Shekhar Mangar wrote:

They are both the same searcher. The reason for displaying them  
twice is to

see the current searcher separately (named searcher) and any other
searchers that are still open due to any reasons. The name attribute  
was

specially added so that one can verify that both are indeed the same.

On Tue, Aug 12, 2008 at 8:28 PM, Matthew Runo [EMAIL PROTECTED]  
wrote:



Hello!

I've noticed that the admin center of SVN head seems to report two  
open

searches recently, though they appear to be the same searcher..

Example:

name:[EMAIL PROTECTED] main
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  searcherName : [EMAIL PROTECTED] main
caching : true
numDocs : 157474
maxDoc : 467325
readerImpl : MultiSegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944089163
openedAt : Tue Aug 12 06:48:41 PDT 2008
registeredAt : Tue Aug 12 06:48:42 PDT 2008
warmupTime : 1190

name:   searcher
class:  org.apache.solr.search.SolrIndexSearcher
version:1.0
description:index searcher
stats:  searcherName : [EMAIL PROTECTED] main
caching : true
numDocs : 157474
maxDoc : 467325
readerImpl : MultiSegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944089163
openedAt : Tue Aug 12 06:48:41 PDT 2008
registeredAt : Tue Aug 12 06:48:42 PDT 2008
warmupTime : 1190


Note that the stats:   searcherName : [EMAIL PROTECTED] main line  
is the
same for both - leading me to think that this is just a display  
issue. Is

anyone else seeing this?

--Matthew





--
Regards,
Shalin Shekhar Mangar.




Re: Did you mean functionality

2008-06-19 Thread Matthew Runo

Is there any work being done on getting this into SolrJ at the moment?

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Jun 18, 2008, at 3:09 AM, Lucas F. A. Teixeira wrote:


Yeah, i read it.
Thanks a lot, I`m waiting for it!

[]s,

Lucas

Lucas Frare A. Teixeira
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Tel: +55 11 3660.1622 - R3018



Grant Ingersoll escreveu:

Also see http://wiki.apache.org/solr/SpellCheckComponent

I expect to commit fairly soon.

On Jun 17, 2008, at 5:46 PM, Otis Gospodnetic wrote:


Hi Lucas,

Have a look at (the patch in) SOLR-572, lots of work happening  
there as we speak.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 

From: Lucas F. A. Teixeira [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, June 17, 2008 4:30:12 PM
Subject: Did you mean functionality

Hello everybody,

I need to integrate the Lucene SpellChecker Contrib lib in my
applycation, but I`m using the EmbeededSolrServer to access all  
indexes.
I want to know what should I do (if someone have any step-by- 
step, link,
tutorial or smoke signal) of what I need to do during indexing,  
and of

course to search through this words generated by this API.

I can use the lib itself to search the suggestions, w/out using  
solr,

but I`m confused about how may I proceed when indexing this docs.

Thanks a lot,

[]s,

--
Lucas Frare A. Teixeira
[EMAIL PROTECTED]
Tel: +55 11 3660.1622 - R3018











Re: Did you mean functionality

2008-06-19 Thread Matthew Runo

Hmmm, good point. I had completely forgotten about that route.

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Jun 19, 2008, at 1:31 PM, Yonik Seeley wrote:

On Thu, Jun 19, 2008 at 2:07 PM, Matthew Runo [EMAIL PROTECTED]  
wrote:
Is there any work being done on getting this into SolrJ at the  
moment?


Just a note to those who may be new to SolrJ: you can still access new
or custom functionality in a generic way via getResponse() w/o
explicit SolrJ support.

-Yonik





Re: Analytics e.g. Top 10 searches

2008-06-06 Thread Matthew Runo
I'm nearly certain that everyone who maintains these stats does it  
themselves in their 'front end'. It's very easy to log terms and  
whatever else just before or after sending the query off to Solr.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Jun 6, 2008, at 3:51 AM, McBride, John wrote:



Hello,

Is anybody familiar with any SOLR-based analytical tools which would
allow us to extract top ten seaches, for example.

I imagine at the query parse level, where the query is tokenized and
filtered would be the best place to log this, due to the many
permutations possible at the user input level.

Is there an existing plugin to do this, or could you suggest how to
architect this?

Thanks,
John





Re: Announcement of Solr Javascript Client

2008-05-29 Thread Matthew Runo
Wow. This is really pretty cool. You're much further along than I  
thought you were! I'd love to see this in as an 'official' Solr client.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 29, 2008, at 8:15 AM, Matthias Epheser wrote:

The server was rebooted yesterday without my knowledge, so the jetty  
is restarted and should be reachable at http://lovo.test.dev.indoqa.com/mepheser/moobrowser/


As you can see, this first demo uses widget classes and is built  
with mootools.




Re: [SPAM] [poll] Change logging to SLF4J?

2008-05-19 Thread Matthew Runo
I just read through the dev list's thread.. and I'm voting for SLF4J  
as well.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 6, 2008, at 7:40 AM, Ryan McKinley wrote:

Hello-

There has been a long running thread on solr-dev proposing switching
the logging system to use something other then JDK logging.
http://www.nabble.com/Solr-Logging-td16836646.html
http://www.nabble.com/logging-through-log4j-td13747253.html

We are considering using http://www.slf4j.org/.  Check:
https://issues.apache.org/jira/browse/SOLR-560

The pro argument is that:
* SLFJ allows more flexibility for people using solr outside the
canned .war to configure logging without touching JDK logging.

The con argument goes something like:
* JDK logging is already is the standard logging framework.
* JDK logging is already in in use.
* SLF4J adds another dependency (for something that already works)

On the dev lists there are a strong opinions on either side, but we
would like to get a larger sampling of option and validation before
making this change.

[  ] Keep solr logging as it is.  (JDK Logging)
[  ] Use SLF4J.

As an bonus question (this time fill in the blank):
I have tried SOLR-560 with my logging system and  
___.


thanks
ryan





HTTP Version Not Supported errors?

2008-05-19 Thread Matthew Runo

Hello folks!

We're starting to see a lot of errors in Solr/SolrJ with the message  
HTTP Version Not Supported. I can't reproduce it, and it only seems  
to happen with load - if there's no one browsing our site, then we  
don't get the errors if we try browsing around ourselves. I looked  
about in the SolrJ code where it connects to the Solr server, but all  
seems well... any ideas?


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

Begin forwarded message:

From: [EMAIL PROTECTED]
Date: May 19, 2008 4:05:02 PM PDT
To: [EMAIL PROTECTED]
Subject: [Log4j] [SMTPAppender] web43 error message

2008-05-19 16:05:02,982 [ERROR] - [EMAIL PROTECTED] -  -  
service.SimpleSiteService (getBrandById:197) - Could not retrieve  
brand for ID[313]
org.apache.solr.client.solrj.SolrServerException: Invalid SOLR Query  
object

at com.zappos.domain.dao.SearchDAO.getSearch(SearchDAO.java:68)
at com.zappos.domain.dao.BrandDAO.getBrandById(BrandDAO.java:92)
	at  
com 
.zappos 
.domain 
.service.SimpleSiteService.getBrandById(SimpleSiteService.java:191)

at com.zappos.zeta.action.ViewBrand.brand(ViewBrand.java:496)
at com.zappos.zeta.action.ViewBrand.view(ViewBrand.java:286)
at sun.reflect.GeneratedMethodAccessor590.invoke(Unknown Source)
	at  
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
	at net.sourceforge.stripes.controller.DispatcherHelper 
$6.intercept(DispatcherHelper.java:458)
	at  
net 
.sourceforge 
.stripes.controller.ExecutionContext.proceed(ExecutionContext.java: 
157)
	at  
net 
.sourceforge 
.stripes 
.controller 
.BeforeAfterMethodInterceptor 
.intercept(BeforeAfterMethodInterceptor.java:107)
	at  
net 
.sourceforge 
.stripes.controller.ExecutionContext.proceed(ExecutionContext.java: 
154)
	at  
net 
.sourceforge 
.stripes.controller.ExecutionContext.wrap(ExecutionContext.java:73)
	at  
net 
.sourceforge 
.stripes 
.controller 
.DispatcherHelper.invokeEventHandler(DispatcherHelper.java:456)
	at  
net 
.sourceforge 
.stripes 
.controller 
.DispatcherServlet.invokeEventHandler(DispatcherServlet.java:241)
	at  
net 
.sourceforge 
.stripes.controller.DispatcherServlet.doPost(DispatcherServlet.java: 
154)
	at  
net 
.sourceforge 
.stripes.controller.DispatcherServlet.doGet(DispatcherServlet.java:61)

at javax.servlet.http.HttpServlet.service(Unknown Source)
at javax.servlet.http.HttpServlet.service(Unknown Source)
	at  
org 
.apache 
.catalina.core.ApplicationFilterChain.internalDoFilter(Unknown Source)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(Unknown  
Source)
	at  
net 
.sourceforge 
.stripes.controller.StripesFilter.doFilter(StripesFilter.java:180)
	at  
org 
.apache 
.catalina.core.ApplicationFilterChain.internalDoFilter(Unknown Source)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(Unknown  
Source)
	at org.apache.catalina.core.ApplicationDispatcher.invoke(Unknown  
Source)
	at  
org 
.apache.catalina.core.ApplicationDispatcher.processRequest(Unknown  
Source)
	at org.apache.catalina.core.ApplicationDispatcher.doForward(Unknown  
Source)
	at org.apache.catalina.core.ApplicationDispatcher.forward(Unknown  
Source)
	at  
org 
.tuckey 
.web 
.filters 
.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195)
	at  
org 
.tuckey 
.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159)
	at  
org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java: 
141)
	at  
org 
.tuckey 
.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java: 
90)
	at  
org 
.tuckey 
.web 
.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java: 
417)
	at  
org 
.apache 
.catalina.core.ApplicationFilterChain.internalDoFilter(Unknown Source)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(Unknown  
Source)
	at  
com 
.zappos 
.zeta.plumbing.SSLRedirectFilter.doFilter(SSLRedirectFilter.java:57)
	at  
org 
.apache 
.catalina.core.ApplicationFilterChain.internalDoFilter(Unknown Source)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(Unknown  
Source)
	at org.apache.catalina.core.StandardWrapperValve.invoke(Unknown  
Source)
	at org.apache.catalina.core.StandardContextValve.invoke(Unknown  
Source)
	at  
org.apache.catalina.authenticator.AuthenticatorBase.invoke(Unknown  
Source)

at org.apache.catalina.core.StandardHostValve.invoke(Unknown Source)
at org.apache.catalina.valves.ErrorReportValve.invoke(Unknown Source)
	at  
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 
563)
	at org.apache.catalina.core.StandardEngineValve.invoke(Unknown  
Source)

at org.apache.catalina.ha.tcp.ReplicationValve.invoke(Unknown Source)
	at  
org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(Unknown  
Source)
	at org.apache.catalina.connector.CoyoteAdapter.service(Unknown  
Source)
	at org.apache.coyote.http11

Re: Release date of SOLR 1.3

2008-05-14 Thread Matthew Runo
There isn't a specific date so far, but I'd like to say that only once  
in the year or so I've been working with the SVN head build of Solr  
have I noticed a bug get committed. And it was fixed very quickly once  
it was found.. I think if you need to have development features you're  
probably safe to use the SVN head, but remember that it is dev, and  
you should *always* test new builds before actually using them =p


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 14, 2008, at 9:08 AM, Umar Shah wrote:

Hi,

I'm using the latest trunk code from SOLR .
I am basically using function queries (sum, product, scale) for my  
project

which are not present in 1.2.
I wanted to know if there is some decided date for release of Solr1.3.
If the date is far/ not decide, what should be the best practice to  
adopt

the above mentioned feature while not compromising on stability of the
system.

thanks
-umar




Re: Multiple open SegmentReaders?

2008-05-02 Thread Matthew Runo
Hah, thank you for doing this. Sometimes I see MultiSegmentReaders,  
sometimes SegmentReaders, so both show up from time to time. Right now  
we've got two MultiSegmentReaders open..


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 1, 2008, at 7:19 PM, Koji Sekiguchi wrote:

I can reproduce with solr/example setup.
What I did:

1. $ svn co http://svn.apache.org/repos/asf/lucene/solr/trunk TEMP
2. $ cd TEMP
3. $ ant clean example
4. $ cd example
5. $ java -jar start.jar

(to post commit)
6. $ cd $SOLR_HOME/example/exampledocs
7. $ ./post.sh

then see adminstatistics. I can see MultiSegmentReader instead of
SegmentReader, though.

name:  [EMAIL PROTECTED] main class:  
org.apache.solr.search.SolrIndexSearcher version: 1.0  
description: index searcher stats: caching : true

numDocs : 0
maxDoc : 0
readerImpl : MultiSegmentReader
readerDir : [EMAIL PROTECTED]:\Project\jakarta 
\lucene\solr\TEMP\example\solr\data\index

indexVersion : 1209693930226
openedAt : Fri May 02 11:05:30 JST 2008
registeredAt : Fri May 02 11:05:30 JST 2008
 name: [EMAIL PROTECTED] main class:  
org.apache.solr.search.SolrIndexSearcher version: 1.0  
description: index searcher stats: caching : true

numDocs : 0
maxDoc : 0
readerImpl : MultiSegmentReader
readerDir : [EMAIL PROTECTED]:\Project\jakarta 
\lucene\solr\TEMP\example\solr\data\index

indexVersion : 1209693930226
openedAt : Fri May 02 11:06:13 JST 2008
registeredAt : Fri May 02 11:06:13 JST 2008

Koji


Yonik Seeley wrote:

Hmmm, if there is a bug, odds are it's due to multicore stuff  -
probably nothing else has touched core stuff like that recently.
Can you reproduce (or rather help others to reproduce) with the
solr/example setup?

-Yonik

On Wed, Apr 30, 2008 at 5:39 PM, Matthew Runo [EMAIL PROTECTED]  
wrote:



Hello!

In using the SVN head version of Solr, I've found that recently we  
started

getting multiple open SegmentReaders, all registered... etc..

Any ideas why this would happen? They don't go away unless the  
server is
restarted, and don't go away with commits, etc. In fact, commits  
seem to

cause the issue. They're causing issues since it causes really stale
searchers to be around...

For example, right now...
org.apache.solr.search.SolrIndexSearcher
caching : true
numDocs : 153312
maxDoc : 153324
readerImpl : SegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944085143
openedAt : Wed Apr 30 14:04:15 PDT 2008
registeredAt : Wed Apr 30 14:04:15 PDT 2008

(and right below that one...)
org.apache.solr.search.SolrIndexSearcher
caching : true
numDocs : 153312
maxDoc : 153324
readerImpl : SegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944085143
openedAt : Wed Apr 30 14:30:02 PDT 2008
registeredAt : Wed Apr 30 14:30:02 PDT 2008

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833












Re: Zappos's new Solr Site

2008-05-02 Thread Matthew Runo
We have a dedicated server set up as the master, with it's own local  
index. We have an NFS mount (read-only) on each of the other machines  
which the master copies it's index to every 20 minutes. We run a  
commit on each slave then to force them to open new readers. So far,  
it's worked fine. I would suggest having the reading and writing done  
to different indexes though, it makes it easier when you can have a  
read-only NFS mounted index (no chance of another server updating it  
at all).


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On May 2, 2008, at 6:41 AM, Alok Dhir wrote:

Hey Matt - congratulations on your new site -- it looks great.

I'm curious, after a few weeks of having run this way, what your  
findings are regarding running the shared index on NFS.  Any  
problems as of yet?


I assume you're indexing from one machine and calling 'commit' on  
the others on some schedule to get them to 'see' changes.


How is that working out for you?

---
Alok K. Dhir
[EMAIL PROTECTED]
Symplicity Corporation
1 703 351 0200 x 8080
www.symplicity.com

On Apr 11, 2008, at 1:35 PM, Matthew Runo wrote:


Hello folks!

First, the link: https://zeta.zappos.com (it's a very early open  
beta... we're just very proud of everyone's work and wanted to  
share it with you all)


We've been working on a new site here at Zappos for about the last  
7 months, with planning going back almost two years. We looked at  
Endeca, we looked at Fast, we looked at so many commercial  
search engine technologies in that time that I can't even remember  
them all. We ended up choosing Solr, and not just because it's  
free. Solr has a truly wonderful group of users here who respond to  
support questions far faster than most paid support contracts. I've  
never had a question that I couldn't get answered on this list, no  
matter how stupid it's been (sorry Hoss!) =p


Zappos has a long history of using open source technologies to  
drive their business, and have used Apache 1.3 + Perl 5 for the  
past 8 years. Our new site is written in Java, and is really built  
around our Solr index. Solr powers all the navigation and facets,  
as well as the brand list and brand pages. One of the issues with  
our old site was how database heavy it was, with some pages  
generating 100s of queries. Zeta is much better in this regard, and  
we really think Solr is going to serve us very well.


Here's some stats on our Solr index...  158,821 documents in about  
2 gigs of disk space, running in Tomcat 6 with 10 gigs of ram set  
aside. We have 5 servers clustered together, and each runs an  
instance of zeta.zappos.com and a local copy of solr. For now, each  
of these servers reads from a single Solr index stored on NFS -  
we'll see how this works out, and are prepared to store a local  
copy of the index on each server.


Thanks, and we'd love any feedback on the new site (keep in mind,  
some parts of it aren't quite done).


Matthew Runo
Software Developer
Zappos.com
702.943.7833







Multiple open SegmentReaders?

2008-04-30 Thread Matthew Runo

Hello!

In using the SVN head version of Solr, I've found that recently we  
started getting multiple open SegmentReaders, all registered... etc..


Any ideas why this would happen? They don't go away unless the server  
is restarted, and don't go away with commits, etc. In fact, commits  
seem to cause the issue. They're causing issues since it causes really  
stale searchers to be around...


For example, right now...
org.apache.solr.search.SolrIndexSearcher
caching : true
numDocs : 153312
maxDoc : 153324
readerImpl : SegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944085143
openedAt : Wed Apr 30 14:04:15 PDT 2008
registeredAt : Wed Apr 30 14:04:15 PDT 2008

(and right below that one...)
org.apache.solr.search.SolrIndexSearcher
caching : true
numDocs : 153312
maxDoc : 153324
readerImpl : SegmentReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1205944085143
openedAt : Wed Apr 30 14:30:02 PDT 2008
registeredAt : Wed Apr 30 14:30:02 PDT 2008

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833



Zappos's new Solr Site

2008-04-11 Thread Matthew Runo

Hello folks!

First, the link: https://zeta.zappos.com (it's a very early open  
beta... we're just very proud of everyone's work and wanted to share  
it with you all)


We've been working on a new site here at Zappos for about the last 7  
months, with planning going back almost two years. We looked at  
Endeca, we looked at Fast, we looked at so many commercial search  
engine technologies in that time that I can't even remember them all.  
We ended up choosing Solr, and not just because it's free. Solr has a  
truly wonderful group of users here who respond to support questions  
far faster than most paid support contracts. I've never had a question  
that I couldn't get answered on this list, no matter how stupid it's  
been (sorry Hoss!) =p


Zappos has a long history of using open source technologies to drive  
their business, and have used Apache 1.3 + Perl 5 for the past 8  
years. Our new site is written in Java, and is really built around our  
Solr index. Solr powers all the navigation and facets, as well as the  
brand list and brand pages. One of the issues with our old site was  
how database heavy it was, with some pages generating 100s of queries.  
Zeta is much better in this regard, and we really think Solr is going  
to serve us very well.


Here's some stats on our Solr index...  158,821 documents in about 2  
gigs of disk space, running in Tomcat 6 with 10 gigs of ram set aside.  
We have 5 servers clustered together, and each runs an instance of  
zeta.zappos.com and a local copy of solr. For now, each of these  
servers reads from a single Solr index stored on NFS - we'll see how  
this works out, and are prepared to store a local copy of the index on  
each server.


Thanks, and we'd love any feedback on the new site (keep in mind, some  
parts of it aren't quite done).


Matthew Runo
Software Developer
Zappos.com
702.943.7833



Re: Wildcard search + case insensitive

2008-04-02 Thread Matthew Runo
Hmm. I'd like the ability to turn on or off in the config case  
sensitivity... I'm looking forward to this patch.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Apr 2, 2008, at 5:48 AM, Tim Mahy wrote:


Hi all,

I already found the answer to my question on the following blog : 
http://michaelkimsal.com/blog/2007/04/solr-case-sensitivty/

greetings,
Tim


-Oorspronkelijk bericht-
Van: Tim Mahy [mailto:[EMAIL PROTECTED]
Verzonden: wo 2-4-2008 13:19
Aan: solr-user@lucene.apache.org
Onderwerp: Wildcard search + case insensitive

Hi all,

I use this type definition in my schema.xml :

   fieldtype name=exactText class=solr.TextField  
positionIncrementGap=100

 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldtype

When I have a document with the term demo in it and I search for  
dem* , I receive the document back from Solr, but when I search on  
Dem* I don't get the document.


Is the LowerCaseFilterFactory not executed when a wildcard search is  
being performed ?


Greetings,
Tim




Info Support - http://www.infosupport.com

Alle informatie in dit e-mailbericht is onder voorbehoud. Info  
Support is op geen enkele wijze aansprakelijk voor vergissingen of  
onjuistheden in dit bericht en staat niet in voor de juiste en  
volledige overbrenging van de inhoud hiervan. Op al de werkzaamheden  
door Info Support uitgevoerd en op al de aan ons gegeven opdrachten  
zijn - tenzij expliciet anders overeengekomen - onze Algemene  
Voorwaarden van toepassing, gedeponeerd bij de Kamer van Koophandel  
te Utrecht onder nr. 30135370. Een exemplaar zenden wij u op uw  
verzoek per omgaande kosteloos toe.


De informatie in dit e-mailbericht is uitsluitend bestemd voor de  
geadresseerde. Gebruik van deze informatie door anderen is verboden.  
Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking  
van deze informatie aan derden is niet toegestaan.


Dit e-mailbericht kan vertrouwelijke informatie bevatten. Indien u  
dit bericht dus per ongeluk ontvangt, stelt Info Support het op  
prijs als u de zender door een antwoord op deze e-mail hiervan op de  
hoogte brengt en deze e-mail vervolgens vernietigt.







Re: Tomcat 6.0 solr home not set (solved)

2008-03-19 Thread Matthew Runo

Go for it!

Matthew Runo

On Mar 19, 2008, at 12:17 PM, David Arpad Geller wrote:


Hallelujah!

So, it's clear to me that neither the Tomcat docs or the Solr/Tomcat  
wiki page is completely clear on this topic.  Specifically, the  
parts about:
a) the way to specify webapps using Catalina/localhost/webapp.xml  
(and how it relates to solr)
b) the need for a solr home directory and what that is for / what  
that means

c) a general desire not to run a nightly build version

Hopefully this thread will serve others but perhaps the wiki could  
be updated?  I'd be happy to provide changes to the page and provide  
it or make the update myself if allowed if you all agree.


David

Jayson Minard wrote:

I'll take the Tomcat question first:

--- snip ---

Also, the Tomcat page for Context says:
-
You may define as many *Context* elements as you wish. Each such  
Context
MUST have a unique context path. In addition, a Context must be  
present
with a context path equal to a zero-length string. This Context  
becomes

the /default/ web application for this virtual host, and is used to
process all requests that do not match any other Context's context  
path.

--
Which also isn't clear to me.  ...context path equal to a zero- 
length
string?  I guess I'm misunderstanding what context path is.  It  
seems
to me that this describes localhost/solr.xml.  Am I missing  
something

here?

--- end snip ---

It is just saying there must be at least one blank context which is
the root / URL for the Tomcat server.  It is already defined and  
you

can ignore this unless you start deleting other contexts defined
elsewhere.  So pretend you did not read that at all and you'll be
dandy!

The NoClassDefFoundError is an odd one.  You are running JDK 1.6 (not
JDK 1.5) and Tomcat 6 so base classes should be present, and the WAR
contains everything else.

Did you modify the solrconfig.xml file, possibly change any class
names in there that are referenced?  Or in your schema point to a
class that does not exist?  Something there might cause a failure
during that part of the loading.

Or you are not pointing to the right solr home.  In fact, your Solr
home looks wrong and is the likely culprit.  It should point to your
own directory that you created that contains a copy of the conf
directory from the example deployment.

mkdir solr-data
cd solr-data
mkdir conf
cd conf
cp -R /usr/local/apache-solr-1.2.0/example/solr/conf/* .

then set the /solr/home to this new solr-data directory (which now
contains the conf directory)

--j

--j

On Wed, Mar 19, 2008 at 11:37 AM, David Arpad Geller
[EMAIL PROTECTED] wrote:


So it seems that I got Tomcat to recognize where solr is with this
conf/Catalina/localhost/solr.xml:

Context
  docBase=/usr/local/apache-solr-1.2.0/dist/apache-solr-1.2.0.war

  debug=0
  crossContext=true 

  Environment name=solr/home
   value=/usr/local/apache-solr-1.2.0

   type=java.lang.String
   override=true /
/Context

But still, there's still some problem (see below).  Thank you for  
all of

the help, it's good stuff to know.

Also, the Tomcat page for Context says:
-
You may define as many *Context* elements as you wish. Each such  
Context
MUST have a unique context path. In addition, a Context must be  
present
with a context path equal to a zero-length string. This Context  
becomes

the /default/ web application for this virtual host, and is used to
process all requests that do not match any other Context's context  
path.

--
Which also isn't clear to me.  ...context path equal to a zero- 
length
string?  I guess I'm misunderstanding what context path is.  It  
seems
to me that this describes localhost/solr.xml.  Am I missing  
something

here?


---

INFO: HTMLManager: start: Starting web application at '/solr'
Mar 19, 2008 2:07:27 PM org.apache.solr.servlet.SolrDispatchFilter  
init

INFO: SolrDispatchFilter.init()
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config getInstanceDir
INFO: Using JNDI solr.home: /usr/local/apache-solr-1.2.0
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to '/usr/local/apache-solr-1.2.0/'
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config getClassLoader
INFO: Adding
'file:/usr/local/apache-solr-1.2.0/lib/commons-csv-0.1- 
SNAPSHOT.jar' to

Solr classloader
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config getClassLoader
INFO: Adding
'file:/usr/local/apache-solr-1.2.0/lib/lucene- 
highlighter-2007-05-20_00-04-53.jar'

to Solr classloader
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config getClassLoader
INFO: Adding
'file:/usr/local/apache-solr-1.2.0/lib/lucene- 
analyzers-2007-05-20_00-04-53.jar'

to Solr classloader
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config getClassLoader
INFO: Adding 'file:/usr/local/apache-solr-1.2.0/lib/easymock.jar' to
Solr classloader
Mar 19, 2008 2:07:27 PM org.apache.solr.core.Config getClassLoader
INFO: Adding
'file

Re: Dedup results on the fly?

2008-02-27 Thread Matthew Runo

I was going to ask the same thing, I'd support this in 1.3.

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 27, 2008, at 12:29 PM, Alok Dhir wrote:


is this going to go into the 1.3 tree at some point?

On Feb 27, 2008, at 3:25 PM, Sean Timm wrote:

Take a look at https://issues.apache.org/jira/browse/SOLR-236 Field  
Collapsing.


-Sean

Head wrote:
I would like to be able to tell SOLR to dedup the results based on  
a certain
set of fields.   For example, I like to return only one instance  
of the set
of documents that have the same 'name' and 'address'.   But I  
would still
like to keep all instances around in case someone wants to  
retrieve one of

the duplicate instances by ID.

Is there some way to do something like this... maybe with a custom
Comparator???   Has anyone attempted to do this?









Re: Shared index base

2008-02-26 Thread Matthew Runo
We're about to do the same thing here, but have not tried yet. We  
currently run Solr with replication across several servers. So long as  
only one server is doing updates to the index, I think it should work  
fine.



Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 7:51 AM, Evgeniy Strokin wrote:

I know there was such discussions about the subject, but I want to  
ask again if somebody could share more information.
We are planning to have several separate servers for our search  
engine. One of them will be index/search server, and all others are  
search only.
We want to use SAN (BTW: should we consider something else?) and  
give access to it from all servers. So all servers will use the same  
index base, without any replication, same files.
Is this a good practice? Did somebody do the same? Any problems  
noticed? Or any suggestions, even about different configurations are  
highly appreciated.


Thanks,
Gene




Re: Shared index base

2008-02-26 Thread Matthew Runo
I hope so. I've found that every once in a while Solr 1.2 replication  
will die, from a temp-index file that seems to ham it up. Removing  
that file on all the servers fixes the issue though.


We'd like to be able to point all the servers at an NFS location for  
their index files, and use a single server to update it.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 9:39 AM, Alok Dhir wrote:

Are you saying all the servers will use the same 'data' dir?  Is  
that a supported config?


On Feb 26, 2008, at 12:29 PM, Matthew Runo wrote:

We're about to do the same thing here, but have not tried yet. We  
currently run Solr with replication across several servers. So long  
as only one server is doing updates to the index, I think it should  
work fine.



Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 7:51 AM, Evgeniy Strokin wrote:

I know there was such discussions about the subject, but I want to  
ask again if somebody could share more information.
We are planning to have several separate servers for our search  
engine. One of them will be index/search server, and all others  
are search only.
We want to use SAN (BTW: should we consider something else?) and  
give access to it from all servers. So all servers will use the  
same index base, without any replication, same files.
Is this a good practice? Did somebody do the same? Any problems  
noticed? Or any suggestions, even about different configurations  
are highly appreciated.


Thanks,
Gene








Re: Shared index base

2008-02-26 Thread Matthew Runo
That's true about the commit issue. With that in mind, it might be  
better to use replication - just keep an eye on it to ensure it's  
working, as my 1.2 install (3 servers) tends to stop every once in a  
blue moon.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 10:53 AM, Walter Underwood wrote:


SAN is not NFS. I would expect SAN to be fast.

wunder

On 2/26/08 10:47 AM, Jae Joo [EMAIL PROTECTED] wrote:



In my environment, there is NO big difference between local disk  
and SAN based

file system.
A little slow down, but not a problem (1 or 2 %)
I do have 4 sets of solr indices each has more than 10G in 3 servers.
I think that it is not good way to share SINGLE Index. - disk is  
pretty cheap

and we can add more disk in SAN pretty easily.
I have another server which is called Master with local disk  
based Solr

Index to update the index.
By some accident or time out, the update is not done successfully,  
so I do

need to do something by manually.
If you have only one index, there is a risk to mess up the index.

Thanks,

Jae


-Original Message-
From: Walter Underwood [mailto:[EMAIL PROTECTED]
Sent: Tue 2/26/2008 1:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Shared index base

I saw a 100X slowdown running with indexes on NFS.

I don't understand going through a lot of effort with unsupported
configurations just to share an index. Local disk is cheap, the
snapshot stuff works well, and local discs avoid a single point
of failure.

The testing time to make a shared index work with each new
release of Solr is almost certainly more expensive than buying
local disc.

The single point of failure is real issue. I've seen two discs
fail on one RAID. When that happens, you've lost all of your
search for hours or days.

Finally, how do you tell Solr that the index has changed and
it needs a new Searcher? Normally, that is a commit, but you
don't want to commit from a read-only Solr.

wunder

On 2/26/08 10:17 AM, Matthew Runo [EMAIL PROTECTED] wrote:

I hope so. I've found that every once in a while Solr 1.2  
replication
will die, from a temp-index file that seems to ham it up.  
Removing

that file on all the servers fixes the issue though.

We'd like to be able to point all the servers at an NFS location for
their index files, and use a single server to update it.

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 9:39 AM, Alok Dhir wrote:


Are you saying all the servers will use the same 'data' dir?  Is
that a supported config?

On Feb 26, 2008, at 12:29 PM, Matthew Runo wrote:


We're about to do the same thing here, but have not tried yet. We
currently run Solr with replication across several servers. So  
long
as only one server is doing updates to the index, I think it  
should

work fine.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 7:51 AM, Evgeniy Strokin wrote:

I know there was such discussions about the subject, but I want  
to

ask again if somebody could share more information.
We are planning to have several separate servers for our search
engine. One of them will be index/search server, and all others
are search only.
We want to use SAN (BTW: should we consider something else?) and
give access to it from all servers. So all servers will use the
same index base, without any replication, same files.
Is this a good practice? Did somebody do the same? Any problems
noticed? Or any suggestions, even about different configurations
are highly appreciated.

Thanks,
Gene

















-XX:+UseLargePages ?

2008-02-26 Thread Matthew Runo

Hello!

I was wondering if there is any impact on using the LargePages JVM  
setting with Solr. Has anyone used this? Does it help performance?  
Hurt it?


We have several 64 bit servers with 16G of ram each, and were  
wondering if we should be using the -XX:+UseLargePages setting on the  
JVM for solr/tocmat.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833



protwords | synonyms | elevator conf files

2008-02-25 Thread Matthew Runo

Hello!

All these configuration files seem like they could be stored in a  
database just as well as they are stored in the file structure.  
Specifically the new elevator handler (which looks to be exactly what  
I needed, thanks!!) would be more useful if it could get its  
configuration from a database.


Has anyone thought about linking these conf files into a database?  
Currently I'm dumping the DB out to the file structure and restarting  
solr to read in the changes - is there a better way? One that doesn't  
clear all the caches, perhaps?


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833



Get Config / Schema, 1.3-dev Broken?

2008-02-08 Thread Matthew Runo

Hello!

Recently, using the latest SVN code, it seems that the links to view  
the schema  config files have been broken.


Urls such as /solr/admin/file/?file=solrconfig.xml result in a 404  
error. Has anyone else noticed this behavior? I just wanted to point  
it out if so.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833



Re: sorlj search

2008-02-06 Thread Matthew Runo
There really isn't any detailed documentation on SolrJ just yet. I was  
able to guess my way through using it based on method names and so  
forth, and you can generate javadoc via ant if you get the source from  
SVN.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 5, 2008, at 9:21 PM, Tevfik Kiziloren wrote:



Hi. I'm a newbie. I need to develop a jsf based search application  
by using
solr. I found nothing about soljava imlementation except simple  
example on
the solr wiki. When I tried a console program that similar in the  
example at

solr wiki, I got the exception below. Where can i find an extensive
documentation about solrj?

Thanks in advance.
Tevfik Kızılören.

try {
   String url = http://localhost:8080/solr;;
   SolrServer server = new CommonsHttpSolrServer(url);
   SolrQuery query = new SolrQuery();
   query.setQuery(solr);
   System.out.println(query.toString());
   QueryResponse rsp = server.query(query);
   System.out.println(rsp.getResults().toString());

   } catch (IOException ex) {

Logger.getLogger(SolrclientView.class.getName()).log(Level.SEVERE,  
null,

ex);
   } catch (SolrServerException ex) {

Logger.getLogger(SolrclientView.class.getName()).log(Level.SEVERE,  
null,

ex);
   }


---
solrclient.SolrclientView jButton1ActionPerformed
SEVERE: null
org.apache.solr.client.solrj.SolrServerException: Error executing  
query

   at
org 
.apache 
.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
   at  
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:96)

   at
solrclient 
.SolrclientView.jButton1ActionPerformed(SolrclientView.java:229)

   at solrclient.SolrclientView.access$800(SolrclientView.java:32)
   at
solrclient.SolrclientView$4.actionPerformed(SolrclientView.java:135)
   at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java: 
1995)

   at
javax.swing.AbstractButton 
$Handler.actionPerformed(AbstractButton.java:2318)

   at
javax 
.swing 
.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)

   at
javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
   at
javax 
.swing 
.plaf 
.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:236)

   at java.awt.Component.processMouseEvent(Component.java:6038)
   at javax.swing.JComponent.processMouseEvent(JComponent.java: 
3265)

   at java.awt.Component.processEvent(Component.java:5803)
   at java.awt.Container.processEvent(Container.java:2058)
   at java.awt.Component.dispatchEventImpl(Component.java:4410)
   at java.awt.Container.dispatchEventImpl(Container.java:2116)
   at java.awt.Component.dispatchEvent(Component.java:4240)
   at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4322)
   at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:3986)
   at  
java.awt.LightweightDispatcher.dispatchEvent(Container.java:3916)

   at java.awt.Container.dispatchEventImpl(Container.java:2102)
   at java.awt.Window.dispatchEventImpl(Window.java:2429)
   at java.awt.Component.dispatchEvent(Component.java:4240)
   at java.awt.EventQueue.dispatchEvent(EventQueue.java:599)
   at
java 
.awt 
.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java: 
273)

   at
java 
.awt 
.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:183)

   at
java 
.awt 
.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java: 
173)

   at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:168)
   at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:160)
   at java.awt.EventDispatchThread.run(EventDispatchThread.java: 
121)

Caused by: org.apache.solr.common.SolrException: parsing error
   at
org 
.apache 
.solr 
.client 
.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java: 
138)

   at
org 
.apache 
.solr 
.client 
.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java: 
99)

   at
org 
.apache 
.solr 
.client 
.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 
317)

   at
org 
.apache 
.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:84)

   ... 29 more
Caused by: java.lang.RuntimeException: this must be known type! not:  
int

   at
org 
.apache 
.solr 
.client 
.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java: 
217)

   at
org 
.apache 
.solr 
.client 
.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java: 
235)

   at
org 
.apache 
.solr 
.client 
.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java: 
123)

--
View this message in context: 
http://www.nabble.com/sorlj-search-tp15305698p15305698.html
Sent from the Solr - User

Re: i think it is time to release new solr version

2008-01-28 Thread Matthew Runo
If you would like to work with Lucene 2.3.0, it's been updated in SVN.  
I've been using the SVN head version for a while, and it's quite  
stable (only thing is conf files and apis change from time to time).


You can get the latest by doing..

svn co http://svn.apache.org/repos/asf/lucene/solr/trunk solr
cd solr
ant dist

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Jan 28, 2008, at 8:06 AM, Traut wrote:


+ 1
Looking forward to get new release version :)

On Jan 28, 2008 6:01 AM, j. L [EMAIL PROTECTED] wrote:


because lucene 2.3.0 today released..



--
regards
j.L





--
Best regards,
Traut




runs vs. running - Query time vs Index Time stemming

2008-01-24 Thread Matthew Runo

Hello folks..

I'm seeing something that makes total sense to me, but the pointy  
haired bosses don't like it, so I've gotta come up with a solution. We  
search a pretty standard product catalog, and due to stemming a search  
for running shoes matches things with Runs 1/2 a size large in the  
product description. I've tried tweaking the Query / Index time  
settings, below, but I still get the stemming. Any ideas on how I can  
make running not match runs in product descriptions, while still  
keeping the words run, runs, running... searchable in the  
product descriptions (just not stemming on them).


Here's my field config...

fieldType name=text class=solr.TextField  
positionIncrementGap=100

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=false/
filter class=solr.StopFilterFactory  
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/
filter  
class=solr.RemoveDuplicatesTokenFilterFactory/

filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory  
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=0 generateNumberParts=1
catenateWords=0 catenateNumbers=0  
catenateAll=0 splitOnCaseChange=1/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/
filter  
class=solr.RemoveDuplicatesTokenFilterFactory/

filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

I don't think I can use stopwords, because I need to be able to search  
on all of these words, just not match runs when they search  
running. In most cases the other stemming is fine, and if possible  
I'd like to not completely turn it off. That is, however, an option.  
It seems to be a solvable problem though - any ideas would be greatly  
appreciated.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833



Re: Cannot start SVN head Solr 1.3

2007-12-28 Thread Matthew Runo
(HostConfig.java: 
714)
	at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
490)

at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138)
	at  
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)
	at  
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
	at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1053)

at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
	at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1045)
	at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443)
	at  
org.apache.catalina.core.StandardService.start(StandardService.java:516)
	at org.apache.catalina.core.StandardServer.start(StandardServer.java: 
710)

at org.apache.catalina.startup.Catalina.start(Catalina.java:566)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at  
sun 
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 
39)
	at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

Dec 28, 2007 8:55:29 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
Dec 28, 2007 8:55:29 AM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init()
Dec 28, 2007 8:55:29 AM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init() done
Dec 28, 2007 8:55:29 AM org.apache.solr.servlet.SolrUpdateServlet init
INFO: SolrUpdateServlet.init() done
Dec 28, 2007 8:55:29 AM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
Dec 28, 2007 8:55:29 AM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
Dec 28, 2007 8:55:29 AM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/35  config=null
Dec 28, 2007 8:55:29 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 2402 ms


Thanks!

Matthew Runo
Software Developer
702.943.7833

On Dec 27, 2007, at 4:20 PM, Ryan McKinley wrote:

The XppUpdateRequestHandler was removed this afternoon... make sure  
your sorlconfig.xml does not include:


 requestHandler name=/update/xpp  
class=solr.XppUpdateRequestHandler /



holler if you have problems!

ryan


Matthew Runo wrote:

Hello!
I'm having a horrible time getting the current svn head build of  
Solr to run. I even remembered to do an 'ant clean' this time.. but  
no luck.
I have set up solr_home via a JAVA_OPTS flag, and am using Tomcat  
6...

[EMAIL PROTECTED]:/opt/tomcat]$ echo $JAVA_OPTS
-Dsolr.solr.home=/opt/solr
Here is the entire deployment log from Tomcat. I also picked out  
the errors (right here)..
SEVERE: org.apache.solr.common.SolrException: Error loading class  
'solr.XppUpdateRequestHandler'
   at  
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java: 
206) at  
org 
.apache 
.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java: 
211) at  
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java: 
83) at org.apache.solr.core.RequestHandlers 
$1.create(RequestHandlers.java:152)
   at org.apache.solr.core.RequestHandlers 
$1.create(RequestHandlers.java:137)
   at  
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java: 
140) at  
org 
.apache 
.solr 
.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java: 
169) at org.apache.solr.core.SolrCore.init(SolrCore.java:329)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
85)... SEVERE: Could not start SOLR. Check solr/home property
org.apache.solr.common.SolrException: Unknown Search Component:  
org.apache.solr.handler.component.QueryComponent
   at  
org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:507)
   at  
org.apache.solr.handler.SearchHandler.inform(SearchHandler.java:115)
   at  
org 
.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java: 
243)

   at org.apache.solr.core.SolrCore.init(SolrCore.java:350)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:85)
Solr home, /opt/solr, is, for sure, good - unless something has  
changed in the last few SVN commits. I didn't see any changes.

Any help would be most appreciated.
---
INFO: Deploying web application archive solr.war
Dec 27, 2007 4:00:01 PM org.apache.solr.servlet.SolrDispatchFilter  
init

INFO: SolrDispatchFilter.init()
Dec 27, 2007 4:00:01 PM org.apache.solr.core.SolrResourceLoader  
locateInstanceDir

INFO: Using JNDI solr.home: /opt/solr
Dec 27, 2007 4:00:01 PM org.apache.solr.servlet.SolrDispatchFilter  
init

INFO: looking for multicore.xml: /opt/solr/multicore.xml
Dec

Cannot start SVN head Solr 1.3

2007-12-27 Thread Matthew Runo
)
	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
771)
	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
525)
	at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java: 
825)
	at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java: 
714)
	at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
490)

at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206)
	at  
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
293)
	at  
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
	at  
org 
.apache 
.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337)
	at org.apache.catalina.core.ContainerBase 
$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
	at org.apache.catalina.core.ContainerBase 
$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
	at org.apache.catalina.core.ContainerBase 
$ContainerBackgroundProcessor.run(ContainerBase.java:1590)

at java.lang.Thread.run(Thread.java:619)
Dec 27, 2007 4:00:01 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Unknown Search  
Component: org.apache.solr.handler.component.QueryComponent

at org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:507)
at org.apache.solr.handler.SearchHandler.inform(SearchHandler.java:115)
	at  
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java: 
243)

at org.apache.solr.core.SolrCore.init(SolrCore.java:350)
	at  
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:85)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
275)
	at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
	at  
org 
.apache.catalina.core.StandardContext.filterStart(StandardContext.java: 
3696)
	at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4343)
	at  
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
771)
	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
525)
	at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java: 
825)
	at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java: 
714)
	at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
490)

at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206)
	at  
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
293)
	at  
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
	at  
org 
.apache 
.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337)
	at org.apache.catalina.core.ContainerBase 
$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
	at org.apache.catalina.core.ContainerBase 
$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
	at org.apache.catalina.core.ContainerBase 
$ContainerBackgroundProcessor.run(ContainerBase.java:1590)

at java.lang.Thread.run(Thread.java:619)

Dec 27, 2007 4:00:01 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
Dec 27, 2007 4:00:01 PM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init()
Dec 27, 2007 4:00:01 PM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init() done
Dec 27, 2007 4:00:01 PM org.apache.solr.servlet.SolrUpdateServlet init
INFO: SolrUpdateServlet.init() done




Thanks!

Matthew Runo
Software Developer
702.943.7833



Re: SOLR X FAST

2007-12-11 Thread Matthew Runo

I think it all depends, what do you want out of Solr or FAST?

Thanks!

Matthew Runo
Software Developer
702.943.7833

On Dec 11, 2007, at 2:09 PM, William Silva wrote:


Hi,
How is the best way to compare SOLR and FAST Search ?
Thanks,
William.




SolrJ and MoreLikeThis / Spellchecker

2007-12-07 Thread Matthew Runo

Hello!

Please forgive my newbie question about SolrJ, but I was unable to  
find my answer in the SOLRJ source code or the wiki (I'll add it if  
someone helps).


Would anyone be so kind as to provide a quick example of using the  
Spellcheck handler and the MoreLikeThis handler with SOLR-J?


The response format for it is so different, I'm not quite sure that my  
normal way of looping through the result docs would work - there are  
no docs in the XML (spellchecker included, as a sample).


response
lst name=responseHeader
int name=status0/int
int name=QTime22/int
/lst
str name=wordsruning/str
str name=existfalse/str
arr name=suggestions
strRunning/str
/arr
/response

Thanks!

Matthew Runo
Software Developer
702.943.7833



Re: SolrJ and MoreLikeThis / Spellchecker

2007-12-07 Thread Matthew Runo
I'll give it a try. Seems like the Spellcheck response type is pretty  
basic.


Thanks!

Matthew Runo
Software Developer
702.943.7833

On Dec 7, 2007, at 11:23 AM, Ryan McKinley wrote:


Matthew Runo wrote:

Hello!
Please forgive my newbie question about SolrJ, but I was unable to  
find my answer in the SOLRJ source code or the wiki (I'll add it if  
someone helps).
Would anyone be so kind as to provide a quick example of using the  
Spellcheck handler and the MoreLikeThis handler with SOLR-J?


With spellcheck, you will be in new water (I think)... you can get  
the response as a NamedList, but there is not anything that puts  
that into user friendly functions.


SolrQuery q = new SolrQuery( foo );
q.setQueryType( spelling );
q.set( anyparam, value );
QueryResponse rsp = solr.query( q )

NamedList nl = rsp.getResponse();

you will have to pick stuff out of the NamedList manually.  If you  
want to contribute a SpellCheckRequest/Response that would be great!


For MLT, the standard QueryRequest should work.  in 1.3-dev, both  
standard and dismax support mlt queries.  Perhaps we should add  
getters and setters to SolrQuery so you don't have to call:

q.set( MoreLikeThisParams.MLT, true );


ryan





Re: Tomcat6 env-entry

2007-12-05 Thread Matthew Runo

Ok, I updated it. I hope it makes sense =\

I'm not really familiar enough with the Context changes to add those.  
If someone else would be so kind as to add the other way, it'd be  
much appreciated.


http://wiki.apache.org/solr/SolrTomcat

--Matthew

On Dec 5, 2007, at 9:31 AM, Erick Erickson wrote:

The beautiful thing about a wiki is that *anybody* can update them.  
It's

especially useful if someone who's just struggled through the issues
can write something up since the pain is still fresh G. Especially
if you're better than I am about writing things down

All of which leads me to ask if you're willing to volunteer. You  
have to

create an ID, but that's all.

Best
Erick

On Dec 5, 2007 12:05 PM, Matthew Runo [EMAIL PROTECTED] wrote:


I found that the JNDI settings for Tomcat6 were hard to figure out.
Would someone be willing to write it up for the wiki? Since I think
most people getting started with SOLR will be using Tomcat6 (or
Jetty), it would make sense to update the docs a bit to make it  
easier

to figure out the proper place and way to set all this up.

Even just a link to this thread in some archive would help.

--Matthew

On Dec 5, 2007, at 1:57 AM, Erik Hatcher wrote:


Or, instead of messing around with the JNDI setting, simply set -
Dsolr.solr.home=/opt/solr with the JVM startup parameters for
Tomcat.   Hardcoding a path in web.xml is definitely _not_ what we
want to do.  Not all containers unpack the WAR file onto disk.
Also, consider the case of upgrading to a newer version of Solr
after having tweaked web.xml.

 Erik


On Dec 4, 2007, at 9:58 PM, Yousef Ourabi wrote:


Tomcat unpacks the jar into the webapps directory based off the
context name anyway...

What was the original thinking behind not having solr/home set in
the web.xml -- seems like an easier way to deal with this.

I would imagine most people are more familiar with setting params
in web.xml than manually creating Contexts for their webapp...

In fact I would take a step further and have a default value of /
opt/solr (or whatever...) and if a specific user wants to change it
they can just edit their web.xml?

This would simplify the documentation, instead of configure your
stuff in the Context -- it becomes this is the default, copy
example/solr to /opt/solr (or we have a script do it) and deploy
the .war


- Original Message -
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, December 4, 2007 6:34:55 PM (GMT-0800) America/
Los_Angeles
Subject: Re: Tomcat6 env-entry


: It works excellently in Tomcat 6. The toughest thing I had to
deal with is
: discovering that the environment variable in web.xml for solr/
home is
: essential. If you skip that step, it won't come up.

no, there's no reason why you should need to edit the web.xml
file ... the
solr/home property can be set in a Context configuration using an
Environment directive without ever opening the solr.war.  See  
this

section of the tomcat docs for me details...



http://tomcat.apache.org/tomcat-6.0-doc/config/context.html#Environment%20Entries


:env-entry
:env-entry-namesolr/home/env-entry-name
:env-entry-typejava.lang.String/env-entry-type
:env-entry-valueF:\Tomcat-6.0.14\webapps\solr/env-entry-
value
:/env-entry


-Hoss










Re: Tomcat6 env-entry

2007-12-05 Thread Matthew Runo
I found that the JNDI settings for Tomcat6 were hard to figure out.  
Would someone be willing to write it up for the wiki? Since I think  
most people getting started with SOLR will be using Tomcat6 (or  
Jetty), it would make sense to update the docs a bit to make it easier  
to figure out the proper place and way to set all this up.


Even just a link to this thread in some archive would help.

--Matthew

On Dec 5, 2007, at 1:57 AM, Erik Hatcher wrote:

Or, instead of messing around with the JNDI setting, simply set - 
Dsolr.solr.home=/opt/solr with the JVM startup parameters for  
Tomcat.   Hardcoding a path in web.xml is definitely _not_ what we  
want to do.  Not all containers unpack the WAR file onto disk.   
Also, consider the case of upgrading to a newer version of Solr  
after having tweaked web.xml.


Erik


On Dec 4, 2007, at 9:58 PM, Yousef Ourabi wrote:

Tomcat unpacks the jar into the webapps directory based off the  
context name anyway...


What was the original thinking behind not having solr/home set in  
the web.xml -- seems like an easier way to deal with this.


I would imagine most people are more familiar with setting params  
in web.xml than manually creating Contexts for their webapp...


In fact I would take a step further and have a default value of / 
opt/solr (or whatever...) and if a specific user wants to change it  
they can just edit their web.xml?


This would simplify the documentation, instead of configure your  
stuff in the Context -- it becomes this is the default, copy  
example/solr to /opt/solr (or we have a script do it) and deploy  
the .war



- Original Message -
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, December 4, 2007 6:34:55 PM (GMT-0800) America/ 
Los_Angeles

Subject: Re: Tomcat6 env-entry


: It works excellently in Tomcat 6. The toughest thing I had to  
deal with is
: discovering that the environment variable in web.xml for solr/ 
home is

: essential. If you skip that step, it won't come up.

no, there's no reason why you should need to edit the web.xml  
file ... the

solr/home property can be set in a Context configuration using an
Environment directive without ever opening the solr.war.  See this
section of the tomcat docs for me details...

http://tomcat.apache.org/tomcat-6.0-doc/config/context.html#Environment%20Entries

:env-entry
:env-entry-namesolr/home/env-entry-name
:env-entry-typejava.lang.String/env-entry-type
:env-entry-valueF:\Tomcat-6.0.14\webapps\solr/env-entry- 
value

:/env-entry


-Hoss







SOLR 1.3 trunk error

2007-12-04 Thread Matthew Runo

Hello!

I'm trying to make use of SOLR 1.3, svn trunk, and get the following  
error.


SEVERE: java.lang.NoSuchMethodError:  
org.apache.solr.search.QParser.getSort(Z)Lorg/apache/solr/search/ 
QueryParsing$SortSpec;
	at  
org 
.apache 
.solr.handler.component.QueryComponent.prepare(QueryComponent.java:66)
	at  
org 
.apache 
.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:93)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
117)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:826)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
	at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
235)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
233)
	at  
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
175)
	at  
org 
.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: 
128)
	at  
org 
.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 
102)
	at  
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at  
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
263)
	at  
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)
	at org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:584)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java: 
447)

at java.lang.Thread.run(Thread.java:619)

--Matthew


Re: SOLR 1.3 trunk error

2007-12-04 Thread Matthew Runo
Ooops, I get this error when I try to search an index with a few  
documents in it.


ie..

http://dev14.zappos.com:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

caching : true
numDocs : 5
maxDoc : 5
readerImpl : MultiReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1196707950551
openedAt : Tue Dec 04 10:14:58 PST 2007
registeredAt : Tue Dec 04 10:14:58 PST 2007

On Dec 4, 2007, at 10:19 AM, Matthew Runo wrote:


Hello!

I'm trying to make use of SOLR 1.3, svn trunk, and get the following  
error.


SEVERE: java.lang.NoSuchMethodError:  
org.apache.solr.search.QParser.getSort(Z)Lorg/apache/solr/search/ 
QueryParsing$SortSpec;
	at  
org 
.apache 
.solr.handler.component.QueryComponent.prepare(QueryComponent.java:66)
	at  
org 
.apache 
.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:93)
	at  
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:826)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
	at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
235)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
233)
	at  
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
175)
	at  
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at  
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at  
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)
	at  
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
263)
	at  
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)
	at org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:584)
	at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)

--Matthew





Re: SOLR 1.3 trunk error

2007-12-04 Thread Matthew Runo

Wow. So I feel stupid. Sorry to waste your time =p

--Matthew

On Dec 4, 2007, at 10:36 AM, Ryan McKinley wrote:


did you try 'ant clean' before running 'ant dist'?

the method signature for SortSpec changed recently


Matthew Runo wrote:
Ooops, I get this error when I try to search an index with a few  
documents in it.

ie..
http://dev14.zappos.com:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on 
 caching : true

numDocs : 5
maxDoc : 5
readerImpl : MultiReader
readerDir : org.apache.lucene.store.FSDirectory@/opt/solr/data/index
indexVersion : 1196707950551
openedAt : Tue Dec 04 10:14:58 PST 2007
registeredAt : Tue Dec 04 10:14:58 PST 2007
On Dec 4, 2007, at 10:19 AM, Matthew Runo wrote:

Hello!

I'm trying to make use of SOLR 1.3, svn trunk, and get the  
following error.


SEVERE: java.lang.NoSuchMethodError:  
org.apache.solr.search.QParser.getSort(Z)Lorg/apache/solr/search/ 
QueryParsing$SortSpec;
   at  
org 
.apache 
.solr.handler.component.QueryComponent.prepare(QueryComponent.java: 
66)
   at  
org 
.apache 
.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:93)
   at  
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
117)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:826)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
206)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
174)
   at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)
   at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)
   at  
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at  
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:175)
   at  
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at  
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at  
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)
   at  
org 
.apache 
.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
   at  
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)
   at org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:584)
   at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:447)

   at java.lang.Thread.run(Thread.java:619)

--Matthew







Re: Tomcat6?

2007-12-03 Thread Matthew Runo

In context.xml, I added..

Environment name=/solr/home value=/Users/mruno/solr-src/example/ 
solr type=java.lang.String /


I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:

In the Solr wiki, there is not described how to install Solr on  
Tomcat 6, and I not managed it myself :(
In the chapter Configuring Solr Home with JNDI there is mentioned  
the directory $CATALINA_HOME/conf/Catalina/localhost , which not  
exists with TOMCAT 6.


Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ 
localhost, but with no success.. (I can query the top level page,  
but the Solr Admin link then not works).


Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_





Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-27 Thread Matthew Runo
I'd be interested in seeing more logging in the admin section! I saw  
that there is QPS in 1.3, which is great, but it'd be wonderful to see  
more.


--Matthew Runo

On Nov 27, 2007, at 9:18 AM, Siegfried Goeschl wrote:


Hi folks,

working on a closed source project for an IP concerned company is  
not always fun ... we combined SOLR with JAMon (http://jamonapi.sourceforge.net/ 
) to keep an eye of the query times and this might be of general  
interest


+) JAMon comes with a ready-to-use ServletFilter
+) we extended this implementation to keep track for queries issued  
by a customer and the requested domain objects, e.g. artist,  
album, track
+) this allows us to keep track of the execution times and their  
distribution to find quickly long running queries without having  
access to the access.log from a web browser

+) a small presentation can be found at 
http://people.apache.org/~sgoeschl/presentations/jamon-20070717.pdf
+) if it is of general I can rewrite the code as contribution

Cheers,

Siegfried Goeschl





Re: Solr cluster topology.

2007-11-20 Thread Matthew Runo

Yes. The clients will always be a minute or two behind the master.

I like the way some people are doing it - make them all masters! Just  
post your updates to each of them - you loose a bit of performance  
perhaps, but it doesn't matter if a server bombs out or you have to  
upgrade them, since they're all exactly the same.


--Matthew

On Nov 20, 2007, at 7:43 AM, Alexander Wallace wrote:


Hi All!

I just started reading about Solr a couple of days ago (not full  
time of course) and it looks like a pretty impressive set of  
technologies... I have still a few questions I have not clearly found:


Q: On a cluster, as I understand it, one and only one machine is a  
master, and N servers could be slaves...The clients, do they all  
talk to the master for indexing and to a load balancer for  
searching?   Is one particular machine configured to know it is the  
master? Or is it only the settings for replicating the index that  
matter?   Or does one post reindex petitions to any of the slaves  
and they will forward it to the master?


How can we have failover in the master?

It is a correct assumption that slaves could always be a bit out of  
sync with the master, correct? A matter of minutes perhaps...


Thanks in advance for your responses!






SOLR 1.3 Release?

2007-10-25 Thread Matthew Runo
Any ideas on when 1.3 might be released? We're starting a new project  
and I'd love to use 1.3 for it - is SVN head stable enough for use?


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




Re: SOLR 1.3 Release?

2007-10-25 Thread Matthew Runo
I'm mostly interested in using the SOLRj library for now, and the  
spellsheck handler  work on per-field updates.


I think I'll just go with 1.3 and report back if something seems broken.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 25, 2007, at 3:29 PM, Ryan McKinley wrote:


Yonik Seeley wrote:

On 10/25/07, Matthew Runo [EMAIL PROTECTED] wrote:
Any ideas on when 1.3 might be released? We're starting a new  
project

and I'd love to use 1.3 for it - is SVN head stable enough for use?

I think it's stable in the sense of does the right thing and doesn't
crash, but IMO
isn't stable in the sense that new interfaces (internal and external)
added since 1.2 may still be changing.



A lot has been added since 1.2 -- if you have the time/temperament  
to be ok interfaces that may be in flux, it is great to have more  
feedback on how they work/ how they should work.  Since 1.2, i  
think any bugs or serious problems that have arised (not many) have  
been fixed within a day or two.  (as good as paid support!)


Note the public interfaces from 1.2 are (and will be) totally  
compatible with 1.3 - the only interface issues you may run into  
are if you are writing custom code or using new features added  
since 1.2



Lots of new stuff going in (and has gone in), and I wouldn't  
expect to

see 1.3 super soon.
Just IMO of course.



I don't think it is soon either -- there are a few big things that  
need to get in and have time to settle before locking into public APIs


ryan











Re: Forced Top Document

2007-10-24 Thread Matthew Runo
I'd love to know this, as I just got a development request for this  
very feature. I'd rather not spend time on it if it already exists.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 23, 2007, at 10:12 PM, mark angelillo wrote:


Hi all,

Is there a way to get a specific document to appear on top of  
search results even if a sorting parameter would push it further down?


Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...






Re: dismax downweighting

2007-10-12 Thread Matthew Runo
would a dismax boost that's negative work?   ie.. name^-1  and   
description^-1 ?


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 12, 2007, at 1:13 PM, Brian Whitman wrote:

i have a dismax query where I want to boost appearance of the query  
terms in certain fields but downboost appearance in others.


The practical use is a field containing a lot of descriptive text  
and then a product name field where products might be named after a  
descriptive word. Consider an electric toothbrush called The Fast  
And Thorough Toothbrush -- if a user searches for fast toothbrush  
I'd like to down-weight that particular model's advantage. The name  
of the product might also be in the descriptive text.


I tried

 str name=qf
-name description
 /str

but solr didn't like that.

Any better ideas?


--
http://variogr.am/







Re: Spell Check Handler

2007-10-11 Thread Matthew Runo
Where does the index come from in the first place? Do we have to  
enter the words, or are they entered as documents enter the SOLR index?


I'd love to be able to use my own documents as the spell check index  
of correctly spelled words.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 11, 2007, at 7:08 AM, [EMAIL PROTECTED]  
[EMAIL PROTECTED] wrote:



Climbingrose,

I think you make a valid point.  Each person may have a different  
concept of how something should work with their application.


My thought on the subject of spell checking multiple words:
  - the parameter multiWords enables spell checking on each word  
in q parameter instead of on the whole field
  - each word is then represented in its own entry in a list of all  
words that are checked
  - to identify each word that is being checked within that entry,  
it is identified by the key words
  - to identify if the word was found exactly as it is within the  
spell checker's index, the exist key contains this information
  - Since there can be suggestions for both misspelled words and  
words that are spelled correctly, the list of suggestions is also  
included for both correctly spelled and misspelled words, even if  
the suggestion list is empty.


  - My vision is that if a user has a search query of multiple  
words and they are wanting to perform a check on the words, the use  
of multiWords will check all words at one time, independently  
from each others and return the list.  The presenting web app can  
then identify visually to the user which words are misspelled and  
which ones have suggestions too.  The user can then work with the  
various lists of suggestions without having to re-hit Solr.   
Naturally, if the user manually changes a word, then Solr will have  
to be re-hit, but providing a single list of all words, including  
suggestions for correct words along with incorrect words, will help  
simplify applications (by reducing iterating over each word) and  
will help reduce the number of hits to the Solr server.



1) I assumpt that when user enter a misspelled multiword query, we  
should
only check for words that are actually misspelled. For example, if  
user
enter life expectancy calculatar, which has calculator  
misspelled, we

should only spellcheck calculatar.


I think I understand what you mean in the above statement, but you  
must admit, it does sound funny.  After all, how do you identify  
that a word is misspelled by NOT using the spelling checker?   
Correct me if I am wrong, but I think you intended to say that when  
a word is identified as being misspelled, then you should only  
include the suggestions for misspelled words.  If this is the case,  
then I would have to disagree with you.  The user may be interested  
in finding words that might mean the same, but are more popular  
(appears in more indexed documents within the Lucene index).  Hence  
the reason why I added the result field exist to identify that a  
word is spelled correctly even if there is a list of suggestions.   
Please note, the situation can exist too where a word is misspelled  
and there are no suggestions so one cannot use the suggestion list  
as an indicator to the correctness of the individual word(s).




2) I only return the best string for a mispelled query.


You can also use the parameter suggestionCount=1 to control how  
many words are returned.  In this case, it will do what your code  
is doing, but still allow the client to dynamically change this  
value without the need to hard code it within the main source code.



As far as only including terms that are more popular than the word  
that is being checked, there is already a parameter  
onlyMorePopular that you can use to dynamically control this  
feature from the client side so it does not have to be hard coded  
within the spelling checker.


Review these parameter options on the wiki, but keep in mind I have  
not updated the wiki with my changes or the new parameter and  
result fields:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

   Thanks Climbingrose,

 Scott Tabar




 climbingrose [EMAIL PROTECTED] wrote:
Just to clarify this line of code:

String[] suggestions = spellChecker.suggestSimilar(termText, numSug,
req.getSearcher().getReader(), restrictToField, true);

I only return suggestions if they are more popular than termText. You
probably need to use code in Scott's patch to make this behaviour
configurable.

On 10/11/07, climbingrose [EMAIL PROTECTED] wrote:


Hi all,

I've been so busy the last few days so I haven't replied to this  
email. I
modified SpellCheckerHandler a while ago to include support for  
multiword
query. To be honest, I didn't have time to write unit test for the  
code.
However, I deployed it in a production environment and it has been  
working

Re: Availability Issues

2007-10-09 Thread Matthew Runo
The way I'd do it would be to buy more servers, set up Tomcat on  
each, and get SOLR replicating from your current machine to the  
others. Then, throw them all behind a load balancer, and there you go.


You could also post your updates to every machine. Then you don't  
need to worry about getting replication running.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 9, 2007, at 7:12 AM, David Whalen wrote:


All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W



-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Monday, October 08, 2007 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Availability Issues

On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:

Do you see any requests that took a really long time to finish?


The requests that take a long time to finish are just

simple queries.

And the same queries run at a later time come back much faster.

Our logs contain 99% inserts and 1% queries.  We are

constantly adding

documents to the index at a rate of 10,000 per minute, so the logs
show mostly that.


Oh, so you are using the same boxes for updating and querying?
When you insert, are you using multiple threads?  If so, how many?

What is the full URL of those slow query requests?
Do the slow requests start after a commit?


Start with the thread dump.
I bet it's multiple queries piling up around some synchronization
points in lucene (sometimes caused by multiple threads generating
the same big filter that isn't yet cached).


What would be my next steps after that?  I'm not sure I'd

understand

enough from the dump to make heads-or-tails of it.  Can I

share that

here?


Yes, post it here.  Most likely a majority of the threads
will be blocked somewhere deep in lucene code, and you will
probably need help from people here to figure it out.

-Yonik








Re: Availability Issues

2007-10-09 Thread Matthew Runo
When we are doing a reindex (1x a day), we post around 150-200  
documents per second, on average. Our index is not as large though,  
about 200k docs. During this import, the search service (with faceted  
page navigation) remains available for front-end searches and  
performance does not noticeably change. You can see this install  
running at http://www.6pm.com, where SOLR is in use for every part of  
the navigation and search.


I believe that a sustained load of 150+ posts per second is very  
possible. At that load though, it does make sense to consider  
multiple machines.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 9, 2007, at 10:16 AM, Charles Hornberger wrote:


I'm about to do a prototype deployment of Solr for a pretty
high-volume site, and I've been following this thread with some
interest.

One thing I want to confirm: It's really possible for Solr to handle a
constant stream of 10K updates/min (150 updates/sec) to a
25M-document index? I new Solr and Lucene were good, but that seems
like a pretty tall order. From the responses I'm seeing to David
Whalen's inquiries, it seems like people think that's possible.

Thanks,
Charlie

On 10/9/07, Matthew Runo [EMAIL PROTECTED] wrote:

The way I'd do it would be to buy more servers, set up Tomcat on
each, and get SOLR replicating from your current machine to the
others. Then, throw them all behind a load balancer, and there you  
go.


You could also post your updates to every machine. Then you don't
need to worry about getting replication running.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Oct 9, 2007, at 7:12 AM, David Whalen wrote:


All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W



-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Monday, October 08, 2007 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Availability Issues

On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:

Do you see any requests that took a really long time to finish?


The requests that take a long time to finish are just

simple queries.

And the same queries run at a later time come back much faster.

Our logs contain 99% inserts and 1% queries.  We are

constantly adding

documents to the index at a rate of 10,000 per minute, so the logs
show mostly that.


Oh, so you are using the same boxes for updating and querying?
When you insert, are you using multiple threads?  If so, how many?

What is the full URL of those slow query requests?
Do the slow requests start after a commit?


Start with the thread dump.
I bet it's multiple queries piling up around some synchronization
points in lucene (sometimes caused by multiple threads generating
the same big filter that isn't yet cached).


What would be my next steps after that?  I'm not sure I'd

understand

enough from the dump to make heads-or-tails of it.  Can I

share that

here?


Yes, post it here.  Most likely a majority of the threads
will be blocked somewhere deep in lucene code, and you will
probably need help from people here to figure it out.

-Yonik













Re: Does Solr Have?

2007-10-04 Thread Matthew Runo
How does one set up the LukeRequestHandler? I didn't see a document  
in the wiki about how to add new handlers, and my install (a 1.1  
install upgraded to 1.2) does not have this handler available.


I'd like to see what we're talking about, it sounds very  
interesting.. but I can't find how to turn on this request handler.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 4, 2007, at 6:11 AM, Ryan McKinley wrote:


Robert Young wrote:

Hi,
We're just about to start work on a project in Solr and there are a
couple of points which I haven't been able to find out from the wiki
which I'm interested in.
1. Is there a REST interface for getting index stats? I would
particularly like access to terms and their document frequencies,
prefereably filtered by a query.


have you checked the luke request handler?
http://wiki.apache.org/solr/LukeRequestHandler



2. Is it possible to use different synonym sets for different queries


the synonym's are linked to each field -- if differnt queires  
access different fields, it will use differnt synonyms.


To automatically index things with different field types, check the  
copyField ... stuff.



OR is it possible to search across multiple indexes with a single
query?


not currently.



3. Is it possible to change stopword and synonym sets at runtime?


By default no - but it is not hard to write a custom field type  
that would.  (I have one that loads synonyms from a SQL table -- it  
can be updated dynamically at runtime)



I'm sure I'll have lots more questions as time goes by and,  
hopefully,

I'll be able to answer others' questions in the future.


great!

ryan





Re: Does Solr Have?

2007-10-04 Thread Matthew Runo
Boo, thank you for the reply. That's what I get for customizing it  
and taking out all the other code I guess. Sorry about that.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 4, 2007, at 9:47 AM, Ryan McKinley wrote:


add:
  requestHandler name=/admin/luke  
class=org.apache.solr.handler.admin.LukeRequestHandler /


to your solrconfig.xml

It is in the example solrconfig.xml that comes with 1.2


Matthew Runo wrote:
How does one set up the LukeRequestHandler? I didn't see a  
document in the wiki about how to add new handlers, and my install  
(a 1.1 install upgraded to 1.2) does not have this handler available.
I'd like to see what we're talking about, it sounds very  
interesting.. but I can't find how to turn on this request handler.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++
On Oct 4, 2007, at 6:11 AM, Ryan McKinley wrote:

Robert Young wrote:

Hi,
We're just about to start work on a project in Solr and there are a
couple of points which I haven't been able to find out from the  
wiki

which I'm interested in.
1. Is there a REST interface for getting index stats? I would
particularly like access to terms and their document frequencies,
prefereably filtered by a query.


have you checked the luke request handler?
http://wiki.apache.org/solr/LukeRequestHandler


2. Is it possible to use different synonym sets for different  
queries


the synonym's are linked to each field -- if differnt queires  
access different fields, it will use differnt synonyms.


To automatically index things with different field types, check  
the copyField ... stuff.



OR is it possible to search across multiple indexes with a single
query?


not currently.



3. Is it possible to change stopword and synonym sets at runtime?


By default no - but it is not hard to write a custom field type  
that would.  (I have one that loads synonyms from a SQL table --  
it can be updated dynamically at runtime)



I'm sure I'll have lots more questions as time goes by and,  
hopefully,

I'll be able to answer others' questions in the future.


great!

ryan







  1   2   >