cores/shards with no leader

2013-08-29 Thread Cat Bieber

Hello,

We're running solr 4.2.0 and recently converted to SolrCloud. We've got 
16 cores, each with 1 shard. 3 zookeeper instances, 4 replicas of each 
core. We're suddenly having trouble with very slow tomcat restarts 
(15-45 minutes) and even when we can get a few replicas up, we aren't 
seeing a leader for many of our cores. I tried issuing a reload command 
through the cores admin, but it fails because there is no leader. Is 
there any way to cause an election? Restarting tomcat on individual 
servers in the cluster doesn't seem to help. We do have some cores that 
are serving request properly and would prefer not to shut down the whole 
cluster if possible -- this is a production system.


In addition, some cores are reporting a peculiar error, stack trace 
below. The cores that report this problem seem to be completely down 
across all replicas.


ERROR org.apache.solr.servlet.SolrDispatchFilter  - null
:org.apache.solr.common.SolrException: Error opening new searcher
at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1304)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1239)
at 
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94)
at 
org.apache.solr.servlet.cache.HttpCacheHeaderUtil.calcLastModified(HttpCacheHeaderUtil.java:145)
at 
org.apache.solr.servlet.cache.HttpCacheHeaderUtil.doCacheHeaderValidation(HttpCacheHeaderUtil.java:218)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:334)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:581)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:879)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Already closed
at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:237)
at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:222)

at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:244)
at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1326)


Has anyone see either of these issues before? I'm having trouble finding 
any information on either situation.


Thanks,
-Cat


Re: cores/shards with no leader

2013-08-29 Thread Cat Bieber

Thanks Shawn. We'll give an upgrade a try and see if that helps.
-Cat

On 08/29/2013 04:32 PM, Shawn Heisey wrote:

On 8/29/2013 2:16 PM, Cat Bieber wrote:
   

We're running solr 4.2.0 and recently converted to SolrCloud. We've got
16 cores, each with 1 shard. 3 zookeeper instances, 4 replicas of each
core. We're suddenly having trouble ...
 

Solr 4.2.0 had a number of bugs.  They were severe enough that a 4.2.1
version was quickly released afterwards.  It should be possible to
upgrade without changing your config.  You should probably upgrade to
4.4, but that would be less straightforward.

   


Re: alphanumeric interval

2012-07-05 Thread Cat Bieber
I did not use facets in my implementation, so I don't have any 
facet-specific code snippet that would be helpful to you. However, if 
your handler extends SearchHandler and calls super.handleRequestBody() 
it should be running the facet component code. You have access to the 
SolrQueryResponse built by it, and may be able to get the data out of 
that object. You'll need to look at the javadoc for NamedList, and I 
found it helpful to dump the list in debug statements so I could examine 
its structure and contents. I suspect you need something like 
rsp.getValues().get(facet_counts) to get the facet data, but haven't 
tested it.

-Cat Bieber

On 07/05/2012 04:32 AM, AlexR wrote:

Hi,

thanks a lot for your answer, and sorry for my late response.

It's my first time to write a solr plugin. I already have a plugin with
empty handleRequestBody() method and i'm able to call them.

I need the list of facetted field person (facet.field=person) in my method.
but i don't know how.

do you have a code snipped of your implementation?

thx
Alex


--
View this message in context: 
http://lucene.472066.n3.nabble.com/alphanumeric-interval-tp3990965p3993148.html
Sent from the Solr - User mailing list archive at Nabble.com.
   


Re: alphanumeric interval

2012-06-22 Thread Cat Bieber
I had a similar issue. The solution I ended up using was a custom 
RequestHandler that extends SearchHandler. In handleRequestBody() it 
calls super.handleRequestBody(req, rsp), looks for a pageSize 
parameter (25 in your example), and loops over the array of results 
inside the response, pulling out the ones I want and building a new 
array. It performs well enough, and avoids downloading a large result 
set. It sounds like you're more interested in number of buckets whereas 
I needed a specific page size, but you could just as easily pass in the 
number of buckets in your param. This solution does require that you be 
willing to write some java code and add it to your solr deploy as a 
plugin. Then you configure your new request handler in solrconfig.xml 
and you can give it a default page size or bucket size.


I actually discussed this with some people at the training before Lucene 
Revolution and there wasn't a distinct right answer. Because I was 
looking for a three-letter prefix for my ranges, one suggestion was to 
add the prefix to the solr index and facet on it. Then, by adding up 
counts you could tell what the endpoints of an interval were. That would 
still require doing some calculations on the client side, and it won't 
be useful if you have full values with few duplicates.

-Cat Bieber


On 06/22/2012 09:32 AM, AlexR wrote:

I need even sized buckets and their borders.
100/4 = 25 entries

Border for first interval is entry 1 and entry 25
in this case Alex - John

i don't want to load all names and calculate the borders on the client.
Is there a way to get the borders from Solr?

   


phrase query and string/keyword tokenizer

2012-06-14 Thread Cat Bieber
I have documents that are word definitions (basically an online 
dictionary) that can have alternate titles. For example the document 
entitled Read-only memory might have an alternate title of ROM. In 
search results, I want to boost documents with an alternate title that 
is a case-insensitive exact match for the query text -- e.g. rom 
should work as well.


I'm running solr 3.6 and using edismax.

I've gone through a few iterations of this. What I have working best so 
far is a multi-valued text field for the alternate titles with a big boost:


fieldType name=lowerCaseSort class=solr.TextField 
sortMissingLast=true omitNorms=true

analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

field name=bestMatchTitle type=lowerCaseSort indexed=true 
stored=false multiValued=true/


This produces great results with single-word searches like the ROM 
example above. It runs into problems with a multi-word alternate title 
like Blue Tooth. I have read some of the prior discussions about this, 
regarding how the query is parsed based on spaces before it gets to the 
keyword tokenizer for the field type.


The question I have is about phrase queries in this case. My request 
handler has:


str name=qfbestMatchTitle^20 title^5 summary^3 metaDescription^1.5 
body^1 author^0.5/str
str name=pfbestMatchTitle^20 title^5 summary^3 metaDescription^1.5 
body^1 author^0.5/str


When I run a query, I get this:

+((DisjunctionMaxQuery((metaDescription:blue^1.5 | summary:blue^3.0 | 
author:blue^0.5 | body:blue | title:blue^5.0 | 
bestMatchTitle:blue^20.0)~0.01) 
DisjunctionMaxQuery((metaDescription:tooth^1.5 | summary:tooth^3.0 | 
author:tooth^0.5 | body:tooth | title:tooth^5.0 | 
bestMatchTitle:tooth^20.0)~0.01))~2) 
DisjunctionMaxQuery((metaDescription:blue tooth~100^1.5 | 
summary:blue tooth~100^3.0 | body:blue tooth~100 | title:blue 
tooth~100^5.0)~0.01)


It looks like the phrase isn't being matched against my bestMatchTitle 
field. It also isn't matched against author, which is type string. So do 
phrases only get matched against certain field types?


When I put the quotes in the query text:

/select/?qt=best-matchq=blue+toothdebugQuery=on

It builds the query I was hoping to get:

+DisjunctionMaxQuery((metaDescription:blue tooth^1.5 | summary:blue 
tooth^3.0 | author:blue tooth^0.5 | body:blue tooth | title:blue 
tooth^5.0 | bestMatchTitle:blue tooth^20.0)~0.01)


But I still need the query on the individual tokens, otherwise it 
eliminates results that may be good hits. So far, any way I have tried 
to combine the two queries either opens up matching a ton of documents 
that shouldn't really match (e.g. total found goes from 24 to 4800+ 
documents) or doesn't match the one I want, giving poor results.


Does anyone have suggestions for how I can convince the phrase query to 
match against my bestMatchTitle field, or change the query text I'm 
passing in to combine these two queries and get the boost I want? Or is 
there another approach altogether that I'm missing?


Thanks for any help with this.
-Cat Bieber



Re: String ordering appears different with sort vs range query

2012-04-20 Thread Cat Bieber
Thanks for looking at this. I'll see if we can sneak an upgrade to 3.6 
into the project to get this working.

-Cat

On 04/20/2012 12:03 PM, Erick Erickson wrote:

BTW, nice problem statement...

Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got fixed
in the 3.6 time-frame. Don't have the time right now
to go back over the JIRA's to see...

Best
Erick

On Thu, Apr 19, 2012 at 3:39 PM, Cat Biebercbie...@techtarget.com  wrote:
   

I'm trying to use a Solr query to find the next title in alphabetical order
after a given string. The issue I'm facing is that the sort param seems to
sort non-alphanumeric characters in a different order from the ordering used
by a range filter in the q or fq param. I can't filter the non-alphanumeric
characters out because they're integral to the data and it would not be a
useful ordering if it were based only on the alphanumeric portion of the
strings.

I'm running Solr version 3.5.

In my current approach, I have a field that is a unique string for each
document:

fieldType name=lowerCaseSort class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

field name=uniqueSortString type=lowerCaseSort indexed=true
stored=true/

I'm passing the value for the current document in a range to query
everything after the current string, sorted ascending:

/select?fl=uniqueSortStringsort=uniqueSortString+ascq=uniqueSortString:[$1+ZX+Spectrum+HOBETA+format+file+TO+*]wt=xmlrows=5version=2.2

In theory, I expect the first result to be the current item and the second
result to be the next one. However, I'm finding that the sort and the range
filter seem to use different ordering:

result name=response numFound=448 start=0
doc
str name=uniqueSortString$1 ZX Spectrum - Emulator/str
/doc
doc
str name=uniqueSortString$1 ZX Spectrum HOBETA format file/str
/doc
doc
str name=uniqueSortString$1 ZX Spectrum Hobetta Picture Format/str
/doc
doc
str name=uniqueSortString$? TR-DOS ZX Spectrum file in HOBETA
format/str
/doc
doc
str name=uniqueSortString$A AutoCAD Autosave File ( Autodesk Inc.)/str
/doc
/result

Based on the results ordering, sort believes - precedes H, but the range
filter should have excluded that first result if it ordered in the same way.
Digging through the code, I think it looks like sorting uses
String.compareTo() for ordering on a text/string field. However I haven't
been able to track down where the range filter code is. If someone can point
me in the right direction to find that code I'd love to look through it. Or,
if anyone has suggestions regarding a different approach or changes I can
make to this query/field, that would be very helpful.

Thanks for your time.
-Cat Bieber
 


String ordering appears different with sort vs range query

2012-04-19 Thread Cat Bieber
I'm trying to use a Solr query to find the next title in alphabetical 
order after a given string. The issue I'm facing is that the sort param 
seems to sort non-alphanumeric characters in a different order from the 
ordering used by a range filter in the q or fq param. I can't filter the 
non-alphanumeric characters out because they're integral to the data and 
it would not be a useful ordering if it were based only on the 
alphanumeric portion of the strings.


I'm running Solr version 3.5.

In my current approach, I have a field that is a unique string for each 
document:


fieldType name=lowerCaseSort class=solr.TextField 
sortMissingLast=true omitNorms=true

analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

field name=uniqueSortString type=lowerCaseSort indexed=true 
stored=true/


I'm passing the value for the current document in a range to query 
everything after the current string, sorted ascending:


/select?fl=uniqueSortStringsort=uniqueSortString+ascq=uniqueSortString:[$1+ZX+Spectrum+HOBETA+format+file+TO+*]wt=xmlrows=5version=2.2

In theory, I expect the first result to be the current item and the 
second result to be the next one. However, I'm finding that the sort and 
the range filter seem to use different ordering:


result name=response numFound=448 start=0
doc
str name=uniqueSortString$1 ZX Spectrum - Emulator/str
/doc
doc
str name=uniqueSortString$1 ZX Spectrum HOBETA format file/str
/doc
doc
str name=uniqueSortString$1 ZX Spectrum Hobetta Picture Format/str
/doc
doc
str name=uniqueSortString$? TR-DOS ZX Spectrum file in HOBETA 
format/str

/doc
doc
str name=uniqueSortString$A AutoCAD Autosave File ( Autodesk Inc.)/str
/doc
/result

Based on the results ordering, sort believes - precedes H, but the range 
filter should have excluded that first result if it ordered in the same 
way. Digging through the code, I think it looks like sorting uses 
String.compareTo() for ordering on a text/string field. However I 
haven't been able to track down where the range filter code is. If 
someone can point me in the right direction to find that code I'd love 
to look through it. Or, if anyone has suggestions regarding a different 
approach or changes I can make to this query/field, that would be very 
helpful.


Thanks for your time.
-Cat Bieber