Fwd: Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound

2011-09-28 Thread Frank Romweber


I use drupal for accessing the solr search engine. After updating an
creating my new index everthing works as before. Then I activate the
group=true and group.field=site and solr delivers me the wanted search
results but in Drupal nothing appears just an empty search page. I found
out that the group changes the resultset names. No problem solr offers
for this case the group.main=true parameter. So I added this and get
this 500 error.

HTTP Status 500 - org/apache/commons/lang/ArrayUtils
java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at
org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573) 


at
org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675)
at org.apache.solr.search.Grouping.execute(Grouping.java:339) at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240) 


at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) 


at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 


at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 


at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) 


at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 


at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 


at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 


at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 


at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 


at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 


at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 


at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) 


at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

I found out that solr didt find the class ArrayUtils.class. I try a lot
of things to get this work. Setting JAVA_HOME and CLASSPATH vars and I
changed the jre without any success. I am really wondering all my other
programms are still running even solr in the normal mode is working
and accesibly but not the group.main=true function.

So my question is now what is nessesary to get this work?
Any help is apreciated.

Thx frank




Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound

2011-09-28 Thread Frank Romweber
I use drupal for accessing the solr search engine. After updating an 
creating my new index everthing works as before. Then I activate the 
group=true and group.field=site and solr delivers me the wanted search 
results but in Drupal nothing appears just an empty search page. I found 
out that the group changes the resultset names. No problem solr offers 
for this case the group.main=true parameter. So I added this and get 
this 500 error.


HTTP Status 500 - org/apache/commons/lang/ArrayUtils 
java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at 
org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573) 
at 
org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) 
at org.apache.solr.search.Grouping.execute(Grouping.java:339) at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240) 
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) 
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) 
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) 
at java.lang.Thread.run(Thread.java:662)


I found out that solr didt find the class ArrayUtils.class. I try a lot 
of things to get this work. Setting JAVA_HOME and CLASSPATH vars and I 
changed the jre without any success. I am really wondering all my other 
programms are still running even solr in the normal mode is working 
and accesibly but not the group.main=true function.


So my question is now what is nessesary to get this work?
Any help is apreciated.

Thx frank




Upgrading from 3.1 to 3.4

2011-09-28 Thread Rohit
I have been using solr 3.1 am planning to update to solr 3.4, whats the
steps to be followed or anything that needs to be take care of specifically
for the upgrade?

 

Regards,

Rohit



Re: getting answers starting with a requested string first

2011-09-28 Thread elisabeth benoit
Thanks a lot for your advice.

What really matters to me is that answers with NAME_ANALYZED=Tour Eiffel
appear first. Then, if Tour Eiffel Tower By Helicopter appears before or
after Hotel la tour Eiffel doesn't really matter.

Since I send fq=NAME_ANALYZED:tour eiffel, I am sure NAME_ANALYZED will at
least contain those two words. So I figured out that if I sort answers by
this field length, I'll get those called Tour eiffel first.

But I'll check the QParser anyway since it seems to be an interesting one.

Best regards,
Elisabeth

2011/9/28 Chris Hostetter hossman_luc...@fucit.org


 : 1) giving NAME_ANALYZED a type where omitNorms=false: I thought this
 would
 : give answers with shorter NAME_ANALYZED field a higher score. I've tested
 : that solution, but it's not working. I guess this is because there is no
 : score for fq parameter (all my answers have same score)

 both of those statements are correct.  omitNorms=false will cause length
 normalization to apply, so with the default similarity, shorter field
 values will generally score higher, but norms are very coarse, so it
 won't be very precise; and fq queries filter the results,
 but do not affect the score.

 : 2) sorting my answers by length desc, and I guess in this case I would
 need
 : to store the length of NAME_ANALYZED field to avoid having to compute it
 on
 : the fly. at this point, this is the only solution I can think of.

 that will also be a good way to sort on the length of the field, and will
 give you a lot of precise control.

 but sorting on length isn't what you asked about...

 :  and I have different answers like
 : 
 :  Restaurant la tour Eiffel
 :  Hotel la tour Eiffel
 :  Tour Eiffel
...
 :  Is there a way to get answers with NAME_ANALYZED beginning with tour
 :  Eiffel first?

 If you want to score documents higher because they appear at the begining
 of the field value, that is a differnet problem then scoring documents
 higher because they are shorter -- ie: Tour Eiffel Tower By Helicopter
 is longer then Hotel la tour Eiffel, which one do you want to come
 first?

 If you want documents to score higher if they appear early in the field
 value, you can either index a marker token at the begining of the field
 (ie: S_T_A_R_T Tour Eiffel) and then do all queries on that field as
 phrase queries including that token (shorter matches score higher in
 phrase queries); or you can look into using the surround QParser that
 was recently commited to the trunk.  the surround parser has special
 syntax for generting Span Queries, which support a SpanFirst query
 that scores documents higher based on how close to the begining of a field
 value the match is.


 -Hoss



Re: Search for empty string in 1.4.1 vs 3.4

2011-09-28 Thread Shanmugavel SRD
Thank you for the reply Chris.
Please find the sample query which is returning results even though id is
not having any value as  in SOLR 1.4.1

http://localhost/solr/online/select/?q=%28%20state%20%29^1.8%20AND%20%20%28%20%28id:%22%22%29%29%20AND%20%20%28%20%28content_type_s:%22Video%22%29^1.5%20%29

PS: id is multivalued=false. Any field with multivalued as false works like
this {returning results on search of } in SOLR 1.4.1. But at the same time
q=id: is not returning any results in SOLR 1.4.1. This problem happens
when there is AND clause with id:

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-for-empty-string-in-1-4-1-vs-3-4-tp3358444p3375436.html
Sent from the Solr - User mailing list archive at Nabble.com.


strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus
 Hi, 


I am experiencing a strange issue doing some load tests. Our setup:

- 2 server with each 24 cpu cores, 130GB of RAM
- 10 shards per server (needed for response times) running in a single tomcat 
instance
- each query queries all 20 shards (distributed search)

- each shard holds about 1.5 mio documents (small shards are needed due to 
rather complex queries)
- all caches are warmed / high cache hit rates (99%) etc.


Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), 
ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases 
throughput and increases the response times of the individual queries.

Also 1-2% of the queries take significantly longer: avg somewhere at 100ms 
while 1-2% take 1.5s or longer. 

Any ideas are greatly appreciated :)

Fred.



Distributed search has problems with some field names

2011-09-28 Thread Luis Neves



Hello all,

I'm experimenting with the Distributed Search bits in the nightly 
builds and I'm facing a problem.


I have on my schema.xml some dynamic fields defined like this:

dynamicField name=$* type=double indexed=true stored=true /
dynamicField name=@* type=string indexed=true stored=true 
multiValued=true /

dynamicField name=* type=string indexed=true stored=true /


When hitting a single shard the following query works fine:

http://solr/select?q=*:*fl=ts,$distinct_boxes

But when I add the distrib=true parameter I get a NullPointerException:


java.lang.NullPointerException
at 
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1025)
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:725)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:700)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1451)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)




The $ in $distinct_boxes appears to be the culprit somehow, the query:

http://solr/select?q=*:*fl=ts,distinct_boxesdistrib=true

works without errors, but of course doesn't retrieve the field I want.

Funnily enough when requesting the uniqueKey field there are no errors:

http://solr/select?q=*:*fl=tid,ts,$distinct_boxesdistrib=true

But somehow the data from the field $distinct_boxes doesn't appear in 
the output.


Is there some workaround? Using fl=* returns all the data from the 
fields that start with $ but it severely increases the size of the 
response.



--
Luis Neves





Re: strange performance issue with many shards on one server

2011-09-28 Thread Federico Fissore

Frederik Kraus, il 28/09/2011 12:58, ha scritto:

  Hi,


I am experiencing a strange issue doing some load tests. Our setup:



just because I've listened to JUG mates talking about that at the last 
meeting, could it be that your CPUs are spending their time getting 
things from RAM to CPU cache?


maybe that, say, 10% CPU power is spent on the bus

federico


Re: strange performance issue with many shards on one server

2011-09-28 Thread Vadim Kisselmann
Hi Fred,
analyze the queries which take longer.
We observe our queries and see the problems with q-time with queries which
are complex, with phrase queries or queries which contains numbers or
special characters.
if you don't know it:
http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
Regards
Vadim


2011/9/28 Frederik Kraus frederik.kr...@gmail.com

  Hi,


 I am experiencing a strange issue doing some load tests. Our setup:

 - 2 server with each 24 cpu cores, 130GB of RAM
 - 10 shards per server (needed for response times) running in a single
 tomcat instance
 - each query queries all 20 shards (distributed search)

 - each shard holds about 1.5 mio documents (small shards are needed due to
 rather complex queries)
 - all caches are warmed / high cache hit rates (99%) etc.


 Now for some reason we cannot seem to fully utilize all CPU power (no disk
 IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
 decreases throughput and increases the response times of the individual
 queries.

 Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
 while 1-2% take 1.5s or longer.

 Any ideas are greatly appreciated :)

 Fred.




Re: strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus
Hi Vladim, 

the thing is, that those exact same queries, that take longer during a load 
test, perform just fine when executed at a slower request rate and are also 
random, i.e. there is no pattern in bad/slow queries.

My first thought was some kind of contention and/or connection starvation for 
the internal shard communication?

Fred.


Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:

 Hi Fred,
 analyze the queries which take longer.
 We observe our queries and see the problems with q-time with queries which
 are complex, with phrase queries or queries which contains numbers or
 special characters.
 if you don't know it:
 http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
 Regards
 Vadim
 
 
 2011/9/28 Frederik Kraus frederik.kr...@gmail.com 
 (mailto:frederik.kr...@gmail.com)
 
   Hi,
  
  
  I am experiencing a strange issue doing some load tests. Our setup:
  
  - 2 server with each 24 cpu cores, 130GB of RAM
  - 10 shards per server (needed for response times) running in a single
  tomcat instance
  - each query queries all 20 shards (distributed search)
  
  - each shard holds about 1.5 mio documents (small shards are needed due to
  rather complex queries)
  - all caches are warmed / high cache hit rates (99%) etc.
  
  
  Now for some reason we cannot seem to fully utilize all CPU power (no disk
  IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
  decreases throughput and increases the response times of the individual
  queries.
  
  Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
  while 1-2% take 1.5s or longer.
  
  Any ideas are greatly appreciated :)
  
  Fred.



Re: strange performance issue with many shards on one server

2011-09-28 Thread Vadim Kisselmann
Hi Fred,

ok, it's a strange behavior with same queries.
Another questions:
-which solr version?
-do you indexing during your load test? (because of index rebuilt)
-do you replicate your index?

Regards
Vadim



2011/9/28 Frederik Kraus frederik.kr...@gmail.com

 Hi Vladim,

 the thing is, that those exact same queries, that take longer during a load
 test, perform just fine when executed at a slower request rate and are also
 random, i.e. there is no pattern in bad/slow queries.

 My first thought was some kind of contention and/or connection starvation
 for the internal shard communication?

 Fred.


 Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:

  Hi Fred,
  analyze the queries which take longer.
  We observe our queries and see the problems with q-time with queries
 which
  are complex, with phrase queries or queries which contains numbers or
  special characters.
  if you don't know it:
 
 http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
  Regards
  Vadim
 
 
  2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:
 frederik.kr...@gmail.com)
 
Hi,
  
  
   I am experiencing a strange issue doing some load tests. Our setup:
  
   - 2 server with each 24 cpu cores, 130GB of RAM
   - 10 shards per server (needed for response times) running in a single
   tomcat instance
   - each query queries all 20 shards (distributed search)
  
   - each shard holds about 1.5 mio documents (small shards are needed due
 to
   rather complex queries)
   - all caches are warmed / high cache hit rates (99%) etc.
  
  
   Now for some reason we cannot seem to fully utilize all CPU power (no
 disk
   IO), ie. increasing concurrent users doesn't increase CPU-Load at a
 point,
   decreases throughput and increases the response times of the individual
   queries.
  
   Also 1-2% of the queries take significantly longer: avg somewhere at
 100ms
   while 1-2% take 1.5s or longer.
  
   Any ideas are greatly appreciated :)
  
   Fred.




Re: strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus


Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:

 Hi Fred,
 
 ok, it's a strange behavior with same queries.
 Another questions:
 -which solr version?

3.3 (might the NIOFSDirectory from 3.4 help?)
 
 -do you indexing during your load test? (because of index rebuilt)
nope
 
 -do you replicate your index?

nope 
 
 Regards
 Vadim
 
 
 
 2011/9/28 Frederik Kraus frederik.kr...@gmail.com 
 (mailto:frederik.kr...@gmail.com)
 
  Hi Vladim,
  
  the thing is, that those exact same queries, that take longer during a load
  test, perform just fine when executed at a slower request rate and are also
  random, i.e. there is no pattern in bad/slow queries.
  
  My first thought was some kind of contention and/or connection starvation
  for the internal shard communication?
  
  Fred.
  
  
  Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
  
   Hi Fred,
   analyze the queries which take longer.
   We observe our queries and see the problems with q-time with queries
  which
   are complex, with phrase queries or queries which contains numbers or
   special characters.
   if you don't know it:
  http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
   Regards
   Vadim
   
   
   2011/9/28 Frederik Kraus frederik.kr...@gmail.com 
   (mailto:frederik.kr...@gmail.com) (mailto:
  frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com))
   
 Hi,


I am experiencing a strange issue doing some load tests. Our setup:

- 2 server with each 24 cpu cores, 130GB of RAM
- 10 shards per server (needed for response times) running in a single
tomcat instance
- each query queries all 20 shards (distributed search)

- each shard holds about 1.5 mio documents (small shards are needed due
  to
rather complex queries)
- all caches are warmed / high cache hit rates (99%) etc.


Now for some reason we cannot seem to fully utilize all CPU power (no
  disk
IO), ie. increasing concurrent users doesn't increase CPU-Load at a
  point,
decreases throughput and increases the response times of the individual
queries.

Also 1-2% of the queries take significantly longer: avg somewhere at
  100ms
while 1-2% take 1.5s or longer.

Any ideas are greatly appreciated :)

Fred.



Still too many files after running solr optimization

2011-09-28 Thread Kissue Kissue
Hi,

I am using solr 3.3. I noticed  that after indexing about 700, 000 records
and running optimization at the end, i still have about 91 files in my index
directory. I thought that optimization was supposed to reduce the number of
files.

My settings are the default that came with Solr (mergefactor, etc)

Any ideas what i could be doing wrong?


Re: Still too many files after running solr optimization

2011-09-28 Thread Manish Bafna
Try to do optimize twice.
The 2nd one will be quick and will delete lot of files.

On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote:
 Hi,

 I am using solr 3.3. I noticed  that after indexing about 700, 000 records
 and running optimization at the end, i still have about 91 files in my index
 directory. I thought that optimization was supposed to reduce the number of
 files.

 My settings are the default that came with Solr (mergefactor, etc)

 Any ideas what i could be doing wrong?



Re: strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus
I just had a look at the thread-dump, pasting 3 examples here:


'pool-31-thread-8233' Id=11626, BLOCKED on 
lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
 total cpu time=20.ms user time=20.ms
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
 
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
 
at 
org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
 
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
 
at 
org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
 
at 
org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
 
at 
org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
 
at 
org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
 
at 
org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
 
at 
org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
 
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
 
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
 
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662) 

'pool-31-thread-8232' Id=11625, BLOCKED on 
lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
 total cpu time=20.ms user time=20.ms
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
 
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
 
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
 
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
 
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
 
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662) 
and 

'http-8080-381' Id=6859, WAITING on 
lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720,
 total cpu time=990.ms user time=920.ms

at sun.misc.Unsafe.park(Native Method) 
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 
at 
java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
 
at 
org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
 
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) 
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
at 

Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
why should the optimization reduce the number of files?
It happens only when you indexing docs with same unique key.

Have you differences in numDocs und maxDocs after optimize?
If yes:
how is your optimize command ?

Regards
Vadim



2011/9/28 Manish Bafna manish.bafna...@gmail.com

 Try to do optimize twice.
 The 2nd one will be quick and will delete lot of files.

 On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I am using solr 3.3. I noticed  that after indexing about 700, 000
 records
  and running optimization at the end, i still have about 91 files in my
 index
  directory. I thought that optimization was supposed to reduce the number
 of
  files.
 
  My settings are the default that came with Solr (mergefactor, etc)
 
  Any ideas what i could be doing wrong?
 



Re: Still too many files after running solr optimization

2011-09-28 Thread Kissue Kissue
numDocs and maxDocs are same size.

I was worried because when i used to use only Lucene for the same indexing,
before optimization there are many files but after optimization i always end
up with just 3 files in my index filder. Just want to find out if this was
ok.

Thanks

On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
v.kisselm...@googlemail.com wrote:

 why should the optimization reduce the number of files?
 It happens only when you indexing docs with same unique key.

 Have you differences in numDocs und maxDocs after optimize?
 If yes:
 how is your optimize command ?

 Regards
 Vadim



 2011/9/28 Manish Bafna manish.bafna...@gmail.com

  Try to do optimize twice.
  The 2nd one will be quick and will delete lot of files.
 
  On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
  wrote:
   Hi,
  
   I am using solr 3.3. I noticed  that after indexing about 700, 000
  records
   and running optimization at the end, i still have about 91 files in my
  index
   directory. I thought that optimization was supposed to reduce the
 number
  of
   files.
  
   My settings are the default that came with Solr (mergefactor, etc)
  
   Any ideas what i could be doing wrong?
  
 



Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
if numDocs und maxDocs have the same mumber of docs nothing will be deleted
on optimize.
You only rebuild your index.

Regards
Vadim




2011/9/28 Kissue Kissue kissue...@gmail.com

 numDocs and maxDocs are same size.

 I was worried because when i used to use only Lucene for the same indexing,
 before optimization there are many files but after optimization i always
 end
 up with just 3 files in my index filder. Just want to find out if this was
 ok.

 Thanks

 On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
 v.kisselm...@googlemail.com wrote:

  why should the optimization reduce the number of files?
  It happens only when you indexing docs with same unique key.
 
  Have you differences in numDocs und maxDocs after optimize?
  If yes:
  how is your optimize command ?
 
  Regards
  Vadim
 
 
 
  2011/9/28 Manish Bafna manish.bafna...@gmail.com
 
   Try to do optimize twice.
   The 2nd one will be quick and will delete lot of files.
  
   On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
   wrote:
Hi,
   
I am using solr 3.3. I noticed  that after indexing about 700, 000
   records
and running optimization at the end, i still have about 91 files in
 my
   index
directory. I thought that optimization was supposed to reduce the
  number
   of
files.
   
My settings are the default that came with Solr (mergefactor, etc)
   
Any ideas what i could be doing wrong?
   
  
 



Re: help understanding match

2011-09-28 Thread Vijay Ramachandran
On Tue, Sep 27, 2011 at 10:58 PM, tamanjit.bin...@yahoo.co.in 
tamanjit.bin...@yahoo.co.in wrote:

 Hi,
 1. Just curious - you have your defaultsearchfield - defaultquery as not
 stored, how do you know that it contains what you think it contains?
 2. the fieldType of defaultquery is query_text, am not sure what all
 analyzers are you using on this fields type both at indexing time and
 querying time . This could actually be the reason why stopwords were not
 used both during indexing and querying time.


Thank you. This seemed to be the problem - I had started with a schema doc
from another project, and made this mistake.


 3. Lastly, if you wanr OR operato to work dont use (quotes) instead use
 ()
 brackets around your searchable term.


The quotes were from some py code for creating the query string, and were
only illustrative.

thanks again!
Vijay

-- 
Targeted direct marketing on Twitter - http://www.wisdomtap.com/


Re: strange performance issue with many shards on one server

2011-09-28 Thread Vadim Kisselmann
Hmm, sorry don't know...
My ideas:
- tomcat generate this problem (for example: maxthreads, number of
connections...)
- JVM - Options, especially GC
- index locks, eventually an open issue in jira

Regards
Vadim




2011/9/28 Frederik Kraus frederik.kr...@gmail.com

 I just had a look at the thread-dump, pasting 3 examples here:


 'pool-31-thread-8233' Id=11626, BLOCKED on
 lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
 total cpu time=20.ms user time=20.ms
 at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
 at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
 at
 org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
 at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
 at
 org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
 at
 org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
 at
 org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
 at
 org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
 at
 org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
 at
 org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 at
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
 at
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 'pool-31-thread-8232' Id=11625, BLOCKED on
 lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
 total cpu time=20.ms user time=20.ms
 at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
 at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 at
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
 at
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 and

 'http-8080-381' Id=6859, WAITING on
 lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720,
 total cpu time=990.ms user time=920.ms

 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
 at
 org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
 at
 

FieldCollapsing don't return every groups

2011-09-28 Thread Rémy Loubradou
Hello,

I'm using the field collapsing feature to group my products by merchant and
I don't understand why some merchant are missing on the result send by solr.
My request is
http:/localhost:8983/solr/select/?q=merchant_name_t:*version=2.2start=0rows=2000indent=ongroup=truegroup.field=merchant_name_tfl=merchant_name_twt=json.
Currently the request return 166 merchants and it should return more than
that.
Did I do something wrong in my query?

Thank you,
Remy


Re: FieldCollapsing don't return every groups

2011-09-28 Thread lboutros
Hi Remy,

could you paste the analyzer part of the field merchant_name_t please ?

And when you say it should return more than that, could you explain why
with examples ?

If I'm not wrong, the field collapsing function is based on indexed values, 
so if your analyzer is complex (not string),

Rémy Loubradou can be indexed as remy and loubradou.

And Rémy NotLoubradou could be grouped with Rémy Loubradou.

This could explain the behavior.

Ludovic.


-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sort five random Top Offers to the top

2011-09-28 Thread MOuli
Hey Community.

I write my first component and now i got a problem hear is my code: 

@Override
public void prepare(ResponseBuilder rb) throws IOException {
try {
rb.req.getParams().getBool(topoffers.show, true);
String client = rb.req.getParams().get(client, 1);
BooleanQuery[] queries = new BooleanQuery[2];
queries[0] = (BooleanQuery) DisMaxQParser.getParser(
rb.req.getParams().get(q),
DisMaxQParserPlugin.NAME,
rb.req)
.getQuery();
queries[1] = new BooleanQuery();
Occur occur = BooleanClause.Occur.MUST;
queries[1].add(QueryParsing.parseQuery(ups_topoffer_ + client
+ :true, rb.req.getSearcher().getSchema()), occur);

Query q = Query.mergeBooleanQueries(queries[0], queries[1]);

DocList ergebnis = rb.req.getSearcher().getDocList(q, null,
null, 0, 5, 0);

String[] machineIds = new String[5];
int position = 0;
DocIterator iter = ergebnis.iterator();
while (iter.hasNext()) {
int docID = iter.nextDoc();
Document doc =
rb.req.getSearcher().getReader().document(docID);
for (String value : doc.getValues(machine_id)) {
machineIds[position++] = value;
}
}

Sort sort = rb.getSortSpec().getSort();
if (sort == null) {
rb.getSortSpec().setSort(new Sort());
sort = rb.getSortSpec().getSort();
}

SortField[] newSortings = new SortField[sort.getSort().length +
5];
int count = 0;
for (String machineId : machineIds) {
SortField sortMachineId = new SortField(map(machine_id, +
machineId + , + machineId + ,1,0) desc, SortField.DOUBLE);
newSortings[count++] = sortMachineId;
}

SortField[] sortings = sort.getSort();
for (SortField sorting : sortings) {
newSortings[count++] = sorting;
}

sort.setSort(newSortings);

rb.getSortSpec().setSort(sort);

} catch (ParseException e) {
LoggerFactory.getLogger(Topoffers.class).error( Fehler bei den
Topoffers!, this);
LoggerFactory.getLogger(Topoffers.class).error(e.toString(),
this);
}

}

Why can't i manipulate the sort? Is there something i miss understand?

This search component is added as a first-component in the solrconfig.xml.

Please can anyone help me??


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3376166.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Still too many files after running solr optimization

2011-09-28 Thread Manish Bafna
Will it not merge the index?
While merging on windows, the old index files dont get deleted.
(Windows has an issue where the file opened for reading cannot be
deleted)

So, if you call optimize again, it will delete the older index files.

On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann
v.kisselm...@googlemail.com wrote:
 if numDocs und maxDocs have the same mumber of docs nothing will be deleted
 on optimize.
 You only rebuild your index.

 Regards
 Vadim




 2011/9/28 Kissue Kissue kissue...@gmail.com

 numDocs and maxDocs are same size.

 I was worried because when i used to use only Lucene for the same indexing,
 before optimization there are many files but after optimization i always
 end
 up with just 3 files in my index filder. Just want to find out if this was
 ok.

 Thanks

 On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
 v.kisselm...@googlemail.com wrote:

  why should the optimization reduce the number of files?
  It happens only when you indexing docs with same unique key.
 
  Have you differences in numDocs und maxDocs after optimize?
  If yes:
  how is your optimize command ?
 
  Regards
  Vadim
 
 
 
  2011/9/28 Manish Bafna manish.bafna...@gmail.com
 
   Try to do optimize twice.
   The 2nd one will be quick and will delete lot of files.
  
   On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
   wrote:
Hi,
   
I am using solr 3.3. I noticed  that after indexing about 700, 000
   records
and running optimization at the end, i still have about 91 files in
 my
   index
directory. I thought that optimization was supposed to reduce the
  number
   of
files.
   
My settings are the default that came with Solr (mergefactor, etc)
   
Any ideas what i could be doing wrong?
   
  
 




Re: FieldCollapsing don't return every groups

2011-09-28 Thread Rémy Loubradou
Hi Ludovic,

I'm not sure to understand which piece of my schema expose the analyzer so
you will find my schema here
https://github.com/lbdremy/solr-install/blob/master/conf/schema.xml. Hope
this will be helpfull :)
The merchant_name_t is a dynamic field matching the *_t pattern so this
field is indexed and the type is text_general.

When I said it should return more than that, I mean the result send by
solr contains 166 groups(=merchants) and it should return more than 166
groups(merchants).
For example the merchant Cult Beauty Ltd. doesn't not appear in the result
and others merchants don't begin by Cult, so where this merchant is
grouped?

Thank you very much for your help Ludovic.

Rémy,

On 28 September 2011 15:52, lboutros boutr...@gmail.com wrote:

 Hi Remy,

 could you paste the analyzer part of the field merchant_name_t please ?

 And when you say it should return more than that, could you explain why
 with examples ?

 If I'm not wrong, the field collapsing function is based on indexed values,
 so if your analyzer is complex (not string),

 Rémy Loubradou can be indexed as remy and loubradou.

 And Rémy NotLoubradou could be grouped with Rémy Loubradou.

 This could explain the behavior.

 Ludovic.


 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376089.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: strange performance issue with many shards on one server

2011-09-28 Thread Toke Eskildsen
On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
 - 10 shards per server (needed for response times) running in a single tomcat 
 instance

Have you tested that sharding actually decreases response times in your
case? I see the idea in decreasing response times with sharding at the
cost of decreasing throughput, but the added overhead of merging is
non-trivial.

 - each query queries all 20 shards (distributed search)
 
 - each shard holds about 1.5 mio documents (small shards are needed due to 
 rather complex queries)
 - all caches are warmed / high cache hit rates (99%) etc.

 Now for some reason we cannot seem to fully utilize all CPU power (no disk 
 IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, 
 decreases throughput and increases the response times of the individual 
 queries.

It sounds as if there's a hard limit on the number of concurrent users
somewhere. I am no expert in httpclient, but the blocked threads in your
thread dump seems to indicate that they wait for connections to be
established rather than for results to be produced.

I seem to remember that tomcat has a default limit on 200 concurrent
connections and with 10 shards/search, that is just 200 / (10
shard_connections + 1 incoming_connection) = 18 concurrent searches.

 Also 1-2% of the queries take significantly longer: avg somewhere at 100ms 
 while 1-2% take 1.5s or longer. 

Could be garbage collection, especially since it shows under high load
which might result in more old objects and thereby trigger full gc.



Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
2011/9/28 Manish Bafna manish.bafna...@gmail.com

 Will it not merge the index?


yes


 While merging on windows, the old index files dont get deleted.
 (Windows has an issue where the file opened for reading cannot be
 deleted)
 
 So, if you call optimize again, it will delete the older index files.

 no.
during optimize you only delete docs, which are flagged as deleted. no
matter how old they are.
if your numDocs and maxDocs have the same number of Docs, you only rebuild
and merge your index, but you delete nothing.

Regards




 On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
  if numDocs und maxDocs have the same mumber of docs nothing will be
 deleted
  on optimize.
  You only rebuild your index.
 
  Regards
  Vadim
 
 
 
 
  2011/9/28 Kissue Kissue kissue...@gmail.com
 
  numDocs and maxDocs are same size.
 
  I was worried because when i used to use only Lucene for the same
 indexing,
  before optimization there are many files but after optimization i always
  end
  up with just 3 files in my index filder. Just want to find out if this
 was
  ok.
 
  Thanks
 
  On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
  v.kisselm...@googlemail.com wrote:
 
   why should the optimization reduce the number of files?
   It happens only when you indexing docs with same unique key.
  
   Have you differences in numDocs und maxDocs after optimize?
   If yes:
   how is your optimize command ?
  
   Regards
   Vadim
  
  
  
   2011/9/28 Manish Bafna manish.bafna...@gmail.com
  
Try to do optimize twice.
The 2nd one will be quick and will delete lot of files.
   
On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
 
wrote:
 Hi,

 I am using solr 3.3. I noticed  that after indexing about 700, 000
records
 and running optimization at the end, i still have about 91 files
 in
  my
index
 directory. I thought that optimization was supposed to reduce the
   number
of
 files.

 My settings are the default that came with Solr (mergefactor, etc)

 Any ideas what i could be doing wrong?

   
  
 
 



Re: Still too many files after running solr optimization

2011-09-28 Thread Manish Bafna
We tested it so many times.
1st time we optimize, the new index file is created (merged one), but
the existing index files are not deleted (because they might be still
open for reading)
2nd time optimize, other than the new index file, all else gets deleted.

This is happening specifically on Windows.

On Wed, Sep 28, 2011 at 8:23 PM, Vadim Kisselmann
v.kisselm...@googlemail.com wrote:
 2011/9/28 Manish Bafna manish.bafna...@gmail.com

 Will it not merge the index?


 yes


 While merging on windows, the old index files dont get deleted.
 (Windows has an issue where the file opened for reading cannot be
 deleted)
 
 So, if you call optimize again, it will delete the older index files.

 no.
 during optimize you only delete docs, which are flagged as deleted. no
 matter how old they are.
 if your numDocs and maxDocs have the same number of Docs, you only rebuild
 and merge your index, but you delete nothing.

 Regards




 On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
  if numDocs und maxDocs have the same mumber of docs nothing will be
 deleted
  on optimize.
  You only rebuild your index.
 
  Regards
  Vadim
 
 
 
 
  2011/9/28 Kissue Kissue kissue...@gmail.com
 
  numDocs and maxDocs are same size.
 
  I was worried because when i used to use only Lucene for the same
 indexing,
  before optimization there are many files but after optimization i always
  end
  up with just 3 files in my index filder. Just want to find out if this
 was
  ok.
 
  Thanks
 
  On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
  v.kisselm...@googlemail.com wrote:
 
   why should the optimization reduce the number of files?
   It happens only when you indexing docs with same unique key.
  
   Have you differences in numDocs und maxDocs after optimize?
   If yes:
   how is your optimize command ?
  
   Regards
   Vadim
  
  
  
   2011/9/28 Manish Bafna manish.bafna...@gmail.com
  
Try to do optimize twice.
The 2nd one will be quick and will delete lot of files.
   
On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
 
wrote:
 Hi,

 I am using solr 3.3. I noticed  that after indexing about 700, 000
records
 and running optimization at the end, i still have about 91 files
 in
  my
index
 directory. I thought that optimization was supposed to reduce the
   number
of
 files.

 My settings are the default that came with Solr (mergefactor, etc)

 Any ideas what i could be doing wrong?

   
  
 
 




Re: FieldCollapsing don't return every groups

2011-09-28 Thread lboutros
Ok, thanks for the schema.

the merchant Cult Beauty Ltd should be indexed like this:

cult 
beauty 
ltd

I think some other merchants contain at least one of these words.

you should try to group with a special field used for field collapsing:

dynamicField name=*_t_group  type=stringindexed=true 
stored=true/

I think you could even disable the stored value for this particular field
(not sure, I have to check).

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376289.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
we had an understanding problem:)

docs are the docs in index.
files are the files in the index directory (index parts).

during the optimization you don't delete docs if they are don't flagged as
deleted.
but you merge your index und delete the files in your index directory, thats
right.

after an second optimize the files are deleted which were opened for
reading.

Regards



2011/9/28 Manish Bafna manish.bafna...@gmail.com

 We tested it so many times.
 1st time we optimize, the new index file is created (merged one), but
 the existing index files are not deleted (because they might be still
 open for reading)
 2nd time optimize, other than the new index file, all else gets deleted.

 This is happening specifically on Windows.

 On Wed, Sep 28, 2011 at 8:23 PM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
  2011/9/28 Manish Bafna manish.bafna...@gmail.com
 
  Will it not merge the index?
 
 
  yes
 
 
  While merging on windows, the old index files dont get deleted.
  (Windows has an issue where the file opened for reading cannot be
  deleted)
  
  So, if you call optimize again, it will delete the older index files.
 
  no.
  during optimize you only delete docs, which are flagged as deleted. no
  matter how old they are.
  if your numDocs and maxDocs have the same number of Docs, you only
 rebuild
  and merge your index, but you delete nothing.
 
  Regards
 
 
 
 
  On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann
  v.kisselm...@googlemail.com wrote:
   if numDocs und maxDocs have the same mumber of docs nothing will be
  deleted
   on optimize.
   You only rebuild your index.
  
   Regards
   Vadim
  
  
  
  
   2011/9/28 Kissue Kissue kissue...@gmail.com
  
   numDocs and maxDocs are same size.
  
   I was worried because when i used to use only Lucene for the same
  indexing,
   before optimization there are many files but after optimization i
 always
   end
   up with just 3 files in my index filder. Just want to find out if
 this
  was
   ok.
  
   Thanks
  
   On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
   v.kisselm...@googlemail.com wrote:
  
why should the optimization reduce the number of files?
It happens only when you indexing docs with same unique key.
   
Have you differences in numDocs und maxDocs after optimize?
If yes:
how is your optimize command ?
   
Regards
Vadim
   
   
   
2011/9/28 Manish Bafna manish.bafna...@gmail.com
   
 Try to do optimize twice.
 The 2nd one will be quick and will delete lot of files.

 On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue 
 kissue...@gmail.com
  
 wrote:
  Hi,
 
  I am using solr 3.3. I noticed  that after indexing about 700,
 000
 records
  and running optimization at the end, i still have about 91
 files
  in
   my
 index
  directory. I thought that optimization was supposed to reduce
 the
number
 of
  files.
 
  My settings are the default that came with Solr (mergefactor,
 etc)
 
  Any ideas what i could be doing wrong?
 

   
  
  
 
 



Re: FieldCollapsing don't return every groups

2011-09-28 Thread lboutros
I just checked, you can disable the storing parameter and use this field:

dynamicField name=*_t_group  type=stringindexed=true 
stored=false/

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FieldCollapsing don't return every groups

2011-09-28 Thread Rémy Loubradou
You right one of the group is 'ltd. Thanks :)

I fixed this issue using a field that I know is unique for each merchant
(the merchant id).

Again thanks for your help Ludovic.

Sinon en France il fait beau? :)


On 28 September 2011 16:56, lboutros boutr...@gmail.com wrote:

 Ok, thanks for the schema.

 the merchant Cult Beauty Ltd should be indexed like this:

 cult
 beauty
 ltd

 I think some other merchants contain at least one of these words.

 you should try to group with a special field used for field collapsing:

 dynamicField name=*_t_group  type=stringindexed=true
 stored=true/

 I think you could even disable the stored value for this particular field
 (not sure, I have to check).

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376289.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: FieldCollapsing don't return every groups

2011-09-28 Thread lboutros
excellent !

and yes, il fait très beau en France :)

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376362.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching multiple fields

2011-09-28 Thread Way Cool
It will be nice if we can have dissum in addition to dismax. ;-)

On Tue, Sep 27, 2011 at 9:26 AM, lee carroll
lee.a.carr...@googlemail.comwrote:

 see


 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html



 On 27 September 2011 16:04, Mark static.void@gmail.com wrote:
  I thought that a similarity class will only affect the scoring of a
 single
  field.. not across multiple fields? Can anyone else chime in with some
  input? Thanks.
 
  On 9/26/11 9:02 PM, Otis Gospodnetic wrote:
 
  Hi Mark,
 
  Eh, I don't have Lucene/Solr source code handy, but I *think* for that
  you'd need to write custom Lucene similarity.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
  
  From: Markstatic.void@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, September 26, 2011 8:12 PM
  Subject: Searching multiple fields
 
  I have a use case where I would like to search across two fields but I
 do
  not want to weight a document that has a match in both fields higher
 than a
  document that has a match in only 1 field.
 
  For example.
 
  Document 1
  - Field A: Foo Bar
  - Field B: Foo Baz
 
  Document 2
  - Field A: Foo Blarg
  - Field B: Something else
 
  Now when I search for Foo I would like document 1 and 2 to be
 similarly
  scored however document 1 will be scored much higher in this use case
  because it matches in both fields. I could create a third field and use
  copyField directive to search across that but I was wondering if there
 is an
  alternative way. It would be nice if we could search across some sort
 of
  virtual field that will use both underlying fields but not actually
  increase the size of the index.
 
  Thanks
 
 
 
 



Re: strange performance issue with many shards on one server

2011-09-28 Thread Ken Krugler
Hi Frederik,

I haven't directly run into this issue with Solr, but I have experienced 
similar issues in a related context.

In my case, I had a custom webapp that made SolrJ requests and then generated 
some aggregated/analyzed results.

During load testing, we ran into a few different issues...

1. The load test software itself had an issue with scaling - I'm assuming 
that's not the case for you, but I've seen it happen more than once.

E.g. there's a limit to max parallel connections in the client being used to 
talk to Solr.

2. We needed to tune up the SolrJ settings for the HttpConnectionManager

Under heavy load, this was running out of free connections.

Given you've got 20 shards, each request is going to spawn 20 HTTP connections.

I don't know off the top of my head how solr.SearchHandler manages connections 
(and whether it's possible to tune this), but from the stack trace below it 
sure looks like you're blocked on getting free HTTP connections.

3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.

There are lots of knobs to twiddle here, for better or worse.

-- Ken

On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

 I just had a look at the thread-dump, pasting 3 examples here:
 
 
 'pool-31-thread-8233' Id=11626, BLOCKED on 
 lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
  total cpu time=20.ms user time=20.ms
 at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
  
 at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
  
 at 
 org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
  
 at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
  
 at 
 org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
  
 at 
 org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
  
 at 
 org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
  
 at 
 org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
  
 at 
 org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
  
 at 
 org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
  
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
  
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
  
 at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
  
 at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
  
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  
 at java.lang.Thread.run(Thread.java:662) 
 
 'pool-31-thread-8232' Id=11625, BLOCKED on 
 lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
  total cpu time=20.ms user time=20.ms
 at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
  
 at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
  
 at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
  
 at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
 at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
  
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
  
 at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
  
 at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
  
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at 

synonym filtering at index time

2011-09-28 Thread Doug McKenzie
Trying to add in synonyms at index time but it's not working as 
expected. Here's the schema and example from synonyms.txt


synonyms.txt has :
watch, watches, watche, watchs

schema for the field :
fieldType name=text_ngram class=solr.TextField 
positionIncrementGap=100

analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt enablePositionIncrement=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=false/
filter class=solr.EdgeNGramFilterFactory minGramSize=2 
maxGramSize=15 side=front/

/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

When I run analysis, the index query correctly shows watche = watch, 
which is then EdgeNGrammed


My understanding of how this is meant to work is that solr will index 
all instances of 'watche' as 'watch' when expand=false


This doesn't seem to be happening though. Any ideas on what I'm missing?

I initially set the synonym filtering to run at query time as its user 
input however that was returning the same results so I thought it might 
be because those terms were already in the index and would therefore 
show up in the results


Thanks
Doug


--
Become a Firebox Fan on Facebook: http://facebook.com/firebox
And Follow us on Twitter: http://twitter.com/firebox

Firebox has been nominated for Retailer of the Year in the 2011 Stuff Awards. 
Who will win? It's up to you! Visit http://www.stuff.tv/awards and place your 
vote. We'll do a special dance if it's us.

Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to  shiny new 
digs in Shoreditch. As of 3rd October please update your records to:
Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1 6JJ

Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
Firebox.com Ltd is registered in England and Wales, company number 3874477
Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com

Any views expressed in this email are those of the individual sender, except 
where the sender expressly, and with authority, states them to be the views of 
Firebox.com Ltd.


RE: strange performance issue with many shards on one server

2011-09-28 Thread Jaeger, Jay - DOT
That  would still show up as the CPU being busy.

-Original Message-
From: Federico Fissore [mailto:feder...@fissore.org] 
Sent: Wednesday, September 28, 2011 6:12 AM
To: solr-user@lucene.apache.org
Subject: Re: strange performance issue with many shards on one server

Frederik Kraus, il 28/09/2011 12:58, ha scritto:
   Hi,


 I am experiencing a strange issue doing some load tests. Our setup:


just because I've listened to JUG mates talking about that at the last 
meeting, could it be that your CPUs are spending their time getting 
things from RAM to CPU cache?

maybe that, say, 10% CPU power is spent on the bus

federico


Re: strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus
 Hi Ken,  

the HttpConnectionManager was actually the first thing I looked at - and bumped 
the Solr default of 20 up to 50, 100, 400, 1 (which should be more or less 
unlimited ;) ). Unfortunately didn't really solve anything. I don't know if the 
static HttpClient is a problem here as it will be the same 
HttpConnectionManager for all shards …

Obviously a way of validating this would be to spawn 20 tomcat (or jetty) 
instances, one for each shard and 10 per server - hopefully there is an easier 
way ;)

By the way: Ubuntu / GC / etc. are all tuned and shouldn't be a bottleneck 
here. The GC only spends about 50-100ms during a 10min load test, and never a 
full-GC.  

Just going through a jstack dump again, it looks like the HttpConnectionManager 
is actually waiting for a lock …

pool-31-thread-15776 prio=10 tid=0x7ef544249000 nid=0x50be waiting for 
monitor entry [0x7ef4d38fc000]
 java.lang.Thread.State: BLOCKED (on object monitor)
 at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
 - waiting to lock 0x7f07dd6bfa70 (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
 at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
 at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
….

Fred.  


Am Mittwoch, 28. September 2011 um 17:48 schrieb Ken Krugler:

 Hi Frederik,
  
 I haven't directly run into this issue with Solr, but I have experienced 
 similar issues in a related context.
  
 In my case, I had a custom webapp that made SolrJ requests and then generated 
 some aggregated/analyzed results.
  
 During load testing, we ran into a few different issues...
  
 1. The load test software itself had an issue with scaling - I'm assuming 
 that's not the case for you, but I've seen it happen more than once.
  
 E.g. there's a limit to max parallel connections in the client being used to 
 talk to Solr.
  
 2. We needed to tune up the SolrJ settings for the HttpConnectionManager
  
 Under heavy load, this was running out of free connections.
  
 Given you've got 20 shards, each request is going to spawn 20 HTTP 
 connections.
  
 I don't know off the top of my head how solr.SearchHandler manages 
 connections (and whether it's possible to tune this), but from the stack 
 trace below it sure looks like you're blocked on getting free HTTP 
 connections.
  
 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.
  
 There are lots of knobs to twiddle here, for better or worse.
  
 -- Ken
  
 On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:
  
  I just had a look at the thread-dump, pasting 3 examples here:
   
   
  'pool-31-thread-8233' Id=11626, BLOCKED on 
  lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
   total cpu time=20.ms user time=20.ms
  at 
  org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)

  at 
  org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)

  at 
  org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)

  at 
  org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)

  at 
  org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)

  at 
  org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)

  at 
  org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)

  at 
  org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)

  at 
  org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)

  at 
  org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)

  at 
  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)

  at 
  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)

  at 
  org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)

  at 
  org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)

  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
  at 

Re: strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus


Am Mittwoch, 28. September 2011 um 16:40 schrieb Toke Eskildsen:

 On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
  - 10 shards per server (needed for response times) running in a single 
  tomcat instance
 
 Have you tested that sharding actually decreases response times in your
 case? I see the idea in decreasing response times with sharding at the
 cost of decreasing throughput, but the added overhead of merging is
 non-trivial.
Yep unfortunately, the queries have huge boolean filterqueries for ACLs etc. 
which just take too long to compute in a single thread.

 
  - each query queries all 20 shards (distributed search)
  
  - each shard holds about 1.5 mio documents (small shards are needed due to 
  rather complex queries)
  - all caches are warmed / high cache hit rates (99%) etc.
 
  Now for some reason we cannot seem to fully utilize all CPU power (no disk 
  IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, 
  decreases throughput and increases the response times of the individual 
  queries.
 
 It sounds as if there's a hard limit on the number of concurrent users
 somewhere. I am no expert in httpclient, but the blocked threads in your
 thread dump seems to indicate that they wait for connections to be
 established rather than for results to be produced.
 
 I seem to remember that tomcat has a default limit on 200 concurrent
 connections and with 10 shards/search, that is just 200 / (10
 shard_connections + 1 incoming_connection) = 18 concurrent searches.
 

I have gradually bumped all of this up to (almost) infinity with no effect ;)


  Also 1-2% of the queries take significantly longer: avg somewhere at 100ms 
  while 1-2% take 1.5s or longer. 
 
 Could be garbage collection, especially since it shows under high load
 which might result in more old objects and thereby trigger full gc.
 GC is only spending something like 50-100ms total for a 10min load test 





Date Faceting | Range Faceting patch not working

2011-09-28 Thread Rohit
Hi,

 

We extensively use date faceting in our application, but now since the index
has become very big we are dividing into shards. Since date/range faceting
don't work on Shards I was trying to apply the path to my Solr, currently
using 3.1 but planning for 3.4 upgrade.

 

https://issues.apache.org/jira/browse/SOLR-1709

 

The path is not working on both 3.1 and 3.4 version, how else can I apply
the patch?

 

Regards,

Rohit

 



Re: Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....

2011-09-28 Thread Ravish Bhagdev
Thanks Chris.  Yes, changing connector settings not just in solr but also in
all webapps that were sending queries into it solved the problem!
 Appreciate the help.

R

On Tue, Sep 13, 2011 at 6:11 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Any idea why solr is unable to return the pound sign as-is?
 :
 : I tried typing in £ 1 million in Solr admin GUI and got following
 response.
 ...
 : str name=q£ 1 million/str
...
 : Here is my Java Properties I got also from admin interface:
...
 : catalina.home =
 : /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/

 Looks like you are using tomcat, so I suspect you are getting bit by
 this...

 https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

 If that's not the problem, please try running the
 example/exampledocs/test_utf8.sh script against your Solr instance (you'll
 need to change the URL variable to match your host:port)


 -Hoss


Re: Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound

2011-09-28 Thread Martijn v Groningen
Hi Frank,

How is Solr deployed? And how did you upgrade?
The commons-lang library (containing ArrayUtils) is included in the
Solr war file.

Martijn

On 28 September 2011 09:16, Frank Romweber fr...@romweber.de wrote:
 I use drupal for accessing the solr search engine. After updating an
 creating my new index everthing works as before. Then I activate the
 group=true and group.field=site and solr delivers me the wanted search
 results but in Drupal nothing appears just an empty search page. I found out
 that the group changes the resultset names. No problem solr offers for this
 case the group.main=true parameter. So I added this and get this 500 error.

 HTTP Status 500 - org/apache/commons/lang/ArrayUtils
 java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at
 org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573)
 at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) at
 org.apache.solr.search.Grouping.execute(Grouping.java:339) at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:662)

 I found out that solr didt find the class ArrayUtils.class. I try a lot of
 things to get this work. Setting JAVA_HOME and CLASSPATH vars and I changed
 the jre without any success. I am really wondering all my other programms
 are still running even solr in the normal mode is working and accesibly
 but not the group.main=true function.

 So my question is now what is nessesary to get this work?
 Any help is apreciated.

 Thx frank






-- 
Met vriendelijke groet,

Martijn van Groningen


Solr Hanging While Building Suggester Index

2011-09-28 Thread Stephen Duncan Jr
We have a separate Java process indexing to Solr using SolrJ.  We are
using Solr 3.4.0, and Jetty version 8.0.1.v20110908.  We experienced
Solr hanging today.  For a period of approximately 10 minutes, it did
not respond to queries.  Our indexer sends a query to build a
spellcheck index after committing once it's added all new documents
(because we have auto-commits that we don't want to trigger rebuilding
the spellcheck, we don't use buildOnCommit), and then sends a query to
build the suggest component index.  We see this from the Solr log
during the period it was hung (we attempted to send several queries
during this time, but they do not appear in the log, or appear after
waiting for several minutes):

2011-09-28 13:18:03,217 [qtp10884088-13] INFO
org.apache.solr.core.SolrCore - [report] webapp= path=/select
params={spellcheck=trueqt=dismaxwt=javabinrows=0spellcheck.build=trueversion=2}
hits=98772 status=0 QTime=173594
2011-09-28 13:28:18,857 [qtp10884088-89] INFO
org.apache.solr.spelling.suggest.Suggester - build()
...
2011-09-28 13:29:02,873 [qtp10884088-89] INFO
org.apache.solr.core.SolrCore - [report] webapp= path=/suggest
params={spellcheck=trueqt=/suggestwt=javabinspellcheck.build=trueversion=2}
status=0 QTime=44016

In our indexer log, we see just after this (13:28:19,217) the call to
build our suggestion index (which comes right after building the
spellcheck index) times out and throws a NoHttpResponseException: The
server localhost failed to respond.

Any ideas?  Anything else we should look at to help diagnose?
---
Stephen Duncan Jr
www.stephenduncanjr.com


Re: strange performance issue with many shards on one server

2011-09-28 Thread Federico Fissore

Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:

That  would still show up as the CPU being busy.



i don't know how the program (top, htop, whatever) displays the value 
but when the cpu has a cache miss definitely that thread sits and waits 
for a number of clock cycles

with 130GB of ram (per server?) I suspect caches miss as a rule

just a suspicion however, nothing I'll bet on


Re: Still too many files after running solr optimization

2011-09-28 Thread Chris Hostetter

: I was worried because when i used to use only Lucene for the same indexing,
: before optimization there are many files but after optimization i always end
: up with just 3 files in my index filder. Just want to find out if this was
: ok.

It sounds like you were most likely using the Compound File Format 
(which causes multiple per-field files to be encapsultated into a single 
file per segment) when you were using Lucene directly (i believe it is the 
default) but in Solr you are not.

check the useCompoundFile setting(s) in your solrconfig.xml

https://lucene.apache.org/java/3_4_0/fileformats.html#Compound%20Files

For most Solr users, the compound file format is a bad idea because it 
can decreases performance -- the only reason to use it is if you are in a 
heavily constraind setup where you need to be very restrictive about the 
number of open file handles.


-Hoss


Trouble configuring multicore / accessing admin page

2011-09-28 Thread Joshua Miller
Hello,

I am trying to get SOLR working with multiple cores and have a problem 
accessing the admin page once I configure multiple cores.

Problem:
When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, 
missing core name in path.

Question:  when using the multicore option, is the standard admin page still 
available?

Environment:
- solr 1.4.1
- Windows server 2008 R2
- Java SE 1.6u27
- Tomcat 6.0.33
- Solr Experience:  none

I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the 
following contents:

solr persistent=true sharedLib=lib
  cores adminPath=/admij/cores
core name=core0 instanceDir=cores/core0 /
core name=core1 instanceDir=cores/core1 /
  /cores
/solr

I have copied the example/solr directory to c:\solr and have populated that 
directory with the cores/{core{0,1}} as well as the proper configs and data 
directories within.

When I restart tomcat, it shows a couple of exceptions related to 
queryElevationComponent and null pointers that I think are due to the DB not 
yet being available but I see that the cores appear to initialize properly 
other than that

So the problem I'm looking to solve/clarify here is the admin page - should 
that remain available and usable when using the multicore configuration or am I 
doing something wrong?  Do I need to use the CoreAdminHandler type requests to 
manage multicore instead?

Thanks,
--
Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/



Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Shawn Heisey

On 9/28/2011 1:40 PM, Joshua Miller wrote:

I am trying to get SOLR working with multiple cores and have a problem 
accessing the admin page once I configure multiple cores.

Problem:
When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, 
missing core name in path.

Question:  when using the multicore option, is the standard admin page still 
available?


When you enable multiple cores, the URL syntax becomes a little 
different.  On 1.4.1 and 3.2.0, I ran into a problem where the trailing 
/ is required on this URL, but that problem seems to be fixed in 3.4.0:


http://host:port/solr/corename/admin/

If you put a defaultCoreName=somecore into the cores tag in 
solr.xml, the original /solr/admin URL should work as well.  I just 
tried it on Solr 3.4.0 and it does work.  According to the wiki, it 
should work in 1.4 as well.  I don't have a 1.4.1 server any more, so I 
can't verify that.


http://wiki.apache.org/solr/CoreAdmin#cores

Thanks,
Shawn



Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Rahul Warawdekar
Hi Joshua,

Can you try updating your solr.xml as follows:
Specify
core name=core0 instanceDir=/core0 / instead of
core name=core0 instanceDir=cores/core0 /

Basically remove the extra text cores in the core element from the
instanceDir attribute.

Just try and let us know if it works.

On Wed, Sep 28, 2011 at 3:40 PM, Joshua Miller jos...@itsecureadmin.comwrote:

 Hello,

 I am trying to get SOLR working with multiple cores and have a problem
 accessing the admin page once I configure multiple cores.

 Problem:
 When accessing the admin page via http://solrhost:8080/solr/admin, I get a
 404, missing core name in path.

 Question:  when using the multicore option, is the standard admin page
 still available?

 Environment:
 - solr 1.4.1
 - Windows server 2008 R2
 - Java SE 1.6u27
 - Tomcat 6.0.33
 - Solr Experience:  none

 I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with
 the following contents:

 solr persistent=true sharedLib=lib
  cores adminPath=/admij/cores
core name=core0 instanceDir=cores/core0 /
core name=core1 instanceDir=cores/core1 /
  /cores
 /solr

 I have copied the example/solr directory to c:\solr and have populated that
 directory with the cores/{core{0,1}} as well as the proper configs and data
 directories within.

 When I restart tomcat, it shows a couple of exceptions related to
 queryElevationComponent and null pointers that I think are due to the DB not
 yet being available but I see that the cores appear to initialize properly
 other than that

 So the problem I'm looking to solve/clarify here is the admin page - should
 that remain available and usable when using the multicore configuration or
 am I doing something wrong?  Do I need to use the CoreAdminHandler type
 requests to manage multicore instead?

 Thanks,
 --
 Josh Miller
 Open Source Solutions Architect
 (425) 737-2590
 http://itsecureadmin.com/




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Joshua Miller
On Sep 28, 2011, at 1:03 PM, Shawn Heisey wrote:

 On 9/28/2011 1:40 PM, Joshua Miller wrote:
 I am trying to get SOLR working with multiple cores and have a problem 
 accessing the admin page once I configure multiple cores.
 
 Problem:
 When accessing the admin page via http://solrhost:8080/solr/admin, I get a 
 404, missing core name in path.
 
 Question:  when using the multicore option, is the standard admin page still 
 available?
 
 When you enable multiple cores, the URL syntax becomes a little different.  
 On 1.4.1 and 3.2.0, I ran into a problem where the trailing / is required on 
 this URL, but that problem seems to be fixed in 3.4.0:
 
 http://host:port/solr/corename/admin/
 
 If you put a defaultCoreName=somecore into the cores tag in solr.xml, the 
 original /solr/admin URL should work as well.  I just tried it on Solr 3.4.0 
 and it does work.  According to the wiki, it should work in 1.4 as well.  I 
 don't have a 1.4.1 server any more, so I can't verify that.
 
 http://wiki.apache.org/solr/CoreAdmin#cores

Hi Shawn,

Thanks for the quick response.

I can't get any of those combinations to work.

I've added the defaultCoreName=core0 into the solr.xml and restarted and 
tried the following combinations:

http://host:port/solr/admin
http://host:port/solr/admin/
http://host:port/solr/core0/admin/
…
(and many others)

I'm stuck on 1.4.1 at least temporarily as I'm taking over an application from 
another resource and need to get it up and running before modifying anything so 
any help here would be greatly appreciated.

Thanks, 

Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/

Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Joshua Miller
On Sep 28, 2011, at 1:17 PM, Rahul Warawdekar wrote:

 Can you try updating your solr.xml as follows:
 Specify
 core name=core0 instanceDir=/core0 / instead of
 core name=core0 instanceDir=cores/core0 /
 
 Basically remove the extra text cores in the core element from the
 instanceDir attribute.

I gave that a try and it didn't change anything.

Thanks,
Josh


RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Robert Petersen
Just go to localhost:8983 (or whatever other port you are using) and use
this path to see all the cores available on the box:

In your example this should give you a core list:

http://solrhost:8080/solr/

-Original Message-
From: Joshua Miller [mailto:jos...@itsecureadmin.com] 
Sent: Wednesday, September 28, 2011 1:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble configuring multicore / accessing admin page

On Sep 28, 2011, at 1:03 PM, Shawn Heisey wrote:

 On 9/28/2011 1:40 PM, Joshua Miller wrote:
 I am trying to get SOLR working with multiple cores and have a
problem accessing the admin page once I configure multiple cores.
 
 Problem:
 When accessing the admin page via http://solrhost:8080/solr/admin, I
get a 404, missing core name in path.
 
 Question:  when using the multicore option, is the standard admin
page still available?
 
 When you enable multiple cores, the URL syntax becomes a little
different.  On 1.4.1 and 3.2.0, I ran into a problem where the trailing
/ is required on this URL, but that problem seems to be fixed in 3.4.0:
 
 http://host:port/solr/corename/admin/
 
 If you put a defaultCoreName=somecore into the cores tag in
solr.xml, the original /solr/admin URL should work as well.  I just
tried it on Solr 3.4.0 and it does work.  According to the wiki, it
should work in 1.4 as well.  I don't have a 1.4.1 server any more, so I
can't verify that.
 
 http://wiki.apache.org/solr/CoreAdmin#cores

Hi Shawn,

Thanks for the quick response.

I can't get any of those combinations to work.

I've added the defaultCoreName=core0 into the solr.xml and restarted
and tried the following combinations:

http://host:port/solr/admin
http://host:port/solr/admin/
http://host:port/solr/core0/admin/
...
(and many others)

I'm stuck on 1.4.1 at least temporarily as I'm taking over an
application from another resource and need to get it up and running
before modifying anything so any help here would be greatly appreciated.

Thanks, 

Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/


Facet mappings

2011-09-28 Thread ntsrikanth
Hi,

  I got a set of values which needs to be mapped to a facet. For example, I
want to map the codes 
SC, AC to the facet value 'Catering',
HB to Half Board
AI, IN to All\ inclusive


I tried creating the following in the schema file.

fieldType name=alpine_field_boardbasis class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=boardbasis_synonyms.txt
ignoreCase=true expand=false/
/analyzer
/fieldType

copyField source=board_basis dest=Board Basis /
field multiValued=false name=Board Basis type=field_boardbasis
stored=false/



And in boardbasis_synonyms.txt

SC = Self\ Catering
CA = Catered\ Chalet
HB = Half\ Board
FB = Full\ Board
RO = Room\ only\ no\ kitchen\ facilities
EM = Self\ catering\ with\ evening\ meal
BB = Bed\ \ Breakfast
AI, IN = All\ inclusive


But when I do a query
(http://localhost:/solr/collection1/select/?q=brochure_year%3A12version=2.2start=0rows=1indent=onfacet=truefacet.field=Board%20Basis),
 

I get the following
lst name=Board Basis
int name=catering455/int
int name=self455/int
int name=board281/int
int name=half243/int
int name=catered114/int
int name=chalet114/int
int name=63/int
int name=bed63/int
int name=breakfast63/int
int name=evening45/int
int name=meal45/int
int name=with45/int
int name=full38/int
int name=all27/int
int name=inclusive27/int
int name=facilities9/int
int name=kitchen9/int
int name=no9/int
int name=only9/int
int name=room9/int


I am expecting to see  something like
lst name=Board Basis
int name=Catered Chalet455/int
int name=Self Catering455/int
int name=Half Board281/int




Thanks in advance,

Srikanth NT



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-mappings-tp3377317p3377317.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Joshua Miller
On Sep 28, 2011, at 1:24 PM, Robert Petersen wrote:

 Just go to localhost:8983 (or whatever other port you are using) and use
 this path to see all the cores available on the box:
 
 In your example this should give you a core list:
 
 http://solrhost:8080/solr/
 

I see  Welcome to Solr! and Solr Admin below that as a link.  When I click 
through the link, I get the 404 error, missing core name in path.



Thanks,

Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/




Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Shawn Heisey

On 9/28/2011 2:24 PM, Robert Petersen wrote:

Just go to localhost:8983 (or whatever other port you are using) and use
this path to see all the cores available on the box:

In your example this should give you a core list:

http://solrhost:8080/solr/


Now this is interesting.

If I have defaultCoreName in my solr.xml (on 3.4.0), the /solr URL only 
shows one admin link, which takes me to the /solr/admin/ page for my 
default core.  On that page, I do have links to all the other core admin 
pages, as usual.


If I don't have defaultCoreName, /solr shows admin links for all defined 
cores.


A quick search didn't turn up any Jira issues for this.  Is this 
intended behavior?


Thanks,
Shawn



Re: Solr Hanging While Building Suggester Index

2011-09-28 Thread Markus Jelsma
Is this a huge index? Keep in mind that most spellchecker implementations 
rebuild the index which can stall the entire process if there are millions of 
full text documents to process.

There is a new implementation called DirectSolrSpellchecker that doens't so a 
complete rebuild but i haven't tried it yet but should work with the 
SuggesterComonent. It's still experimental though.

 We have a separate Java process indexing to Solr using SolrJ.  We are
 using Solr 3.4.0, and Jetty version 8.0.1.v20110908.  We experienced
 Solr hanging today.  For a period of approximately 10 minutes, it did
 not respond to queries.  Our indexer sends a query to build a
 spellcheck index after committing once it's added all new documents
 (because we have auto-commits that we don't want to trigger rebuilding
 the spellcheck, we don't use buildOnCommit), and then sends a query to
 build the suggest component index.  We see this from the Solr log
 during the period it was hung (we attempted to send several queries
 during this time, but they do not appear in the log, or appear after
 waiting for several minutes):
 
 2011-09-28 13:18:03,217 [qtp10884088-13] INFO
 org.apache.solr.core.SolrCore - [report] webapp= path=/select
 params={spellcheck=trueqt=dismaxwt=javabinrows=0spellcheck.build=truev
 ersion=2} hits=98772 status=0 QTime=173594
 2011-09-28 13:28:18,857 [qtp10884088-89] INFO
 org.apache.solr.spelling.suggest.Suggester - build()
 ...
 2011-09-28 13:29:02,873 [qtp10884088-89] INFO
 org.apache.solr.core.SolrCore - [report] webapp= path=/suggest
 params={spellcheck=trueqt=/suggestwt=javabinspellcheck.build=trueversio
 n=2} status=0 QTime=44016
 
 In our indexer log, we see just after this (13:28:19,217) the call to
 build our suggestion index (which comes right after building the
 spellcheck index) times out and throws a NoHttpResponseException: The
 server localhost failed to respond.
 
 Any ideas?  Anything else we should look at to help diagnose?
 ---
 Stephen Duncan Jr
 www.stephenduncanjr.com


RE: strange performance issue with many shards on one server

2011-09-28 Thread Jaeger, Jay - DOT
Yes, that thread waits (in the sense that nothing useful gets done), but during 
that time, from the perspective of the applications and OS, that CPU is busy: 
it is not waiting in such a way that you can dispatch a different process.

The point is, that if this was actually the problem, it would show up in a 
higher CPU utilization than the correspondent reported.

-Original Message-
From: Federico Fissore [mailto:feder...@fissore.org] 
Sent: Wednesday, September 28, 2011 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: strange performance issue with many shards on one server

Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
 That  would still show up as the CPU being busy.


i don't know how the program (top, htop, whatever) displays the value 
but when the cpu has a cache miss definitely that thread sits and waits 
for a number of clock cycles
with 130GB of ram (per server?) I suspect caches miss as a rule

just a suspicion however, nothing I'll bet on


RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Jaeger, Jay - DOT
One time when we had that problem, it was because one or more cores had a 
broken XML configuration file. 
Another time, it was because solr/home was not set right in the servlet 
container.
Another time it was because we had an older EAR pointing to a newer release 
Solr home directory.  Given what you did, I suppose that is possible in your 
case, too.

In all cases, the Solr log provided hints as to what was going wrong.

JRJ

-Original Message-
From: Joshua Miller [mailto:jos...@itsecureadmin.com] 
Sent: Wednesday, September 28, 2011 2:41 PM
To: solr-user@lucene.apache.org
Subject: Trouble configuring multicore / accessing admin page

Hello,

I am trying to get SOLR working with multiple cores and have a problem 
accessing the admin page once I configure multiple cores.

Problem:
When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, 
missing core name in path.

Question:  when using the multicore option, is the standard admin page still 
available?

Environment:
- solr 1.4.1
- Windows server 2008 R2
- Java SE 1.6u27
- Tomcat 6.0.33
- Solr Experience:  none

I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the 
following contents:

solr persistent=true sharedLib=lib
  cores adminPath=/admij/cores
core name=core0 instanceDir=cores/core0 /
core name=core1 instanceDir=cores/core1 /
  /cores
/solr

I have copied the example/solr directory to c:\solr and have populated that 
directory with the cores/{core{0,1}} as well as the proper configs and data 
directories within.

When I restart tomcat, it shows a couple of exceptions related to 
queryElevationComponent and null pointers that I think are due to the DB not 
yet being available but I see that the cores appear to initialize properly 
other than that

So the problem I'm looking to solve/clarify here is the admin page - should 
that remain available and usable when using the multicore configuration or am I 
doing something wrong?  Do I need to use the CoreAdminHandler type requests to 
manage multicore instead?

Thanks,
--
Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/



UIMA DictionaryAnnotator partOfSpeach

2011-09-28 Thread chanhangfai
Hi all,

I have the dictionary Annotator UIMA-solr running, 
used my own dictionary file and it works, 
it will match all the words (Nouns, Verbs and Adjectives) from my dictionary
file.

*but now, if I only want to match Nouns,  (ignore other part of speech)*

how can I configure it?


http://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html

From the above user guide, in section (3.3. Input Match Type Filters),
i added the following code to my DictionaryAnnotatorDescriptor.xml,

nameValuePair
   nameInputMatchFilterFeaturePath/name
   value
  string*partOfSpeach*/string 
   /value
/nameValuePair  

nameValuePair
   nameFilterConditionOperator/name
   value
  stringEQUALS/string 
   /value
/nameValuePair  

nameValuePair
   nameFilterConditionValue/name
   value
  stringnoun/string 
   /value
/nameValuePair


but it fails, and the error said featurePathElementNames *partOfSpeach* is
invalid.

org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException:
EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException:
Can't find bundle for base name
org.apache.uima.annotator.dict_annot.dictionaryAnnotatorMessages, locale
en_US
at
org.apache.uima.annotator.dict_annot.impl.FeaturePathInfo_impl.typeSystemInit(FeaturePathInfo_impl.java:110)
at
org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotator.typeSystemInit(DictionaryAnnotator.java:383)
at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.checkTypeSystemChange(CasAnnotator_ImplBase.java:100)
at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:55)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)



Any idea please, 
Thanks in advance..

Frankie


--
View this message in context: 
http://lucene.472066.n3.nabble.com/UIMA-DictionaryAnnotator-partOfSpeach-tp3377440p3377440.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Jaeger, Jay - DOT
cores adminPath=/admij/cores

Was that a cut and paste?  If so, the /admij/cores is presumably incorrect, and 
ought to be /admin/cores

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: Wednesday, September 28, 2011 4:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Trouble configuring multicore / accessing admin page

One time when we had that problem, it was because one or more cores had a 
broken XML configuration file. 
Another time, it was because solr/home was not set right in the servlet 
container.
Another time it was because we had an older EAR pointing to a newer release 
Solr home directory.  Given what you did, I suppose that is possible in your 
case, too.

In all cases, the Solr log provided hints as to what was going wrong.

JRJ

-Original Message-
From: Joshua Miller [mailto:jos...@itsecureadmin.com] 
Sent: Wednesday, September 28, 2011 2:41 PM
To: solr-user@lucene.apache.org
Subject: Trouble configuring multicore / accessing admin page

Hello,

I am trying to get SOLR working with multiple cores and have a problem 
accessing the admin page once I configure multiple cores.

Problem:
When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, 
missing core name in path.

Question:  when using the multicore option, is the standard admin page still 
available?

Environment:
- solr 1.4.1
- Windows server 2008 R2
- Java SE 1.6u27
- Tomcat 6.0.33
- Solr Experience:  none

I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the 
following contents:

solr persistent=true sharedLib=lib
  cores adminPath=/admij/cores
core name=core0 instanceDir=cores/core0 /
core name=core1 instanceDir=cores/core1 /
  /cores
/solr

I have copied the example/solr directory to c:\solr and have populated that 
directory with the cores/{core{0,1}} as well as the proper configs and data 
directories within.

When I restart tomcat, it shows a couple of exceptions related to 
queryElevationComponent and null pointers that I think are due to the DB not 
yet being available but I see that the cores appear to initialize properly 
other than that

So the problem I'm looking to solve/clarify here is the admin page - should 
that remain available and usable when using the multicore configuration or am I 
doing something wrong?  Do I need to use the CoreAdminHandler type requests to 
manage multicore instead?

Thanks,
--
Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/



Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Joshua Miller
On Sep 28, 2011, at 2:11 PM, Jaeger, Jay - DOT wrote:

   cores adminPath=/admij/cores
 
 Was that a cut and paste?  If so, the /admij/cores is presumably incorrect, 
 and ought to be /admin/cores
 

No, that was a typo -- the config file is correct with admin/cores.  Thanks for 
pointing out the mistake here.


Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/




Re: strange performance issue with many shards on one server

2011-09-28 Thread Frederik Kraus
 Yep, I'm not getting more than 50-60% CPU during those load tests. 


Am Mittwoch, 28. September 2011 um 23:01 schrieb Jaeger, Jay - DOT:

 Yes, that thread waits (in the sense that nothing useful gets done), but 
 during that time, from the perspective of the applications and OS, that CPU 
 is busy: it is not waiting in such a way that you can dispatch a different 
 process.
 
 The point is, that if this was actually the problem, it would show up in a 
 higher CPU utilization than the correspondent reported.
 
 -Original Message-
 From: Federico Fissore [mailto:feder...@fissore.org] 
 Sent: Wednesday, September 28, 2011 2:04 PM
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Subject: Re: strange performance issue with many shards on one server
 
 Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
  That would still show up as the CPU being busy.
 
 i don't know how the program (top, htop, whatever) displays the value 
 but when the cpu has a cache miss definitely that thread sits and waits 
 for a number of clock cycles
 with 130GB of ram (per server?) I suspect caches miss as a rule
 
 just a suspicion however, nothing I'll bet on




Re: Solr Hanging While Building Suggester Index

2011-09-28 Thread Stephen Duncan Jr
No, this is on a test system that is still smallish, approx 100,000
records of dummy data with Wikipedia articles as content at the time
this occurred.

I wouldn't expect rebuilding the index to stall the entire JVM, that
seems excessive...

Stephen Duncan Jr
www.stephenduncanjr.com



On Wed, Sep 28, 2011 at 4:43 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Is this a huge index? Keep in mind that most spellchecker implementations
 rebuild the index which can stall the entire process if there are millions of
 full text documents to process.

 There is a new implementation called DirectSolrSpellchecker that doens't so a
 complete rebuild but i haven't tried it yet but should work with the
 SuggesterComonent. It's still experimental though.

 We have a separate Java process indexing to Solr using SolrJ.  We are
 using Solr 3.4.0, and Jetty version 8.0.1.v20110908.  We experienced
 Solr hanging today.  For a period of approximately 10 minutes, it did
 not respond to queries.  Our indexer sends a query to build a
 spellcheck index after committing once it's added all new documents
 (because we have auto-commits that we don't want to trigger rebuilding
 the spellcheck, we don't use buildOnCommit), and then sends a query to
 build the suggest component index.  We see this from the Solr log
 during the period it was hung (we attempted to send several queries
 during this time, but they do not appear in the log, or appear after
 waiting for several minutes):

 2011-09-28 13:18:03,217 [qtp10884088-13] INFO
 org.apache.solr.core.SolrCore - [report] webapp= path=/select
 params={spellcheck=trueqt=dismaxwt=javabinrows=0spellcheck.build=truev
 ersion=2} hits=98772 status=0 QTime=173594
 2011-09-28 13:28:18,857 [qtp10884088-89] INFO
 org.apache.solr.spelling.suggest.Suggester - build()
 ...
 2011-09-28 13:29:02,873 [qtp10884088-89] INFO
 org.apache.solr.core.SolrCore - [report] webapp= path=/suggest
 params={spellcheck=trueqt=/suggestwt=javabinspellcheck.build=trueversio
 n=2} status=0 QTime=44016

 In our indexer log, we see just after this (13:28:19,217) the call to
 build our suggestion index (which comes right after building the
 spellcheck index) times out and throws a NoHttpResponseException: The
 server localhost failed to respond.

 Any ideas?  Anything else we should look at to help diagnose?
 ---
 Stephen Duncan Jr
 www.stephenduncanjr.com



Re: Facet mappings

2011-09-28 Thread Koji Sekiguchi

(11/09/29 5:38), ntsrikanth wrote:

Hi,

   I got a set of values which needs to be mapped to a facet. For example, I
want to map the codes
SC, AC to the facet value 'Catering',
HB to Half Board
AI, IN to All\ inclusive


I tried creating the following in the schema file.

fieldType name=alpine_field_boardbasis class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=boardbasis_synonyms.txt
ignoreCase=true expand=false/
/analyzer
/fieldType


Use KeywordTokenizerFactory instead of StandardTokenizerFactory. The factory 
class
should also be specified in filter/ for synonym like:

filter class=solr.SynonymFilterFactory 
tokenizerFactory=solr.KeywordTokenizerFactory .../

as it uses WhitespaceTokenizerFactory to analyze synonyms.txt if tokenizer 
factory is
not specified.

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Re: Boost Exact matches on Specific Fields

2011-09-28 Thread Way Cool
I will give str_category more weight than ts_category because we want
str_category to win if they have exact matches ( you converted to
lowercase).

On Mon, Sep 26, 2011 at 10:23 PM, Balaji S mcabal...@gmail.com wrote:

 Hi

   You mean to say copy the String field to a Text field or the reverse .
 This is the approach I am currently following

 Step 1: Created a FieldType


 fieldType name=string_lower class=solr.TextField
 sortMissingLast=true omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
/analyzer
 /fieldType

 Step 2 : field name=str_category type=string_lower indexed=true
 stored=true/

 Step 3 : copyField source=ts_category dest=str_category/

 And in the SOLR Query planning to q=hospitalsqf=body^4.0 title^5.0
 ts_category^10.0 str_category^8.0


 The One Question I have here is All the above mentioned fields will have
 Hospital present in them , will the above approach work to get the exact
 match on the top and bring Hospitalization below in the results


 Thanks
 Balaji


 On Tue, Sep 27, 2011 at 9:38 AM, Way Cool way1.wayc...@gmail.com wrote:

  If I were you, probably I will try defining two fields:
  1. ts_category as a string type
  2. ts_category1 as a text_en type
  Make sure copy ts_category to ts_category1.
 
  You can use the following as qf in your dismax:
  qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
  or something like that.
 
  YH
  http://thetechietutorials.blogspot.com/
 
 
  On Mon, Sep 26, 2011 at 2:06 PM, balaji mcabal...@gmail.com wrote:
 
   Hi all
  
  I am new to SOLR and have a doubt on Boosting the Exact Terms to the
  top
   on a Particular field
  
   For ex :
  
   I have a text field names ts_category and I want to give more boost
  to
   this field rather than other fields, SO in my Query I pass the
 following
  in
   the QF params qf=body^4.0 title^5.0 ts_category^21.0 and also sort on
   SCORE desc
  
   When I do a search against Hospitals . I get Hospitalization
   Management , Hospital Equipment  Supplies  on Top rather than the
 exact
   matches of Hospitals
  
So It would be great , If I could be helped over here
  
  
   Thanks
   Balaji
  
  
  
  
  
  
  
   Thanks in Advance
   Balaji
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 



Re: strange performance issue with many shards on one server

2011-09-28 Thread Federico Fissore

Frederik Kraus, il 28/09/2011 23:16, ha scritto:

  Yep, I'm not getting more than 50-60% CPU during those load tests.



I would try reducing the number of shards. A part from the memory 
discussion, this really seems to me a concurrency issue: too many 
threads waiting for other threads to complete, too many context switches...


recently, on a lots-of-cores database server, we INCREASED speed by 
REDUCING the number of cores/threads each query was allowed to use 
(making sense of our customer investment)
maybe you can get a similar effect by reducing the number of pieces your 
distributed search has to merge


my 2 eurocents

federico


Re: strange performance issue with many shards on one server

2011-09-28 Thread Lance Norskog
Come cache hit problems can be fixed with the Large Pages feature.

http://www.google.com/search?q=large+pages

On Wed, Sep 28, 2011 at 3:30 PM, Federico Fissore feder...@fissore.orgwrote:

 Frederik Kraus, il 28/09/2011 23:16, ha scritto:

   Yep, I'm not getting more than 50-60% CPU during those load tests.


 I would try reducing the number of shards. A part from the memory
 discussion, this really seems to me a concurrency issue: too many threads
 waiting for other threads to complete, too many context switches...

 recently, on a lots-of-cores database server, we INCREASED speed by
 REDUCING the number of cores/threads each query was allowed to use (making
 sense of our customer investment)
 maybe you can get a similar effect by reducing the number of pieces your
 distributed search has to merge

 my 2 eurocents

 federico




-- 
Lance Norskog
goks...@gmail.com


Re: Questions about LocalParams syntax

2011-09-28 Thread Chris Hostetter

: 1.)  How should I deal with repeating parameters?  If I use multiple 
: boost queries, it seems that only the last one listed is used...  for 
: example:
: 
: ((_query_:{!dismax qf=\title^500 author^300 allfields\ 
bq=\format:Book^50\ bq=\format:Journal^150\}test))

Hmmm... that's either a bug or a silly limitation in the local params 
parsing -- I've file a Jira for it but i have no idea what the fix is (or 
if it was intentional for some odd reason)

https://issues.apache.org/jira/browse/SOLR-2798

...if you are interested in dig into the code to see what the cause might 
be and helping to work on a patch that would be awesome.

: 2.)  What is the proper way to escape quotes?  Since there are multiple 
: nested layers of double quotes, things get ugly and it's easy to end up 
: with syntax errors.  I found that this syntax doesn't cause an error:
...
: ((_query_:{!dismax qf=\title^500 author^300 allfields\  
bq=\format:\\\Book\\\^50\ bq=\format:\\\Journal\\\^150\}test))

backslash escaping should work, but you need to keep in mind that both the 
LocalParam syntax and most query parsers treat '' and '\' as significant 
characters, so you may have to escape them more times then you think

For instance, even w/o local params, if you wanted a bq that contained a 
literal '', you'd need to escape it for the lucene query parser...

bq=foo_s:inner\quote OR foo_s:other

if you then wanted to use that bq as a quoted local param, you'd need to 
escape both the '\' and the original '' again ...

q={!dismax bq=foo_s:inner\\\quote OR foo_s:other}foo

...and if you then wanted to use that entire {!dismax ... } string inside 
of a quoted expression using the _query_ hook of the Lucene QParser 
(which is what it looks like you are doig) you would need to escape *all* 
of those '\' and '' characters once more

q=bob OR _query_:{!dismax bq=\foo_s:inner\\\quote OR foo_s:other\}foo

...and it should work (it does for me)

But the other thing you can do to make your life a *lot* simpler is to 
leverage the parameter derefrencing and put each logical query string into 
it's own parameter...

q=bob OR _query_:{!dismax bq=$myBq}foo
   myBq=foo_s:inner\quote OR foo_s:other

...or really make youre life easy...

qq=foo
   q=bob OR _query_:{!dismax bq=$myBq v=$qq}
   myBq=foo_s:inner\quote OR foo_s:other






-Hoss


Re: Boost Exact matches on Specific Fields

2011-09-28 Thread Balaji S
Yeah I will change the weight for str_category and make it higher . I
converted it to lowercase  because we cannot expect users to type them in
the correct case

Thanks
Balaji

On Thu, Sep 29, 2011 at 3:52 AM, Way Cool way1.wayc...@gmail.com wrote:

 I will give str_category more weight than ts_category because we want
 str_category to win if they have exact matches ( you converted to
 lowercase).

 On Mon, Sep 26, 2011 at 10:23 PM, Balaji S mcabal...@gmail.com wrote:

  Hi
 
You mean to say copy the String field to a Text field or the reverse .
  This is the approach I am currently following
 
  Step 1: Created a FieldType
 
 
  fieldType name=string_lower class=solr.TextField
  sortMissingLast=true omitNorms=true
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.TrimFilterFactory /
 /analyzer
  /fieldType
 
  Step 2 : field name=str_category type=string_lower indexed=true
  stored=true/
 
  Step 3 : copyField source=ts_category dest=str_category/
 
  And in the SOLR Query planning to q=hospitalsqf=body^4.0 title^5.0
  ts_category^10.0 str_category^8.0
 
 
  The One Question I have here is All the above mentioned fields will have
  Hospital present in them , will the above approach work to get the
 exact
  match on the top and bring Hospitalization below in the results
 
 
  Thanks
  Balaji
 
 
  On Tue, Sep 27, 2011 at 9:38 AM, Way Cool way1.wayc...@gmail.com
 wrote:
 
   If I were you, probably I will try defining two fields:
   1. ts_category as a string type
   2. ts_category1 as a text_en type
   Make sure copy ts_category to ts_category1.
  
   You can use the following as qf in your dismax:
   qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
   or something like that.
  
   YH
   http://thetechietutorials.blogspot.com/
  
  
   On Mon, Sep 26, 2011 at 2:06 PM, balaji mcabal...@gmail.com wrote:
  
Hi all
   
   I am new to SOLR and have a doubt on Boosting the Exact Terms to
 the
   top
on a Particular field
   
For ex :
   
I have a text field names ts_category and I want to give more
 boost
   to
this field rather than other fields, SO in my Query I pass the
  following
   in
the QF params qf=body^4.0 title^5.0 ts_category^21.0 and also sort
 on
SCORE desc
   
When I do a search against Hospitals . I get Hospitalization
Management , Hospital Equipment  Supplies  on Top rather than the
  exact
matches of Hospitals
   
 So It would be great , If I could be helped over here
   
   
Thanks
Balaji
   
   
   
   
   
   
   
Thanks in Advance
Balaji
   
--
View this message in context:
   
  
 
 http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
Sent from the Solr - User mailing list archive at Nabble.com.
   
  
 



Re: LocalParams, bq, and highlighting

2011-09-28 Thread Chris Hostetter


: I've run into another strange behavior related to LocalParams syntax in 
: Solr 1.4.1.  If I apply Dismax boosts using bq in LocalParams syntax, 
: the contents of the boost queries get used by the highlighter.  
: Obviously, when I use bq as a separate parameter, this is not an issue.
...
: Is this a known limitation of the highlighter, or is it a bug?  Is this 
: issue resolved in newer versions of Solr?

I *think* what you're encountering here is just an inherent property of 
how the highlighter works.

HighlightComponent asks the QueryComponent and/or default QParser for the 
highlight query to extract terms from for highlighting.  

With a request like this...

http://localhost:8983/solr/select?defType=dismaxq=solrhl=truefl=namehl.fl=namebq=server

...DismaxQParser is the default query parser, and because of how it 
is designed to work (and designed to be used) it assumes that the main 
query should just be what's in the q param and not the other clauses 
like bq that were added to it for searching.

In a query like this however...

http://localhost:8983/solr/select?q=inStock:true+AND+_query_:{!dismax}solrhl=truefl=namehl.fl=namebq=server

...LuceneQParser is the default query parser, and it doesn't know/care 
what all of the subclauses are, or where they came from, or wether they 
are significant enough to the user that they should be included in the 
highlighting or not.  It just knows that it has a query, so it gives it to 
the highlighter.

So it is what it is.

This is definitely an interesting case that i don't think anyone ever 
really considered before.  It seems like a strong argument in favor of 
adding an hl.q param that the HighlightingComponent would use as an 
override for whatever the QueryComponent thinks the highlighting query 
should be, that way people expressing complex queries like you you 
describe could do something like...

qq=solr
q=inStock:true AND+_query_:{!dismax v=$qq}
hl.q={!v=$qq}
hl=true
fl=name
hl.fl=name
bq=server

...what do you think?

wanna file a Jira requesting this as a feature?  Pretty sure the change 
would only require a few lines of code (but of course we'd also need JUnit 
tests which would probably be several dozen lines of code)



-Hoss


Re: strategy for post-processing answer set

2011-09-28 Thread Chris Hostetter

: it looks to me as if Solr just brings back the URLs. what I want to do is to
: get the actual documents in the answer set, simplify their HTML and remove
: all the javascript, ads, etc., and append them into a single document.
: 
: Now ... does Nutch already have the documents? can I get them from its db?
: or do I have to go get the documents again with something like a wget?

i *think* what you are saying is that:

a) you built your index using nutch
b) when you query Solr, you only get back a url field for each matching 
document 
c) what you want is to combine the whole text of webpages corrisponding to 
all of those urls into one massive html page

If that's the case,then you should either:

1) ask on the nutch-user mailing list about how to store the whole 
content of web pages that nutch crawls so you cna build up a page like 
this (nutch may already be doing it, i don't know -- depends on the 
schema)

2) write custom client code (probably outside of the score of velocity) to 
re-fetch these urls at query time and parse them and combine them as you 
see fit.

which approach is right for you all depends on your goals and use case -- 
but solr can only give you back the fields you store in it.

-Hoss


Re: autosuggest combination of data from documents and popular queries

2011-09-28 Thread Chris Hostetter

: If user starts typing m i wil show mango as suggestion. And other
: suggestions should come from the document title in index. So if I have a
: document in index with title Man .. so suggestions would be
: mango
: man
...
: Is this doable ? any options ?

It's totally doable, and you've already done the hard part by building up 
a database of the popular queries you want to seed the suggestions with, 
abd building up an suggestion index where each document corrisponds to a 
single suggestion.
  
but in order to also have suggestions come from the fields of your 
main index, you'll need to also add them as individual documents to that same 
suggestion index.

you could either get those field values from whatever original source you 
used, or you crawl your own solr index.  If you want individual *terms* 
from the index to be added as suggestions, then the LukeRequestHandler or 
the TermsComponent would probably be the easiest way to extract them.

-Hoss


Re: Solr Cloud Number of Shard Limitation?

2011-09-28 Thread Jamie Johnson
Thanks Mark found the TODO in ZkStateReader.java

// TODO: - possibly: incremental update rather than reread everything

Was there a patch they provided back to address this?

On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote:

 On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote:

 Is there any limitation, be it technical or for sanity reasons, on the
 number of shards that can be part of a solr cloud implementation?


 The loggly guys ended up hitting a limit somewhere. Essentially, whenever the 
 cloud state is updated, info is read about each shard to update the state 
 (from zookeeper). There is a TODO that I put in there that says something 
 like, consider updating this incrementally - usually the data on most 
 shards has not changed, so no reason to read it all. They implemented that 
 today in their own code, but we have not yet done this in trunk. What that 
 places the upper limit at, I don't know - I imagine it takes quite a few 
 shards before it ends up being too much of a problem - they shard by user I 
 believe, so lot's of shards.


 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona













Re: Solr Cloud Number of Shard Limitation?

2011-09-28 Thread Mark Miller
No, we don't have any patches for it yet. You might make a JIRA issue for it?

I think the big win is a fairly easy one - basically, right now when we update 
the cloud state, we look at the children of the 'shards' node, and then we read 
the data at each node individually. I imagine this is the part that breaks down 
:)

We have already likely have most of that info though - really, you should just 
have to compare the children of the 'shards' node with the list we already have 
from the last time we got the cloud state - remove any that are no longer in 
the list, read the data for those not in the list, and get your new state 
efficiently.

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona

On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote:

 Thanks Mark found the TODO in ZkStateReader.java
 
 // TODO: - possibly: incremental update rather than reread everything
 
 Was there a patch they provided back to address this?
 
 On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote:
 
 On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote:
 
 Is there any limitation, be it technical or for sanity reasons, on the
 number of shards that can be part of a solr cloud implementation?
 
 
 The loggly guys ended up hitting a limit somewhere. Essentially, whenever 
 the cloud state is updated, info is read about each shard to update the 
 state (from zookeeper). There is a TODO that I put in there that says 
 something like, consider updating this incrementally - usually the data on 
 most shards has not changed, so no reason to read it all. They implemented 
 that today in their own code, but we have not yet done this in trunk. What 
 that places the upper limit at, I don't know - I imagine it takes quite a 
 few shards before it ends up being too much of a problem - they shard by 
 user I believe, so lot's of shards.
 
 
 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
 
 
 
 
 
 
 
 
 
 
 














Re: autosuggest combination of data from documents and popular queries

2011-09-28 Thread abhayd
hi hoss,
This helps..
But as I understand TermsComponent does not allow sort on popularity..Just
coun|index. Or I m missing something?

If TermsComponent allows custom sorting i dont even have to use ngrams.

Any thoughts?

abhay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3378096.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud Number of Shard Limitation?

2011-09-28 Thread Jamie Johnson
I'll definitely create a JIRA for this.  Looking at the code in
CloudState I think we could do the following

as we iterate over shardINames we check to see if the oldCloudState
had the slice already, if so get the state from there, otherwise do
what is already happening.  Something like the following:

for (String shardIdZkPath : shardIdNames) {
Slice slice = null;
if(oldCloudState.liveNodesContain(shardIdZkPath)) {
slice = 
oldCloudState.getCollectionStates().get(collection).get(shardIdZkPath);
}

if(slice == null){
MapString,ZkNodeProps shardsMap = 
readShards(zkClient,
shardIdPaths + / + shardIdZkPath);
slice = new Slice(shardIdZkPath, shardsMap);
}

  slices.put(shardIdZkPath, slice);
}
I don't see a need to remove the old states since we only keep the
states that are already in oldCloudState and read new ones.  Does that
make sense?

On Wed, Sep 28, 2011 at 11:01 PM, Mark Miller markrmil...@gmail.com wrote:
 No, we don't have any patches for it yet. You might make a JIRA issue for it?

 I think the big win is a fairly easy one - basically, right now when we 
 update the cloud state, we look at the children of the 'shards' node, and 
 then we read the data at each node individually. I imagine this is the part 
 that breaks down :)

 We have already likely have most of that info though - really, you should 
 just have to compare the children of the 'shards' node with the list we 
 already have from the last time we got the cloud state - remove any that are 
 no longer in the list, read the data for those not in the list, and get your 
 new state efficiently.

 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

 On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote:

 Thanks Mark found the TODO in ZkStateReader.java

 // TODO: - possibly: incremental update rather than reread everything

 Was there a patch they provided back to address this?

 On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote:

 On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote:

 Is there any limitation, be it technical or for sanity reasons, on the
 number of shards that can be part of a solr cloud implementation?


 The loggly guys ended up hitting a limit somewhere. Essentially, whenever 
 the cloud state is updated, info is read about each shard to update the 
 state (from zookeeper). There is a TODO that I put in there that says 
 something like, consider updating this incrementally - usually the data 
 on most shards has not changed, so no reason to read it all. They 
 implemented that today in their own code, but we have not yet done this in 
 trunk. What that places the upper limit at, I don't know - I imagine it 
 takes quite a few shards before it ends up being too much of a problem - 
 they shard by user I believe, so lot's of shards.


 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona


























Re: UIMA DictionaryAnnotator partOfSpeach

2011-09-28 Thread Pulkit Singhal
At first glance it seems like a simple localization issue as indicated by this:

 org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException:
 EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException:
 Can't find bundle for base name
 org.apache.uima.annotator.dict_annot.dictionaryAnnotatorMessages, locale
 en_US

Perhaps you can get the source code for UIMA and run the server
hosting Solr in debug mode then remote connect to it via eclipse or
some other IDE and use a breakpoint to figure out which resource is
the issue.

After that it would be UIMA specific solution, I think.

On Wed, Sep 28, 2011 at 4:11 PM, chanhangfai chanhang...@hotmail.com wrote:
 Hi all,

 I have the dictionary Annotator UIMA-solr running,
 used my own dictionary file and it works,
 it will match all the words (Nouns, Verbs and Adjectives) from my dictionary
 file.

 *but now, if I only want to match Nouns,  (ignore other part of speech)*

 how can I configure it?


 http://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html

 From the above user guide, in section (3.3. Input Match Type Filters),
 i added the following code to my DictionaryAnnotatorDescriptor.xml,

 nameValuePair
   nameInputMatchFilterFeaturePath/name
   value
      string*partOfSpeach*/string
   /value
 /nameValuePair

 nameValuePair
   nameFilterConditionOperator/name
   value
      stringEQUALS/string
   /value
 /nameValuePair

 nameValuePair
   nameFilterConditionValue/name
   value
      stringnoun/string
   /value
 /nameValuePair


 but it fails, and the error said featurePathElementNames *partOfSpeach* is
 invalid.

 org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException:
 EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException:
 Can't find bundle for base name
 org.apache.uima.annotator.dict_annot.dictionaryAnnotatorMessages, locale
 en_US
        at
 org.apache.uima.annotator.dict_annot.impl.FeaturePathInfo_impl.typeSystemInit(FeaturePathInfo_impl.java:110)
        at
 org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotator.typeSystemInit(DictionaryAnnotator.java:383)
        at
 org.apache.uima.analysis_component.CasAnnotator_ImplBase.checkTypeSystemChange(CasAnnotator_ImplBase.java:100)
        at
 org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:55)
        at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
        at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
        at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
        at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409)
        at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
        at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
        at
 org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
        at
 org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)



 Any idea please,
 Thanks in advance..

 Frankie


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/UIMA-DictionaryAnnotator-partOfSpeach-tp3377440p3377440.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: basic solr cloud questions

2011-09-28 Thread Pulkit Singhal
@Darren: I feel that the question itself is misleading. Creating
shards is meant to separate out the data ... not keep the exact same
copy of it.

I think the two node setup that was attempted by Sam mislead him and
us into thinking that configuring two nodes which are to be named
shard1 ... somehow means that they are instantly replicated too ...
this is not the case! I can see how this misunderstanding can develop
as I too was confused until Yury cleared it up.

@Sam: If you are interested in performing a quick exercise to
understand the pieces involved for replication rather than sharding
... perhaps this link would be of help in taking you through it:
http://pulkitsinghal.blogspot.com/2011/09/setup-solr-master-slave-replication.html

- Pulkit

2011/9/27 Yury Kats yuryk...@yahoo.com:
 On 9/27/2011 5:16 PM, Darren Govoni wrote:
 On 09/27/2011 05:05 PM, Yury Kats wrote:
 You need to either submit the docs to both nodes, or have a replication
 setup between the two. Otherwise they are not in sync.
 I hope that's not the case. :/ My understanding (or hope maybe) is that
 the new Solr Cloud implementation will support auto-sharding and
 distributed indexing. This means that shards will receive different
 documents regardless of which node received the submitted document
 (spread evenly based on a hash-node assignment). Distributed queries
 will thus merge all the solr shard/node responses.

 All cores in the same shard must somehow have the same index.
 Only then can you continue servicing searches when individual cores
 fail. Auto-sharding and distributed indexing don't have anything to
 do with this.

 In the future, SolrCloud may be managing replication between cores
 in the same shard automatically. But right now it does not.



Re: Why I can't take an full-import with entity name?

2011-09-28 Thread Pulkit Singhal
Can you monitor the DB side to see what results it returned for that query?

2011/8/30 于浩 yuhao.1...@gmail.com:
 I am using solr1.3,I updated solr index throgh solr delta import every two
 hours. but the delta import is database connection wasteful.
 So i want to use full-import with entity name instead of delta import.

 my db-data-config.xml  file:
 entity name=article pk=Article_ID  query=select
 Article_ID,Article_Title,Article_Abstract from Article_Detail
                field name=Article_ID column=Article_ID /
 /entity
 entity name=delta_article pk=Article_ID  rootEngity=false
  query=select Article_ID,Article_Title,Article_Abstract from Article_Detail
 where Article_IDgt;'${dataimporter.request.minID}' and Article_ID
 lt;='{dataimporter.request.maxID}'
 
                field name=Article_ID column=Article_ID /
 /entity


 then I uses
 http://192.168.1.98:8081/solr/db_article/dataimport?command=full-importentity=delta_articlecommit=trueclean=falsemaxID=1000minID=10
 but the solr will finish nearyly instant,and there is no any record
 imported. but what the fact is there are many records meets the condtion of
 maxID and minID.


 the tomcat log:
 信息: [db_article] webapp=/solr path=/dataimport
 params={maxID=6737277clean=falsecommit=trueentity=delta_articlecommand=full-importminID=6736841}
 status=0 QTime=0
 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 信息: Starting Full Import
 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.SolrWriter
 readIndexerProperties
 信息: Read dataimport.properties
 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.SolrWriter
 persistStartTime
 信息: Wrote last indexed time to dataimport.properties
 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.DocBuilder commit
 信息: Full Import completed successfully


 some body who can help or some advices?



Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-28 Thread Pulkit Singhal
Did you find out about this?

2011/8/2 Yury Kats yuryk...@yahoo.com:
 I have multiple SolrCloud instances, each running its own Zookeeper
 (Solr launched with -DzkRun).

 I would like to create an ensemble out of them. I know about -DzkHost
 parameter, but can I achieve the same programmatically? Either with
 SolrJ or REST API?

 Thanks,
 Yury



Re: Solr Cloud Number of Shard Limitation?

2011-09-28 Thread Jamie Johnson
So I tested what I wrote, and man was that wrong.  I have updated it
and created a JIRA for this issue.  I also attached a patch which will
patch CloudState to address this issue.  Feedback is appreciated.

https://issues.apache.org/jira/browse/SOLR-2799

On Wed, Sep 28, 2011 at 11:46 PM, Jamie Johnson jej2...@gmail.com wrote:
 I'll definitely create a JIRA for this.  Looking at the code in
 CloudState I think we could do the following

 as we iterate over shardINames we check to see if the oldCloudState
 had the slice already, if so get the state from there, otherwise do
 what is already happening.  Something like the following:

 for (String shardIdZkPath : shardIdNames) {
                        Slice slice = null;
                        if(oldCloudState.liveNodesContain(shardIdZkPath)) {
                                slice = 
 oldCloudState.getCollectionStates().get(collection).get(shardIdZkPath);
                        }

                        if(slice == null){
                                MapString,ZkNodeProps shardsMap = 
 readShards(zkClient,
 shardIdPaths + / + shardIdZkPath);
                                slice = new Slice(shardIdZkPath, shardsMap);
                        }

          slices.put(shardIdZkPath, slice);
        }
 I don't see a need to remove the old states since we only keep the
 states that are already in oldCloudState and read new ones.  Does that
 make sense?

 On Wed, Sep 28, 2011 at 11:01 PM, Mark Miller markrmil...@gmail.com wrote:
 No, we don't have any patches for it yet. You might make a JIRA issue for it?

 I think the big win is a fairly easy one - basically, right now when we 
 update the cloud state, we look at the children of the 'shards' node, and 
 then we read the data at each node individually. I imagine this is the part 
 that breaks down :)

 We have already likely have most of that info though - really, you should 
 just have to compare the children of the 'shards' node with the list we 
 already have from the last time we got the cloud state - remove any that are 
 no longer in the list, read the data for those not in the list, and get your 
 new state efficiently.

 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

 On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote:

 Thanks Mark found the TODO in ZkStateReader.java

 // TODO: - possibly: incremental update rather than reread everything

 Was there a patch they provided back to address this?

 On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote:

 On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote:

 Is there any limitation, be it technical or for sanity reasons, on the
 number of shards that can be part of a solr cloud implementation?


 The loggly guys ended up hitting a limit somewhere. Essentially, whenever 
 the cloud state is updated, info is read about each shard to update the 
 state (from zookeeper). There is a TODO that I put in there that says 
 something like, consider updating this incrementally - usually the data 
 on most shards has not changed, so no reason to read it all. They 
 implemented that today in their own code, but we have not yet done this in 
 trunk. What that places the upper limit at, I don't know - I imagine it 
 takes quite a few shards before it ends up being too much of a problem - 
 they shard by user I believe, so lot's of shards.


 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona



























Query failing because of omitTermFreqAndPositions

2011-09-28 Thread Isan Fulia
Hi All,

My schema consisted of field textForQuery which was defined as
field name=textForQuery type=text indexed=true stored=false
multiValued=true/

After indexing 10 lakhs  of  documents  I changed the field to
field name=textForQuery type=text indexed=true stored=false
multiValued=true *omitTermFreqAndPositions=true*/

So documents that were indexed after that omiited the position information
of the terms.
As a result I was not able to search the text which rely on position
information for eg. coke studio at mtv even though its present in some
documents.

So I again changed the field textForQuery to
field name=textForQuery type=text indexed=true stored=false
multiValued=true/

But now even for new documents added  the query requiring positon
information is still failing.
For example i reindexed certain documents that consisted of coke studio at
mtv but still the query is not returning any documents when searched for
*textForQuery:coke studio at mtv*

Can anyone please help me out why this is happening


-- 
Thanks  Regards,
Isan Fulia.


Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-28 Thread Jamie Johnson
I'm not a solrcloud guru, but why not start your zookeeper quorum separately?

I also believe that you can specify a zoo.cfg file which will create a
zk quorum from solr

example zoo.cfg (from
http://zookeeper.apache.org/doc/current/zookeeperStarted.html#sc_RunningReplicatedZooKeeper)

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

On Thu, Sep 29, 2011 at 12:17 AM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
 Did you find out about this?

 2011/8/2 Yury Kats yuryk...@yahoo.com:
 I have multiple SolrCloud instances, each running its own Zookeeper
 (Solr launched with -DzkRun).

 I would like to create an ensemble out of them. I know about -DzkHost
 parameter, but can I achieve the same programmatically? Either with
 SolrJ or REST API?

 Thanks,
 Yury