HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Hi,
we have an index with 2mil documents in it. From time to time we rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the end we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?

TIA


Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty
memory.



On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Which version of Solr?
 Are you sure you did not run out of memory half way through import?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com wrote:

  Hi,
  we have an index with 2mil documents in it. From time to time we rewrite
  about 1/10 of the documents (just under 200k). No autocommit. At the end
 we
  a single commit and got time out after 60 sec. My questions are:
  1. is it normal to have the commit of this size takes more than 1min? I
  know it's probably depend on the server ...
  2. I know there're a few parameters I can set in CommonsHttpSolrServer
  class: setConnectionManagerTimeout(), setConnectionTimeout(),
  setSoTimeout(). Which should I use?
 
  TIA
 



Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Solrj.


On Tue, Feb 19, 2013 at 9:08 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, your commits may have to wait until any merges are done, which _may_
 be merging your entire index into a single segment. Possibly this could
 take more than 60 seconds.

 _How_ are you doing this? DIH? SolrJ? post.jar?

 Best
 Erick


 On Tue, Feb 19, 2013 at 8:00 PM, Siping Liu liu01...@gmail.com wrote:

  Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get
 plenty
  memory.
 
 
 
  On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Which version of Solr?
   Are you sure you did not run out of memory half way through import?
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
   On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com
 wrote:
  
Hi,
we have an index with 2mil documents in it. From time to time we
  rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the
  end
   we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than
 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in
 CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?
   
TIA
   
  
 



Re: custom sorter

2012-07-22 Thread Siping Liu
Hi -- thanks for the response. It's the right direction. However on closer
look I don't think I can use it directly. The reason is that in my case,
the query string is always *:*, we use filter query to get different
results. When fq=(field1:xyz) we want to boost one document and let sort=
to take care of the rest results, and when field1 has other value, sort=
takes care of all results.

Maybe I can define my own SearchComponent class, and specify it in
arr name=last-components
  strmy_search_component/str
/arr
I have to try and see if that'd work.

thanks.


On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
lee.a.carr...@googlemail.comwrote:

 take a look at
 http://wiki.apache.org/solr/QueryElevationComponent

 On 20 July 2012 03:48, Siping Liu liu01...@gmail.com wrote:

  Hi,
  I have requirements to place a document to a pre-determined  position for
  special filter query values, for instance when filter query is
  fq=(field1:xyz) place document abc as first result (the rest of the
  result set will be ordered by sort=field2). I guess I have to plug in my
  Java code as a custom sorter. I'd appreciate it if someone can shed light
  on this (how to add custom sorter, etc.)
  TIA.
 



custom sorter

2012-07-19 Thread Siping Liu
Hi,
I have requirements to place a document to a pre-determined  position for
special filter query values, for instance when filter query is
fq=(field1:xyz) place document abc as first result (the rest of the
result set will be ordered by sort=field2). I guess I have to plug in my
Java code as a custom sorter. I'd appreciate it if someone can shed light
on this (how to add custom sorter, etc.)
TIA.


match to non tokenizable word (helloworld)

2010-05-16 Thread siping liu

I get no match when searching for helloworld, even though I have hello 
world in my index. How do people usually deal with this? Write a custom 
analyzer, with help from a collection of all dictionary words?

 

thanks for suggestions/comments.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

weird problem with solr.DateField

2009-11-11 Thread siping liu

Hi,

I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
defined in solrconfig:

fieldType name=date class=solr.DateField sortMissingLast=true 
omitNorms=true /

field name=lastUpdate type=date indexed=true stored=true default=NOW 
multiValued=false /

 

and following code that get executed once every night:

CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(http://...;);
solrServer.setRequestWriter(new BinaryRequestWriter());

solrServer.add(documents);
solrServer.commit();

UpdateResponse deleteResult = solrServer.deleteByQuery(lastUpdate:[* TO 
NOW-2HOUR]);
solrServer.commit();

 

The purpose is to refresh index with latest data (in documents).

This works fine, except that after a few days I start to see a few documents 
with no lastUpdate field (query -lastUpdate:[* TO *]) -- how can that be 
possible?

 

thanks in advance.

 
  
_
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu

Hi,

I read pretty much all posts on this thread (before and after this one). Looks 
like the main suggestion from you and others is to keep max heap size (-Xmx) as 
small as possible (as long as you don't see OOM exception). This brings more 
questions than answers (for me at least. I'm new to Solr).

 

First, our environment and problem encountered: Solr1.4 (nightly build, 
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml 
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the 
problem similar to the one orignal poster reported -- long pause (seconds to 
minutes) under load test. jconsole showed that it pauses on GC. So more 
JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m 
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with 
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new 
setup, it works fine until Tomcat reaches heap size, then it blocks and takes 
minutes on full GC to get more space from tenure generation. We tried 
different Xmx (from very small to large), no difference in long GC time. We 
never run into OOM.

 

Questions:

* In general various cachings are good for performance, we have more RAM to use 
and want to use more caching to boost performance, isn't your suggestion (of 
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's 
good. But why they get GC'ed eventually?? I did a quick check of Solr code 
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that 
what is causing all this? This seems to suggest a design flaw in Solr's memory 
management strategy (or just my ignorance about Solr?). I mean, wouldn't this 
be the right way of doing it -- you allow user to specify the cache size in 
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and 
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess 
now it's better to have Solr on its own Tomcat, given that it's tricky to 
adjust the java options.

 

thanks.


 
 From: wun...@wunderwood.org
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and Garbage Collection
 Date: Fri, 25 Sep 2009 09:51:29 -0700
 
 30ms is not better or worse than 1s until you look at the service
 requirements. For many applications, it is worth dedicating 10% of your
 processing time to GC if that makes the worst-case pause short.
 
 On the other hand, my experience with the IBM JVM was that the maximum query
 rate was 2-3X better with the concurrent generational GC compared to any of
 their other GC algorithms, so we got the best throughput along with the
 shortest pauses.
 
 Solr garbage generation (for queries) seems to have two major components:
 per-request garbage and cache evictions. With a generational collector,
 these two are handled by separate parts of the collector. Per-request
 garbage should completely fit in the short-term heap (nursery), so that it
 can be collected rapidly and returned to use for further requests. If the
 nursery is too small, the per-request allocations will be made in tenured
 space and sit there until the next major GC. Cache evictions are almost
 always in long-term storage (tenured space) because an LRU algorithm
 guarantees that the garbage will be old.
 
 Check the growth rate of tenured space (under constant load, of course)
 while increasing the size of the nursery. That rate should drop when the
 nursery gets big enough, then not drop much further as it is increased more.
 
 After that, reduce the size of tenured space until major GCs start happening
 too often (a judgment call). A bigger tenured space means longer major GCs
 and thus longer pauses, so you don't want it oversized by too much.
 
 Also check the hit rates of your caches. If the hit rate is low, say 20% or
 less, make that cache much bigger or set it to zero. Either one will reduce
 the number of cache evictions. If you have an HTTP cache in front of Solr,
 zero may be the right choice, since the HTTP cache is cherry-picking the
 easily cacheable requests.
 
 Note that a commit nearly doubles the memory required, because you have two
 live Searcher objects with all their caches. Make sure you have headroom for
 a commit.
 
 If you want to test the tenured space usage, you must test with real world
 queries. Those are the only way to get accurate cache eviction rates.
 
 wunder
  
_
Bing™  brings you maps, menus, and reviews organized in one place.   Try it now.
http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1

anyway to get Document update time stamp

2009-09-17 Thread siping liu

I understand there's no update in Solr/lucene, it's really delete+insert. Is 
there anyway to get a Document's insert time stamp, w/o explicitely creating 
such a data field in the document? If so, how can I query it, for instance get 
all documents that are older than 24 hours? Thanks.
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/171222984/direct/01/

DisMaxRequestHandler usage

2009-06-16 Thread siping liu

Hi,

I have this standard query:

q=(field1:hello OR field2:hello) AND (field3:world)

 

Can I use dismax handler for this (applying the same search term on field1 and 
field2, but keep field3 with something separate)? If it can be done, what's the 
advantage of doing it this way over using the standard query?

 

thanks.

_
Microsoft brings you a new way to search the web.  Try  Bing™ now
http://www.bing.com?form=MFEHPGpubl=WLHMTAGcrea=TEXT_MFEHPG_Core_tagline_try 
bing_1x1

Query faceting

2009-06-08 Thread siping liu

Hi,

I have a field called service with following values:

- Shuttle Services
- Senior Discounts
- Laundry Rooms

- ...

 

When I conduct query with facet=truefacet.field=servicefacet.limit=-1, I 
get something like this back:

- shuttle 2

- service 3

- senior 0

- laundry 0

- room 3

- ...

 

Questions:

- How not to break up fields values in words, so I can get something like 
Shuttle Services 2 back?

- How to tell Solr not to return facet with 0 value? The query takes long time 
to finish, seemingly because of the long list of items with 0 count.

 

thanks for any advice.

_
Insert movie times and more without leaving Hotmail®. 
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009

RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks 
like any serious customization work requires developing custom SearchComponent, 
but it's not clear to me how Solr designer wanted this to be done. I have more 
confident to either do it at Lucene level, or stay on client side and using 
something like Multi-core (as discussed here 
http://wiki.apache.org/solr/MultipleIndexes).


 
 Date: Wed, 20 May 2009 13:47:20 -0400
 Subject: RE: Creating a distributed search in a searchComponent
 From: nicholas.bai...@rackspace.com
 To: solr-user@lucene.apache.org
 
 It seems I sent this out a bit too soon. After looking at the source it seems 
 there are two seperate paths for distributed and regular queries, however the 
 prepare method for for all components is run before the shards parameter is 
 checked. So I can build the shards portion by using the prepare method of the 
 my own search component. 
 
 However I'm not sure if this is the greatest idea in case solr changes at 
 some point.
 
 -Nick
 
 -Original Message-
 From: Nick Bailey nicholas.bai...@rackspace.com
 Sent: Wednesday, May 20, 2009 1:29pm
 To: solr-user@lucene.apache.org
 Subject: Creating a distributed search in a searchComponent
 
 Hi,
 
 I am wondering if it is possible to basically add the distributed portion of 
 a search query inside of a searchComponent.
 
 I am hoping to build my own component and add it as a first-component to the 
 StandardRequestHandler. Then hopefully I will be able to use this component 
 to build the shards parameter of the query and have the Handler then treat 
 the query as a distributed search. Anyone have any experience or know if this 
 is possible?
 
 Thanks,
 Nick
 
 
 

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

adding plug-in after search is done

2009-04-27 Thread siping liu

trying to manipulate search result (like further filtering out unwanted), and 
ordering the results differently. Where is the suitable place for doing it? 
I've been using QueryResponseWriter but that doesn't seem to be the right place.

thanks.

_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates2_042009