from:"siping liu"

HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu

Hi,
we have an index with 2mil documents in it. From time to time we rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the end we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?

TIA

Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu

Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty
memory.



On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Which version of Solr?
 Are you sure you did not run out of memory half way through import?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com wrote:

  Hi,
  we have an index with 2mil documents in it. From time to time we rewrite
  about 1/10 of the documents (just under 200k). No autocommit. At the end
 we
  a single commit and got time out after 60 sec. My questions are:
  1. is it normal to have the commit of this size takes more than 1min? I
  know it's probably depend on the server ...
  2. I know there're a few parameters I can set in CommonsHttpSolrServer
  class: setConnectionManagerTimeout(), setConnectionTimeout(),
  setSoTimeout(). Which should I use?
 
  TIA

Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu

Solrj.


On Tue, Feb 19, 2013 at 9:08 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, your commits may have to wait until any merges are done, which _may_
 be merging your entire index into a single segment. Possibly this could
 take more than 60 seconds.

 _How_ are you doing this? DIH? SolrJ? post.jar?

 Best
 Erick


 On Tue, Feb 19, 2013 at 8:00 PM, Siping Liu liu01...@gmail.com wrote:

  Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get
 plenty
  memory.
 
 
 
  On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Which version of Solr?
   Are you sure you did not run out of memory half way through import?
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
   On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com
 wrote:
  
Hi,
we have an index with 2mil documents in it. From time to time we
  rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the
  end
   we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than
 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in
 CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?
   
TIA

Re: custom sorter

2012-07-22 Thread Siping Liu

Hi -- thanks for the response. It's the right direction. However on closer
look I don't think I can use it directly. The reason is that in my case,
the query string is always *:*, we use filter query to get different
results. When fq=(field1:xyz) we want to boost one document and let sort=
to take care of the rest results, and when field1 has other value, sort=
takes care of all results.

Maybe I can define my own SearchComponent class, and specify it in
arr name=last-components
  strmy_search_component/str
/arr
I have to try and see if that'd work.

thanks.


On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
lee.a.carr...@googlemail.comwrote:

 take a look at
 http://wiki.apache.org/solr/QueryElevationComponent

 On 20 July 2012 03:48, Siping Liu liu01...@gmail.com wrote:

  Hi,
  I have requirements to place a document to a pre-determined  position for
  special filter query values, for instance when filter query is
  fq=(field1:xyz) place document abc as first result (the rest of the
  result set will be ordered by sort=field2). I guess I have to plug in my
  Java code as a custom sorter. I'd appreciate it if someone can shed light
  on this (how to add custom sorter, etc.)
  TIA.

custom sorter

2012-07-19 Thread Siping Liu

Hi,
I have requirements to place a document to a pre-determined  position for
special filter query values, for instance when filter query is
fq=(field1:xyz) place document abc as first result (the rest of the
result set will be ordered by sort=field2). I guess I have to plug in my
Java code as a custom sorter. I'd appreciate it if someone can shed light
on this (how to add custom sorter, etc.)
TIA.

match to non tokenizable word (helloworld)

2010-05-16 Thread siping liu


I get no match when searching for helloworld, even though I have hello 
world in my index. How do people usually deal with this? Write a custom 
analyzer, with help from a collection of all dictionary words?

 

thanks for suggestions/comments.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

weird problem with solr.DateField

2009-11-11 Thread siping liu


Hi,

I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
defined in solrconfig:

fieldType name=date class=solr.DateField sortMissingLast=true 
omitNorms=true /

field name=lastUpdate type=date indexed=true stored=true default=NOW 
multiValued=false /

 

and following code that get executed once every night:

CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(http://...;);
solrServer.setRequestWriter(new BinaryRequestWriter());

solrServer.add(documents);
solrServer.commit();

UpdateResponse deleteResult = solrServer.deleteByQuery(lastUpdate:[* TO 
NOW-2HOUR]);
solrServer.commit();

 

The purpose is to refresh index with latest data (in documents).

This works fine, except that after a few days I start to see a few documents 
with no lastUpdate field (query -lastUpdate:[* TO *]) -- how can that be 
possible?

 

thanks in advance.

 
  
_
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu

Hi,

I read pretty much all posts on this thread (before and after this one). Looks
like the main suggestion from you and others is to keep max heap size (-Xmx) as
small as possible (as long as you don't see OOM exception). This brings more
questions than answers (for me at least. I'm new to Solr).

First, our environment and problem encountered: Solr1.4 (nightly build,
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the
problem similar to the one orignal poster reported -- long pause (seconds to
minutes) under load test. jconsole showed that it pauses on GC. So more
JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new
setup, it works fine until Tomcat reaches heap size, then it blocks and takes
minutes on full GC to get more space from tenure generation. We tried
different Xmx (from very small to large), no difference in long GC time. We
never run into OOM.

Questions:

* In general various cachings are good for performance, we have more RAM to use
and want to use more caching to boost performance, isn't your suggestion (of
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's
good. But why they get GC'ed eventually?? I did a quick check of Solr code
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that
what is causing all this? This seems to suggest a design flaw in Solr's memory
management strategy (or just my ignorance about Solr?). I mean, wouldn't this
be the right way of doing it -- you allow user to specify the cache size in
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess
now it's better to have Solr on its own Tomcat, given that it's tricky to
adjust the java options.

thanks.

From: wun...@wunderwood.org
To: solr-user@lucene.apache.org
Subject: RE: Solr and Garbage Collection
Date: Fri, 25 Sep 2009 09:51:29 -0700

30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of your
processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the maximum query
rate was 2-3X better with the concurrent generational GC compared to any of
their other GC algorithms, so we got the best throughput along with the
shortest pauses.

Solr garbage generation (for queries) seems to have two major components:
per-request garbage and cache evictions. With a generational collector,
these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so that it
can be collected rapidly and returned to use for further requests. If the
nursery is too small, the per-request allocations will be made in tenured
space and sit there until the next major GC. Cache evictions are almost
always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of course)
while increasing the size of the nursery. That rate should drop when the
nursery gets big enough, then not drop much further as it is increased more.

After that, reduce the size of tenured space until major GCs start happening
too often (a judgment call). A bigger tenured space means longer major GCs
and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say 20% or
less, make that cache much bigger or set it to zero. Either one will reduce
the number of cache evictions. If you have an HTTP cache in front of Solr,
zero may be the right choice, since the HTTP cache is cherry-picking the
easily cacheable requests.

Note that a commit nearly doubles the memory required, because you have two
live Searcher objects with all their caches. Make sure you have headroom for
a commit.

If you want to test the tenured space usage, you must test with real world
queries. Those are the only way to get accurate cache eviction rates.

wunder

_
Bing™ brings you maps, menus, and reviews organized in one place. Try it now.
http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1

anyway to get Document update time stamp

2009-09-17 Thread siping liu


I understand there's no update in Solr/lucene, it's really delete+insert. Is 
there anyway to get a Document's insert time stamp, w/o explicitely creating 
such a data field in the document? If so, how can I query it, for instance get 
all documents that are older than 24 hours? Thanks.
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/171222984/direct/01/

DisMaxRequestHandler usage

2009-06-16 Thread siping liu


Hi,

I have this standard query:

q=(field1:hello OR field2:hello) AND (field3:world)

 

Can I use dismax handler for this (applying the same search term on field1 and 
field2, but keep field3 with something separate)? If it can be done, what's the 
advantage of doing it this way over using the standard query?

 

thanks.

_
Microsoft brings you a new way to search the web.  Try  Bing™ now
http://www.bing.com?form=MFEHPGpubl=WLHMTAGcrea=TEXT_MFEHPG_Core_tagline_try 
bing_1x1

Query faceting

2009-06-08 Thread siping liu


Hi,

I have a field called service with following values:

- Shuttle Services
- Senior Discounts
- Laundry Rooms

- ...

 

When I conduct query with facet=truefacet.field=servicefacet.limit=-1, I 
get something like this back:

- shuttle 2

- service 3

- senior 0

- laundry 0

- room 3

- ...

 

Questions:

- How not to break up fields values in words, so I can get something like 
Shuttle Services 2 back?

- How to tell Solr not to return facet with 0 value? The query takes long time 
to finish, seemingly because of the long list of items with 0 count.

 

thanks for any advice.

_
Insert movie times and more without leaving Hotmail®. 
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009

RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks
like any serious customization work requires developing custom SearchComponent,
but it's not clear to me how Solr designer wanted this to be done. I have more
confident to either do it at Lucene level, or stay on client side and using
something like Multi-core (as discussed here
http://wiki.apache.org/solr/MultipleIndexes).

Date: Wed, 20 May 2009 13:47:20 -0400
Subject: RE: Creating a distributed search in a searchComponent
From: nicholas.bai...@rackspace.com
To: solr-user@lucene.apache.org

It seems I sent this out a bit too soon. After looking at the source it seems
there are two seperate paths for distributed and regular queries, however the
prepare method for for all components is run before the shards parameter is
checked. So I can build the shards portion by using the prepare method of the
my own search component.

However I'm not sure if this is the greatest idea in case solr changes at
some point.

-Nick

-Original Message-
From: Nick Bailey nicholas.bai...@rackspace.com
Sent: Wednesday, May 20, 2009 1:29pm
To: solr-user@lucene.apache.org
Subject: Creating a distributed search in a searchComponent

Hi,

I am wondering if it is possible to basically add the distributed portion of
a search query inside of a searchComponent.

I am hoping to build my own component and add it as a first-component to the
StandardRequestHandler. Then hopefully I will be able to use this component
to build the shards parameter of the query and have the Handler then treat
the query as a distributed search. Anyone have any experience or know if this
is possible?

Thanks,
Nick

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

adding plug-in after search is done

2009-04-27 Thread siping liu


trying to manipulate search result (like further filtering out unwanted), and 
ordering the results differently. Where is the suitable place for doing it? 
I've been using QueryResponseWriter but that doesn't seem to be the right place.

thanks.

_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates2_042009

HELP: CommonsHttpSolrServer.commit() time out after 1min

Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

Re: custom sorter

custom sorter

match to non tokenizable word (helloworld)

weird problem with solr.DateField

RE: Solr and Garbage Collection

anyway to get Document update time stamp

DisMaxRequestHandler usage

Query faceting

RE: Creating a distributed search in a searchComponent

adding plug-in after search is done

13 matches

Site Navigation

Mail list logo

Footer information