Question on Solr Distributed Search

vivek sar Thu, 09 Apr 2009 17:02:06 -0700

Hi,

  I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,


   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa     -
returns with 10 result

now if I add "shards" parameter to it,

  
http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa
 - this fails with

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at
..
        at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
        at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
..
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
        at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
        at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
        at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
        at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
        at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

Attached is my solrconfig.xml. Do I need a special RequestHandler for
sharding? I haven't been able to make any distributed search
successfully. Any help is appreciated.

Note: I'm indexing using Solrj - not sure if that makes any difference
to the search part.

Thanks,
-vivek

<?xml version="1.0" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<config>

  <!-- Used to specify an alternate directory to hold all index data
       other than the default ./data under the Solr home.
       If replication is in use, this should match the replication configuration. -->
  <!--
  <dataDir>./solr/data</dataDir>
  -->

  <indexDefaults>
   <!-- Values here affect all index writers and act as a default unless overridden. -->
    <useCompoundFile>true</useCompoundFile>
    <mergeFactor>100</mergeFactor>
    <!-- <maxBufferedDocs>10000</maxBufferedDocs> -->
    <ramBufferSizeMB>64</ramBufferSizeMB>    
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
    <lockType>single</lockType>
  </indexDefaults>

  <mainIndex>
    <!-- options specific to the main on-disk lucene index -->
    <useCompoundFile>true</useCompoundFile>
    <mergeFactor>100</mergeFactor>
    
    <!-- <maxBufferedDocs>1000</maxBufferedDocs>  -->
    <!-- Tell Lucene when to flush documents to disk.
    Giving Lucene more memory for indexing means faster indexing at the cost of more RAM
    If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.
    -->
    <ramBufferSizeMB>64</ramBufferSizeMB>    
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>

    <!-- If true, unlock any held write or commit locks on startup. 
         This defeats the locking mechanism that allows multiple
         processes to safely access a lucene index, and should be
         used with care. -->
    <unlockOnStartup>true</unlockOnStartup>
    <lockType>single</lockType>
  </mainIndex>

  <!-- the default high-performance update handler -->
  <updateHandler class="solr.DirectUpdateHandler2">

    <!-- A prefix of "solr." for class names is an alias that
         causes solr to search appropriate packages, including
         org.apache.solr.(search|update|request|core|analysis)
     -->

    <!-- Perform a <commit/> automatically under certain conditions:
         maxDocs - number of updates since last commit is greater than this
         maxTime - oldest uncommited update (in ms) is this long ago
     -->
     <!--  
    <autoCommit waitSearcher="false" waitFlush="false"> 
      <maxDocs>100000</maxDocs>
      <maxTime>10000</maxTime> 
    </autoCommit>
    -->

    <!-- The RunExecutableListener executes an external command.
         exe - the name of the executable to run
         dir - dir to use as the current working directory. default="."
         wait - the calling thread waits until the executable returns. default="true"
         args - the arguments to pass to the program.  default=nothing
         env - environment variables to set.  default=nothing
      -->
    <!-- A postCommit event is fired after every commit or optimize command
    <listener event="postCommit" class="solr.RunExecutableListener">
      <str name="exe">snapshooter</str>
      <str name="dir">solr/bin</str>
      <bool name="wait">true</bool>
      <arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
      <arr name="env"> <str>MYVAR=val1</str> </arr>
    </listener>
    -->
    <!-- A postOptimize event is fired only after every optimize command, useful
         in conjunction with index distribution to only distribute optimized indicies 
    <listener event="postOptimize" class="solr.RunExecutableListener">
      <str name="exe">snapshooter</str>
      <str name="dir">solr/bin</str>
      <bool name="wait">true</bool>
    </listener>
    -->

  </updateHandler>


  <query>
    <!-- Maximum number of clauses in a boolean query... can affect
        range or prefix queries that expand to big boolean
        queries.  An exception is thrown if exceeded.  -->
    <maxBooleanClauses>1024</maxBooleanClauses>

    
    <!-- Cache used by SolrIndexSearcher for filters (DocSets),
         unordered sets of *all* documents that match a query.
         When a new searcher is opened, its caches may be prepopulated
         or "autowarmed" using data from caches in the old searcher.
         autowarmCount is the number of items to prepopulate.  For LRUCache,
         the autowarmed items will be the most recently accessed items.
       Parameters:
         class - the SolrCache implementation (currently only LRUCache)
         size - the maximum number of entries in the cache
         initialSize - the initial capacity (number of entries) of
           the cache.  (seel java.util.HashMap)
         autowarmCount - the number of entries to prepopulate from
           and old cache.
         -->
    <filterCache
      class="solr.LRUCache"
      size="512"
      initialSize="512"
      autowarmCount="256"/>

   <!-- queryResultCache caches results of searches - ordered lists of
         document ids (DocList) based on a query, a sort, and the range
         of documents requested.  -->
    <queryResultCache
      class="solr.LRUCache"
      size="512"
      initialSize="512"
      autowarmCount="256"/>

  <!-- documentCache caches Lucene Document objects (the stored fields for each document).
       Since Lucene internal document ids are transient, this cache will not be autowarmed.  -->
    <documentCache
      class="solr.LRUCache"
      size="512"
      initialSize="512"
      autowarmCount="0"/>

    <!-- If true, stored fields that are not requested will be loaded lazily.
    -->
    <enableLazyFieldLoading>false</enableLazyFieldLoading>

    <!-- Example of a generic cache.  These caches may be accessed by name
         through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert().
         The purpose is to enable easy caching of user/application level data.
         The regenerator argument should be specified as an implementation
         of solr.search.CacheRegenerator if autowarming is desired.  -->
    <!--
    <cache name="myUserCache"
      class="solr.LRUCache"
      size="4096"
      initialSize="1024"
      autowarmCount="1024"
      regenerator="org.mycompany.mypackage.MyRegenerator"
      />
    -->

   <!-- An optimization that attempts to use a filter to satisfy a search.
         If the requested sort does not include score, then the filterCache
         will be checked for a filter matching the query. If found, the filter
         will be used as the source of document ids, and then the sort will be
         applied to that.
    <useFilterForSortedQuery>true</useFilterForSortedQuery>
   -->

   <!-- An optimization for use with the queryResultCache.  When a search
         is requested, a superset of the requested number of document ids
         are collected.  For example, if a search for a particular query
         requests matching documents 10 through 19, and queryWindowSize is 50,
         then documents 0 through 50 will be collected and cached.  Any further
         requests in that range can be satisfied via the cache.  -->
    <queryResultWindowSize>10</queryResultWindowSize>

    <!-- This entry enables an int hash representation for filters (DocSets)
         when the number of items in the set is less than maxSize.  For smaller
         sets, this representation is more memory efficient, more efficient to
         iterate over, and faster to take intersections.  -->
    <HashDocSet maxSize="3000" loadFactor="0.75"/>


    <!-- boolToFilterOptimizer converts boolean clauses with zero boost
         into cached filters if the number of docs selected by the clause exceeds
         the threshold (represented as a fraction of the total index) -->
    <boolTofilterOptimizer enabled="true" cacheSize="32" threshold=".05"/>


    <!-- a newSearcher event is fired whenever a new searcher is being prepared
         and there is a current searcher handling requests (aka registered). -->
    <!-- QuerySenderListener takes an array of NamedList and executes a
         local query request for each NamedList in sequence. -->
    <!--
    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">solr</str> <str name="start">0</str> <str name="rows">10</str> </lst>
        <lst> <str name="q">rocks</str> <str name="start">0</str> <str name="rows">10</str> </lst>
      </arr>
    </listener>
    -->

    <!-- a firstSearcher event is fired whenever a new searcher is being
         prepared but there is no current registered searcher to handle
         requests or to gain autowarming data from. -->
    <!--
    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str name="rows">10</str> </lst>
      </arr>
    </listener>
    -->

    <!-- If a search request comes in and there is no current registered searcher,
         then immediately register the still warming searcher and use it.  If
         "false" then all requests will block until the first searcher is done
         warming. -->
    <useColdSearcher>false</useColdSearcher>

  </query>


 <!-- requestHandler plugins... incoming queries will be dispatched to the
     correct handler based on the path or the qt (query type) param.
     Names starting with a '/' are accessed with the a path equal to the 
     registered name.  Names without a leading '/' are accessed with:
      http://host/app/select?qt=name
     If no qt is defined, the requestHandler that declares default="true"
     will be used.
  -->
  <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <!--
       <int name="rows">10</int>
       <str name="fl">*</str>
       <str name="version">2.1</str>
        -->
     </lst>
  </requestHandler>

 
  
  
  <requestHandler name="instock" class="solr.DisMaxRequestHandler" >
    <!-- for legacy reasons, DisMaxRequestHandler will assume all init
         params are "defaults" if you don't explicitly specify any defaults.
      -->
     <str name="fq">
        inStock:true
     </str>
     <str name="qf">
        text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
     </str>
     <str name="mm">
        2&lt;-1 5&lt;-2 6&lt;90%
     </str>
  </requestHandler>
  
  <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" />

  <!--
   Analysis request handler.  Since Solr 1.3.  Use to returnhow a document is analyzed.  Useful
   for debugging and as a token server for other types of applications
   -->
  <requestHandler name="/analysis" class="solr.AnalysisRequestHandler" />


  <!-- CSV update handler, loaded on demand -->
  <requestHandler name="/update/csv" class="solr.CSVRequestHandler" startup="lazy" />
  
  <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
  
  <!-- queryResponseWriter plugins... query responses will be written using the
    writer specified by the 'wt' request parameter matching the name of a registered
    writer.
    The "standard" writer is the default and will be used if 'wt' is not specified 
    in the request. XMLResponseWriter will be used if nothing is specified here.
    The json, python, and ruby writers are also available by default.

    <queryResponseWriter name="standard" class="org.apache.solr.request.XMLResponseWriter"/>
    <queryResponseWriter name="json" class="org.apache.solr.request.JSONResponseWriter"/>
    <queryResponseWriter name="python" class="org.apache.solr.request.PythonResponseWriter"/>
    <queryResponseWriter name="ruby" class="org.apache.solr.request.RubyResponseWriter"/>

    <queryResponseWriter name="custom" class="com.example.MyResponseWriter"/>
  -->

  <!-- XSLT response writer transforms the XML output by any xslt file found
       in Solr's conf/xslt directory.  Changes to xslt files are checked for
       every xsltCacheLifetimeSeconds.  
   -->
  <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter">
    <int name="xsltCacheLifetimeSeconds">5</int>
  </queryResponseWriter> 
    
  <!-- config for the admin interface --> 
  <admin>
    <defaultQuery>solr</defaultQuery>    
    <!-- pingQuery should be "URLish" ...
         &amp; separated key=val pairs ... but there shouldn't be any
         URL escaping of the values -->
    <!-- <pingQuery>
     qt=dismax&amp;q=solr&amp;start=3&amp;fq=id:[* TO *]&amp;fq=cat:[* TO *]
    </pingQuery> -->
    <!-- configure a healthcheck file for servers behind a loadbalancer
    <healthcheck type="file">server-enabled</healthcheck>
    -->
  </admin>
    
</config>

Question on Solr Distributed Search

Reply via email to