Re: Excessive Heap Usage from docValues?

2014-03-20 Thread Toke Eskildsen
On Wed, 2014-03-19 at 22:01 +0100, tradergene wrote:
 I have a Solr index with about 32 million docs.  Each doc is relatively
 small but has multiple dynamic fields that are storing INTs.  The initial
 problem that I had to resolve is that we were running into OOMs (on a 48GB
 heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
 filling up the heap due to all the dynamic fields.

48GB heap for a 130GB, 32M docs index sounds excessive.  Could you tell
us how many unique fields your searcher uses in total for faceting and
maybe the overall layout of your index? Is this perhaps a case of many
distinct groups of data put in the same index, where the searches are
always within a single group and each group has its own fields for
faceting? Are the fields single- or multi-valued?

- Toke Eskildsen, State and University Library, Denmark




Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-20 Thread Salman Akram
Yup!


On Thu, Mar 20, 2014 at 5:13 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Guessing it's surround query parser's support for within backed by span
 queries.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Mar 19, 2014 4:44 PM, T. Kuro Kurosaka k...@healthline.com wrote:

  In the thread Partial Counts in SOLR, Salman gave us this sample query:
 
   ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
  purchase* or repurchase*)) w/10 (executive or director)
 
 
  I'm not familiar with this w/10 notation. What does this mean,
  and what parser(s) supports this syntax?
 
  Kuro
 
 




-- 
Regards,

Salman Akram


Re: Solr memory usage off-heap

2014-03-20 Thread Avishai Ish-Shalom
thanks!


On Tue, Mar 18, 2014 at 4:37 PM, Erick Erickson erickerick...@gmail.comwrote:

 Avishai:

 It sounds like you already understand mmap. Even so you might be
 interested in this excellent writeup of MMapDirectory and Lucene by
 Uwe:
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

 Best,
 Erick

 On Tue, Mar 18, 2014 at 7:23 AM, Avishai Ish-Shalom
 avis...@fewbytes.com wrote:
  aha! mmap explains it. thank you.
 
 
  On Tue, Mar 18, 2014 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote:
   My solr instances are configured with 10GB heap (Xmx) but linux shows
   resident size of 16-20GB. even with thread stack and permgen taken
 into
   account i'm still far off from these numbers. Could it be that jvm IO
   buffers take so much space? does lucene use JNI/JNA memory
 allocations?
 
  Solr does not do anything off-heap.  There is a project called
  heliosearch underway that aims to use off-heap memory extensively with
  Solr.
 
  There IS some mis-reporting of memory usage, though.  See a screenshot
  that I just captured of top output, sorted by memory usage.  The java
  process at the top of the list is Solr, running under the included
 Jetty:
 
  https://www.dropbox.com/s/03a3pp510mrtixo/solr-ram-usage-wrong.png
 
  I have a 6GB heap and 52GB of index data on this server.  This makes the
  62.2GB virtual memory size completely reasonable.  The claimed resident
  memory size is 20GB, though.  If you add that 20GB to the 49GB that is
  allocated to the OS disk cache and the 6GB that it says is free, that's
  75GB.  I've only got 64GB of RAM on the box, so something is being
  reported wrong.
 
  If I take my 20GB resident size and subtract the 14GB shared size, that
  is closer to reality, and it makes the numbers fit into the actual
  amount of RAM that's on the machine.  I believe the misreporting is
  caused by the specific way that Java uses MMap when opening Lucene
  indexes.  This information comes from what I remember about a
  conversation I witnessed in #lucene or #lucene-dev, not from my own
  exploration.  I believe they said that the MMap methods which don't
  misreport memory usage would not do what Lucene requires.
 
  Thanks,
  Shawn
 
 



wrong query results with wdf and ngtf

2014-03-20 Thread Andreas Owen
Is there a way to tell ngramfilterfactory while indexing that number shall 
never be tokenized? then the query should be able to find numbers.

Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is 
possible? So to speak put the hole number as token and not all possible tokens.

Solr analysis shows onnly WDF has no underscore in its tokens, the rest have 
it. can i tell the query to search numbers differently with NGTF, WT, LCF or 
whatever?

I also tried filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/
@ = ALPHA
_ = ALPHA

I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

avaloq frage 1- only returns if i set minGramSize=1 while 
indexing
yh_cug- query parser doesn't remove _ but the 
indexer does (WDF) so there is no match

Is there a way to also query the hole term avaloq frage 1 without tokenizing 
it?

Fieldtype:

fieldType name=text_de class=solr.TextField positionIncrementGap=100
  analyzer type=index 
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/ 
filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_de.txt format=snowball 
enablePositionIncrements=true/ !-- remove common words --
 filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=German/ !-- remove noun/adjective inflections like plural endings 
-- 
filter class=solr.NGramFilterFactory minGramSize=3 
maxGramSize=15/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   /analyzer
   analyzer type=query
tokenizer class=solr.WhiteSpaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/ 
filter class=solr.StopFilterFactory 
ignoreCase=true words=lang/stopwords_de.txt format=snowball 
enablePositionIncrements=true/ !-- remove common words --
filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=German/
  /analyzer
 /fieldType


Solrconfig:

 queryParser name=synonym_edismax
 class=solr.SynonymExpandingExtendedDismaxQParserPlugin
   lst name=synonymAnalyzers
 lst name=myCoolAnalyzer
   lst name=tokenizer
 str name=classstandard/str
   /lst
   lst name=filter
 str name=classshingle/str
 str name=outputUnigramsIfNoShinglestrue/str
 str name=outputUnigramstrue/str
 str name=minShingleSize2/str
 str name=maxShingleSize4/str
   /lst
   lst name=filter
 str name=classsynonym/str
 str name=tokenizerFactorysolr.KeywordTokenizerFactory/str
 str name=synonymssynonyms.txt/str
 str name=expandtrue/str
 str name=ignoreCasetrue/str
   /lst
 /lst
   /lst
 /queryParser
 
 requestHandler name=/select2 class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=defTypesynonym_edismax/str
str name=synonymstrue/str
str name=qfplain_text^10 editorschoice^200
 title^20 h_*^14
 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
 contentmanager^5 links^5
 last_modified^5 url^5
/str
str name=bq(expiration:[NOW TO *] OR (*:* 
 -expiration:*))^6/str
str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
 
str name=dftext/str
str name=fl*,path,score/str
str name=wtjson/str
str name=q.opAND/str
 
!-- Highlighting defaults --
str name=hlon/str
str name=hl.flplain_text,title/str
str name=hl.fragSize200/str
str name=hl.simple.prelt;bgt;/str
str name=hl.simple.postlt;/bgt;/str
 
 !-- lst name=invariants --
 str name=faceton/str
 str name=facet.mincount1/str
 str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str
 str name=f.inhaltstyp_s.facet.sortindex/str
 str name=facet.field{!ex=doctype}doctype/str
 str name=f.doctype.facet.sortindex/str
 str name=facet.field{!ex=thema_f}thema_f/str
 str name=f.thema_f.facet.sortindex/str
 str name=facet.field{!ex=author_s}author_s/str
 str name=f.author_s.facet.sortindex/str
 str
 name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str
 str name=f.sachverstaendiger_s.facet.sortindex/str
 str name=facet.field{!ex=veranstaltung_s}veranstaltung_s/str
 str name=f.veranstaltung_s.facet.sortindex/str
 str name=facet.date{!ex=last_modified}last_modified/str
 str name=facet.date.gap+1MONTH/str
 str name=facet.date.endNOW/MONTH+1MONTH/str
 str name=facet.date.startNOW/MONTH-36MONTHS/str
 

Re: join and filter query with AND

2014-03-20 Thread Marcin Rzewucki
Nope. There is no line break in the string and it is not feed from file.
What else could be the reason ?



On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote:

 It looks to me like you're feeding this from some
 kind of text file and you really _do_ have a
 line break after Stara

 Or have a line break in the string you paste into the URL
 or something similar.

 Kind of shooting in the dark though.

 Erick

 On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:
  Hi,
 
  I have the following issue with join query parser and filter query. For
  such query:
 
  str name=q*:*/str
  str name=fq
  (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
  Zagora)) AND (prod:214)
  /str
 
  I got error:
  lst name=error
  str name=msg
  org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical
  error at line 1, column 12. Encountered: EOF after : \Stara
  /str
  int name=code400/int
  /lst
 
  Stack:
  DEBUG - 2014-03-19 13:35:20.825;
 org.eclipse.jetty.servlet.ServletHandler;
  chain=SolrRequestFilter-default
  DEBUG - 2014-03-19 13:35:20.826;
  org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
  SolrRequestFilter
  ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
  Cannot parse 'city:Stara': Lexical error at line 1, column 12.  E
  ncountered: EOF after : \Stara
  at
 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
  at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:364)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
  at
  org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Thread.java:744)
  Caused by: org.apache.solr.search.SyntaxError: Cannot parse
 'city:Stara':
  Lexical error at line 1, column 12.  Encountered: EOF after : \Stara
  at
 
 org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:159)
  at
 org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
  at org.apache.solr.search.QParser.getQuery(QParser.java:141)
  at
 
 

Problems with the Suggest Request Handler in Solr 4.7.0

2014-03-20 Thread Steve Huckle
The Suggest Search Component that comes preconfigured in Solr 4.7.0 
solrconfig.xml seems to thread dump when I call it:


http://localhost:8983/solr/suggest?spellcheck=onq=acwt=jsonindent=true

msg:No suggester named default was configured

Can someone tell me what's going on there?

However, I can stop that happening if I replace the preconfigured 
Suggest Search Component and Request Handler with the Search Component 
and Request Handler configuration detailed here:


https://cwiki.apache.org/confluence/display/solr/Suggester

...but after indexing the data in exampledocs, it doesn't seem to return 
any suggestions either. Can anyone help suggest how I might get suggest 
suggesting suggestions?


--
Steve Huckle

If you print this email, eventually you'll want to throw it away. But there is 
no away. So don't print this email, even if you have to.



Solr dih to read Clob contents

2014-03-20 Thread Prasi S
Hi,
I have a requirement to index a database table with clob content. Each row
in my table a column which is an xml stored as clob. I want to read the
contents of xmlthrough dih and map each of the xml tag to a separate solr
field,

Below is my clob content.
root
   authorA/author
   date02-Dec-2013/date
   .
   .
   .
/root

i want to read the contents of the clob and map author to author_solr and
date to date_solr . Is this possible with a clob tranformer or a script
tranformer.


Thanks,
Prasi


wrong results with wdf ngtf

2014-03-20 Thread Andreas Owen
Is there a way to tell ngramfilterfactory while indexing that number shall
never be tokenized? then the query should be able to find numbers.

 

Or do i have to change the ngram-min for numbers (not alpha) to 1, if that
is possible? So to speak put the hole number as token and not all possible
tokens.

 

Solr analysis shows onnly WDF has no underscore in its tokens, the rest have
it. can i tell the query to search numbers differently with NGTF, WT, LCF or
whatever?

 

I also tried filter class=solr.WordDelimiterFilterFactory
types=at-under-alpha.txt/

@ = ALPHA

_ = ALPHA

 

I have gotten nearly everything to work. There are to queries where i dont
get back what i want.

 

avaloq frage 1   - only returns if i set
minGramSize=1 while indexing

yh_cug- query parser doesn't
remove _ but the indexer does (WDF) so there is no match

 

Is there a way to also query the hole term avaloq frage 1 without
tokenizing it?

 

Fieldtype:

 

fieldType name=text_de class=solr.TextField positionIncrementGap=100

  analyzer type=index 

   tokenizer
class=solr.StandardTokenizerFactory/

filter
class=solr.LowerCaseFilterFactory/

   filter
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ 

   filter class=solr.StopFilterFactory
ignoreCase=true words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

filter
class=solr.GermanNormalizationFilterFactory/

   filter
class=solr.SnowballPorterFilterFactory language=German/ !-- remove
noun/adjective inflections like plural endings --


   filter class=solr.NGramFilterFactory
minGramSize=3 maxGramSize=15/

   filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/

   /analyzer

   analyzer type=query

   tokenizer
class=solr.WhiteSpaceTokenizerFactory/

   filter
class=solr.LowerCaseFilterFactory/

   filter
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ 

   filter
class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

   filter
class=solr.GermanNormalizationFilterFactory/

   filter
class=solr.SnowballPorterFilterFactory language=German/

  /analyzer

/fieldType

 

 

Solrconfig:

 

 queryParser name=synonym_edismax

 class=solr.SynonymExpandingExtendedDismaxQParserPlugin

   lst name=synonymAnalyzers

 lst name=myCoolAnalyzer

   lst name=tokenizer

 str name=classstandard/str

   /lst

   lst name=filter

 str name=classshingle/str

 str name=outputUnigramsIfNoShinglestrue/str

 str name=outputUnigramstrue/str

 str name=minShingleSize2/str

 str name=maxShingleSize4/str

   /lst

   lst name=filter

 str name=classsynonym/str

 str name=tokenizerFactorysolr.KeywordTokenizerFactory/str

 str name=synonymssynonyms.txt/str

 str name=expandtrue/str

 str name=ignoreCasetrue/str

   /lst

 /lst

   /lst

 /queryParser

 

 requestHandler name=/select2 class=solr.SearchHandler

  lst name=defaults

str name=echoParamsexplicit/str

int name=rows10/int

str name=defTypesynonym_edismax/str

str name=synonymstrue/str

str name=qfplain_text^10 editorschoice^200

 title^20 h_*^14

 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10

 contentmanager^5 links^5

 last_modified^5 url^5

/str

str name=bq(expiration:[NOW TO *] OR (*:* 

 -expiration:*))^6/str

str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --

 

str name=dftext/str

str name=fl*,path,score/str

str name=wtjson/str

str name=q.opAND/str

 

!-- Highlighting defaults --

str name=hlon/str

str name=hl.flplain_text,title/str

str name=hl.fragSize200/str

str name=hl.simple.prelt;bgt;/str

str name=hl.simple.postlt;/bgt;/str

 

 !-- lst name=invariants --

 str name=faceton/str

 str name=facet.mincount1/str

 str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str

 str name=f.inhaltstyp_s.facet.sortindex/str

 str name=facet.field{!ex=doctype}doctype/str

 str name=f.doctype.facet.sortindex/str

 str name=facet.field{!ex=thema_f}thema_f/str

 str name=f.thema_f.facet.sortindex/str

 str name=facet.field{!ex=author_s}author_s/str

 str name=f.author_s.facet.sortindex/str

 str

 

Re: Solr dih to read Clob contents

2014-03-20 Thread Gora Mohanty
On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote:

 Hi,
 I have a requirement to index a database table with clob content. Each row
 in my table a column which is an xml stored as clob. I want to read the
 contents of xmlthrough dih and map each of the xml tag to a separate solr
 field,

 Below is my clob content.
 root
authorA/author
date02-Dec-2013/date
.
.
.
 /root

 i want to read the contents of the clob and map author to author_solr and
 date to date_solr . Is this possible with a clob tranformer or a script
 tranformer.

You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
along with the ClobTransformer. You do not provide details of your DIH data
configuration file, but this should look something like:

dataSource name=xmldata type=FieldReaderDataSource/
...
document
  entity name=x query=... transformer=ClobTransformer
 entity name=y dataSource=xmldata dataField=x.clob_column
processor=XPathEntityProcessor forEach=/root
   field column=author_solr xpath=/author /
   field column=date_solr xpath=/date /
 /entity
  /entity
/document

Regards,
Gora


Re: Solr4.7 No live SolrServers available to handle this request

2014-03-20 Thread Greg Walters
Sathya,

I assume you're using Solr Cloud. Please provide your clusterstate.json while 
you're seeing this issue and check your logs for any exceptions. With no 
information from you it's hard to troubleshoot any issues!

Thanks,
Greg

On Mar 20, 2014, at 12:44 AM, Sathya sathia.blacks...@gmail.com wrote:

 Hi Friends,
 
 I am new to Solr. I have 5 solr node in 5 different machine. When i index
 the data, sometimes *No live SolrServers available to handle this request*
 exception occur in 1 or 2 machines. 
 
 I dont know why its happen and how to solve this. Kindly help me to solve
 this issue.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679.html
 Sent from the Solr - User mailing list archive at Nabble.com.



wrong results with wdf ngtf

2014-03-20 Thread aowen
Is there a way to tell ngramfilterfactory while indexing that number shall 
never be tokenized? then the query should be able to find numbers.

Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is 
possible? So to speak put the hole number as token and not all possible tokens.

Solr analysis shows onnly WDF has no underscore in its tokens, the rest have 
it. can i tell the query to search numbers differently with NGTF, WT, LCF or 
whatever?

I also tried filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/
@ = ALPHA
_ = ALPHA

I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

avaloq frage 1   - only returns if i set 
minGramSize=1 while indexing
yh_cug- query parser doesn't 
remove _ but the indexer does (WDF) so there is no match

Is there a way to also query the hole term avaloq frage 1 without tokenizing 
it?

Fieldtype:

fieldType name=text_de class=solr.TextField positionIncrementGap=100
  analyzer type=index 
   tokenizer 
class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/ 
   filter class=solr.StopFilterFactory 
ignoreCase=true words=lang/stopwords_de.txt format=snowball 
enablePositionIncrements=true/ !-- remove common words --
filter 
class=solr.GermanNormalizationFilterFactory/
   filter class=solr.SnowballPorterFilterFactory 
language=German/ !-- remove noun/adjective inflections like plural endings 
-- 
   filter class=solr.NGramFilterFactory 
minGramSize=3 maxGramSize=15/
   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.WhiteSpaceTokenizerFactory/
   filter 
class=solr.LowerCaseFilterFactory/
   filter 
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ 
   filter 
class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt 
format=snowball enablePositionIncrements=true/ !-- remove common words --
   filter 
class=solr.GermanNormalizationFilterFactory/
   filter 
class=solr.SnowballPorterFilterFactory language=German/
  /analyzer
/fieldType


Solrconfig:

 queryParser name=synonym_edismax
 class=solr.SynonymExpandingExtendedDismaxQParserPlugin
   lst name=synonymAnalyzers
 lst name=myCoolAnalyzer
   lst name=tokenizer
 str name=classstandard/str
   /lst
   lst name=filter
 str name=classshingle/str
 str name=outputUnigramsIfNoShinglestrue/str
 str name=outputUnigramstrue/str
 str name=minShingleSize2/str
 str name=maxShingleSize4/str
   /lst
   lst name=filter
 str name=classsynonym/str
 str name=tokenizerFactorysolr.KeywordTokenizerFactory/str
 str name=synonymssynonyms.txt/str
 str name=expandtrue/str
 str name=ignoreCasetrue/str
   /lst
 /lst
   /lst
 /queryParser
 
 requestHandler name=/select2 class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=defTypesynonym_edismax/str
str name=synonymstrue/str
str name=qfplain_text^10 editorschoice^200
 title^20 h_*^14
 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
 contentmanager^5 links^5
 last_modified^5 url^5
/str
str name=bq(expiration:[NOW TO *] OR (*:* 
 -expiration:*))^6/str
str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
 
str name=dftext/str
str name=fl*,path,score/str
str name=wtjson/str
str name=q.opAND/str
 
!-- Highlighting defaults --
str name=hlon/str
str name=hl.flplain_text,title/str
str name=hl.fragSize200/str
str name=hl.simple.prelt;bgt;/str
str name=hl.simple.postlt;/bgt;/str
 
 !-- lst name=invariants --
 str name=faceton/str
 str name=facet.mincount1/str
 str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str
 str name=f.inhaltstyp_s.facet.sortindex/str
 str name=facet.field{!ex=doctype}doctype/str
 str name=f.doctype.facet.sortindex/str
 str name=facet.field{!ex=thema_f}thema_f/str
 str name=f.thema_f.facet.sortindex/str
 str name=facet.field{!ex=author_s}author_s/str
 str name=f.author_s.facet.sortindex/str
 str
 name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str
 str 

Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Ugo Matrangolo
Hi,

I would like some advice about the best way to bootstrap from scratch a
SolrCloud cluster housing at least two collections with different
sharding/replication setup.

Going through the docs/'Solr In Action' book what I have sees so far is
that there is a way to bootstrap a SolrCloud cluster with sharding
configuration using the:

  -DnumShards=2

but this (afaik) works only for a single collection. What I need is a way
to deploy from scratch a SolrCloud cluster housing (e.g.) two collections
Foo and Bar where Foo has only one shard and is replicated everywhere while
Bar has three shards and ,again, is replicated.

I can't find a config file where to put this sharding plan and I'm starting
to think that the only way to do this is after the deploy using the
Collections API.

Is there a best approach way to do this ?

Ugo


Re: join and filter query with AND

2014-03-20 Thread Erick Erickson
Well, the error message really looks like your input is
getting chopped off.

It's vaguely possible that you have some super-low limit
in your servlet container configuration that is only letting very
small packets through.

What I'd do is look in the Solr log file to see exactly what
is coming through. Because regardless of what you _think_
you're sending, it _really_ looks like Solr is getting the fq
clause with something that breaks it up. So I'd like to
absolutely nail that as being wrong before speculating.

Because I can cut/paste your fq clause just fine. Of course
it fails because I don't have the other core defined, but that
means the query has made it through query parsing while
yours hasn't in your setup.

Best,
Erick

On Thu, Mar 20, 2014 at 2:19 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:
 Nope. There is no line break in the string and it is not feed from file.
 What else could be the reason ?



 On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote:

 It looks to me like you're feeding this from some
 kind of text file and you really _do_ have a
 line break after Stara

 Or have a line break in the string you paste into the URL
 or something similar.

 Kind of shooting in the dark though.

 Erick

 On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:
  Hi,
 
  I have the following issue with join query parser and filter query. For
  such query:
 
  str name=q*:*/str
  str name=fq
  (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
  Zagora)) AND (prod:214)
  /str
 
  I got error:
  lst name=error
  str name=msg
  org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical
  error at line 1, column 12. Encountered: EOF after : \Stara
  /str
  int name=code400/int
  /lst
 
  Stack:
  DEBUG - 2014-03-19 13:35:20.825;
 org.eclipse.jetty.servlet.ServletHandler;
  chain=SolrRequestFilter-default
  DEBUG - 2014-03-19 13:35:20.826;
  org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
  SolrRequestFilter
  ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
  Cannot parse 'city:Stara': Lexical error at line 1, column 12.  E
  ncountered: EOF after : \Stara
  at
 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
  at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:364)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
  at
  org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
 

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Mark Miller
Honestly, the best approach is to start with no collections defined and use the 
collections api.

If you want to prefconfigure (which has it’s warts and will likely go away as 
an option), it’s tricky to do it with different numShards, as that is a global 
property per node.

You would basically set -DnumShards=1 and start your cluster with Foo defined. 
Then you stop the cluster and define Bar and start with -DnumShards=3.

The ability to preconfigure and bootstrap like this was kind of a transitional 
system meant to help people that knew Solr pre SolrCloud get something up 
quickly back before we had a collections api.

The collections API is much better if you want multiple collections and it’s 
the future.
-- 
Mark Miller
about.me/markrmiller

On March 20, 2014 at 10:24:18 AM, Ugo Matrangolo (ugo.matrang...@gmail.com) 
wrote:

Hi,  

I would like some advice about the best way to bootstrap from scratch a  
SolrCloud cluster housing at least two collections with different  
sharding/replication setup.  

Going through the docs/'Solr In Action' book what I have sees so far is  
that there is a way to bootstrap a SolrCloud cluster with sharding  
configuration using the:  

-DnumShards=2  

but this (afaik) works only for a single collection. What I need is a way  
to deploy from scratch a SolrCloud cluster housing (e.g.) two collections  
Foo and Bar where Foo has only one shard and is replicated everywhere while  
Bar has three shards and ,again, is replicated.  

I can't find a config file where to put this sharding plan and I'm starting  
to think that the only way to do this is after the deploy using the  
Collections API.  

Is there a best approach way to do this ?  

Ugo  


Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Erick Erickson
You might find this useful:
http://heliosearch.org/solrcloud-assigning-nodes-machines/


It uses the collections API to create your collection with zero
nodes, then shows how to assign your leaders to specific
machines (well, at least specify the nodes the leaders will
be created on, it doesn't show how to assign, for instance,
shard1 to nodeX)

It also shows a way to assign specific replicas on specific nodes
to specific shards, although as Mark says this is a transitional
technique. I know there's an addreplica command in the works
for the collections API that should make this easier, but that's
not released yet.

Best,
Erick


On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo
ugo.matrang...@gmail.com wrote:
 Hi,

 I would like some advice about the best way to bootstrap from scratch a
 SolrCloud cluster housing at least two collections with different
 sharding/replication setup.

 Going through the docs/'Solr In Action' book what I have sees so far is
 that there is a way to bootstrap a SolrCloud cluster with sharding
 configuration using the:

   -DnumShards=2

 but this (afaik) works only for a single collection. What I need is a way
 to deploy from scratch a SolrCloud cluster housing (e.g.) two collections
 Foo and Bar where Foo has only one shard and is replicated everywhere while
 Bar has three shards and ,again, is replicated.

 I can't find a config file where to put this sharding plan and I'm starting
 to think that the only way to do this is after the deploy using the
 Collections API.

 Is there a best approach way to do this ?

 Ugo


Multilingual indexing, search results, edismax and stopwords

2014-03-20 Thread kastania44
On our drupal multilingual system we use apache Solr 3.5.
The problem is well known on different blogs, sites I read.
The search results are not the one we want.
On our code in hook apachesolr_query_alter we override the defaultOperator: 
$query-replaceParam('mm', '90%');
The requirement is, when I search for: biological analyses, I want to fetch
only the results which have both of the words.
When I search for: biological and chemical analyses, I want it to fetch only
the results which have biological , chemical, analyses. The and is not
indexed due to stopwords.

If I set mm to 100% and my query has stopwords it will not fetch any result.
If I set mm to 100$ and my query does not have stopwords it will fetch the
desired results.
If I set mm anything between 50%-99% it fetches not wanted results, as
results that contain only one of the searched keywords, or words like the
searched keywords, like analyse (even if I searched for analyses).

If I search using + before the words that are mandatory it works ok, but it
is not user friently, to ask from the user to type + before each word
exvcept from the stopwords.

Do I make any sense? 

Below are some of our configuration details:

All the indexed fields are of type text_language, 
e.g from our schema.xml
/field name=label type=text indexed=true stored=true
termVectors=true omitNorms=true/
field name=i18n_label_en type=text_en indexed=true stored=true
termVectors=true omitNorms=true/
field name=i18n_label_fr type=text_fr indexed=true stored=true
termVectors=true omitNorms=true//
All the text fieldtypes have the same configuration except from the
protected, words, dictionary parameters which are language specific.
e.g from our schema.xml
/fieldType name=text_en class=solr.TextField
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent_en.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/


filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_en.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
protected=protwords.txt generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1 splitOnNumerics=1 stemEnglishPossessive=1/
filter class=solr.LengthFilterFactory min=2 max=100/
filter class=solr.LowerCaseFilterFactory/filter
class=solr.DictionaryCompoundWordTokenFilterFactory
dictionary=compoundwords_en.txt minWordSize=5 minSubwordSize=4
maxSubwordSize=15 onlyLongestMatch=true/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords_en.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent_en.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms_en.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_en.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
protected=protwords.txt generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1 splitOnNumerics=1 stemEnglishPossessive=1/
filter class=solr.LengthFilterFactory min=2 max=100/
filter class=solr.LowerCaseFilterFactory/filter
class=solr.DictionaryCompoundWordTokenFilterFactory
dictionary=compoundwords_en.txt minWordSize=5 minSubwordSize=4
maxSubwordSize=15 onlyLongestMatch=true/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords_en.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType/

 solrQueryParser defaultOperator=AND/

solrconfig.xml

  /requestHandler name=pinkPony class=solr.SearchHandler
default=true
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  bool name=omitHeadertrue/bool
  float name=tie0.01/float
  
  int name=timeAllowed${solr.pinkPony.timeAllowed:-1}/int
  str name=q.alt*:*/str

  
  str name=spellcheckfalse/str
  
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.extendedResultsfalse/str
  
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler/


ANY ideas are appreciated!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multilingual-indexing-search-results-edismax-and-stopwords-tp4125746.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filter in terms component

2014-03-20 Thread Jilani Shaik
Will it work for multi value fields, It is saying that Field Cache will not
work for multi value fields error. Most of the data is multi value fields
in index.

Thanks,
Jilani




On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 If you just need counts may be you can make use of
 http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

 Ahmet



 On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com
 wrote:
 Hi Ahmet,

 I have gone through the facet component, as our application has 300+
 million docs and it very time consuming with this component and also it
 uses cache. So I have gone through the terms component where Solr is
 reading index for field terms, is there any approach where I can get the
 terms using the filter. So that I can restrict some of the document terms
 in counts.

 Basically we have set of documents where we want to show the terms count
 based on those filters with set name. Instead of reading entire index.

 Please let me know if you need any details to throw some more pointers

 Thanks,
 Jilani



 On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Jilani,
 
  What features of terms component are you after? If if it is just
  terms.prefix, it could be simulated with facet component with
 facet.prefix
  parameter. faceting component respects filter queries.
 
 
 
  On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
 jilani24...@gmail.com
  wrote:
  Hi,
 
  I have huge index and using Solr. I need terms component with filter by a
  field. Please let me know is there anything that I can get it.
 
  Please provide me some pointers, even to develop this by going through
 the
  Lucene.
 
  Please suggest.
 
  Thanks,
  Jilani
 
 




understand debuginfo from query

2014-03-20 Thread aowen
i want the infos simplified so that the user can see why a doc was found

bellow is the output a a doc:

0.085597195 = (MATCH) sum of:
  0.083729245 = (MATCH) max of:
0.0019158133 = (MATCH) weight(plain_text:test^10.0 in 601) 
[DefaultSimilarity], result of:
  0.0019158133 = score(doc=601,freq=9.0 = termFreq=9.0
), product of:
0.022560213 = queryWeight, product of:
  10.0 = boost
  3.6232536 = idf(docFreq=81, maxDocs=1130)
  6.2265067E-4 = queryNorm
0.084920004 = fieldWeight in 601, product of:
  3.0 = tf(freq=9.0), with freq of:
9.0 = termFreq=9.0
  3.6232536 = idf(docFreq=81, maxDocs=1130)
  0.0078125 = fieldNorm(doc=601)
0.083729245 = (MATCH) weight(inhaltstyp:test^6.0 in 601) 
[DefaultSimilarity], result of:
  0.083729245 = score(doc=601,freq=1.0 = termFreq=1.0
), product of:
0.017686278 = queryWeight, product of:
  6.0 = boost
  4.734136 = idf(docFreq=26, maxDocs=1130)
  6.2265067E-4 = queryNorm
4.734136 = fieldWeight in 601, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  4.734136 = idf(docFreq=26, maxDocs=1130)
  1.0 = fieldNorm(doc=601)
0.013458222 = (MATCH) weight(title:test^20.0 in 601) [DefaultSimilarity], 
result of:
  0.013458222 = score(doc=601,freq=1.0 = termFreq=1.0
), product of:
0.042281017 = queryWeight, product of:
  20.0 = boost
  3.395244 = idf(docFreq=102, maxDocs=1130)
  6.2265067E-4 = queryNorm
0.31830412 = fieldWeight in 601, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  3.395244 = idf(docFreq=102, maxDocs=1130)
  0.09375 = fieldNorm(doc=601)
  0.001867952 = (MATCH) product of:
0.003735904 = (MATCH) sum of:
  0.003735904 = (MATCH) ConstantScore(expiration:[1395328539325 TO *]), 
product of:
1.0 = boost
0.003735904 = queryNorm
0.5 = coord(1/2)
  0.0 = (MATCH) FunctionQuery(div(int(clicks),max(int(displays),const(1, 
product of:
0.0 = div(int(clicks)=0,max(int(displays)=432,const(1)))
8.0 = boost
6.2265067E-4 = queryNorm 


why is the sum 0.085597195? this would mean 0.083729245 + 0.001867952 and these 
are not included in the sum: 0.0019158133 + 0.013458222  + 0.003735904 

am i looking at the wrong total?
aren't these 2 cases the ones i have to sum up x = (MATCH) sum of or x = 
score( ?

i'm trying to extract the fields that where used for weighing the doc.



Singles in solr for bigrams,trigrams in parsed_query

2014-03-20 Thread Jyotirmoy Sundi
Hi Folks,
   I am using singles to index bigrams/trigrams. The same is also used
for query in the schema.xml file. But when I run the query in debug mode
for a collections, I dont see the bigrams in the parsed_query . Any idea
what I might be missing.
solr/colection/select?q=best%20pricedebugQuery=on

str name=parsedquery_toStringtext:best text:price/str
I was hoping to see
str name=parsedquery_toStringtext:best text:price text:best price/str

My schema files looks like this:
 types
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
fieldType name=int class=solr.TrieIntField precisionStep=0
omitNorms=true positionIncrementGap=0/

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=4 outputUnigrams=true /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.LengthFilterFactory min=3 max=50 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=1 preserveOriginal=1
splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1/
filter class=solr.StopFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.TrimFilterFactory /
/analyzer

  analyzer type=query
filter class=solr.LowerCaseFilterFactory/
filter class=solr.LengthFilterFactory min=3 max=50 /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
splitOnNumerics=0 stemEnglishPossessive=1/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=4 outputUnigrams=true /
filter class=solr.CommonGramsFilterFactory words=stopwords.txt
ignoreCase=true/
!--filter class=solr.CommonGramsFilterFactory
words=stopwords.txt ignoreCase=true/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=4 outputUnigrams=true /--
 /analyzer
/fieldType
 /types



-- 
Best Regards,
Jyotirmoy Sundi


Re: Parallel queries to Solr

2014-03-20 Thread solr2020
Thanks Shawn. When we run any solrj application , the below message is
displayed

org.apache.solr.client.solrj.impl.HttpClientUtil createClient
INFO: Creating new http client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false

and while restarting solr we are getting this message.

org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client,
config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false



This is indicating the number of http connections by default? This can be
overridden by adding the below one.?

 ModifiableSolrParams params = new ModifiableSolrParams(); 
params.add(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 300); 
 params.add(HttpClientUtil.PROP_MAX_CONNECTIONS, 5000); 
 HttpClient httpClient = HttpClientUtil.createClient(params); 

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parallel-queries-to-Solr-tp4119959p4125806.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Jeff Wartes

Please note that although the article talks about the ADDREPLICA command,
that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find
it yet. See https://issues.apache.org/jira/browse/SOLR-5130



On 3/20/14, 7:45 AM, Erick Erickson erickerick...@gmail.com wrote:

You might find this useful:
http://heliosearch.org/solrcloud-assigning-nodes-machines/


It uses the collections API to create your collection with zero
nodes, then shows how to assign your leaders to specific
machines (well, at least specify the nodes the leaders will
be created on, it doesn't show how to assign, for instance,
shard1 to nodeX)

It also shows a way to assign specific replicas on specific nodes
to specific shards, although as Mark says this is a transitional
technique. I know there's an addreplica command in the works
for the collections API that should make this easier, but that's
not released yet.

Best,
Erick


On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo
ugo.matrang...@gmail.com wrote:
 Hi,

 I would like some advice about the best way to bootstrap from scratch a
 SolrCloud cluster housing at least two collections with different
 sharding/replication setup.

 Going through the docs/'Solr In Action' book what I have sees so far is
 that there is a way to bootstrap a SolrCloud cluster with sharding
 configuration using the:

   -DnumShards=2

 but this (afaik) works only for a single collection. What I need is a
way
 to deploy from scratch a SolrCloud cluster housing (e.g.) two
collections
 Foo and Bar where Foo has only one shard and is replicated everywhere
while
 Bar has three shards and ,again, is replicated.

 I can't find a config file where to put this sharding plan and I'm
starting
 to think that the only way to do this is after the deploy using the
 Collections API.

 Is there a best approach way to do this ?

 Ugo



Re: Filter in terms component

2014-03-20 Thread Jilani Shaik
Hi,

Please provide some more pointers to go ahead in addressing this.

Thnks,
Jilani


On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote:


 Will it work for multi value fields, It is saying that Field Cache will
 not work for multi value fields error. Most of the data is multi value
 fields in index.

 Thanks,
 Jilani




 On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 If you just need counts may be you can make use of
 http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

 Ahmet



 On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com
 wrote:
 Hi Ahmet,

 I have gone through the facet component, as our application has 300+
 million docs and it very time consuming with this component and also it
 uses cache. So I have gone through the terms component where Solr is
 reading index for field terms, is there any approach where I can get the
 terms using the filter. So that I can restrict some of the document terms
 in counts.

 Basically we have set of documents where we want to show the terms count
 based on those filters with set name. Instead of reading entire index.

 Please let me know if you need any details to throw some more pointers

 Thanks,
 Jilani



 On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Jilani,
 
  What features of terms component are you after? If if it is just
  terms.prefix, it could be simulated with facet component with
 facet.prefix
  parameter. faceting component respects filter queries.
 
 
 
  On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
 jilani24...@gmail.com
  wrote:
  Hi,
 
  I have huge index and using Solr. I need terms component with filter by
 a
  field. Please let me know is there anything that I can get it.
 
  Please provide me some pointers, even to develop this by going through
 the
  Lucene.
 
  Please suggest.
 
  Thanks,
  Jilani
 
 





Limit on # of collections -SolrCloud

2014-03-20 Thread Chris W
Hi there

 Is there a limit on the # of collections solrcloud can support? Can
zk/solrcloud handle 1000s of collections?

Also i see that the bootup time of solrcloud increases with increase in #
of cores. I do not have any expensive warm up queries. How do i speedup
solr startup?

-- 
Best
-- 
C


Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Shalin Shekhar Mangar
There are no arbitrary limits on the number of collections but yes
there are practical limits. For example, the cluster state can become
a bottleneck. There is a lot of work happening on finding and
addressing these problems. See
https://issues.apache.org/jira/browse/SOLR-5381

Boot up time is because of:
1) Core discovery, schema/config parsing etc
2) Transaction log replay on startup
3) Wait time for enough replicas to become available before leader
election happens

You can't do much about 1 right now I think. For #2, you can keep your
transaction logs smaller by a hard commit before shutdown. For #3
there is a leaderVoteWait settings but I'd rather not touch that
unless it becomes a problem.

On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote:
 Hi there

  Is there a limit on the # of collections solrcloud can support? Can
 zk/solrcloud handle 1000s of collections?

 Also i see that the bootup time of solrcloud increases with increase in #
 of cores. I do not have any expensive warm up queries. How do i speedup
 solr startup?

 --
 Best
 --
 C



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr4.7 No live SolrServers available to handle this request

2014-03-20 Thread Michael Sokolov
I'm getting a similar exception when writing documents (on the client 
side).  I can write one document fine, but the second (which is being 
routed to a different shard) generates the error.  It happens every time 
- definitely not a resource issue or timing problem since this database 
is completely empty -- I'm just getting started and running some tests, 
so there must be some kind of setup problem.  But it's difficult to 
diagnose (for me, anyway)!  I'd appreciate any insight, hints, guesses, 
etc. since I'm stuck. Thanks!


One node (the leader?) is reporting Internal Server Error in its log, 
and another node (presumably the shard where the document is being 
directed) bombs out like this:


ERROR - 2014-03-20 15:56:53.022; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: ERROR adding document 
SolrInputDocument(


... long dump of document fields

)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:99)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)

...
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215)
at 
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)

at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:366)
at 
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:240)
at 
org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:119)
at 
org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:192)

at org.apache.coyote.Response.doWrite(Response.java:520)
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:408)

... 37 more

This is with Solr 4.6.1, Tomcat 7.  Here's my clusterstate.json. Updates 
are being sent to the test1x3 collection



{
  test3x1:{
shards:{
  shard1:{
range:8000-d554,
state:active,
replicas:{core_node1:{
state:active,
base_url:http://10.4.24.37:8080/solr;,
core:test3x1_shard1_replica1,
node_name:10.4.24.37:8080_solr,
leader:true}}},
  shard2:{
range:d555-2aa9,
state:active,
replicas:{core_node3:{
state:active,
base_url:http://10.4.24.39:8080/solr;,
core:test3x1_shard2_replica1,
node_name:10.4.24.39:8080_solr,
leader:true}}},
  shard3:{
range:2aaa-7fff,
state:active,
replicas:{core_node2:{
state:active,
base_url:http://10.4.24.38:8080/solr;,
core:test3x1_shard3_replica1,
node_name:10.4.24.38:8080_solr,
leader:true,
maxShardsPerNode:1,
router:{name:compositeId},
replicationFactor:1},
  test1x3:{
shards:{shard1:{
range:8000-7fff,
state:active,
replicas:{
  core_node1:{
state:active,
base_url:http://10.4.24.39:8080/solr;,
core:test1x3_shard1_replica2,
node_name:10.4.24.39:8080_solr,
leader:true},
  core_node2:{
state:active,
base_url:http://10.4.24.38:8080/solr;,
core:test1x3_shard1_replica1,
node_name:10.4.24.38:8080_solr},
 

Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Chris W
Thanks, Shalin. Making clusterstate.json on a collection basis sounds
awesome.

 I am not having problems with #2 . #3 is a major time hog in my
environment. I have over 300 +collections and restarting the entire cluster
takes in the order of hours.  (2-3 hour). Can you explain more about the
leaderVoteWait setting?




On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 There are no arbitrary limits on the number of collections but yes
 there are practical limits. For example, the cluster state can become
 a bottleneck. There is a lot of work happening on finding and
 addressing these problems. See
 https://issues.apache.org/jira/browse/SOLR-5381

 Boot up time is because of:
 1) Core discovery, schema/config parsing etc
 2) Transaction log replay on startup
 3) Wait time for enough replicas to become available before leader
 election happens

 You can't do much about 1 right now I think. For #2, you can keep your
 transaction logs smaller by a hard commit before shutdown. For #3
 there is a leaderVoteWait settings but I'd rather not touch that
 unless it becomes a problem.

 On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote:
  Hi there
 
   Is there a limit on the # of collections solrcloud can support? Can
  zk/solrcloud handle 1000s of collections?
 
  Also i see that the bootup time of solrcloud increases with increase in #
  of cores. I do not have any expensive warm up queries. How do i speedup
  solr startup?
 
  --
  Best
  --
  C



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Best
-- 
C


Re: Filter in terms component

2014-03-20 Thread Ahmet Arslan
Hi,

I suggest you start a new threat describing your use case. Just describe the 
problem without assumptions. With a appropriate title/subject.

Ahmet



On Thursday, March 20, 2014 10:01 PM, Jilani Shaik jilani24...@gmail.com 
wrote:
Hi,

Please provide some more pointers to go ahead in addressing this.

Thnks,
Jilani



On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote:


 Will it work for multi value fields, It is saying that Field Cache will
 not work for multi value fields error. Most of the data is multi value
 fields in index.

 Thanks,
 Jilani




 On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 If you just need counts may be you can make use of
 http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

 Ahmet



 On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com
 wrote:
 Hi Ahmet,

 I have gone through the facet component, as our application has 300+
 million docs and it very time consuming with this component and also it
 uses cache. So I have gone through the terms component where Solr is
 reading index for field terms, is there any approach where I can get the
 terms using the filter. So that I can restrict some of the document terms
 in counts.

 Basically we have set of documents where we want to show the terms count
 based on those filters with set name. Instead of reading entire index.

 Please let me know if you need any details to throw some more pointers

 Thanks,
 Jilani



 On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Jilani,
 
  What features of terms component are you after? If if it is just
  terms.prefix, it could be simulated with facet component with
 facet.prefix
  parameter. faceting component respects filter queries.
 
 
 
  On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
 jilani24...@gmail.com
  wrote:
  Hi,
 
  I have huge index and using Solr. I need terms component with filter by
 a
  field. Please let me know is there anything that I can get it.
 
  Please provide me some pointers, even to develop this by going through
 the
  Lucene.
 
  Please suggest.
 
  Thanks,
  Jilani
 
 






Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Erick Erickson
How many total replicas are we talking here?
As in how many shards and, for each shard,
how many replicas? I'm not asking for a long list
here, just if you have a bazillion replicas in aggregate.

Hours is surprising.

Best,
Erick

On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com wrote:
 Thanks, Shalin. Making clusterstate.json on a collection basis sounds
 awesome.

  I am not having problems with #2 . #3 is a major time hog in my
 environment. I have over 300 +collections and restarting the entire cluster
 takes in the order of hours.  (2-3 hour). Can you explain more about the
 leaderVoteWait setting?




 On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 There are no arbitrary limits on the number of collections but yes
 there are practical limits. For example, the cluster state can become
 a bottleneck. There is a lot of work happening on finding and
 addressing these problems. See
 https://issues.apache.org/jira/browse/SOLR-5381

 Boot up time is because of:
 1) Core discovery, schema/config parsing etc
 2) Transaction log replay on startup
 3) Wait time for enough replicas to become available before leader
 election happens

 You can't do much about 1 right now I think. For #2, you can keep your
 transaction logs smaller by a hard commit before shutdown. For #3
 there is a leaderVoteWait settings but I'd rather not touch that
 unless it becomes a problem.

 On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote:
  Hi there
 
   Is there a limit on the # of collections solrcloud can support? Can
  zk/solrcloud handle 1000s of collections?
 
  Also i see that the bootup time of solrcloud increases with increase in #
  of cores. I do not have any expensive warm up queries. How do i speedup
  solr startup?
 
  --
  Best
  --
  C



 --
 Regards,
 Shalin Shekhar Mangar.




 --
 Best
 --
 C


Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Otis Gospodnetic
Hours sounds too long indeed.  We recently had a client with several
thousand collections, but restart wasn't taking hours...

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Mar 20, 2014 5:49 PM, Erick Erickson erickerick...@gmail.com wrote:

 How many total replicas are we talking here?
 As in how many shards and, for each shard,
 how many replicas? I'm not asking for a long list
 here, just if you have a bazillion replicas in aggregate.

 Hours is surprising.

 Best,
 Erick

 On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com wrote:
  Thanks, Shalin. Making clusterstate.json on a collection basis sounds
  awesome.
 
   I am not having problems with #2 . #3 is a major time hog in my
  environment. I have over 300 +collections and restarting the entire
 cluster
  takes in the order of hours.  (2-3 hour). Can you explain more about the
  leaderVoteWait setting?
 
 
 
 
  On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  There are no arbitrary limits on the number of collections but yes
  there are practical limits. For example, the cluster state can become
  a bottleneck. There is a lot of work happening on finding and
  addressing these problems. See
  https://issues.apache.org/jira/browse/SOLR-5381
 
  Boot up time is because of:
  1) Core discovery, schema/config parsing etc
  2) Transaction log replay on startup
  3) Wait time for enough replicas to become available before leader
  election happens
 
  You can't do much about 1 right now I think. For #2, you can keep your
  transaction logs smaller by a hard commit before shutdown. For #3
  there is a leaderVoteWait settings but I'd rather not touch that
  unless it becomes a problem.
 
  On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com
 wrote:
   Hi there
  
Is there a limit on the # of collections solrcloud can support? Can
   zk/solrcloud handle 1000s of collections?
  
   Also i see that the bootup time of solrcloud increases with increase
 in #
   of cores. I do not have any expensive warm up queries. How do i
 speedup
   solr startup?
  
   --
   Best
   --
   C
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 
  --
  Best
  --
  C



Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Chris W
The replication factor is two. I have equally sharded all collections
across all nodes. We have a 6 node cluster setup. 300* 6 shards and 2
replicas per shard. I have almost 600 cores per machine

Also one fact is that my zk timeout is in the order of 2-3 minutes. I see
zk responses very slow and a lot of outstanding requests (found that out
thanks to https://github.com/phunt/)




On Thu, Mar 20, 2014 at 2:53 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hours sounds too long indeed.  We recently had a client with several
 thousand collections, but restart wasn't taking hours...

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Mar 20, 2014 5:49 PM, Erick Erickson erickerick...@gmail.com wrote:

  How many total replicas are we talking here?
  As in how many shards and, for each shard,
  how many replicas? I'm not asking for a long list
  here, just if you have a bazillion replicas in aggregate.
 
  Hours is surprising.
 
  Best,
  Erick
 
  On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com
 wrote:
   Thanks, Shalin. Making clusterstate.json on a collection basis sounds
   awesome.
  
I am not having problems with #2 . #3 is a major time hog in my
   environment. I have over 300 +collections and restarting the entire
  cluster
   takes in the order of hours.  (2-3 hour). Can you explain more about
 the
   leaderVoteWait setting?
  
  
  
  
   On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar 
   shalinman...@gmail.com wrote:
  
   There are no arbitrary limits on the number of collections but yes
   there are practical limits. For example, the cluster state can become
   a bottleneck. There is a lot of work happening on finding and
   addressing these problems. See
   https://issues.apache.org/jira/browse/SOLR-5381
  
   Boot up time is because of:
   1) Core discovery, schema/config parsing etc
   2) Transaction log replay on startup
   3) Wait time for enough replicas to become available before leader
   election happens
  
   You can't do much about 1 right now I think. For #2, you can keep your
   transaction logs smaller by a hard commit before shutdown. For #3
   there is a leaderVoteWait settings but I'd rather not touch that
   unless it becomes a problem.
  
   On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com
  wrote:
Hi there
   
 Is there a limit on the # of collections solrcloud can support? Can
zk/solrcloud handle 1000s of collections?
   
Also i see that the bootup time of solrcloud increases with increase
  in #
of cores. I do not have any expensive warm up queries. How do i
  speedup
solr startup?
   
--
Best
--
C
  
  
  
   --
   Regards,
   Shalin Shekhar Mangar.
  
  
  
  
   --
   Best
   --
   C
 




-- 
Best
-- 
C


SOLR synonyms - Explicit mappings

2014-03-20 Thread bbi123
I need some clarification of how to define explicit mappings in synonyms.txt
file.

I have been using equivalent synonyms for a while and it works as expected.

I am confused with explicit mapping.

I have the below synonyms added to query analyzer.

I want the search on keyword 'watch' to actually do a search on 'smartwatch'
but the below query mapping seems to bring the documents that contain both
keywords 'watch' and 'smartwatch'.. Am I doing anything wrong?

watch = smartwatch

Thanks for your help!!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-synonyms-Explicit-mappings-tp4125858.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-20 Thread shushuai zhu
Hi, 

I am looking for some advice to handle large volume of documents with a very 
high incoming rate. The size of each document is about 0.5 KB and the incoming 
rate could be more than 20K per second and we want to store about one year's 
documents in Solr for near real=time searching. The goal is to achieve 
acceptable indexing and querying performance.

We will use techniques like soft commit, dedicated indexing servers, etc. My 
main question is about how to structure the collection/shard/core to achieve 
the goals. Since the incoming rate is very high, we do not want the incoming 
documents to affect the existing older indexes. Some thoughts are to create a 
latest index to hold the incoming documents (say latest half hour's data, about 
36M docs) so queries on older data could be faster since the old indexes are 
not affected. There seem three ways to grow the time dimension by 
adding/splitting/creating a new object listed below every half hour:

collection
shard
core

Which is the best way to grow the time dimension? Any limitation in that 
direction? Or there is some better approach?

As an example, I am thinking about having 4 nodes with the following 
configuration to setup a Solr Cloud:

Memory: 128 GB
Storage: 4 TB

How to set the collection/shard/core to deal with the use case?

Thanks in advance.

Shushuai 


Memory + WeakIdentityMap

2014-03-20 Thread Harish Agarwal
I'm transitioning my index from a 3.x version to 4.6.  I'm running a large
heap (20G), primarily to accomodate a large facet cache (~5G), but have
been able to run it on 3.x stably.

On 4.6.0 after stress testing I'm finding that all of my shards are
spending all of their time in GC.  After taking a heap dump and analyzing,
it appears that org.apache.lucene.util.WeakIdentityMap is using many Gs of
memory.  Does anyone have any insight into which Solr component(s) use this
and whether this kind of memory consumption is to be expected?

Thank You,
-Harish


Rounding errors with SOLR score

2014-03-20 Thread William Bell
When doing complex boosting/bq we are getting rounding errors on the score.

To get the score to be consistent I needed to use rint on sort:

sort=rint(product(sum($p_score,$s_score,$q_score),100)) desc,s_query asc

str name=p_scorerecip(priority,1,.5,.01)/str
str name=s_scoreproduct(recip(synonym_rank,1,1,.01),17)/str
str name=q_score
query({!dismax qf=user_query_edge^1 user_query^0.5 user_query_fuzzy
v=$q1})
/str

The issue is in the qf area.

{s_query: Ear Irrigation,score: 10.331313},{s_query: Ear Piercing,
score: 10.331314},{s_query: Ear Pinning,score: 10.331313},

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: solr cloud distributed optimize() becomes serialized

2014-03-20 Thread William Bell
Yeah. optimize() also used to come back immediately if the index was
already indexed. It just reopened the index.

We uses to use that for cleaning up the old directories quickly. But now it
does another optimize() even through the index is already optimized.

Very strange.


On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu chris...@gmail.com wrote:

 I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4
 or maybe 4.5, an explicit optimize(), without any parameters, it usually
 took 2 minutes for a 32 core cluster.

 However, in 4.6.1, the same call took about 1 hour. Checking the index
 modification time for each core shows 2 minutes gap if sorted.

 We are using a solrj client connecting to zookeeper. I found it is talking
 to a specific solr server A, and that server A is distributing the calls to
 all other solr servers. Here is the thread dump for this server A:

 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
 at

 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226)
 at

 org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195)
 at

 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250)
 at

 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Wiki edit rights

2014-03-20 Thread William Bell
PLease add me too.


On Tue, Mar 18, 2014 at 8:33 AM, Erick Erickson erickerick...@gmail.comwrote:

 Done, thanks!

 On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson
 anders.gustafs...@pedago.fi wrote:
  Yes, please. My Wiki ID is Anders Gustafsson
 
  But yes, please, add the howto to Wiki. You will need to get your
  account whitelisted first (due to spammers), so send a separate email
  with your Apache wiki id and somebody will unlock you for editing.
 
  --
  Anders Gustafsson
  Engineer, CNI, CNE6, ASE
  Pedago, The Aaland Islands (N60 E20)
  www.pedago.fi
  phone +358 18 12060
  mobile +358 40506 7099
 
 




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: solr cloud distributed optimize() becomes serialized

2014-03-20 Thread Shalin Shekhar Mangar
That's not right. Which Solr versions are you on (question for both
William and Chris)?

On Fri, Mar 21, 2014 at 8:07 AM, William Bell billnb...@gmail.com wrote:
 Yeah. optimize() also used to come back immediately if the index was
 already indexed. It just reopened the index.

 We uses to use that for cleaning up the old directories quickly. But now it
 does another optimize() even through the index is already optimized.

 Very strange.


 On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu chris...@gmail.com wrote:

 I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4
 or maybe 4.5, an explicit optimize(), without any parameters, it usually
 took 2 minutes for a 32 core cluster.

 However, in 4.6.1, the same call took about 1 hour. Checking the index
 modification time for each core shows 2 minutes gap if sorted.

 We are using a solrj client connecting to zookeeper. I found it is talking
 to a specific solr server A, and that server A is distributing the calls to
 all other solr servers. Here is the thread dump for this server A:

 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
 at

 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226)
 at

 org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195)
 at

 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250)
 at

 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)




 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



-- 
Regards,
Shalin Shekhar Mangar.


Re: Wiki edit rights

2014-03-20 Thread Shalin Shekhar Mangar
What's your wiki username?

On Fri, Mar 21, 2014 at 8:12 AM, William Bell billnb...@gmail.com wrote:
 PLease add me too.


 On Tue, Mar 18, 2014 at 8:33 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Done, thanks!

 On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson
 anders.gustafs...@pedago.fi wrote:
  Yes, please. My Wiki ID is Anders Gustafsson
 
  But yes, please, add the howto to Wiki. You will need to get your
  account whitelisted first (due to spammers), so send a separate email
  with your Apache wiki id and somebody will unlock you for editing.
 
  --
  Anders Gustafsson
  Engineer, CNI, CNE6, ASE
  Pedago, The Aaland Islands (N60 E20)
  www.pedago.fi
  phone +358 18 12060
  mobile +358 40506 7099
 
 




 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



-- 
Regards,
Shalin Shekhar Mangar.


Re: Memory + WeakIdentityMap

2014-03-20 Thread Shawn Heisey
On 3/20/2014 6:54 PM, Harish Agarwal wrote:
 I'm transitioning my index from a 3.x version to 4.6.  I'm running a large
 heap (20G), primarily to accomodate a large facet cache (~5G), but have
 been able to run it on 3.x stably.
 
 On 4.6.0 after stress testing I'm finding that all of my shards are
 spending all of their time in GC.  After taking a heap dump and analyzing,
 it appears that org.apache.lucene.util.WeakIdentityMap is using many Gs of
 memory.  Does anyone have any insight into which Solr component(s) use this
 and whether this kind of memory consumption is to be expected?

I can't really say what WeakIdentityMap is doing.  I can trace the only
usage in Lucene to MMapDirectory, but it doesn't make a lot of sense for
this to use a lot of memory, unless this is the source of the memory
misreporting that Java 7 seems to do with MMap.  See this message in a
recent thread on this mailing list:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c53285ca1.9000...@elyograg.org%3E

If you have a lot of facets, one approach for performance is to use
facet.method=enum so that your Java heap does not need to be super large.

This does not actually reduce the overall system memory requirements.
It just shifts the responsibility for caching to the operating system
instead of Solr, and requires that you have enough memory to put a
majority of the index into the OS disk cache.  Ideally, there would be
enough RAM for the entire index to fit.

http://wiki.apache.org/solr/SolrPerformanceProblems

Another option for facet memory optimization is docValues.  One caveat:
It is my understanding that the docValues content is the same as a
stored field.  Depending on your schema definition, this may be
different than the indexed values that facets normally use.  The
docValues feature also helps with sorting.

Thanks,
Shawn