SolrCloud Archecture recommendations + related questions

2012-08-06 Thread Greg Pendlebury
Hi All,

TL;DR version: We think we want to explore Lucene/Solr 4.0 and SolrCloud,
but I’m not sure if there is any good doco/articles on how to make
architecture choices for how to chop up big indexes… and what other general
considerations are part of the equation?



I’m throwing this post out to the public to see if any kind and
knowledgeable individuals could provide some educated feedback on the
options our team is currently considering for the future architecture of
our Solr indexes. We have a loose collection of Solr indexes, each with a
specific purpose and differing schemas and document makeup, containing just
over 300 million documents with varying degrees of full-text. Our existing
architecture is showing its age, as it is really just the setup used for
small/medium indexes scaled upwards.

The biggest individual index is around 140 million documents and currently
exists as a Master/Slave setup with the Master receiving all writes in the
background and the 3 load balanced slaves updating with a 5 minute poll
interval. The master index is 451gb on disk and the 3 slaves are running
JVMs with RAM allocations of 21gb (right now anyway).

We are struggling under the traffic load and/or scale of our indexes
(mainly the later I think). We know this isn’t the best way to run things,
but the index in question is a fairly new addition and each time we run
into issues we tend to make small changes to improve things in the short
term… like bumping the RAM allocation up, toying with poll intervals,
garbage collection config etc.

We’ve historically run into issues with facet queries generating a lot of
bloat on some types of fields. These had to be solved through internal
modifications, but I expect we’ll have to review this with the new version
anyway. Related to that, there are some question marks on generating good
facet data from a sharded approach. In particular though, we are really
struggling with garbage collection on the slave machines around the time
that the slave/master sync occurs because of multiple copies of the index
being held in memory until all searchers have de-referenced the old index.
The machines typically either crash from OOM when we occasionally have a
third and/or forth copy of the index appear because of really old searchers
not ‘letting go’ (hence we play with widening poll intervals), or they seem
to rarely become perpetually locked in GC and have to be restarted (not
100% why, but large heap allocations aren’t helping, and cache warming may
be a culprit).

The team has lots of things we want to try to improve things, but given the
scale of the systems it is very hard to just try things out without
considerable resourcing implications. The entire ecosystem is spread across
7 machines that are resourced in the 64gb-100gb of RAM range (this is just
me poking around our servers… not a thorough assessment). Each machine is
running several JVMs so that for each ‘type’ of index there are typically
2-4 load balanced slaves available at any given time. One of those machines
is exclusively used as the Master for all indexes and receives no search
traffic… just lots of write traffic.

I believe the answers to some of these are going to be very much dependent
on schemas and documents, so I don’t imagine anyone can answer the
questions better then we can after testing and benchmarking… but right now
we are still trying to choose where to start, so broad ideas are very
welcome.

The kind of things we are currently thinking about:

   - Moving to v4.0 (currently just completed our v3.5 upgrade) to take
   advantage of the reduced RAM consumption:
   https://issues.apache.org/jira/browse/LUCENE-2380 We are hoping that
   this has the double-whammy impact of improving garbage collection as well.
   Lots of full-text data should equal lots of Strings, and thus lots of
   savings from this change.
   - Moving to a basic sharded approach. We’ve only just started testing
   this, and I’m not involved, so I’m not sure on what early results we’ve
   got…. But:
   - Given that we’d like to move to v4.0, I believe this opens up the
   option of a SolrCloud implementation… my suspicion is that this is where
   the money is at… but I’d be happy to hear feedback (good or bad) from
   people that are using it in production.
   - Hardware; we are not certain that the current approach of a few
   colossal machines is any better that lots of smaller clustered machines…
   and it is prohibitively expensive to experiment here. We don’t think that
   our current setup using SSDs and fibre-channel connections would be
   creating too many bottlenecks on I/O, and rarely see other hardware related
   issues, but I’d again be curious if people have observed contradictory
   evidence. My suspicion is that with the changes above though, our current
   hardware would handle the load far better than it currently is.
   - Are there any sort of pros and cons documented out there for making
   decisions on sharding 

error message in solr logs

2012-08-06 Thread soni.s
Hi, we have a large lucene index base created using solr. Its split into 16
cores. Each core contains almost 10GB of indexes. We have deployed 8
instances of Solr hosting two cores each. The logic of identifying where the
document resides based on the document id, is built within the application.
There are other queries also which query all the cores on all the cores
accross solr instances because the query may not be based on document id. We
use SolrJ to connect to and query the indexes and get results.
We have more reads than writes overall. A document is inserted once and
updated a max of 2 times in a few days. But it could be potentially searched
10s of times in a day.

Lately we are noticing below exception in our solr logs. This happens
sometimes once or twice a day on a few cores.

SEVERE: org.apache.solr.common.SolrException: Invalid chunk header
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.ctc.wstx.exc.WstxIOException: Invalid chunk header
at
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:548)
at
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
at
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
at
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:68)
... 17 more
Caused by: java.io.IOException: Invalid chunk header
at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:133)
at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:710)
at org.apache.coyote.Request.doRead(Request.java:428)
at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:304)
at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:405)
at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:327)
at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:193)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)

The env consists of:

OS: Enterprise Linux 64 bit
Tomcat version: 6.0.26
solr version: 3.3.0
JDK: 1.6
Total number of solr documents: more than 20 Million.

Can someone please let me know what this is as googling around doesnt give
me much info. Overall i dont see much problem from the application's use but
i wanted to know what this error is and what could the impact be to the app
in future. Thanks for any help in advance.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-message-in-solr-logs-tp3999328.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: Special suggestions requirement

2012-08-06 Thread Lochschmied, Alexander
Is there anything you cannot do with Solr? :-)
Thanks a lot Erick! I only had to use . instead of ?, e.g.

...:8983/solr/terms?terms.fl=fieldnameterms.limit=100terms.prefix=abcdterms.regex.flag=case_insensitiveterms=trueterms.regex=abcd..

Adding terms.sort=index allows me even to sort as I need.

Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Samstag, 4. August 2012 20:11
An: solr-user@lucene.apache.org
Betreff: Re: Special suggestions requirement

Would it work to use TermsComponent with wildcards?
Something like terms.regex=ABCD42??...

see: http://wiki.apache.org/solr/TermsComponent/

Best
Erick


On Fri, Aug 3, 2012 at 9:07 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:
 I could be crazy, but it sounds to me like you need a trie, not a 
 search index: http://en.wikipedia.org/wiki/Trie

 But in any case, what you want to do should be achievable. It seems 
 like you need to do EdgeNgrams and facet on the results, where 
 facet.counts  1 to exclude the actual part numbers, since each of 
 those would be distinct.

 I'm on the train right now, so I can't test this. :\

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 
 www.appinions.com Where Influence Isn't a Game


 On Thu, Aug 2, 2012 at 9:19 PM, Lochschmied, Alexander 
 alexander.lochschm...@vishay.com wrote:
 Even with prefix query, I do not get ABCD02 or any ABCD02... back. BTW: 
 EdgeNGramFilterFactory is used on the field we are getting the 
 suggestions/spellchecks from.
 I think the problem is that there are a lot of different part numbers 
 starting with ABCD and every part number has the same length. I showed 
 only 4 in the example but there might be thousands.

 Here are some full part number examples that might be in the index:
 ABCD110040
 ABCD00
 ABCD99
 ABCD155500
 ...

 I'm looking for a way to make Solr return distinct list of fixed 
 length substrings of them, e.g. if ABCD is entered, I would need
 ABCD00
 ABCD01
 ABCD02
 ABCD03
 ...
 ABCD99

 Then if user chose ABCD42 from the suggestions, I would need
 ABCD4201
 ABCD4202
 ABCD4203
 ...
 ABCD4299

 and so on.

 I would be able to do some post processing if needed or adjust the schema 
 or indexing process. But the key functionality I need from Solr is returning 
 distinct set of those suggestions where only the last two characters change. 
 All of the available combinations of those last two characters must be 
 considered though. I need to show alpha-numerically sorted suggestions; the 
 smallest value first.

 Thanks,
 Alexander

 -Ursprüngliche Nachricht-
 Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
 Gesendet: Donnerstag, 2. August 2012 15:02
 An: solr-user@lucene.apache.org
 Betreff: Re: Special suggestions requirement

 In this case, we're storing the overall value length and sorting it on that, 
 then alphabetically.

 Also, how are your queries fashioned? If you're doing a prefix query, 
 everything that matches it should score the same. If you're only doing a 
 prefix query, you might need to add a term for exact matches as well to get 
 them to show up.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 
 www.appinions.com Where Influence Isn't a Game


 On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander 
 alexander.lochschm...@vishay.com wrote:
 Is there a way to offer distinct, alphabetically sorted, fixed length 
 options?

 I am trying to suggest part numbers and I'm currently trying to do it with 
 the spellchecker component.
 Let's say ABCD was entered and we have indexed part numbers like
 ABCD
 ABCD2000
 ABCD2100
 ABCD2200
 ...

 I would like to have 2 characters suggested always, so for ABCD, 
 it should suggest
 ABCD00
 ABCD20
 ABCD21
 ABCD22
 ...

 No smart sorting is needed, just alphabetically sorting. The problem is 
 that for example 00 (or ABCD00) may not be suggested currently as it 
 doesn't score high enough. But we are really trying to get all distinct 
 values starting from the smallest (up to a certain number of suggestions).

 I was looking already at custom comparator class option. But this would 
 probably not work as I would need more information to implement it there 
 (like at least the currently entered search term, ABCD in the example).

 Thanks,
 Alexander


read write solr shard setup

2012-08-06 Thread soni.s
Hi, i am trying to use a read/write solr setup. what i mean is that i would
have a common location for lucene indexes and configure one instance of solr
for reads and another instance to only write new indexes. Both the instances
pointing to the same index location. The approach is given here 
http://wiki.apache.org/solr/NearRealtimeSearchTuning.
http://wiki.apache.org/solr/NearRealtimeSearchTuning. . 
My question is: is there a way that i can read the documents from the
read-only instance without calling the empty 'commit()'?  I mean is there
some configuration i can change in solrconfig.xml or something?

I have the following configuration in solrconfig.xml
autoCommit 
   maxDocs1/maxDocs
   maxTime10/maxTime 
/autoCommit

But this doesnt seem to help the RO node to be able to read the
just-commited documents.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/read-write-solr-shard-setup-tp3999357.html
Sent from the Solr - User mailing list archive at Nabble.com.


Returning page numbers where match occurs

2012-08-06 Thread debdoot
Suppose, we are provisioning search over large text documents (e.g., Word,
PPT). It would be nice to have the highlighter component to return the page
numbers where the matches are found so that the same may be included in the
search result summaries. What is the most efficient way to accomplish this?

Thanks
Debdoot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Returning-page-numbers-where-match-occurs-tp3999370.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Returning page numbers where match occurs

2012-08-06 Thread Jack Krupansky
There is an old, open Jira, SOLR-380 - There's no way to convert search 
results into page-level hits of a structured document., but no recent 
activity on it. It does have a lot of interesting commentary though. I 
wouldn't get my hopes up.


See:
https://issues.apache.org/jira/browse/SOLR-380

The short answer is that you would have to re-parse the document yourself 
since Tika/POI called from SolrCell simply parses the document into a 
linear, unstructured stream of text, with no markers for pages. The SOLR-380 
Jira issue may give you some clues.


I do have a related question: Would you want strictly integer page numbers 
where the first page of any front matter is 1 or the actual literal page 
numbers (e.g. iii or A-1). The former is simpler but incorrect if the 
user thinks they can simply look for that page number in the document.


-- Jack Krupansky

-Original Message- 
From: debdoot

Sent: Monday, August 06, 2012 9:13 AM
To: solr-user@lucene.apache.org
Subject: Returning page numbers where match occurs

Suppose, we are provisioning search over large text documents (e.g., Word,
PPT). It would be nice to have the highlighter component to return the page
numbers where the matches are found so that the same may be included in the
search result summaries. What is the most efficient way to accomplish this?

Thanks
Debdoot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Returning-page-numbers-where-match-occurs-tp3999370.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Returning page numbers where match occurs

2012-08-06 Thread debdoot
Thanks a lot Jack for your prompt reply! The JIRA issue indeed talks about
what I want to accomplish. I will try out Tricia's solution.

As regards your question - whether I want real page numbers? Yes, ideally
I want to get real page numbers (and am willing to put in the additional
parsing effort to get those). For starters, even integer page numbers will
work - do you have a simpler solution in mind for this case (than the one
described in SOLR-380)?

The text from the documents (for which I want page numbers) are represented
in certain fields of the Solr/Lucene document that I index, i.e., a
many-to-one relation exists between the office documents and Solr documents
in my index.

Regards,
Debdoot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Returning-page-numbers-where-match-occurs-tp3999370p3999380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reg Default search field

2012-08-06 Thread Lakshmi Bhargavi
Hi ,

I have a question on the default search field defined in schema.xml or in
the later versions , specified as part of the search handlers. Do we always
need to have this default search field defined in order to do search if the
field name is not passed? 

Suppose , there is a field named 'Title' and it holds a value called 'solr'.  

In order to get results for this search -
http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on
,

 do I need to define default search field and copy the contents of the
specific field into this default search field ?

DefaultSearchFieldtext/defaultSearchField
copyField source=Title dest=text/

Thanks in advance!
Lakshmi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reg-Default-search-field-tp3999387.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reg Default search field

2012-08-06 Thread Erik Hatcher
Lakschmi - The field(s) used for querying needs to be specified somewhere, 
either as a default field or as a qf parameter to (e)dismax, etc.

Erik

On Aug 6, 2012, at 10:48 , Lakshmi Bhargavi wrote:

 Hi ,
 
 I have a question on the default search field defined in schema.xml or in
 the later versions , specified as part of the search handlers. Do we always
 need to have this default search field defined in order to do search if the
 field name is not passed? 
 
 Suppose , there is a field named 'Title' and it holds a value called 'solr'.  
 
 In order to get results for this search -
 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on
 ,
 
 do I need to define default search field and copy the contents of the
 specific field into this default search field ?
 
 DefaultSearchFieldtext/defaultSearchField
 copyField source=Title dest=text/
 
 Thanks in advance!
 Lakshmi
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Reg-Default-search-field-tp3999387.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Reg Default search field

2012-08-06 Thread Jack Krupansky
defaultSearchField is deprecated in Solr 3.6. It is still supported, but the 
df query request parameter overrides it. So, go into solrconfig.xml and 
change the df parameter value from text to Title.


-- Jack Krupansky

-Original Message- 
From: Lakshmi Bhargavi

Sent: Monday, August 06, 2012 10:48 AM
To: solr-user@lucene.apache.org
Subject: Reg Default search field

Hi ,

I have a question on the default search field defined in schema.xml or in
the later versions , specified as part of the search handlers. Do we always
need to have this default search field defined in order to do search if the
field name is not passed?

Suppose , there is a field named 'Title' and it holds a value called 'solr'.

In order to get results for this search -
http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on
,

do I need to define default search field and copy the contents of the
specific field into this default search field ?

DefaultSearchFieldtext/defaultSearchField
copyField source=Title dest=text/

Thanks in advance!
Lakshmi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reg-Default-search-field-tp3999387.html
Sent from the Solr - User mailing list archive at Nabble.com. 



[ANNOUNCE] Lucene/Solr @ ApacheCon Europe - August 13th Deadline for CFP and Travel Assistance applications

2012-08-06 Thread Chris Hostetter


ApacheCon Europe will be happening 5-8 November 2012 in Sinsheim, Germany 
at the Rhein-Neckar-Arena.  Early bird tickets go on sale this Monday, 6 
August.


  http://www.apachecon.eu/

The Lucene/Solr track is shaping up to be quite impressive this year, so 
make your plans to attend and submit your session proposals ASAP!


-- CALL FOR PAPERS --

The Call for Participation for ApacheCon Europe has been extended to 13 
August!


To submit a presentation and for more details, visit 
http://www.apachecon.eu/cfp/


Post a banner on your Website to show your support for ApacheCon Europe or 
North America (24-28 February 2013 in Portland, OR)! Download at 
http://www.apache.org/events/logos-banners/


We look forward to seeing you!

 -the Apache Conference Committee  ApacheCon Planners

--- TRAVEL ASSISTANCE ---

We're pleased to announce Travel Assistance (TAC) applications for 
ApacheCon Europe 2012 are now open!


The Travel Assistance Committee exists to help those that would like to 
attend ApacheCon events, but are unable to do so for financial reasons. 
For more info on this years Travel Assistance application criteria please 
visit the TAC website at  http://www.apache.org/travel/ .


Some important dates... The original application period officially opened 
on 23rd July, 2012. Applicants have until the 13th August 2012 to submit 
their applications (which should contain as much supporting material as 
required to efficiently and accurately process your request), this will 
enable the Travel Assistance Committee to announce successful awards on or 
shortly after the 24th August, 2012.


As always TAC expects to deal with a range of applications from many 
diverse backgrounds so we encourage (as always) anyone thinking about 
sending in a TAC application to get it in ASAP.


We look forward to greeting everyone in Sinsheim, Germany in November.




Re: Stopping replication?

2012-08-06 Thread csscouter
Erick,

Thank you for the courtesy of your reply.

I was able to figure out the problem, and for the benefit of the list, I
list the analysis. Judging by the caliber of those on this list, this is
likely too basic for the interests of most, but newbies (among whom I still
classify myself) might benefit. Here's what occurred:

Recall that the version I'm using is 3.3. I don't know if these comments can
extend to versions other than 3.3, but I suspect so.

I noted in my initial plea: /I seem to recall that the slaves USED TO
 say Solr Replication  Slave./ It turns out that is indeed the case, and
 that was a clue that they weren't being recognized as slave servers. The
 file solrconfig.xml contains the configuration setup for replication,
 under the entry requestHandler name=quot;/replication ...
 lt;/requestHandler. A slave knows it's a slave by the following entry:


  true
  http://://replication
  00:00:60


The key here is the line true. There is at least one fancy way to define
trueness or falseness - by defining the value as a parameter, and
passing the resolution to the parameter in to solr when it starts. The
reason for using this technique is to allow a single solrconfig.xml file to
be deployed to all servers running solr, and then configuring those servers
as slaves or the master at the time the servers start. (The information on
doing this is in the solr wiki documentation for Replication at
http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node,
incidentally).

In my case, I'm running solr under WebLogic 10.3.2 application server. I had
defined the line:

  true

as:

  ${org.apache.solr.handler.enable.slave:false}

in my solrconfig.xml, and had been starting the WebLogic managed servers
with the parameter -Dorg.apache.solr.handler.enable.master=false. Note
that this parameter deals with the *master* and not the slave. This was
working in my existing environment, and despite the fact that no
-Dorg.apache.solr.handler.enable.slave=true parameter was being passed in
from WebLogic, the slaves were able to recognize themselves as slaves. In
the new WebLogic environment, this was no longer the case. I don't know why
at this point.

To solve the problem for the short term, I created a separate file for the
slave servers that bypassed the whole parameter-resolution mechanism by
defining that line under the slave configuration in its solrconfig.xml as:

  true

That, of course, now leaves me with 2 solrconfig.xml files - one for the
master server, and one for the slave servers. My bottom line is that at
least it's now working, people are not being impacted, and I can
troubleshoot the underlying issue at a more leisurely pace.

Hope this helps someone, somewhere. Erick, thanks for taking an interest.

Tim Hibbs



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stopping-replication-tp3999272p3999445.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stopping replication?

2012-08-06 Thread csscouter
Erick,

Thank you for the courtesy of your reply.

I was able to figure out the problem, and for the benefit of the list, I
list the analysis. Judging by the caliber of those on this list, this is
likely too basic for the interests of most, but newbies (among whom I still
classify myself) might benefit. Here's what occurred:

Recall that the version I'm using is 3.3. I don't know if these comments can
extend to versions other than 3.3, but I suspect so.

I noted in my initial plea: /I seem to recall that the slaves USED TO
 say Solr Replication name Slave./ It turns out that is indeed the
 case, and that was a clue that they weren't being recognized as slave
 servers. The file solrconfig.xml contains the configuration setup for
 replication, under the entry requestHandler name=quot;/replication ...
 lt;/requestHandler. A slave knows it's a slave by the following entry:

lst name=slave
  str name=enabletrue/str
  str name=masterUrlhttp://host:port/solr home location, in my case
'apache-solr-3.3.0'/replication/str
  str name=pollInterval00:00:60/str
/lst

The key here is the line str name=enabletrue/str. There is at least
one fancy way to define trueness or falseness - by defining the value
as a parameter, and passing the resolution to the parameter in to solr when
it starts. The reason for using this technique is to allow a single
solrconfig.xml file to be deployed to all servers running solr, and then
configuring those servers as slaves or the master at the time the servers
start. (The information on doing this is in the solr wiki documentation for
Replication at
http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node,
incidentally).

In my case, I'm running solr under WebLogic 10.3.2 application server. I had
defined the line:

  str name=enabletrue/str

as:

  str name=enable${org.apache.solr.handler.enable.slave:false}/str

in my solrconfig.xml, and had been starting the WebLogic managed servers
with the parameter -Dorg.apache.solr.handler.enable.master=false. Note
that this parameter deals with the *master* and not the slave. This was
working in my existing environment, and despite the fact that no
-Dorg.apache.solr.handler.enable.slave=true parameter was being passed in
from WebLogic, the slaves were able to recognize themselves as slaves. In
the new WebLogic environment, this was no longer the case. I don't know why
at this point.

To solve the problem for the short term, I created a separate file for the
slave servers that bypassed the whole parameter-resolution mechanism by
defining that line under the slave configuration in its solrconfig.xml as:

  str name=enabletrue/str

That, of course, now leaves me with 2 solrconfig.xml files - one for the
master server, and one for the slave servers. My bottom line is that at
least it's now working, people are not being impacted, and I can
troubleshoot the underlying issue at a more leisurely pace.

Hope this helps someone, somewhere. Erick, thanks for taking an interest.

Tim Hibbs



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stopping-replication-tp3999272p3999447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Running out of memory

2012-08-06 Thread Michael Della Bitta
You might want to look at turning down or eliminating your caches if
you're running out of RAM. Possibly some of them have a low hit rate,
which you can see on the Stats page. Caches with a low hit rate are
only consuming RAM and CPU cycles.

Also, using this JVM arg might reduce the memory footprint:
-XX:+UseCompressedOops

In the end though, the surefire solution would be to go to an instance
type with more RAM: http://www.ec2instances.info/

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 6, 2012 at 1:48 PM, Jon Drukman jdruk...@gmail.com wrote:
 Hi there.  I am running Solr 1.4.1 on an Amazon EC2 box with 7.5GB of RAM.
  It was set up about 18 months ago and has been largely trouble-free.
  Unfortunately, lately it has started to run out of memory pretty much
 every day.  We are seeing

 SEVERE: java.lang.OutOfMemoryError: Java heap space

 When that happens, a simple query like
 http://localhost:8983/solr/select?q=*:*'
 returns nothing.

 I am starting Solr with the following:

 /usr/lib/jvm/jre/bin/java -XX:+UseConcMarkSweepGC -Xms1G -Xmx5G -jar
 start.jar

 It would be vastly preferable if Solr could just exit when it gets a memory
 error, because we have it running under daemontools, and that would cause
 an automatic restart.  After restarting, Solr works fine for another 12-18
 hours.  Not ideal but at least it wouldn't require human intervention to
 get it going again.

 What can I do to reduce the memory pressure?  Does Solr require the entire
 index to fit in memory at all times?  The on disk size is 15GB.  There are
 27.5 million documents, but they are all tiny (mostly one line forum
 comments like this game is awesome).

 We're using Sun openJava SDK 1.6 if that matters.

 -jsd-


Multiple Embedded Servers Pointing to single solrhome/index

2012-08-06 Thread Bing Hua
Hi,

I'm trying to use two embedded solr servers pointing to a same solrhome /
index. So that's something like

System.setProperty(solr.solr.home, SomeSolrDir);
CoreContainer.Initializer initializer = new
CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
m_server = new EmbeddedSolrServer(coreContainer, );

on both applications. The problem is, after I have done one add+commit
SolrInputDocument on one embedded server, the other server would fail to
obtain write lock any more. I'm thinking there must be a way of releasing
write lock so other servers may pick up. Is there an API that does so?

Any inputs are appreciated.
Bing


Two questions on spellchecking

2012-08-06 Thread Uwe Reh

Hi,

even though I read a lot, none of my spellchecker configurations works 
really well. I reached a dead end. Maybe someone could help, to solve my 
challenges.


- How can I get case sensitive suggestions, independent of the given 
case in the query?


- How to configure a 'did you mean' spellchecking, as discussed in 
https://issues.apache.org/jira/browse/SOLR-2585 (Context-Sensitive 
Spelling Suggestions  Collations)



I'm using following environment:
- Solr 4.0-alpha (downloaded 25. June)
- Java 7
- schema.xml

 fieldType name=textSuggest class=solr.TextField 
positionIncrementGap=100
 analyzer
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
 /analyzer
  /fieldType

 ...

  field name=suggest type=textSuggest indexed=true  stored=true 
required=false multiValued=true  /

- solrconfig.xml (suggester)

   requestHandler name=/hint 
class=org.apache.solr.handler.component.SearchHandler
  lst name=defaults
 str name=echoParamsall/str
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggester/str
 str name=spellcheck.extendedResultstrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.count20/str
  /lst
  arr name=components
 strsuggester/str
  /arr
   /requestHandler
   searchComponent name=suggester class=solr.SpellCheckComponent
  lst name=spellchecker
 str name=namesuggester/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldsuggest/str
  /lst
   /searchComponent

- solrconfig.xml (spellcheck)

  requestHandler name=standard class=solr.StandardRequestHandler 
default=true
  lst name=defaults
 str name=echoParamsall/str
 int name=rows10/int
 str name=dfallfields/str
 str name=spellcheck.extendedResultstrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.count20/str
  /lst
  arr name=last-components
 strspellcheck/str
  /arr
   /requestHandler

searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetextSpell/str
  lst name=spellchecker
 str name=namedefault/str
 str name=fieldsuggest/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=distanceMeasureinternal/str
 float name=accuracy0.1/float
 int name=maxEdits2/int
 int name=minPrefix1/int
 int name=maxInspections5/int
 int name=minQueryLength1/int
 float name=maxQueryFrequency0.1/float
 float name=thresholdTokenFrequency0.001/float
  /lst
   /searchComponent


*Suggester problem*
With this configuration the suggester works not case sensitive, but the 
hints are all lower case.

Example: .../hint?q=dawt=xmlspellcheck=truespellcheck.build=true

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint name=QTime173/intlst name=paramsstr name=spellchecktrue/strstr name=echoParamsall/strstr name=spellcheck.extendedResultstrue/strstr name=spellcheck.dictionarysuggester/strstr name=spellcheck.count20/strstr name=spellcheck.onlyMorePopularfalse/strstr name=spellchecktrue/strstr name=qda/strstr 
name=wtxml/strstr name=spellcheck.buildtrue/str/lst/lststr name=commandbuild/strlst name=spellchecklst name=suggestionslst name=daint name=numFound20/intint name=startOffset0/intint name=endOffset2/intarr name=suggestionstrdat-marktspiegel spezial/strstrdata structures with c++ using stl/strstrdata warehouse/strstrdatan, 
ingeborg/strstrdatenbanken mit delphi/strstrdatenverschlüsselung/strstrdauner, gabriele/strstrdautermann, margit/strstrdavid copperfield/strstrdavid, horst/strstrdav

id, leo/strstrdavid, nicholas/strstrdavis, charles t./strstrdavis, edward l/strstrdavis, leslie dorfman/strstrdavis, stanley m./strstrdavor 
kommt noch/strstrdavydova, irina n./strstrdawidowski, bernd/strstrdayan, daniel/str/arr/lstbool 
name=correctlySpelledfalse/bool/lst/lst

/response
Using just solr.StrField as field type, the suggestion are true to 
original capitalization, but I get no suggestions, if the query starts 
with a lower case character.


*Spelling problem*
One of the indexed entries in the field 'suggest' is David Copperfield 
and I want this string as alternative suggestion to the query David 
opperfield.

Example .../select?q=david+opperfieldrows=0wt=xmlspellcheck=true

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint name=QTime15/intlst name=paramsstr name=dfallfields/strstr name=echoParamsall/strstr name=spellcheck.extendedResultstrue/strstr name=spellcheck.count20/strstr name=spellcheck.onlyMorePopularfalse/strstr 
name=rows0/strstr name=spellchecktrue/strstr name=qdavid opperfield/strstr 

Re: Trending topics?

2012-08-06 Thread Otis Gospodnetic
Chris,

I'm not sure if Solr by itself can really do this (easily and/or well).
Have a look 
at http://sematext.com/products/key-phrase-extractor/index.html which can do 
exactly that, but without Solr.  Some of the highlighted bits refer to trending 
topics, though not using exactly that terminology.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Chris Dawson xrdaw...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Thursday, August 2, 2012 11:34 AM
Subject: Trending topics?
 
How would I generate a list of trending topics using solr?

Chris




Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-06 Thread Lance Norskog
Where is the common index? On NFS?

If it is on a native hard disk (on the same computer) Solr uses the
file locking mechanism supplied by the operating system (Linux or
Windows). This may not be working right. See this for more info on
file locking:
http://wiki.apache.org/lucene-java/AvailableLockFactories

On Mon, Aug 6, 2012 at 10:56 AM, Bing Hua bh...@cornell.edu wrote:
 Hi,

 I'm trying to use two embedded solr servers pointing to a same solrhome /
 index. So that's something like

 System.setProperty(solr.solr.home, SomeSolrDir);
 CoreContainer.Initializer initializer = new
 CoreContainer.Initializer();
 CoreContainer coreContainer = initializer.initialize();
 m_server = new EmbeddedSolrServer(coreContainer, );

 on both applications. The problem is, after I have done one add+commit
 SolrInputDocument on one embedded server, the other server would fail to
 obtain write lock any more. I'm thinking there must be a way of releasing
 write lock so other servers may pick up. Is there an API that does so?

 Any inputs are appreciated.
 Bing



-- 
Lance Norskog
goks...@gmail.com


Re: Problem with Solr 4.0-ALPHA and JSON response

2012-08-06 Thread Sami Siren
On Fri, Jul 27, 2012 at 6:32 PM, Federico Valeri fedeval...@gmail.com wrote:
 Hi all,

Hi,

 I'm new to Solr, I have a problem with JSON format, this is my Java
 client code:


The java client (SolrServer) can only operate with xml or javabin
format. If you need to get the json response from Solr by using java
you could just use a http client directly and bypass the solr client.

 Now the problem is that I recieve the response but it doesn't trigger the
 javascript callback function.
 I see wt=javabin in SolrCore.execute log, even if I set wt=json in
 paramters, is this normal?

yes, to control the format used by the client there's a method
HttpSolrServer#setParser that set's the client parser (that also
overrides the wt param when the request is made)

--
 Sami Siren