SolrCloud (7.3) and Legacy replication slaves

2019-05-21 Thread Michael Tracey
Is it possible set up an existing SolrCloud cluster as the master for
legacy replication to a slave server or two?   It looks like another option
is to use Uni-direction CDCR, but not sure what is the best option in this
case.

-- 
Michael Tracey


Sort order, return the first 20 results, and the last 80 results

2019-02-12 Thread Michael Tracey
Hey all,  I'm interested returning 100 rows in a query, with a sort order on a 
tfloat field, but return the first 20 results, then the last 80 results.  I'd 
like to do this without two requests, to keep down requests per second.   Is 
there any way to do this in one query with function queries or another method?

Thanks, Michael

SolrCloud Nodes autoSoftCommit and (temporary) missing documents

2014-05-23 Thread Michael Tracey
Hey all,

I've got a number of nodes (Solr 4.4 Cloud) that I'm balancing with HaProxy for 
queries.  I'm indexing pretty much constantly, and have autoCommit and 
autoSoftCommit on for Near Realtime Searching.  All works nicely, except that 
occasionally the auto-commit cycles are far enough off that one node will 
return a document that another node doesn't.  I don't want to have to add 
something like this: timestamp:[* TO NOW-30MINUTE] to every query to make sure 
that all the nodes have the record.  Ideas? autoSoftCommit more often?

autoCommit 
   maxDocs10/maxDocs 
   maxTime720/maxTime 
   openSearcherfalse/openSearcher 
/autoCommit

autoSoftCommit 
   maxTime3/maxTime 
   maxDocs5000/maxDocs
/autoSoftCommit 

Thanks,

M.


Turning on KeywordRepeat and RemoveDups on an existing fieldType.

2014-05-05 Thread Michael Tracey
As per the stemming docs ( 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want 
to score the original term higher than the stemmed version by adding:

   filter class=solr.KeywordRepeatFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/

to a field type that is already created (with Stemming). I have 100M documents 
in this index, and it gets slowly reindexed every month as records change.  My 
question is, can I add this to the existing fieldType, or do I need to make a 
new fieldType, and copyField the data over to it, and after it's all reindexed 
switch my code?  I'd rather be able to just add the lines to my fieldType 
because I don't think I have enough disk space on my cloud members to hold my 
primary fulltext field twice.

Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks 
like this:

fieldType name=keywordText class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=keyword_stopwords.txt enablePositionIncrements=true /
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=keyword_stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
/fieldType

Thanks,

M.


Re: Problems bulk adding documents to Solr Cloud in 4.5.1

2013-11-19 Thread Michael Tracey
Dave, that's the exact symptoms we all have had in SOLR-5402.  After many 
attempted fixes (including upgrading jetty, switching to tomcat, messing with 
buffer settings) my solution was to fall back to 4.4 and await a fix.

- Original Message -
From: Dave Seltzer dselt...@tveyes.com
To: solr-user@lucene.apache.org
Sent: Monday, November 18, 2013 9:48:46 PM
Subject: Problems bulk adding documents to Solr Cloud in 4.5.1

Hello,

I'm having quite a bit of trouble indexing content in Solr Cloud. I build a
content indexer on top of the REST API designed to index my data quickly.
It was working very well indexing about 100 documents per add
instruction.

After some tweaking of the schema I switched on a few more servers. Set up
a few shards and started indexing data. Everything was working perfectly,
but as soon as I switched to Cloud I started getting
RemoteServerExceptions Illegal to have multiple roots.

I'm using the stock Jetty container on both servers.

To get things working I reduced the number of documents per add until it
worked. Unfortunately that has limited me to adding a single document per
add - which is quite slow.

I'm fairly sure it's not the size of the HTTP post because things were
working just fine until I moved over to Solr Cloud.

Does anyone have any information about this problem? It sounds a lot like
Sai Gadde's https://issues.apache.org/jira/browse/SOLR-5402

Thanks so much!

-Dave


qf match density?

2013-11-11 Thread Michael Tracey
While doing a search like:

q=great+gatsbydefType=edismaxqf=title^1.8

records with a title of great gatsby / great gatsby always score higher than 
great gatsby just a single time.

How do I express that a single match should be just as important as having the 
query match multiple times in the title field?

Thanks, m.


Is this a reasonable way to boost?

2013-11-07 Thread Michael Tracey
I'm trying to boost results slightly on a price (not currency) field that are 
closer to a certain value.  I want results that are not too expensive or too 
inexpensive to be favored.  Here is what we currently are trying:

bf=sub(1,abs(sub(15,price)))^0.2

where 15 is that median I want to boost towards.  Is this a good way?  I 
understand in older solr's it was common to use recip(ord()) for this but you 
shouldn't do so now.

Thanks for any comments or advice on improving this.

M.





SolrCloud (4.4) and CurrencyField refresh intervals

2013-11-04 Thread Michael Tracey
I've got a 4.4 solrCloud cluster running, and have an external process that 
rebuilds the currency.xml file and uploads to zookeeper the latest version 
every X minutes.

It looks like with CurrencyField the OpenExchangeRatesOrgProvider provider has 
a refreshInterval setting, but the documentation does not mention a 
refreshInterval on the FileExchangeRateProvider.  Is there a way to do this 
without reloading the whole core on each of the nodes after updating the rates? 
 (Ideally, I'd like the changes to be picked up at the next hard commit).

Thanks,

M.


Re: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?).

2013-10-28 Thread Michael Tracey
Hey, this is Michael, who was having the exact error on the Jetty side with an 
update.  I've upgraded jetty from the 4.5.1 embedded version (in the example 
directory) to version 9.0.6, which means I had to upgrade my OpenJDK from 1.6 
to 1.7.0_45.  Also, I added the suggested (very large) settings to my 
solrconfig.xml: 

requestParsers enableRemoteStreaming=true formdataUploadLimitInKB=2048000 
multipartUploadLimitInKB=2048000 /

but I am still getting the errors when I put a second server in the cloud. 
Single servers (external zookeeper, but no cloud partner) works just fine.

I suppose my next step is to try Tomcat, but according to your post, it will 
not help!

Any help is appreciated,

M.

- Original Message -
From: Sai Gadde gadde@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, October 28, 2013 7:10:41 AM
Subject: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag 
in epilog?).

we have a similar error as this thread.

http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html

Tried tomcat setting from this post. We used exact setting sepecified
here. we merge 500 documents at a time. I am creating a new thread
because Michael is using Jetty where as we use Tomcat.


formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very
high value 2GB. As suggested in the following thread.
https://issues.apache.org/jira/browse/SOLR-5331


We use out of the box Solr 4.5.1 no customization done. If we merge
documents via SolrJ to a single server it is perfectly working fine.


 But as soon as we add another node to the cloud we are getting
following while merging documents.



This is the error we are getting on the server (10.10.10.116 - IP is
irrelavent just for clarity)where merging is happening. 10.10.10.119
is the new node here. This server gets RemoteSolrException


shard update error StdNode:
http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Illegal to have multiple roots (start tag in epilog?).
 at [row,col {unknown-source}]: [1,12468]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)





On the other server 10.10.10.119 we get following error


org.apache.solr.common.SolrException: Illegal to have multiple roots
(start tag in epilog?).
 at [row,col {unknown-source}]: [1,12468]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at 

Re: Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)

2013-10-25 Thread Michael Tracey
Thanks Chris and Sai. I was hoping to use the standard jetty configuration 
(noting another thread on these forums indicating that it is the default and 
supported container). but will migrate to tomcat of needed.  Has anyone found a 
workaround that works with the standard container?

We are sending updates of around 1000 records at a time, about 500k for the 
whole json document. 

Sent from my iPhone

 On Oct 25, 2013, at 6:01 AM, Sai Gadde gadde@gmail.com wrote:
 
 We were trying to migrate to 4.5 from 4.0 and faced similar issue as well.
 I saw the ticket raised by Chris and tried setting formdataUploadLimitInKB
 to a higher value and which did not resolve this issue.
 
 We use Solr 4.0.0 currently and no additional container settings are
 required. But it is very strange since when I tested with a single instance
 there was no problem at all. How come it is so difficult for two Solr
 instances to communicate with each other! I except Solr cloud setup should
 be independent of container configuration.
 
 Anyway thanks Chris for the info we will try these tomcat settings and see
 if this issue goes away.
 
 
 On Fri, Oct 25, 2013 at 4:35 PM, Chris Geeringh geeri...@gmail.com wrote:
 
 Hi Michael,
 
 I opened that ticket, and it looks like there is indeed a buffer or limit I
 was exceeding. As per the ticket I guess the stream is cut off at that
 limit, and is then malformed. I am using Tomcat, and since increasing some
 limits on the connector, I haven't had any issues since. I'll close that
 ticket.
 
 Connector port=8080 protocol=HTTP/1.1
   connectionTimeout=6
   redirectPort=8443 maxPostSize=104857600
 maxHttpHeaderSize=819200 maxThreads=1/
 
 Hope that helps.
 
 Cheers,
 Chris
 
 
 On 25 October 2013 03:48, Michael Tracey mtra...@biblio.com wrote:
 
 Hey Solr-users,
 
 I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105
 million
 records) and a lot of daily churn of newly indexed files (auto softcommit
 and commits).  I'm trying to bring another matching node into the mix,
 and
 am getting these errors on the new node:
 
 org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: Illegal to have multiple roots
 (start
 tag in epilog?).
 
 On the old server, still running, I'm getting:
 
 shard update error StdNode: http://server1:
 /solr/collection/:org.apache.solr.client.solrj.SolrServerException:
 Server refused connection at: http://server2:/solr/collection
 
 the new core never actually comes online, stays in recovery mode.  The
 other two tiny cores (100,000+ records each and not updated frequently),
 work just fine.
 
 is this SOLR-4327 bug?  https://issues.apache.org/jira/browse/SOLR-5331
 And if so, how can I get the new node up and running so I can get back in
 production with some redundancy and speed?
 
 I'm running an external zookeeper, and that is all running just fine.
 Also internal Solrj/jetty with little to no modifications.
 
 Any ideas would be appreciated, thanks,
 
 M.
 


Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)

2013-10-24 Thread Michael Tracey
Hey Solr-users,

I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105 million 
records) and a lot of daily churn of newly indexed files (auto softcommit and 
commits).  I'm trying to bring another matching node into the mix, and am 
getting these errors on the new node:

org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: 
Illegal to have multiple roots (start tag in epilog?).

On the old server, still running, I'm getting: 

shard update error StdNode: 
http://server1:/solr/collection/:org.apache.solr.client.solrj.SolrServerException:
 Server refused connection at: http://server2:/solr/collection

the new core never actually comes online, stays in recovery mode.  The other 
two tiny cores (100,000+ records each and not updated frequently), work just 
fine.

is this SOLR-4327 bug?  https://issues.apache.org/jira/browse/SOLR-5331   And 
if so, how can I get the new node up and running so I can get back in 
production with some redundancy and speed?

I'm running an external zookeeper, and that is all running just fine.  Also 
internal Solrj/jetty with little to no modifications.  

Any ideas would be appreciated, thanks, 

M.


Controlling traffic between solr 4.1 nodes

2013-02-05 Thread Michael Tracey
Hey all, new to Solr 4.x, and am wondering if there is any way that I could 
have a single collection (single or multiple shards) replicated into two 
datacenters, where only 1 solr instance in each datacenter communicate.  (for 
example, 4 servers in one DC, 4 servers in another datacenter and only one in 
each DC communicate).

From everything I've seen, all zookeepers and replicas must have access to all 
other members.  Is there something I'm missing?

Thanks,

M.