Which fields matched?

2012-12-07 Thread Jeff Wartes
If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like:

RE: Which fields matched?

2012-12-07 Thread Jeff Wartes
-text Lucene Explanation object for the query and then traverse it to get your matched field list without all the text. No parsed would be required, but the Explanation structure could get messy. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Friday, December 07, 2012 11:59 AM

RE: Which fields matched?

2012-12-11 Thread Jeff Wartes
-1999 sounds almost the same, but I never looked into the source. On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote: Thanks, I did start to dig into how DebugComponent does its thing a little, and I'm not all the way down the rabbit hole yet, but the lucene

RE: Solr load balancer

2013-01-31 Thread Jeff Wartes
For what it's worth, Google has done some pretty interesting research into coping with the idea that particular shards might very well be busy doing something else when your query comes in. Check out this slide deck: http://research.google.com/people/jeff/latency.html Lots of interesting

Can't mix Synonyms with Shingles?

2011-08-10 Thread Jeff Wartes
on the original tokens, as they do if I remove the ShingleFilterFactory. I'm using Solr 3.3, any clarification would be appreciated. Thanks, -Jeff Wartes

RE: Can't mix Synonyms with Shingles?

2011-08-10 Thread Jeff Wartes
InternationalCorporation. If this is the form you want to use for synonym matching, it must exist in your synonym file. Does it? Steve -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: Wednesday, August 10, 2011 3:43 PM To: solr-user@lucene.apache.org Subject: Can't mix

RE: Can't mix Synonyms with Shingles?

2011-08-10 Thread Jeff Wartes
filter. -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: Wednesday, August 10, 2011 1:27 PM To: solr-user@lucene.apache.org Subject: RE: Can't mix Synonyms with Shingles? Hi Steven, The token separator was certainly a deliberate choice, are you saying

DistributedSearchDesign and multiple requests

2010-10-21 Thread Jeff Wartes
of shortcuts difficult before I dig in. Thanks, -Jeff Wartes

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Jeff Wartes
For what it's worth, I had the same question last year, and I never really got a good solution: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3C81 e9a7879c550b42a767f0b86b2b81591a15b...@ex4.corp.w3data.com%3E I dug into the highlight component for a while, but it turned

TermFrequency in a multi-valued field

2013-08-07 Thread Jeff Wartes
This might end up being more of a Lucene question, but anyway... For a multivalued field, it appears that term frequency is calculated as something a little like: sum(tf(value1), ..., tf(valueN)) I'd rather my score not give preference based on how *many* of the values in the multivalued field

Re: TermFrequency in a multi-valued field

2013-08-07 Thread Jeff Wartes
A multivalued text field is directly equivalent to concatenating the values, with a possible position gap between the last and first terms of adjacent values. That, in a nutshell, would be the problem. Maybe the discussion is over at this point. It could be I dumbed down the problem a bit

Distance sort on a multi-value field

2013-08-14 Thread Jeff Wartes
I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. Somewhat surprisingly I don't see this in the documentation

Re: Distance sort on a multi-value field

2013-08-14 Thread Jeff Wartes
. You're right, that doesn't look like something I can easily use for more general aggregate scoring control. Ah well. On 8/14/13 12:35 PM, Smiley, David W. dsmi...@mitre.org wrote: On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote: I'm still pondering aggregate-type operations

Re: Distance sort on a multi-value field

2013-08-22 Thread Jeff Wartes
, Jeff Wartes lt; jwartes@ gt; wrote: Hm, Give me all the stores that only have branches in this area might be a plausible use case for farthest distance. That's essentially a contains question though, so maybe that's already supported? I guess it depends on how contains/intersects/etc

Required local configuration with ZK solr.xml?

2014-01-28 Thread Jeff Wartes
It was my hope that storing solr.xml would mean I could spin up a Solr node pointing it to a properly configured zookeeper ensamble, and that no further local configuration or knowledge would be necessary. However, I’m beginning to wonder if that’s sufficient. It’s looking like I may also

Re: Required local configuration with ZK solr.xml?

2014-01-29 Thread Jeff Wartes
...the differnce between that example and what you are doing here is that in that example, because both of nodes already had collection1 instance dirs, they expected to be part of collection1 when they joined the cluster. And that, I think, is my misunderstanding. I had assumed that the link

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes
Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each node will behave accordingly to implement and maintain that truth. I can't seem to locate a Jira issue for it, unfortunately. It's possible that one doesn't exist yet, or that it has an obscure title.

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes
Found it. In case anyone else cares, this appears to be the root issue: https://issues.apache.org/jira/browse/SOLR-5128 Thanks again. On 1/30/14, 9:01 AM, Jeff Wartes jwar...@whitepages.com wrote: Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each

Re: SolrCloud how to spread out to multiple nodes

2014-02-10 Thread Jeff Wartes
If you¹re only concerned with moving your shards, (rather than changing the number of shards), I¹d: 1. Add a new server and fire up Solr pointed to the same ZooKeeper with the same config At this point the new server won¹t be indexing anything, but will still technically be part of the

handleSelect=true with SolrCloud

2014-02-11 Thread Jeff Wartes
I’m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0 at present.) The old query style relied on using /solr/select?qt=foo to select the proper requestHandler. I know handleSelect=true is deprecated now, but it’d be pretty handy for testing to be able to be backwards

Re: handleSelect=true with SolrCloud

2014-02-11 Thread Jeff Wartes
Got it in one. Thanks! On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote: On 2/11/2014 10:21 AM, Jeff Wartes wrote: I¹m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0 at present.) The old query style relied on using /solr/select?qt=foo to select the proper

ZK connection problems

2014-02-21 Thread Jeff Wartes
I’ve been experimenting with SolrCloud configurations in AWS. One issue I’ve been plagued with is that during indexing, occasionally a node decides it can’t talk to ZK, and this disables updates in the pool. The node usually recovers within a second or two. It’s possible this happens when I’m

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-24 Thread Jeff Wartes
I¹ll second that thank-you, this is awesome. I asked about this issue in 2010, but when I didn¹t hear anything (and disappointingly didn¹t find SOLR-1880), we ended up rolling our own version of this functionality. I¹ve been laboriously migrating it every time we bump our Solr version ever

Re: SolrCloud Startup

2014-02-24 Thread Jeff Wartes
There is a RELOAD collection command you might try: https://cwiki.apache.org/confluence/display/solr/Collections+API#Collection sAPI-api2 I think you¹ll find this a lot faster than restarting your whole JVM. On 2/24/14, 4:12 PM, KNitin nitin.t...@gmail.com wrote: Hi I have a 4 node

Re: Automate search results filtering based on scoring

2014-03-05 Thread Jeff Wartes
It¹s worth mentioning that scores should not be considered comparable across queries, so equating ³confidence² and ³score² is a tricky proposition. That is, the maxScore for the search field1:foo may be 10.0, and the maxScore for ³field1:bar² may be 1.0, but that doesn¹t mean the top result for

Re: Result merging takes too long

2014-03-17 Thread Jeff Wartes
This is highly anecdotal, but I tried SOLR-1880 with 4.7 for some tests I was running, and saw almost a 30% improvement in latency. If you¹re only doing document selection, it¹s definitely worth having. I¹m reasonably certain that the patch would work in 4.6 too, but the test file relies on some

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Jeff Wartes
Please note that although the article talks about the ADDREPLICA command, that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find it yet. See https://issues.apache.org/jira/browse/SOLR-5130 On 3/20/14, 7:45 AM, Erick Erickson erickerick...@gmail.com wrote: You might find

Re: Logging which client connected to Solr

2014-03-27 Thread Jeff Wartes
You could always just pass the username as part of the GET params for the query. Solr will faithfully ignore and log any parameters it doesn¹t recognize, so it¹d show up in your {lot of params}. That means your log parser would need more intelligence, and your client would have to pass in the

Re: svn vs GIT

2014-04-14 Thread Jeff Wartes
I vastly prefer git, but last I checked, (admittedly, some time ago) you couldn't build the project from the git clone. Some of the build scripts assumed some svn commands will work. On 4/12/14, 3:56 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Amon; There has been a conversation about

Re: svn vs GIT

2014-04-15 Thread Jeff Wartes
. Aiyengar wrote: ant compile / ant -f solr dist / ant test certainly work, I use them with a git working copy. You trying something else? On 14 Apr 2014 19:36, Jeff Wartes jwar...@whitepages.com wrote: I vastly prefer git, but last I checked, (admittedly, some time ago) you couldn't build

Re: timeAllowed in not honoring

2014-04-30 Thread Jeff Wartes
It¹s not just FacetComponent, here¹s the original feature ticket for timeAllowed: https://issues.apache.org/jira/browse/SOLR-502 As I read it, timeAllowed only limits the time spent actually getting documents, not the time spent figuring out what data to get or how. I think that means the

Re: When not to use NRTCachingDirectory and what to use instead.

2014-04-30 Thread Jeff Wartes
On 4/19/14, 6:51 AM, Ken Krugler kkrugler_li...@transpac.com wrote: The code I see seems to be using an FSDirectory, or is there another layer of wrapping going on here? return new NRTCachingDirectory(FSDirectory.open(new File(path)), maxMergeSizeMB, maxCachedMB); I was also curious

Re: Strategy for removing an active shard from zookeeper

2014-07-03 Thread Jeff Wartes
To expand on that, the Collections API DELETEREPLICA command is availible in Solr = 4.6, but will not have the ability wipe the disk until Solr 4.10. Note that whether or not it deletes anything from disk, DELETEREPLICA will remove that replica from your cluster state in ZK, so even in 4.10,

Re: Listening on SolrCloud events

2014-07-03 Thread Jeff Wartes
If you¹re using SolrJ, CloudSolrServer exposes the information you need directly, although you¹d have to poll it for changes. Specifically, this code path will get you a snapshot of the clusterstate: http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/client/solrj

SolrCloud extended warmup support

2014-07-21 Thread Jeff Wartes
I’d like to ensure an extended warmup is done on each SolrCloud node prior to that node serving traffic. I can do certain things prior to starting Solr, such as pump the index dir through /dev/null to pre-warm the filesystem cache, and post-start I can use the ping handler with a health check

Re: SolrCloud extended warmup support

2014-07-21 Thread Jeff Wartes
On 7/21/14, 4:50 PM, Shawn Heisey s...@elyograg.org wrote: On 7/21/2014 5:37 PM, Jeff Wartes wrote: I¹d like to ensure an extended warmup is done on each SolrCloud node prior to that node serving traffic. I can do certain things prior to starting Solr, such as pump the index dir through /dev

Re: SolrCloud extended warmup support

2014-07-24 Thread Jeff Wartes
the primary, secondary etc. sorts will fill those caches. Best, Erick On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes jwar...@whitepages.com wrote: On 7/21/14, 4:50 PM, Shawn Heisey s...@elyograg.org wrote: On 7/21/2014 5:37 PM, Jeff Wartes wrote: I¹d like to ensure an extended warmup is done

Re: SolrCloud extended warmup support

2014-07-25 Thread Jeff Wartes
It¹s a command like this just prior to jetty startup: find -L solrhome dir -type f -exec cat {} /dev/null \; On 7/24/14, 2:11 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Jeff Wartes [jwar...@whitepages.com] wrote: Well, I¹m not sure what to say. I¹ve been observing a noticeable

Re: SOLR cloud creating multiple copies of the same index

2014-07-25 Thread Jeff Wartes
Looks to me like you are, or were, hitting the replication handler¹s backup function: http://wiki.apache.org/solr/SolrReplication#HTTP_API ie, http://master_host:port/solr/replication?command=backup You might not have been doing it explicitly, there¹s some support for a backup being triggered

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Jeff Wartes
I¹m able to do cross-solrcloud-cluster index copy using nothing more than careful use of the ³fetchindex² replication handler command. I¹m using this as a build/deployment tool, so I manually create a collection in two clusters, index into one, test, and then ask the other cluster to fetchindex

Re: Replicating Between Solr Clouds

2014-08-19 Thread Jeff Wartes
I¹ve been working on this tool, which wraps the collections API to do more advanced cluster-management operations: https://github.com/whitepages/solrcloud_manager One of the operations I¹ve added (copy) is a deployment mechanism that uses the replication handler¹s snap puller to hot-load a

Re: How to restore an index from a backup over HTTP

2014-08-20 Thread Jeff Wartes
Message - From: Jeff Wartes jwar...@whitepages.com To: solr-user@lucene.apache.org Sent: Monday, August 18, 2014 9:49:28 PM Subject: Re: How to restore an index from a backup over HTTP I¹m able to do cross-solrcloud-cluster index copy using nothing more than careful use of the ³fetchindex

Re: Solr API for getting shard's leader/replica status

2014-09-08 Thread Jeff Wartes
I had a similar need. The resulting tool is in scala, but it still might be useful to look at. I had to work through some of those same issues: https://github.com/whitepages/solrcloud_manager From a clusterstate perspective, I mostly cared about active vs non-active, so here¹s a sample output

Re: Solr Sharding Help

2014-09-08 Thread Jeff Wartes
You need to specify a replication factor of 2 if you want two copies of each shard. Solr doesn¹t ³auto fill² available capacity, contrary to the misleading examples on the http://wiki.apache.org/solr/SolrCloud page. Those examples only have that behavior because they ask you to copy the examples

Re: replica recovery

2015-10-27 Thread Jeff Wartes
On the face of it, your scenario seems plausible. I can offer two pieces of info that may or may not help you: 1. A write request to Solr will not be acknowledged until an attempt has been made to write to all relevant replicas. So, B won’t ever be missing updates that were applied to A, unless

Re: Facet queries blow out the filterCache

2015-10-28 Thread Jeff Wartes
FWIW, since it seemed like there was at least one bug here (and possibly more), I filed https://issues.apache.org/jira/browse/SOLR-8171 On 10/6/15, 3:58 PM, "Jeff Wartes" <jwar...@whitepages.com> wrote: > >I dug far enough yesterday to find the GET_DOCSET, but not f

Re: copy data between collection

2015-10-26 Thread Jeff Wartes
The “copy” command in this tool automatically does what Upayavira describes, including bringing the replicas up to date. (if any) https://github.com/whitepages/solrcloud_manager I’ve been using it as a mechanism for copying a collection into a new cluster (different ZK), but it should work

Re: are there any SolrCloud supervisors?

2015-10-14 Thread Jeff Wartes
I’m aware of two public administration tools: This was announced to the list just recently: https://github.com/bloomreach/solrcloud-haft And I’ve been working in this: https://github.com/whitepages/solrcloud_manager Both of these hook the Solrcloud client’s ZK access to inspect the cluster state

Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-20 Thread Jeff Wartes
If you’re using AWS, there’s this: https://github.com/LucidWorks/solr-scale-tk If you’re using chef, there’s this: https://github.com/vkhatri/chef-solrcloud (There are several other chef cookbooks for Solr out there, but this is the only one I’m aware of that supports Solr 5.3.) For ZK, I’m

Re: Facet queries blow out the filterCache

2015-10-06 Thread Jeff Wartes
I dug far enough yesterday to find the GET_DOCSET, but not far enough to find why. Thanks, a little context is really helpful sometimes. So, starting with an empty filterCache... http://localhost:8983/solr/techproducts/select?q=name:foo=1=true =popularity New values: lookups: 0,

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager supports 5.x, and I added some backup/restore functionality similar to SOLR-5750 in the last release. Like SOLR-5750, this backup strategy requires a shared filesystem, but note that unlike SOLR-5750, I haven’t yet added any backup functionality

Re: Cached fq decreases performance

2015-09-04 Thread Jeff Wartes
On 9/4/15, 7:06 AM, "Yonik Seeley" wrote: > >Lucene seems to always be changing it's execution model, so it can be >difficult to keep up. What version of Solr are you using? >Lucene also changed how filters work, so now, a filter is >incorporated with the query like so: >

Re: Cached fq decreases performance

2015-09-03 Thread Jeff Wartes
Tokenizers, Filters, URPs and even a newsletter: >http://www.solr-start.com/ > > >On 3 September 2015 at 16:45, Jeff Wartes <jwar...@whitepages.com> wrote: >> >> I have a query like: >> >> q==enabled:true >> >> For purposes of this conversation

Cached fq decreases performance

2015-09-03 Thread Jeff Wartes
I have a query like: q==enabled:true For purposes of this conversation, "fq=enabled:true" is set for every query, I never open a new searcher, and this is the only fq I ever use, so the filter cache size is 1, and the hit ratio is 1. The fq=enabled:true clause matches about 15% of my

Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud index on fields like this:

Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
; wrote: >what if you set f.city.facet.limit=-1 ? > >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com> >wrote: > >> >> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud >> index on fields like this: >> >> > docValue

Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
stributed requests, it expained here >https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re >questParameters >eg does it happen if you run with distrib=false? > >On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jwar...@whitepages.com> >wrote: > &

Re: Facet queries blow out the filterCache

2015-10-02 Thread Jeff Wartes
ert, but not a lookup, so the cache hit ratio is always exactly 1. On 10/2/15, 4:18 AM, "Toke Eskildsen" <t...@statsbiblioteket.dk> wrote: >On Thu, 2015-10-01 at 22:31 +, Jeff Wartes wrote: >> It still inserts if I address the core directly and use distrib=f

Re: Cost of having multiple search handlers?

2015-09-29 Thread Jeff Wartes
ibute it. We’ve been running it in production for a year, >but the config is pretty manual. > >wunder >Walter Underwood >wun...@wunderwood.org >http://observer.wunderwood.org/ (my blog) > > >> On Sep 28, 2015, at 4:41 PM, Jeff Wartes <jwar...@whitepages.com> wrote: >

Re: Cost of having multiple search handlers?

2015-09-28 Thread Jeff Wartes
One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will be done by then. On 9/28/15, 11:39 AM, "Walter Underwood" wrote: >We did the same thing, but reporting performance metrics to Graphite. > >But we won’t be able to add servlet filters in 6.x,

Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
If I configure my filterCache like this: and I have <= 10 distinct filter queries I ever use, does that mean I’ve effectively disabled cache invalidation? So my cached filter query results will never change? (short of JVM restart) I’m unclear on whether autowarm simply copies the value into

Re: Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
of whether it was populated via autowarm. On 9/24/15, 11:28 AM, "Jeff Wartes" <jwar...@whitepages.com> wrote: > >If I configure my filterCache like this: >autowarmCount="10"/> > >and I have <= 10 distinct filter queries I ever use, does that mean I’ve

Re: How to know index file in OS Cache

2015-09-25 Thread Jeff Wartes
I’ve been relying on this: https://code.google.com/archive/p/linux-ftools/ fincore will tell you what percentage of a given file is in cache, and fadvise can suggest to the OS that a file be cached. All of the solr start scripts at my company first call fadvise (FADV_WILLNEED) on all the

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread Jeff Wartes
If you want two different collections to have two different schemas, those collections need to reference two different configsets. So you need another copy of your config available using a different name, and to reference that other name when you create the second collection. On 12/4/15, 6:26

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Jeff Wartes
I’ve never used the managed schema, so I’m probably biased, but I’ve never seen much of a point to the Schema API. I need to make changes sometimes to solrconfig.xml, in addition to schema.xml and other config files, and there’s no API for those, so my process has been like: 1. Put the entire

Re: How to list all collections in solr-4.7.2

2015-12-03 Thread Jeff Wartes
Looks like LIST was added in 4.8, so I guess you’re stuck looking at ZK, or finding some tool that looks in ZK for you. The zkCli.sh that ships with zookeeper would probably suffice for a one-off manual inspection: https://zookeeper.apache.org/doc/trunk/zookeeperStarted.html#sc_ConnectingT

Re: Fully automated replica creation in AWS

2015-12-09 Thread Jeff Wartes
It’s a pretty common misperception that since solr scales, you can just spin up new nodes and be done. Amazon ElasticSearch and older solrcloud getting-started docs encourage this misperception, as does the HDFS-only autoAddReplicas flag. I agree that auto-scaling should be approached carefully,

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Jeff Wartes
Don’t set solr.data.dir. Instead, set the install dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr I have many solrcloud collections, and separate data/install dirs, and I’ve never had to do anything with manual per-collection or per-replica data dirs. That said,

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-08 Thread Jeff Wartes
be... > >=xxx > >btw, for your app, isn't "slice" old notation? > > > > >On 08/01/16 22:05, Jeff Wartes wrote: >> >> I’m pretty sure you could change the name when you ADDREPLICA using a >> core.name property. I don’t know if you can when you

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-08 Thread Jeff Wartes
I’m pretty sure you could change the name when you ADDREPLICA using a core.name property. I don’t know if you can when you initially create the collection though. The CLUSTERSTATUS command will tell you the core names:

Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-23 Thread Jeff Wartes
Looks like it’ll set partialResults=true on your results if you hit the timeout. https://issues.apache.org/jira/browse/SOLR-502 https://issues.apache.org/jira/browse/SOLR-5986 On 12/22/15, 5:43 PM, "Vincenzo D'Amore" wrote: >Well... I can write everything, but

Re: replica recovery

2015-11-19 Thread Jeff Wartes
he >limit on each server but it isn't clear to me how high it should be or if >raising the limit will cause new problems. > >Any advice you could provide in this situation would be awesome! > >Cheers, >Brian > > > >> On Oct 27, 2015, at 20:50, Jeff Wartes <jwar

Re: Data Import Handler / Backup indexes

2015-11-23 Thread Jeff Wartes
dentally and the DIH cannot be run >because the database is unavailable. > >Our collection is simple: 2 nodes - 1 collection - 2 shards with 2 >replicas >each > >So a simple copy (cp command) for both the nodes/shards might work for us? >How do I restore the data back?

Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Jeff Wartes
For what it’s worth, I’d suggest you go into a conversation with Azul with a more explicit “I’m looking to buy” approach. I reached out to them with a more “I’m exploring my options” attitude, and never even got a trial. I get the impression their business model involves a fairly expensive (to

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Jeff Wartes
t; > >> https://github.com/LucidWorks/auto-phrase-tokenfilter >> > > > >> > >> > > > >> > Is there anything else out there that you would recommend I look >> > at? >> > > > >> > >> > > > >>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread Jeff Wartes
Oh, interesting. I’ve certainty encountered issues with multi-word synonyms, but I hadn’t come across this. If you end up using it with a recent solr verison, I’d be glad to hear your experience. I haven’t used it, but I am aware of one other project in this vein that you might be interested

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Jeff Wartes
r on the linux command line I get: > >/opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar > >But the log file is still carrying class not found exceptions when I >restart... > >Are you in "Cloud" mode? What version of Solr are you using?

Re: Multiple calls across the distributed nodes for a query

2016-06-15 Thread Jeff Wartes
Any distributed query falls into the two-phase process. Actually, I think some components may require a third phase. (faceting?) However, there are also cases where only a single pass is required. A fl=id,score will only be a single pass, for example, since it doesn’t need to get the field

Re: Long STW GCs with Solr Cloud

2016-06-16 Thread Jeff Wartes
Check your gc log for CMS “concurrent mode failure” messages. If a concurrent CMS collection fails, it does a stop-the-world pause while it cleans up using a *single thread*. This means the stop-the-world CMS collection in the failure case is typically several times slower than a concurrent

Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Jeff Wartes
to promotion failures. I suspect there's a lot of garbage building up. >We're going to run tests with field collapsing disabled and see if that >makes a difference. > >Cas > > >On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes <jwar...@whitepages.com> wrote: > >> Check y

Re: SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Jeff Wartes
There’s no official way of doing #1, but there are some less official ways: 1. The Backup/Restore API provides some hooks into loading pre-existing data dirs into an existing collection. Lots of caveats. 2. If you don’t have many shards, there’s always rsync/reload. 3. There are some third-party

Re: collection aliasing

2016-01-28 Thread Jeff Wartes
I enjoy using collection aliases in all client references, because that allows me to change the collection all clients use without updating the clients. I just move the alias. This is particularly useful if I’m doing a full index rebuild and want an atomic, zero-downtime switchover. On

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
On 1/27/16, 8:28 AM, "Shawn Heisey" wrote: > >I don't think any documentation states this, but it seems like a good >idea to me use an alias from day one, so that you always have the option >of swapping the "real" collection that you are using without needing to >change

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
If you can identify the problem documents, you can just re-index those after forcing a sync. Might save a full rebuild and downtime. You might describe your cluster setup, including ZK. it sounds like you’ve done your research, but improper ZK node distribution could certainly invalidate some

Re: Shard allocation across nodes

2016-02-01 Thread Jeff Wartes
You could write your own snitch: https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement Or, it would be more annoying, but you can always add/remove replicas manually and juggle things yourself after you create the initial collection. On 2/1/16, 8:42 AM, "Tom Evans"

Re: Restoring backups of solrcores

2016-02-01 Thread Jeff Wartes
Aliases work when indexing too. Create collection: collection1 Create alias: this_week -> collection1 Index to: this_week Next week... Create collection: collection2 Create (Move) alias: this_week -> collection2 Index to: this_week On 2/1/16, 2:14 AM, "vidya" wrote:

Re: very slow frequent updates

2016-02-24 Thread Jeff Wartes
;> of >> > SOLR as the field which is the basis of the sort is not included in the >> > schema for example the price. The customer wants the list in descending >> > order of the price. >> > >> > So I have to get all the 1000 docids from solr an

Re: very slow frequent updates

2016-02-23 Thread Jeff Wartes
My suggestion would be to split your problem domain. Use Solr exclusively for search - index the id and only those fields you need to search on. Then use some other data store for retrieval. Get the id’s from the solr results, and look them up in the data store to get the rest of your fields.

Re: Shard State vs Replica State

2016-02-26 Thread Jeff Wartes
I believe the shard state is a reflection of whether that shard is still in use by the collection, and has nothing to do with the state of the replicas. I think doing a split-shard operation would create two new shards, and mark the old one as inactive, for example. On 2/26/16, 8:50 AM,

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
My understanding is that the "version" represents the timestamp the searcher was opened, so it doesn’t really offer any assurances about your data. Although you could probably bounce a node and get your document counts back in sync (by provoking a check), it’s interesting that you’re in this

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
t; >>> >>> You might watch the achieved replication factor of your updates and see if >>> it ever changes >>> > >This is a good tip. I’m not sure I like the implication that any failure to >write all 3 of our replicas must be retried at the app layer. Is t

Re: Adding nodes

2016-02-17 Thread Jeff Wartes
Solrcloud does not come with any autoscaling functionality. If you want such a thing, you’ll need to write it yourself. https://github.com/whitepages/solrcloud_manager might be a useful head start though, particularly the “fill” and “cleancollection” commands. I don’t do *auto* scaling, but I

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread Jeff Wartes
I’ve been running SolrCloud clusters in various versions for a few years here, and I can only think of two or three cases that the ZK-stored cluster state was broken in a way that I had to manually intervene by hand-editing the contents of ZK. I think I’ve seen Solr fixes go by for those

Re: SolrCloud backup/restore

2016-04-05 Thread Jeff Wartes
There is some automation around this process in the backup commands here: https://github.com/whitepages/solrcloud_manager It’s been tested with 5.4, and will restore arbitrary replication factors. Ever assuming the shared filesystem for backups, of course. On 4/5/16, 3:18 AM, "Reth RM"

Re: SolrCloud no leader for collection

2016-04-05 Thread Jeff Wartes
I recall I had some luck fixing a leader-less shard (after a ZK quorum failure) by forcably removing the records for the down-state replicas from the leader election list, and then forcing an election. The ZK path looks like collections//leader_elect/shardX/election. Usually you’ll find the

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread Jeff Wartes
n zookeeper? > > > >Your tool is very interesting, I just thought about writing such a tool >myself. >From the sources I understand that you represent each node as a path in the >git repository. >So, I guess that for restore purposes I will have to do >the opposite direction a

Re: Separating cores from Solr home

2016-03-03 Thread Jeff Wartes
It’s a bit backwards feeling, but I’ve had luck setting the install dir and solr home, instead of the data dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr So all of the Solr files are in in /opt/solr and all of the index/core related files end up in /data/solr.

Re: XX:ParGCCardsPerStrideChunk

2016-03-03 Thread Jeff Wartes
I've experimented with that a bit, and Shawn added my comments in IRC to his Solr/GC page here: https://wiki.apache.org/solr/ShawnHeisey The relevant bit: "With values of 4096 and 32768, the IRC user was able to achieve 15% and 19% reductions in average pause time, respectively, with the

Re: Replicas for same shard not in sync

2016-04-27 Thread Jeff Wartes
some retry logic in the code that distributes the updates from >the leader as well. > >Best, >Erick > >On Tue, Apr 26, 2016 at 12:51 PM, Jeff Wartes <jwar...@whitepages.com> wrote: >> >> At the risk of thread hijacking, this is an area where I don’t know I full

Re: Solr 5.2.1 on Java 8 GC

2016-04-28 Thread Jeff Wartes
Shawn Heisey’s page is the usual reference guide for GC settings: https://wiki.apache.org/solr/ShawnHeisey Most of the learnings from that are in the Solr 5.x startup scripts already, but your heap is bigger, so your mileage may vary. Some tools I’ve used while doing GC tuning: * VisualVM -

  1   2   >