Re: Indexing performance 7.3 vs 8.7

2020-12-23 Thread Bram Van Dam
On 23/12/2020 16:00, Ron Buchanan wrote: > - both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running > OpenJDK (and a bit newer) If you're using G1GC, you probably want to give Java 11 a go. It's an easy thing to test, and it's had a positive impact for us. Your mileage may

Re: Reindexing major upgrades

2020-10-06 Thread Bram Van Dam
On 05/10/2020 16:02, Rafael Sousa wrote: > Having things reindexed from scratch is not > an option, so, is there a way of creating a 8.6.2 index from a pre-existing > 6.5 index or something like that? Sadly there is no such way. If all your fields are stored you might be able to whip up something

Re: ApacheCon at Home 2020 starts tomorrow!

2020-09-30 Thread Bram Van Dam
On 30/09/2020 05:14, Rahul Goswami wrote: > Thanks for sharing this Anshum. Day 1 had some really interesting sessions. > Missed out on a couple that I would have liked to listen to. Are the > recordings of these sessions available anywhere? The ASF will be uploading the recordings of all

Re: Many small instances, or few large instances?

2020-09-22 Thread Bram Van Dam
ions > that minimize heap requirements. And Lucene has done a lot to move memory to > the OS rather than heap (e.g. docValues, MMapDirectory etc.). > > Anyway, carry on as before for the nonce. > > Best, > Erick > >> On Sep 21, 2020, at 6:06 AM, Bram Van Dam wrote:

Many small instances, or few large instances?

2020-09-21 Thread Bram Van Dam
Hey folks, I've always heard that it's preferred to have a SolrCloud setup with many smaller instances under the CompressedOops limit in terms of memory, instead of having larger instances with, say, 256GB worth of heap space. Does this recommendation still hold true with newer garbage

Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Bram Van Dam
There are a couple of open issues related to the timeAllowed parameter. For instance it currently doesn't work on conjunction with the cursorMark parameter [1]. And on Solr 7 it doesn't work at all [2]. But other than that, when users have a lot of query flexibility, it's a pretty good idea to

Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Bram Van Dam
On 11/08/2020 13:15, Erick Erickson wrote: > CDCR is being deprecated. so I wouldn’t suggest it for the long term. Ah yes, thanks for pointing that out. That makes Dominique's alternative less attractive. I guess I'll stick to my original proposal! Thanks Erick :-) - Bram

Backups in SolrCloud using snapshots of individual cores?

2020-08-06 Thread Bram Van Dam
Hey folks, Been reading up about the various ways of creating backups. The whole "shared filesystem for Solrcloud backups"-thing is kind of a no-go in our environment, so I've been looking for ways around that, and here's what I've come up with so far: 1. Stop applications from writing to solr

Re: Solr Float/Double multivalues fields

2020-07-03 Thread Bram Van Dam
On 03/07/2020 09:50, Thomas Corthals wrote: > I think this should go in the ref guide. If your product depends on this > behaviour, you want reassurance that it isn't going to change in the next > release. Not everyone will go looking through the javadoc to see if this is > implied. This is in

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-29 Thread Bram Van Dam
On 28/06/2020 14:42, Erick Erickson wrote: > We need to draw a sharp distinction between standalone “going away” > in terms of our internal code and going away in terms of the user > experience. It'll be hard to make it completely transparant in terms of user experience. For instance, tere is

Re: Autocommit in SolrCloud with many shards

2020-06-18 Thread Bram Van Dam
; >> On Jun 17, 2020, at 5:00 PM, Bram Van Dam wrote: >> >> Thanks for pointing that out. I'm attaching a patch for the ref-guide >> which summarizes what you said. Maybe other people will find this useful >> as well? >> >> Oh and Erick, thanks for your ev

Re: Autocommit in SolrCloud with many shards

2020-06-17 Thread Bram Van Dam
follower won’t until 60 seconds later. > > Best, > Erick > >> On Jun 17, 2020, at 5:36 AM, Bram Van Dam wrote: >> >> 'morning :-) >> >> I'm wondering how autocommits work in Solr. >> >> Say I have a cluster with many nodes and many colections

Autocommit in SolrCloud with many shards

2020-06-17 Thread Bram Van Dam
'morning :-) I'm wondering how autocommits work in Solr. Say I have a cluster with many nodes and many colections with many shards. If each collection's config has a hard autocommit configured every minute, does that mean that SolrCloud (presumably the leader?) will dish out commit requests to

Re: Solr Deletes

2020-05-26 Thread Bram Van Dam
On 26/05/2020 14:07, Erick Erickson wrote: > So best practice is to go ahead and use delete-by-id. I've noticed that this can cause issues when using implicit routing, at least on 7.x. Though I can't quite remember whether the issue was a performance issue, or whether documents would sometimes

Re: +(-...) vs +(*:* -...) vs -(+...)

2020-05-22 Thread Bram Van Dam
Additional reading: https://lucidworks.com/post/why-not-and-or-and-not/ Assuming implicit AND, we perform the following rewrite on strictly negative queries: -f:a -> -f:a *:* Isn't search fun? :-) - Bram On 21/05/2020 20:51, Houston Putman wrote: > Jochen, > > For the standard query

Re: ZooKeeper 3.4 end of life

2020-04-15 Thread Bram Van Dam
On 09/04/2020 16:03, Bram Van Dam wrote: > Thanks, Erick. I'll give it a go this weekend and see how it behaves. > I'll report back so there's a record of my attempts in case anyone else > ends up asking the same question. Here's a quick update after non-exhaustive testing: Running

Re: ZooKeeper 3.4 end of life

2020-04-09 Thread Bram Van Dam
Thanks, Erick. I'll give it a go this weekend and see how it behaves. I'll report back so there's a record of my attempts in case anyone else ends up asking the same question. Some of our customers get a bit nervous when software goes out of support, even if it works fine, so I try to be prepared

ZooKeeper 3.4 end of life

2020-04-09 Thread Bram Van Dam
Hey folks, The ZK team just announced that they're dropping 3.4 support as of the 1st of June, 2020. What does this mean for those of us still on Solr < 8.2? From what I can tell, ZooKeeper 3.5+ does not work with older Solr versions. Has anyone managed to get a 3.5+ to work with Solr 7 at all?

Re: Modify ZK ensemble string in a running SolrCloud?

2020-03-23 Thread Bram Van Dam
On 23/03/2020 14:17, Erick Erickson wrote: > As of Solr 8.2, Solr is distributed with ZooKeeper 3.5.5 (will be 3.5.7 in > Solr 8.6), which allows “dynamic reconfiguration”. If you’re running an > earlier version of Zookeeper, then no you’ll have to restart to change ZK > nodes. Thanks Erick,

Modify ZK ensemble string in a running SolrCloud?

2020-03-23 Thread Bram Van Dam
Is it possible to change the ZK ensemble without restarting the entire SolrCloud? Specifically adding or removing a ZK instance from the ensemble. I'm assuming the answer is no, as far as I can tell the only place where this is configured is the zkHost parameter, which is passed to Solr as a JVM

Devoxx Antwerp

2019-11-05 Thread Bram Van Dam
I don't suppose any Solr users/devs will be attending Devoxx in Antwerp this week? If any of you are, it might be nice to have a chat to exchange some experiences. If not, I'll take that as a sign not to leave it quite so late next year .. ahem. - Bram

Re: Query number of Lucene documents using Solr?

2019-08-27 Thread Bram Van Dam
On 26/08/2019 23:12, Shawn Heisey wrote: > The numbers shown in Solr's LukeRequestHandler come directly from > Lucene.  This is the URL endpoint it will normally be at, for core XXX: > > http://host:port/solr/XXX/admin/luke Thanks Shawn, that's a great entry point! > The specific error you

Query number of Lucene documents using Solr?

2019-08-26 Thread Bram Van Dam
Possibly somewhat unusual question: I'm looking for a way to query the number of *lucene documents* from within Solr. This can be different from the number of Solr documents (because of unmerged deletes/updates/ etc). As a bit of background; we recently found this lovely little error message in a

Re: Incorrect shard placement during Collection creation in 7.6

2019-02-14 Thread Bram Van Dam
Thanks Erick, I just created SOLR-13247 and linked it to SOLR-12944. - Bram On 13/02/2019 18:31, Erick Erickson wrote: > I haven't verified, but this looks like a JIRA to me. Looks like some > of the create logic may have issues, see: SOLR-12944 and maybe link to > that JIRA?

Re: Incorrect shard placement during Collection creation in 7.6

2019-02-13 Thread Bram Van Dam
> TL;DR; createNodeSet & shards combination is not being respected. Update: Upgraded to 7.7, no improvement sadly.

Incorrect shard placement during Collection creation in 7.6

2019-02-13 Thread Bram Van Dam
Hey folks, TL;DR; createNodeSet & shards combination is not being respected. I'm attempting to create a collection with multiple shards, but apparently the value of createNodeSet is not being respected and shards are being assigned to nodes seemingly at random. createNodeSet.shuffle is set to

CDCR "all" collections

2019-01-24 Thread Bram Van Dam
Hey folks, Is there any way to set up CDCR for *all* collections, including any newly created ones? Having to modify the solrconfig in ZK every time a collection is added is a bit of a pain, especially because I'm assuming it requires a restart to activate the config? Basically if I have DC Src

Re: "no servers hosting shard" when querying during shard creation

2019-01-15 Thread Bram Van Dam
On 13/01/2019 19:43, Erick Erickson wrote: > Yeah, that seems wrong, I'd say open a JIRA. I've created a bug in Jira: SOLR-13136. Should I assign this to anyone? Unsure what the procedure is there. Incidentally, while doing so I noticed that 7.6 is still "unreleased" according to Jira. Thanks,

Re: "no servers hosting shard" when querying during shard creation

2019-01-13 Thread Bram Van Dam
On 13/01/2019 14:28, Bram Van Dam wrote: > If a query is launched during the shard creation, I get a > SolrServerException from SolrJ: Error from server at foo: no servers > hosting shard: bar I should probably add that I'm running 7.6.0.

"no servers hosting shard" when querying during shard creation

2019-01-13 Thread Bram Van Dam
Hey folks, I'm getting SolrServerExceptions and I'm not sure whether this is by design or whether this is a concurrency bug of some sort. Basically I've got a pretty active collection which is being queried all the time. Periodically, new shards are created (using the Collection Admin API's

Re: “solr.data.dir” can only config a single directory

2018-08-29 Thread Bram Van Dam
On 28/08/18 08:03, zhenyuan wei wrote: > But this is not a common way to do so, I mean, nobody want to ADDREPLICA > after collection was created. I wouldn't say "nobody"..

Odd GC values in solr.in.sh on 7.2.1

2018-02-21 Thread Bram Van Dam
Hey folks, solr.in.sh appears to contain broken GC suggestions: # These GC settings have shown to work well for a number of common Solr workloads #GC_TUNE="-XX:NewRatio=3 -XX:SurvivorRatio=4etc. The "etc." part is copied verbatim from the file. It looks likes the original GC_TUNE settings

7.0 upgrade: Trie* -> Point* migration

2017-09-26 Thread Bram Van Dam
Hey folks, We're preparing for an upgrade to 7.0, but I'm a bit worried about the deprecation of Trie* fields. Is there any way to upgrade an existing index to use Point* fields without having to reindex all documents? Does the IndexUpgrader take care of this? Thanks, - Bram

Re: Strange boolean query behaviour on 5.5.4

2017-07-05 Thread Bram Van Dam
On 04/07/17 18:10, Erick Erickson wrote: > I think you'll get what you expect by something like: > (*:* -someField:Foo) AND (otherField: (Bar OR Baz)) Yeah that's what I figured. It's not a big deal since we generate Solr syntax using a parser/generator on top of our own query syntax. Still a

Strange boolean query behaviour on 5.5.4

2017-07-04 Thread Bram Van Dam
Hey folks, I'm experiencing some strange query behaviour, and it isn't immediately clear to me why this wouldn happen. The definition of the query syntax on the wiki is a bit fuzzy, so my interpretation of the syntax might be off. This query does work (no results, when results are expected).

Re: solr 6 at scale

2017-05-25 Thread Bram Van Dam
>>> It is relatively easy to downgrade to an earlier release within the >>> same major version. We have not switched to 6.5.1 simply because we >>> have no pressing need for it - Solr 6.3 works well for us. > >> That strikes me as a little bit dangerous, unless your indexes are very >> static.

Re: Solr - example for using percentiles

2017-02-22 Thread Bram Van Dam
On 17/02/17 13:39, John Blythe wrote: > Using the stats component makes short work of things. > > stats.true=foo The stats component has been rendered obsolete by the newer and shinier json.facet stuff. - Bram

Re: Solr - example for using percentiles

2017-02-17 Thread Bram Van Dam
On 15/01/17 15:26, Vidal, Gilad wrote: > Hi, > Can you direct me for Java example using Solr percentiles? > The following 3 examples are not seems to be working. Not sure if this is still relevant, but I use the json.facet parameter with SolrJ: query.add("json.facet",

Re: Atomic updates to increase single field bulk updates?

2017-02-17 Thread Bram Van Dam
> I am aware of the requirements to use atomic updates, but as I understood, > those would not have a big impact on performance and only a slight increase > in index size? AFAIK there won't be a difference in index size between atomic updates and full updates, as the end result is the same.

Re: Solr 6 Performance Suggestions

2016-11-23 Thread Bram Van Dam
On 22/11/16 15:34, Prateek Jain J wrote: > I am not sure but I heard this in one of discussions, that you cant migrate > directly from solr 4 to solr 6. It has to be incremental like solr 4 to solr > 5 and then to solr 6. I might be wrong but is worth trying. Ideally the index needs to be

Re: 5.5.3: fieldValueCache auto-warming error

2016-11-15 Thread Bram Van Dam
On 11/11/16 18:08, Bram Van Dam wrote: > On 10/11/16 17:10, Erick Erickson wrote: >> Just facet on the text field yourself ;) Quick update: you were right. One of the users managed to find a bug in our application which enabled them to facet on the text field. It would be stil

Re: 5.5.3: fieldValueCache auto-warming error

2016-11-11 Thread Bram Van Dam
On 10/11/16 17:10, Erick Erickson wrote: > Just facet on the text field yourself ;) Wish I could, this is on premise over at a client, access is difficult and their response time is pretty bad on public holidays and weekends. So I'm basically twiddling my thumbs while waiting to get more log

Re: 5.5.3: fieldValueCache auto-warming error

2016-11-10 Thread Bram Van Dam
On 09/11/16 16:59, Erick Erickson wrote: > But my bet is that you _are_ doing something that uninverts the text > field (obviously inadvertently). If you restart Solr and monitor the > log until the first time you see this exception, what do the queries > show? My guess is that once you get some

5.5.3: fieldValueCache auto-warming error

2016-11-09 Thread Bram Van Dam
Hey folks, I'm frequently getting the following error, which has me a little puzzled: Error during auto-warming of key:text:org.apache.solr.common.SolrException: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field text This is strange because the field in

Re: SolrJ & Ridiculously Large Queries

2016-10-18 Thread Bram Van Dam
On 14/10/16 16:13, Shawn Heisey wrote: > name="solr.jetty.request.header.size" default="8192" /> A belated thanks, Shawn! 32k should be sufficient, I hope. - Bram

SolrJ & Ridiculously Large Queries

2016-10-14 Thread Bram Van Dam
Hey folks, I just noticed that Jetty barfs with HTTP 414 when request URIs are very large, which makes sense. I think the default limit is ~8k. Unfortunately I've got users who insist on executing queries that are 16k (!1!?!?) in size. Two questions: 1) is it possible to POST these oversized

Re: json.facet without a facet ...

2016-09-27 Thread Bram Van Dam
On 26/09/16 17:06, Yonik Seeley wrote: > Statistics are now fully integrated into faceting. Since we start off > with a single facet bucket with a domain defined by the main query and > filters, we can even ask for statistics for this top level bucket, > before breaking up into further buckets via

json.facet without a facet ...

2016-09-26 Thread Bram Van Dam
Howdy, I realize that this might be a strange question, so please bear with me here. I've been replacing my usage of the old Stats Component (stats=true, stats.field=foo, [stats.facet=bar]) with the new json.facet sugar. This has been a great improvement on all fronts. However, with the stats

Re: JSON Facet API

2016-09-21 Thread Bram Van Dam
On 21/09/16 05:40, Sandeep Khanzode wrote: > How can I specify JSON Facets in SolrJ? The below facet query for example ... SolrQuery query = new SolrQuery(); query.add("json.facet", jsonStringGoesHere); - Bram

Re: Miserable Experience Using Solr. Again.

2016-09-17 Thread Bram Van Dam
> I would like to see a future where the admin UI is more than just an > addon ... but even then, I think the HTTP API will *still* be the most > important piece of the system. In 4 years of heavily using (many instances and many versions of) Solr, the only times when I've used the admin UI has

Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Bram Van Dam
I'm sorry you're having a "miserable" experience "again". That's certainly not my experience with Solr. That being said: > First I was getting errors about "Unsupported major.minor version 52.0", so I > needed to install the Linux x64 JRE 1.8.0, which I managed on CentOS 6 with... > yum install

Re: Monitoring Apache Solr

2016-09-12 Thread Bram Van Dam
> I try to monitor apache solr, because solr often over heap and status > collection solr be "down". How to monitor apache solr ?? > is there any tools for monitoring solr or how ?? The easiest way is to use the Solr ping feature: https://cwiki.apache.org/confluence/display/solr/Ping It will

SolrJ & json.facet?

2016-05-25 Thread Bram Van Dam
Hey folks, Is there any way to use the "new" json.facet features (or some Java equivalent) using SolrJ? I've had a quick look at the source code, but nothing really jumps out at me. Thanks, - Bram

Re: Solr 5.2.1 on Java 8 GC

2016-05-01 Thread Bram Van Dam
On 30/04/16 17:34, Davis, Daniel (NIH/NLM) [C] wrote: > Bram, on the subject of brute force - if your script is "clever" and uses > binary first search, I'd love to adapt it to my environment. I am trying to > build a truly multi-tenant Solr because each of our indexes is tiny, but all >

Re: Tuning solr for large index with rapid writes

2016-04-30 Thread Bram Van Dam
> If I'm reading this right, you have 420M docs on a single shard? > Yep, you were reading it right. Is Erick mentioned, it's hard to give concrete sizing advice, but we've found 120M to be the magic number. When a shard contains more than 120M documents, performance goes down rapidly & GC

Re: Tuning solr for large index with rapid writes

2016-04-30 Thread Bram Van Dam
On 29/04/16 16:33, Erick Erickson wrote: > You have one huge advantage when doing prototyping, you can > mine your current logs for real user queries. It's actually > surprisingly difficult to generate, say, 10,000 "realistic" queries. And > IMO you need something approaching that number to insure

Re: Solr 5.2.1 on Java 8 GC

2016-04-30 Thread Bram Van Dam
On 29/04/16 16:40, Nick Vasilyev wrote: > Not sure if it helps anyone, but I am seeing decent results with the > following. > > It was mostly a result of trial and error, I'm ashamed to admit that I've used a similar approach: wrote a simple test script to try out various GC settings with

Re: Storing different collection on different hard disk

2016-04-21 Thread Bram Van Dam
On 21/04/16 03:56, Zheng Lin Edwin Yeo wrote: > This is the working one: > dataDir=D:/collection1/data Ah yes. Backslashes are escape characters in properties files. C:\\collection1\\data would probably work as well. - bram

Re: Indexing 700 docs per second

2016-04-20 Thread Bram Van Dam
> I have a requirement to index (mainly updation) 700 docs per second. > Suppose I have a 128GB RAM, 32 CPU machine, with each doc size around 260 > byes (6 fields out of which only 2 will undergo updation at the above > rate). This collection has around 122Million docs and that count is pretty >

Re: Storing different collection on different hard disk

2016-04-20 Thread Bram Van Dam
Have you considered simply mounting different disks under different paths? It looks like you're using Windows, so I'm not sure if that's possible, but it seems like a relatively basic task, so who knows. You could mount Disk 1 as /path/to/collection1 and Disk 2 as /path/to/collection2. That way

Re: [Possible Bug] 5.5.0 Startup script ignoring host parameter?

2016-03-31 Thread Bram Van Dam
On 30/03/16 16:45, Shawn Heisey wrote: > The host parameter does not control binding to network interfaces. It > controls what hostname is published to zookeeper when running in cloud mode. Oh I see. That wasn't clear from the documentation. Might be worth adding such a parameter to the startup

[Possible Bug] 5.5.0 Startup script ignoring host parameter?

2016-03-30 Thread Bram Van Dam
Hi folks, It looks like the "-h" parameter isn't being processed correctly. I want Solr to listen on 127.0.0.1, but instead it binds to all interfaces. Am I doing something wrong? Or am I misinterpreting what the -h parameter is for? Linux: # bin/solr start -h 127.0.0.1 -p 8180 # netstat -tlnp

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Bram Van Dam
On 23/03/16 15:50, Yonik Seeley wrote: > Kind of a unique situation for a dot-oh release, but from the Solr > perspective, 6.0 should have *fewer* bugs than 5.5 (for those features > in 5.5 at least)... we've been squashing a bunch of docValue related > issues. I've been led to understand that

Re: Solr 5.5.0: JVM args warning in console logfile.

2016-03-24 Thread Bram Van Dam
> When I made the change outlined in the patch on SOLR-8145 to my bin/solr > script, the warning disappeared. That was not the intended effect of > the patch, but I'm glad to have the mystery solved. > > Thank you for mentioning the problem so we could track it down. You're welcome. And thanks

Re: Solr 5.5.0: JVM args warning in console logfile.

2016-03-22 Thread Bram Van Dam
On 22/03/16 15:16, Shawn Heisey wrote: > This message is not coming from Solr. It's coming from Jetty. Solr > uses Jetty, but uses it completely unchanged. Ah you're right. Here's the offending code:

Solr 5.5.0: JVM args warning in console logfile.

2016-03-22 Thread Bram Van Dam
Hey folks, When I start 5.5.0 (on RHEL), the following entry is added to server/logs/solr-8983-console.log: WARNING: System properties and/or JVM args set. Consider using --dry-run or --exec I can't quite figure out what's causing this. Any clues on how to get rid of it? Thanks, - Bram

SolrCloud: Frequent "No registered leader was found" errors

2015-12-22 Thread Bram Van Dam
Hi folks, Been doing some SolrCloud testing and I've been experiencing some problems. I'll try to be relatively brief, but feel free to ask for additional information. I've added about 200 million documents to a SolrCloud. The cloud contains 3 collections, and all documents were added to all

Re: Deduplication

2015-05-20 Thread Bram Van Dam
Write a custom update processor and include it in your update chain. You will then have the ability to do anything you want with the entire input document before it hits the code to actually do the indexing. This sounded like the perfect option ... until I read Jack's comment: My

Re: Deduplication

2015-05-20 Thread Bram Van Dam
On 19/05/15 14:47, Alessandro Benedetti wrote: Hi Bram, what do you mean with : I would like it to provide the unique value myself, without having the deduplicator create a hash of field values . This is not reduplication, but simple document filtering based on a constraint. In the

Re: Solr 5.0, Jetty and WAR

2015-05-19 Thread Bram Van Dam
My organization has issues with Jetty (some customers don't want Jetty on their boxes, but are OK with WebSphere or Tomcat) so I'm trying to figure out: how to get Solr on WebSphere / Tomcat without using WAR knowing that the WAR will go away. I understand that some customers are irrational.

Deduplication

2015-05-19 Thread Bram Van Dam
Hi folks, I'm looking for a way to have Solr reject documents if a certain field value is duplicated (reject, not overwrite). There doesn't seem to be any kind of unique option in schema fields. The de-duplication feature seems to make this (somewhat) possible, but I would like it to provide the

Date Time datatypes?

2015-03-30 Thread Bram Van Dam
Howdy folks, Is there any way index only the date and time portions of a datetime field? A Date is really a period of 24hrs, starting at 00:00 in said date's time zone. It would be useful if there was a way to search for documents of a certain date with these semantics. As for times, I'd

Re: How large is your solr index?

2015-01-11 Thread Bram Van Dam
Do note that one strategy is to create more shards than you need at the beginning. Say you determine that 10 shards will work fine, but you expect to grow your corpus by 2x. _Start_ with 20 shards (multiple shards can be hosted in the same JVM, no problem, see maxShardsPerNode in the collections

Re: How large is your solr index?

2015-01-08 Thread Bram Van Dam
On 01/07/2015 05:42 PM, Erick Erickson wrote: True, and you can do this if you take explicit control of the document routing, but... that's quite tricky. You forever after have to send any _updates_ to the same shard you did the first time, whereas SPLITSHARD will do the right thing. Hmm. That

Re: How large is your solr index?

2015-01-07 Thread Bram Van Dam
On 01/06/2015 07:54 PM, Erick Erickson wrote: Have you considered pre-supposing SolrCloud and using the SPLITSHARD API command? I think that's the direction we'll probably be going. Index size (at least for us) can be unpredictable in some cases. Some clients start out small and then grow

Re: Solr support for multi-tenant applications

2015-01-07 Thread Bram Van Dam
One possibility is to have separate core for each tenant domain. You could do that, and it's probably the way to go if you have a lot of data. However, if you don't have much data, you can achieve multi-tenancy by adding a filter to all your queries, for instance: query = userQuery

Re: How large is your solr index?

2015-01-04 Thread Bram Van Dam
On 01/04/2015 02:22 AM, Jack Krupansky wrote: The reality doesn't seem to be there today. 50 to 100 million documents, yes, but beyond that takes some kind of heroic effort, whether a much beefier box, very careful and limited data modeling or limiting of query capabilities or tolerance of

FOSDEM Open source search devroom

2015-01-02 Thread Bram Van Dam
Hi folks, There will be an Open source search devroom[1] at this year's FOSDEM in Brussels, 31st of January 1st of February. I don't know if there will be a Lucene/Solr presence (there's no schedule for the dev room yet), but this seems like a good place meet up and talk shop. I'll be

Re: How large is your solr index?

2014-12-31 Thread Bram Van Dam
On 12/30/2014 05:03 PM, Erick Erickson wrote: I think that it would be _extremely_ helpful to have a bunch of war stories to reference. In my experience, people dealing with large numbers of documents really are most concerned with whether what they're doing is _possible_, and are mostly looking

Re: How large is your solr index?

2014-12-30 Thread Bram Van Dam
On 12/29/2014 08:08 PM, ralph tice wrote: Like all things it really depends on your use case. We have 160B documents in our largest SolrCloud and doing a *:* to get that count takes ~13-14 seconds. Doing a text:happy query only takes ~3.5-3.6 seconds cold, subsequent queries for the same terms

Re: How large is your solr index?

2014-12-30 Thread Bram Van Dam
On 12/29/2014 09:53 PM, Jack Krupansky wrote: And that Lucene index document limit includes deleted and updated documents, so even if your actual document count stays under 2^31-1, deleting and updating documents can push the apparent document count over the limit unless you very aggressively

Re: How large is your solr index?

2014-12-30 Thread Bram Van Dam
On 12/29/2014 10:30 PM, Toke Eskildsen wrote: That being said, I acknowledge that it helps with stories to get a feel of what can be done. That's pretty much what I'm after, mostly to reassure myself that it can be done. Even if it does require a lot of hardware (which is fine). At

Re: SolrCloud Paging on large indexes

2014-12-29 Thread Bram Van Dam
On 12/23/2014 04:07 PM, Toke Eskildsen wrote: The beauty of the cursor is that it is has little to no overhead, relative to a standard top-X sorted search. A standard search uses a sliding window over the full result set, as does a cursor-search. Same amount of work. It is just a question of

How large is your solr index?

2014-12-29 Thread Bram Van Dam
Hi folks, I'm trying to get a feel of how large Solr can grow without slowing down too much. We're looking into a use-case with up to 100 billion documents (SolrCloud), and we're a little afraid that we'll end up requiring 100 servers to pull it off. The largest index we currently have is

Re: SolrCloud Paging on large indexes

2014-12-23 Thread Bram Van Dam
On 12/22/2014 04:27 PM, Erick Erickson wrote: Have you read Hossman's blog here? https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl Oh thanks, that's a pretty interesting read. The scale we're investigating is several orders

SolrCloud Paging on large indexes

2014-12-22 Thread Bram Van Dam
Hi folks, If I understand things correctly, you can use paging sorting in a SolrCloud environment. However, if I request the first 10 documents, a distributed query will be launched to all shards requesting the top 10, and then (Shards * 10) documents will then be sorted so that only the

Re: SolrCloud Paging on large indexes

2014-12-22 Thread Bram Van Dam
On 12/22/2014 12:47 PM, heaven wrote: I have a very bad experience with pagination on collections larger than a few millions of documents. Pagination becomes very and very slow. Just tried to switch to page 76662 and it took almost 30 seconds. Yeah that's pretty much my experience, and I think

Filter Query or Query

2014-11-10 Thread Bram Van Dam
Hi folks, I have an index with hundreds of millions of documents, which users can query in various ways. Two index fields are used by the system to hide certain documents from certain users (for instance: Department A can only view documents belonging to Department A, but not Department B).

Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-23 Thread Bram Van Dam
On 06/20/2014 06:48 PM, Yonik Seeley wrote: Heliosearch is a Solr fork that will hopefully find it's way back to the ASF in the future. There are about 50 instances of sun.misc.unsafe in heliosearch's code at this point. Has this been tested on non-oracle VMs? Particularly IBM? Also: please

Paging while indexes

2014-06-23 Thread Bram Van Dam
Is there any way to take the current index version (or commit number or something) into account in paged queries? When navigating through a large result set in an NRT environment, I want the navigation to remain *fixed* on the initial results. I'm trying to avoid a scenario where a user has a

Re: How to handle multiple sub second updates to same SOLR Document

2014-01-28 Thread Bram Van Dam
On 01/25/2014 07:21 PM, christopher palm wrote: The problem I am trying to solve is that the order of these updates isn’t guaranteed once the multi threaded SOLRJ client starts sending them to SOLR, and older updates are overlaying the newer updates on the same document. Don't do that. There

Per-field/facet TimeZone in query?

2013-11-28 Thread Bram Van Dam
Howdy, Is there any way to specify a time zone per field/facet? There is a global TZ query parameter, but I would like to be able to use a different TZ for different fields or facets in a query. Thx, - Bram

Re: Core admin: create new core

2013-11-05 Thread Bram Van Dam
On 11/04/2013 04:06 PM, Bill Bell wrote: You could pre create a bunch of directories and base configs. Create as needed. Then use schema less API to set it up ... Or make changes in a script and reload the core.. I ended up creating a little API that takes schema/config as input, creates

Core admin: create new core

2013-11-04 Thread Bram Van Dam
The core admin CREATE function requires that the new instance dir and schema/config exist already. Is there a particular reason for this? It would be incredible convenient if I could create a core with a new schema and new config simply by calling CREATE (maybe providing the contents of

Re: pivot range faceting

2013-10-20 Thread Bram Van Dam
On 10/21/2013 03:46 AM, Toby Lazar wrote: Thanks for confirming my fears. I saw some presentations where I thought this feature was used, but perhaps it was done performing multiple range queries. Probably. I had a look at implementing the feature (because it's something we rely on quite a

[SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam
Hi folks, Long story short: I'm occasionally getting exceptions under heavy load (SocketException: Connection reset). I would expect HttpSolrServer to try again maxRetries-times, but it doesn't. For reasons I don't entirely understand, the call to httpClient.execute(method) is not inside

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam
On 10/07/2013 11:51 AM, Furkan KAMACI wrote: Could you send you error logs? Whoops, forgot to paste: Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/fooIndex at

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam
On 10/07/2013 12:55 PM, Furkan KAMACI wrote: One more thing, could you say that which version of Solr you are using? The stacktrace comes from 4.2.1, but I suspect that this could occur on 4.4 as well. I've not been able to reproduce this consistently: it has happened twice (!) after

Re: {soft}Commit and cache flusing

2013-10-02 Thread Bram Van Dam
if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.

Re: OpenJDK or OracleJDK

2013-09-30 Thread Bram Van Dam
On 09/30/2013 01:11 PM, Raheel Hasan wrote: Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr over CentOS? If you're using Java 7 (or 8) then it doesn't matter. If you're using Java 6, stick with the Oracle version.

  1   2   >