Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-02 Thread Shawn Heisey
On 2/2/2020 8:47 AM, Joseph Lorenzini wrote: 1000 1 That autoSoftCommit setting is far too aggressive, especially for bulk indexing. I don't know whether it's causing the specific problem you're asking about here, but it's still a setting

Re: Solr 7.7 heap space is getting full

2020-02-02 Thread Walter Underwood
The only time I’ve ever had an OOM is when Solr gets a huge load spike and fires up 2000 threads. Then it runs out of space for stacks. I’ve never run anything other than an 8GB heap, starting with Solr 1.3 at Netflix. Agreed about filter cache, though I’d expect heavy use of that to most often

Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-02 Thread Erick Erickson
You’re opening new searchers very often, every second at least. I do not recommended this except under vary unusual circumstances. This shouldn’t be the root of your problem, but it’s not helping either. But I’d bump that up to 60 seconds or so. I usually just specify maxTime and not maxDocs, I

Re: Solr 7.7 heap space is getting full

2020-02-02 Thread Erick Erickson
Mostly I was reacting to the statement that the number of docs increased by over 4x and then there were memory problems. Hmmm, that said, what does “heap space is getting full” mean anyway? If you’re hitting OOMs, that’s one thing. If you’re measuring the amount of heap consumed and noticing

Re: Solr 7.7 heap space is getting full

2020-02-02 Thread Walter Underwood
We CANNOT diagnose anything until you tell us the error message! Erick, I strongly disagree that more heap is needed for bigger indexes. Except for faceting, Lucene was designed to stream index data and work regardless of the size of the index. Indexing is in RAM buffer sized chunks, so large

Re: G1GC Pauses (Young Gen)

2020-02-02 Thread Walter Underwood
Indexing shouldn’t require a massive heap. The memory used is proportional to the size of the update, not the size of the index. Our shards are 30 B and we run with an 8 GB heap. Never had a problem. You only need a lot of heap if you are running faceting (or maybe sorting) on a big index.

Re: Solr 7.7 heap space is getting full

2020-02-02 Thread Rajdeep Sahoo
We have allocated 16 gb of heap space out of 24 g. There are 3 solr cores here, for one core when the no of documents are getting increased i.e. around 4.5 lakhs,then this scenario is happening. On Sun, 2 Feb, 2020, 9:02 PM Erick Erickson, wrote: > Allocate more heap and possibly add more

Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-02 Thread Joseph Lorenzini
Hi Eric, Thanks for the help. For commit settings, you are referring to https://lucene.apache.org/solr/guide/8_3/updatehandlers-in-solrconfig.html. If so, yes, i have soft commits on. According to the docs, open search is turned by default. Here are the settings. 60

Re: Solr 7.7 heap space is getting full

2020-02-02 Thread Erick Erickson
Allocate more heap and possibly add more RAM. What are you expectations? You can't continue to add documents to your Solr instance without regard to how much heap you’ve allocated. You’ve put over 4x the number of docs on the node. There’s no magic here. You can’t continue to add docs to a

Re: Solr 7.7 heap space is getting full

2020-02-02 Thread Rajdeep Sahoo
What can we do in this scenario as the solr master node is going down and the indexing is failing. Please provide some workaround for this issue. On Sat, 1 Feb, 2020, 11:51 PM Walter Underwood, wrote: > What message do you get about the heap space. > > It is completely normal for Java to use

Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-02 Thread Erick Erickson
What are your commit settings? Solr keeps certain in-memory structures between commits, so it’s important to commit periodically. Say every 60 seconds as a straw-man proposal (and openSearcher should be set to true or soft commits should be enabled). When firing a zillion docs at Solr, it’s also

Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-02 Thread Joseph Lorenzini
Hi all, I have three node solr cloud cluster. The collection has a single shard. I am importing 140 GB CSV file into solr using curl with a URL that looks roughly like this. I am streaming the file from disk for performance reasons.

Re: G1GC Pauses (Young Gen)

2020-02-02 Thread Karl Stoney
So interesting fact, setting XX:MaxGCPauseMillis causes g1gc to dynamically adjust the size of your young space, and setting it too low makes it nose dive as tiny as possible during the memory allocations that happen during soft commits. Setting XX:MaxGCPauseMillis much high has actually