RE: Trailing space issue with indexed data.

2020-08-18 Thread Markus Jelsma
Hello, You can use TrimFieldUpdateProcessorFactory [1] in your URP chain to remove leading or trailing whitespace when indexing. Regards, Markus [1] https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html -Original

RE: Drop bad document in update batch

2020-08-18 Thread Markus Jelsma
Ah yes, i should have looked at the list of subclasses of UpdateRequestProcessorFactory in de API docs as it is not mentioned in the manual. Thanks Erick! -Original message- > From:Erick Erickson > Sent: Tuesday 18th August 2020 19:04 > To: solr-user@lucene.apache.org > Subject: Re:

Re: Trailing space issue with indexed data.

2020-08-18 Thread Jörn Franke
During indexing. Do they matter for search, ie would the search be different with/without them? > Am 18.08.2020 um 19:57 schrieb Fiz N : > > Hell SOLR Experts, > > I am using SOLR 8.6 and indexing data from MSSQL DB. > > after indexing is done I am seeing > > “Page_number”:”1

Trailing space issue with indexed data.

2020-08-18 Thread Fiz N
Hell SOLR Experts, I am using SOLR 8.6 and indexing data from MSSQL DB. after indexing is done I am seeing “Page_number”:”1“, “Doc_name”:” office 770 toll free “ “Doc_text”:” From: Hyan, gan \nTo: Delacruz Decruz \n“ I was remove

Re: Deleted collection is getting back after restart

2020-08-18 Thread Erick Erickson
What this sounds like is you’re not really connecting to ZooKeeper the way you think. First, insure that you’re really connecting from Solr by going to the admin UI and looking at the Zookeeper information. Are the URLs and ports correct? Second, Zookeeper by default puts the data in

Deleted collection is getting back after restart

2020-08-18 Thread yaswanth kumar
I am using solr with zookeeper ensemble, but some time when we delete the collection with the solr API , they are getting disappeared from solr cloud but after some days, when the machines are rebooted, they are coming back on the cloud but with down status. Not really sure if its an issue with

Re: Drop bad document in update batch

2020-08-18 Thread Erick Erickson
I think you’re looking for TolerantUpdateProcessor(Factory), added in SOLR-445. It hung around for a LOGGG time and didn’t actually get added until 6.1. > On Aug 18, 2020, at 12:51 PM, Markus Jelsma > wrote: > > Hello, > > Normally, if a single document is bad, the whole indexing

Re: Use of NRTCachingDirectoryFactory

2020-08-18 Thread Erick Erickson
In a word, “yes”. NRTCachingDirectory almost always “does the right thing” without any modifications based on your environment. Second, since Lucene uses MMapDirectory, the relevant portions of your index will already be in the OS’s RAM, which is why it’s a mistake to try to force it. All

Drop bad document in update batch

2020-08-18 Thread Markus Jelsma
Hello, Normally, if a single document is bad, the whole indexing batch is dropped. I think i remember there was an URP(?) that discards bad documents from the batch, but i cannot find it in the manual [1]. Is it possible or am i starting to imagine things? Thanks, Markus [1]

Re: SOLR indexing takes longer time

2020-08-18 Thread Walter Underwood
Instead of writing code, I’d fire up SQL Workbench/J, load the same JDBC driver that is being used in Solr, and run the query. https://www.sql-workbench.eu If that takes 3.5 hours, you have isolated the problem. wunder Walter Underwood wun...@wunderwood.org

Re: SOLR indexing takes longer time

2020-08-18 Thread David Hastings
Another thing to mention is to make sure the indexer you build doesnt send commits until its actually done. Made that mistake with some early in house indexers. On Tue, Aug 18, 2020 at 9:38 AM Charlie Hull wrote: > 1. You could write some code to pull the items out of Mongo and dump > them to

Re: SOLR indexing takes longer time

2020-08-18 Thread Charlie Hull
1. You could write some code to pull the items out of Mongo and dump them to disk - if this is still slow, then it's Mongo that's the problem. 2. Write a standalone indexer to replace DIH, it's single threaded and deprecated anyway. 3. Minor point - consider whether you need to index everything

Use of NRTCachingDirectoryFactory

2020-08-18 Thread Tushar Arora
Hi, One of our indexes has a size of around 1GB. And the production server has RAM of 16GB. And this is a slave server. Data replicates from the master server to it every 5 minutes. Is it a good practice to keep this index in RAM? I checked the solr.RAMDirectoryFactory. But, it does not work