in-place updates

2018-04-10 Thread Hendrik Haddorp
Hi, in http://lucene.472066.n3.nabble.com/In-Place-Updates-not-working-as-expected-tp4375621p4380035.html some restrictions on the supported fields are given. I could however not find if in-place updates are supported for are field types or if they only work for say numeric fields. thanks,

Re: Solr 7.3.0 loading OpenNLPExtractNamedEntitiesUpdateProcessorFactory

2018-04-10 Thread Ryan Yacyshyn
Hi, I found the problem: there was an additional jar file in the /dist folder that needed to be loaded as well (dist/solr-analysis-extras-7.3.0.jar). I didn't see this one. Thanks, Ryan On Mon, 9 Apr 2018 at 14:58 Ryan Yacyshyn wrote: > Hi Shawn, > > I'm pretty sure

Re: Text in images are not extracted and indexed to content

2018-04-10 Thread Zheng Lin Edwin Yeo
Thanks for the reply. It was due to the Tesseract OCR problem, as I have tried out the new Tesseract 4 version on my system, and it does not set the path in the Environment Variables, unlike the older Tesseract 3, which set the path automatically during installation. Regards, Edwin On 10 April

Using Solr to search website and external Oracle ServiceCloud

2018-04-10 Thread jclaros
Hello, I have a drupal website that uses SOLR. When a user searches our website, I would like to now return results from 2 sources: (1) our website (2) our external Oracle ServiceCloud knowledge base Does anyone have any suggestions on how to do this? -- Sent from:

Using Solr to search website and external Oracle ServiceCloud

2018-04-10 Thread jclaros
Hello, I would like to use SOLR to search 2 different sources: (1) My website (2) My external knowledge base created in Oracle Service Could. Right now SOLR works great against my website. I am wanted to integrate my FAQs from the knowledge base into 1 search on the website to make it easier for

Solr7.1.0 - deleting collections when using HDFS

2018-04-10 Thread Joe Obernberger
Hi All - I've noticed that if I delete a collection that is stored in HDFS, the files/directory in HDFS remain.  If I then try to recreate the collection with the same name, I get an error about unable to open searcher.  If I then remove the directory from HDFS, the error remains due to files

RE: Backup a solr cloud collection - timeout in 180s?

2018-04-10 Thread Petersen, Robert (Contr)
Erick: Good to know! Thx Robi -Original Message- From: Erick Erickson Sent: Tuesday, April 10, 2018 12:42 PM To: solr-user Subject: Re: Backup a solr cloud collection - timeout in 180s? Robi: Yeah, the ref guide has lots and

Re: Backup a solr cloud collection - timeout in 180s?

2018-04-10 Thread Erick Erickson
Robi: Yeah, the ref guide has lots and lots and lots of info, but at 1,100 pages and growing things can be "interesting" to find. Do be aware of one thing. The async ID should be unique and before 7.3 there was a bug that if you used the same ID twice (without waiting for completion and deleting

Re: Recover a Solr Node

2018-04-10 Thread Erick Erickson
There's actually not much that's _required_ in core.properties, so if you want to try this just create a core directory by hand under SOLR_HOME and name it by the pattern that other cores use. Then use a core.properties file from another replica in the same collection and substitute every property

Re: Recover a Solr Node

2018-04-10 Thread Shawn Heisey
On 4/9/2018 2:28 PM, Karthik Ramachandran wrote: > We are using Solr cloud with 3 nodes, no replication with 8 shard per node > per collection. We have multiple collection on that node. > > We have backup of data the data folder, so we can recover it, is there a > way to reconstruct

Re: Recover a Solr Node

2018-04-10 Thread Karthik Ramachandran
Eric, Just throwing whats's in my mind. I see that collection cluster state has the all the information to create the core.properties. If I create the core.properties from the cluster state and then reload the collection will that bring the collection up? I did try the above step, but instead

Re: Backup a solr cloud collection - timeout in 180s?

2018-04-10 Thread Petersen, Robert (Contr)
HI Erick, I *just* found that parameter in the guide... it was waaay down at the bottom of the page (in proverbial small print)! So for other readers the steps are this: # start the backup async enabled /admin/collections?action=BACKUP=addrsearchBackup=addrsearch=/apps/logs/backups=1234 #

Re: Backup a solr cloud collection - timeout in 180s?

2018-04-10 Thread Erick Erickson
Specify the "async" property, see: https://lucene.apache.org/solr/guide/6_6/collections-api.html There's also a way to check the status of the backup running in the background. Best, Erick On Mon, Apr 9, 2018 at 11:05 AM, Petersen, Robert (Contr) wrote: > Shouldn't

Re: Recover a Solr Node

2018-04-10 Thread Erick Erickson
Not that I know of. You might be able to do an ADDREPLICA for each one. This is a risk when running without replicas. Best, Erick On Mon, Apr 9, 2018 at 1:28 PM, Karthik Ramachandran wrote: > We are using Solr cloud with 3 nodes, no replication with 8 shard per node > per

Re: replication

2018-04-10 Thread Erick Erickson
bq: should we try to bite the solrcloud bullet and be done w it that's what I'd do. As of 7.0 there are different "flavors", TLOG, PULL and NRT so that's also a possibility, although you can't (yet) direct queries to one or the other. So just making them all NRT and forgetting about it is

Query regarding LTR plugin in solr

2018-04-10 Thread prateek . agarwal
Hi, I'm working on ltr feature in solr. I have a feature like : ''' { "store" : "my_feature_store", "name" : "in_aggregated_terms", "class" : "org.apache.solr.ltr.feature.SolrFeature", "params" : { "q" : "{!func}scale(query({!payload_score f=aggregated_terms func=max

Ignore Field from indexing

2018-04-10 Thread swap
Hi I have document indexed. Email-Id is unique key in document. On updating I need to ignore few field if its already exists. Please let me know if something more required. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-10 Thread David Hastings
I actually used solr 5.x, the more like this features, and a subset of human tagged data (about 10%) to apply subject coding with around a 95% accuracy rate to over 2 million documents, so it is definitely doable On Tue, Apr 10, 2018 at 10:40 AM, Alexandre Rafalovitch wrote:

Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-10 Thread Alexandre Rafalovitch
I know it was a joke, but I've been thinking of something like that. Not a chatbot per say, but perhaps something that uses Machine Learning/topic clustering on the past discussions and match them to the new questions. Still would need to be rechecked by a human for final response, but could be

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-10 Thread Shawn Heisey
On 4/10/2018 7:32 AM, Christopher Schultz wrote: What happened is that the new core directory was created as root, owned by root. Was it? If my server is running as solr, how can it create directories as root? Unless you run Solr in cloud mode (which means using zookeeper), the server cannot

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-10 Thread Christopher Schultz
Shawn, On 4/9/18 8:04 PM, Shawn Heisey wrote: > On 4/9/2018 12:58 PM, Christopher Schultz wrote: >> After playing-around with a Solr 7.2.1 instance launched from the >> extracted tarball, I decided to go ahead and create a "real service" on >> my Debian-based server. >> >> I've run the 7.3.0

Re: Collapse in facet

2018-04-10 Thread Joel Bernstein
It sounds like the JSON facet API could do what you are describing. I haven't tried the exclusion of the collapse filter with the JSON facet API but I suspect it will work. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Apr 10, 2018 at 3:40 AM, Carl-Johan Syrén

Re: Text in images are not extracted and indexed to content

2018-04-10 Thread Shamik Sinha
To index text in images the image needs to be searchable i. e. text needs to be overlayed on the image like a searchable pdf. You can do this using ocr but it is a bit unreliable if the images are scanned copies of written text. On 10-Apr-2018 4:12 PM, "Rahul Singh"

Re: Text in images are not extracted and indexed to content

2018-04-10 Thread Rahul Singh
May need to extract outside SolR and index pure text with an external ingestion process. You have much more control over the Tika attributes and behaviors. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 9, 2018, 10:23 PM -0400, Zheng Lin Edwin Yeo , wrote:

Re: Score certain documents higher based on a weight field

2018-04-10 Thread Emir Arnautović
Hi, In case you are using (e)dismax query parser, you can use bf (additive) or boost (multiplier) to boost results. You have field function to access the field value (can also just use field name in most places. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Collapse in facet

2018-04-10 Thread Carl-Johan Syrén
Hi I use fq= '{!collapse field=id_publ}' in a question to get distinct publications id:s, it works fine. Then i got three facets and in one of them i want to collapse on another level. For each organisation id i want to count distinct publication id:s. To do that i start with suppressing the

Re: SOLR with Sitecore SXA

2018-04-10 Thread Stefan Matheis
You should be subscribed to the list [1], then just mail in - if you're not subscribed you don't get any follow up mails from the list (besides all other mails that happen) [1] http://lucene.apache.org/solr/community.html#mailing-lists-irc -Stefan On Mon, Apr 9, 2018, 5:03 PM Saul Nachman