Re: Default Index config

2018-04-09 Thread mganeshs
Hi Shawn, Thanks for the reply. Yes we use only one solr client. Though collection name is passed in the function, we are using same client for now. Regarding merge config, after reading lot of forums and listening to presentation of revolution 2017, idea is to reduce the merge frequency, so

Text in images are not extracted and indexed to content

2018-04-09 Thread Zheng Lin Edwin Yeo
Hi, Currently I am facing issue whereby the text in images file like jpg, bmp are not being extracted out and indexed. After the indexing, Tika did extract all the meta data out and index them under the fields attr_*. However, the content field is always empty for images file. For other types of

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-09 Thread Shawn Heisey
On 4/9/2018 12:58 PM, Christopher Schultz wrote: > After playing-around with a Solr 7.2.1 instance launched from the > extracted tarball, I decided to go ahead and create a "real service" on > my Debian-based server. > > I've run the 7.3.0 install script, configured Solr for TLS, and moved my >

Re: replication

2018-04-09 Thread John Blythe
Thanks a bunch for the thorough reply, Shawn. Phew. We’d chosen to go w Master-slave replication instead of SolrCloud per the sudden need we had encountered and the desire to avoid the nuances and changes related to moving to SolrCloud. But so much for this being a more straightforward solution,

Score certain documents higher based on a weight field

2018-04-09 Thread OTH
Hello, Is there a way to assign a higher score to certain documents based on a 'weight' field? E.g., if I have the following two documents: { "name":"United Kingdom", "weight":2730, } { "name":"United States of America", "weight":11246, } Currently, if I issue

Re: this IndexWriter is closed

2018-04-09 Thread Shawn Heisey
On 4/9/2018 12:31 PM, Jay Potharaju wrote: > I am getting Indexwriter is closed error only on some of my shards in the > collection. This seems to be happening on leader shards only. There is are > other shards on the box and they are not throwing any error. Also there is > enough disc space on

RE: [EXT] Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Hanjan, Harinder
Oh this is great! Saves me a whole bunch of manual work. Thanks! -Original Message- From: Charlie Hull [mailto:char...@flax.co.uk] Sent: Monday, April 09, 2018 2:15 PM To: solr-user@lucene.apache.org Subject: [EXT] Re: How to use Tika (Solr Cell) to extract content from HTML document

Re: replication

2018-04-09 Thread Shawn Heisey
On 4/9/2018 12:15 PM, John Blythe wrote: > we're starting to dive into master/slave replication architecture. we'll > have 1 master w 4 slaves behind it. our app is NRT. if user performs an > action in section A's data they may choose to jump to section B which will > be dependent on having the

Recover a Solr Node

2018-04-09 Thread Karthik Ramachandran
We are using Solr cloud with 3 nodes, no replication with 8 shard per node per collection. We have multiple collection on that node. We have backup of data the data folder, so we can recover it, is there a way to reconstruct core.properties for all the replica's for that node? -- With Thanks &

Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Charlie Hull
As a bonus here's a Dropwizard Tika wrapper that gives you a Tika web service https://github.com/mattflax/dropwizard-tika-server written by a colleague of mine at Flax. Hope this is useful. Cheers Charlie On 9 April 2018 at 19:26, Hanjan, Harinder wrote: > Thank

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-09 Thread Christopher Schultz
All, On 4/9/18 2:58 PM, Christopher Schultz wrote: > All, > > After playing-around with a Solr 7.2.1 instance launched from the > extracted tarball, I decided to go ahead and create a "real service" on > my Debian-based server. > > I've run the 7.3.0 install script, configured Solr for TLS, and

Confusing error when creating a new core with TLS, service enabled

2018-04-09 Thread Christopher Schultz
All, After playing-around with a Solr 7.2.1 instance launched from the extracted tarball, I decided to go ahead and create a "real service" on my Debian-based server. I've run the 7.3.0 install script, configured Solr for TLS, and moved my existing configuration into the data directory, here: $

this IndexWriter is closed

2018-04-09 Thread Jay Potharaju
Hi, I am getting Indexwriter is closed error only on some of my shards in the collection. This seems to be happening on leader shards only. There is are other shards on the box and they are not throwing any error. Also there is enough disc space on the box available at this time. Solr: 5.3.0.

RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Hanjan, Harinder
Thank you Charlie, Tim. I will integrate Tika in my Java app and use SolrJ to send data to Solr. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, April 09, 2018 11:24 AM To: solr-user@lucene.apache.org Subject: [EXT] RE: How to use Tika (Solr Cell)

replication

2018-04-09 Thread John Blythe
hi, all. we're starting to dive into master/slave replication architecture. we'll have 1 master w 4 slaves behind it. our app is NRT. if user performs an action in section A's data they may choose to jump to section B which will be dependent on having the updates from their action in section A.

Backup a solr cloud collection - timeout in 180s?

2018-04-09 Thread Petersen, Robert (Contr)
Shouldn't this just create the backup file(s) asynchronously? Can the timeout be adjusted? Solr 7.2.1 with five nodes and the addrsearch collection is five shards x five replicas and "numFound":38837970 docs Thx Robi

How many SynonymGraphFilterFactory can I have?

2018-04-09 Thread Vincenzo D'Amore
Hi all, in an Solr 4.8 schema I have a fieldType with few SynonymFilter filters at index and few at query time. Moving this old schema to Solr 7.3.0 I see that if I use SynonymGraphFilter during indexing, I have to follow it with FlattenGraphFilter. I also know that I cannot have multiple

RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Allison, Timothy B.
+1 https://lucidworks.com/2012/02/14/indexing-with-solrj/ We should add a chatbot to the list that includes Charlie's advice and the link to Erick's blog post whenever Tika is used.  -Original Message- From: Charlie Hull [mailto:char...@flax.co.uk] Sent: Monday, April 9, 2018 12:44

Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Charlie Hull
I'd recommend you run Tika externally to Solr, which will allow you to catch this kind of problem and prevent it bringing down your Solr installation. Cheers Charlie On 9 April 2018 at 16:59, Hanjan, Harinder wrote: > Hello! > > Solr (i.e. Tika) throws a "zip bomb"

Uninverting stats on solr 5 and beyond

2018-04-09 Thread Matteo Grolla
Hi, on solr 4 the log contained informations about time spent and memory consumed uninverting a field. Where can I find this information on current version of solr? Thanks --excerpt from solr 4.10 log-- INFO - 2018-04-09 15:57:58.720; org.apache.solr.request.UnInvertedField; UnInverted

How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Hanjan, Harinder
Hello! Solr (i.e. Tika) throws a "zip bomb" exception with certain documents we have in our Sharepoint system. I have used the tika-app.jar directly to extract the document in question and it does _not_ throw an exception and extract the contents just fine. So it would seem Solr is doing

Query regarding LTR plugin in solr

2018-04-09 Thread Prateek Agarwal
Hi, I'm working on ltr feature in solr. I have a feature like : ''' { "store" : "my_feature_store", "name" : "in_aggregated_terms", "class" : "org.apache.solr.ltr.feature.SolrFeature", "params" : { "q" : "{!func}scale(query({!payload_score f=aggregated_terms func=max

SOLR with Sitecore SXA

2018-04-09 Thread Saul Nachman
Do I ask for a subscription here first and then mail the main thread? Regards Saul

Re: Default Index config

2018-04-09 Thread Shawn Heisey
On 4/9/2018 4:04 AM, mganeshs wrote: Regarding CPU high, when we are troubleshooting, we found that Merge threads are keep on running and it's take most CPU time ( as per Visual JVM ). With a one second autoSoftCommit, nearly constant indexing will produce a lot of very small index segments. 

RE: PreAnalyzed URP and SchemaRequest API

2018-04-09 Thread Markus Jelsma
Hello David, The remote client has everything on the class path but just calling setTokenStream is not going to work. Remotely, all i get from SchemaRequest API is a AnalyzerDefinition. I haven't found any Solr code that allows me to transform that directly into an analyzer. If i had that, it

Re: Solr join With must clause in fq

2018-04-09 Thread Mikhail Khludnev
it might make sense to test on the recent versions of Solr. On Sun, Apr 8, 2018 at 8:21 PM, manuj singh wrote: > Hi all, > I am trying to debug a problem which i am facing and need some help. > > I have a solr query which does join on 2 different cores. so lets say my >

Re: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

2018-04-09 Thread Alessandro Benedetti
Hi Sami, I agree with Mikhail, if you have relatively complex data you could curate your own knowledge base for products as use it for Named entity Recognition. You can then search a field compatible_with the extracted entity. If the scenario is simpler using the analysis chain you mentioned

Re: Default Index config

2018-04-09 Thread mganeshs
Hi Shawn, Regarding CPU high, when we are troubleshooting, we found that Merge threads are keep on running and it's take most CPU time ( as per Visual JVM ). GC is not causing any issue as we use the default GC and also tried with G1 as you suggested over here

Re: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

2018-04-09 Thread Adhyan Arizki
You can just use synonyms for that.. rather hackish but it works On Mon, 9 Apr 2018, 05:06 Sami al Subhi, wrote: > I think this filter will output the desired result: > > > > > > > > > > > > > indexing: > "iPhone 6" will be indexed as "iphone 6"

RE: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

2018-04-09 Thread msaunier
I up my subject. Thanks -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : jeudi 5 avril 2018 10:46 À : solr-user@lucene.apache.org Objet : RE: ZKPropertiesWriter error DIH (SolrCloud 6.6.1) I have use this process to create the DIH : 1. Create the BLOB

Re: Solr 7.3.0 loading OpenNLPExtractNamedEntitiesUpdateProcessorFactory

2018-04-09 Thread Ryan Yacyshyn
Hi Shawn, I'm pretty sure the paths to load the jars in analysis-extras is correct, the jars in /contrib/analysis-extras/lib load fine. I verified this by changing the name of solr.OpenNLPTokenizerFactory to solr.OpenNLPTokenizerFactory2 and saw the new error. Changing it back to