Re: Cascading failures with replicas

2017-03-18 Thread Walter Underwood
6.3.0. No idea how it is happening, but I got two replicas on the same host after one host went down. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 18, 2017, at 8:35 PM, Erick Erickson wrote: > > Hmmm, I'm totally

Re: How on EARTH do I remove 's in schema file?

2017-03-18 Thread Erick Erickson
OK, you're defining a . It has one or two sections, blah blah blah For the time being, these should pretty much be very, very similar if not identical. If you only have in the fieldType, then the same analysis chain is used both for indexing and querying. The admin UI analysis page can

Re: Cascading failures with replicas

2017-03-18 Thread Erick Erickson
Hmmm, I'm totally mystified about how Solr is "creating a new replica when one host is down". Are you saying this is happening automagically? You're right the autoAddReplica bit is HDFS so having replicas just show up is completely completely weird. In days past, when a replica was discovered on

Re: How on EARTH do I remove 's in schema file?

2017-03-18 Thread donato
Thank you so much, Erick! I will try that! I do have one other question though... what sections do I do all of this in? I see like four or five sections with different things in them. Do I use all of those in each section or just in some? What is each section? What do they do? Thanks again for

Re: How on EARTH do I remove 's in schema file?

2017-03-18 Thread Erick Erickson
First, uncheck the "verbose" checkbox. The nitty-gritty information isn't relevant at this point. Second, hover over each of the light-gray like "MCF", "PRCF" and such. You'll see the element of the analysis chain that stands for, and the difference between the line before and this line is the

Re: How on EARTH do I remove 's in schema file?

2017-03-18 Thread vishal jain
Try "stemEnglishPossessive" to remove. On Sat, Mar 18, 2017 at 4:00 AM, donato wrote: > I have been racking my brain for days... I need to remove 's from say > "patrick's" If I search for "patrick" or "patricks" I get the same number > of > results, however, if I search

Re: OCR not working occasionally

2017-03-18 Thread Zheng Lin Edwin Yeo
Hi Rick, Thanks for your reply. I saw this error message for the file which has a failure. Am I able to index such files together with the other files which store text as an image together in the same indexing threads? 2017-03-19 01:02:26.610 INFO (qtp1543727556-19) [c:collection1 s:shard1

Re: Cascading failures with replicas

2017-03-18 Thread Walter Underwood
Thanks. This is a very CPU-heavy workload, with ngram fields and very long queries. 16.7 million docs. The whole cascading failure thing in search engines is hard. The first time I hit this was at Infoseek, over twenty years ago. > On Mar 18, 2017, at 12:46 PM, Erick Erickson

Re: OCR not working occasionally

2017-03-18 Thread Rick Leir
Hi Edwin The pdf file format can store text as an image, and then you need OCR to get the text. However, text is more commonly not stored as an image in the pdf, and then you should not use OCR to get the text. Do you get an error message when you have a failure? Cheers -- Rick On March 18,

Re: How on EARTH do I remove 's in schema file?

2017-03-18 Thread donato
Erick, Here is the analysis: https://www.screencast.com/t/DKKklTXk Do you need everything on that page? I'm not sure what I am looking for here... Also, this is my current schema.xml file * DOWNLOAD HERE *. Not sure if I

RE: stemEnglishPossessive and contractions

2017-03-18 Thread donato
Hi Herman, I just noticed your post on possessives and I am having the same problem. With Sr. Patrick's Day coming up, people are searching our site for "patrick" and patrick's" yet they are yielding different results. If we search for "patrick" and patricks" they yield the same results. I want

Re: How on EARTH do I remove 's in schema file?

2017-03-18 Thread Erick Erickson
bq: I'm not too familiar with this technology yet. I tried adding that =query at the end of my URL, but nothing happened. You need to look at the raw response. There should be a section at the end of the response where debug information is appended. Please just paste the relevant bits of your

Re: Cascading failures with replicas

2017-03-18 Thread Erick Erickson
bug# 2, Solr shouldn't be adding replicas by itself unless you specified autoAddReplicas=true when you created the collection. It default to "false". So I'm not sure what's going on here. bug #3. The internal load balancers are round-robin, so this is expected. Not optimal I'll grant but

Re: Managed schema used with Cloudera MapreduceIndexerTool and morphlines?

2017-03-18 Thread Erick Erickson
Hey Jay! All I can say is "good luck with that". I do know Morphlines uses EmbeddedSolrServer to do its work. So I don't really see a good way to pluck just what you'd need for schemaless. The MapReduceIndexerTool is carried right along with Solr though. IIRC the Morphlines stuff is mostly the

OCR not working occasionally

2017-03-18 Thread Zheng Lin Edwin Yeo
Hi, I'm facing the issue of that the Tesseract OCR is not able to extract the words in a PDF file in an attachment in EMLfile and index it into Solr occasionally? However, most of the time it can be extracted. What could be the reason that causes the file in the email attachment to be failed to

Re: Group by range results

2017-03-18 Thread Zheng Lin Edwin Yeo
You can try using JSON Facet. It has Range Facet, which you can use it to group in by the date range. http://yonik.com/json-facet-api/#Range_Facet Regards, Edwin On 16 March 2017 at 21:32, Mikhail Ibraheem wrote: > Any help on this please? > > > > From: Mikhail

Re: fq performance

2017-03-18 Thread Damien Kamerman
You may want to consider a join, esp. if you're ever consider thousands of groups. e.g. fq={!join from=access_control_group to=doc_group}access_control_user_id:USERID On 18 March 2017 at 05:57, Yonik Seeley wrote: > On Fri, Mar 17, 2017 at 2:17 PM, Shawn Heisey