Re: SOLR upgrade

2021-02-12 Thread David Hastings
i generally will only upgrade every other release. since i started with 1.4, went to 3->5->7.X, and never EVER a .0 or an even .X release, On Fri, Feb 12, 2021 at 12:01 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Just avoid 8.8.0 for the moment, until 8.8.1 is released. 8.7.x

Re: Frequent Index Replication Failure in solr.

2020-11-13 Thread David Hastings
looks like youre repeater is grabbing a file that the master merged into a different file, why not lower how often you go from master->repeater, and/or dont commit so often so you can make the index faster On Fri, Nov 13, 2020 at 12:13 PM Parshant Kumar wrote: > All,please help on this > > On

Re: converting string to solr.TextField

2020-10-16 Thread David Hastings
l the docs into an > existing index, things like changing from stored=true to > stored=false, adding new fields, deleting fields (although the > meta-data for the field is still kept around) etc. > > > On Oct 16, 2020, at 3:57 PM, David Hastings < > hastings.recurs...@gmail

Re: converting string to solr.TextField

2020-10-16 Thread David Hastings
and we > need to be free to make important improvements with time." > > And all that aside, you have to re-index all the docs anyway or > your search results will be inconsistent. So leaving aside the > impossible task of covering all the possibilities on the fly, it’s > b

Re: converting string to solr.TextField

2020-10-16 Thread David Hastings
"If you want to keep the same field name, you need to delete all of the documents in the index, change the schema, and reindex." actually doesnt re-indexing a document just delete/replace anyways assuming the same id? On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch wrote: > Just as a

Re: Solr endpoint on the public internet

2020-10-08 Thread David Hastings
dler. And block Config API to avoid attackers creating new > handlers. > > Regards, > Alex. > >> On Thu, 8 Oct 2020 at 14:54, David Hastings wrote: >> >> Well that’s why I suggested deleting the update handler :) >> >>>> On Oct 8, 2020

Re: Solr endpoint on the public internet

2020-10-08 Thread David Hastings
Well that’s why I suggested deleting the update handler :) > On Oct 8, 2020, at 2:52 PM, Walter Underwood wrote: > > Let me know where it is and I’ll delete all the documents in your collection. > It is easy, just one HTTP request. > >

Re: Master/Slave

2020-09-30 Thread David Hastings
>whether we should expect Master/Slave replication also to be deprecated it better not ever be depreciated. it has been the most reliable mechanism for its purpose, solr cloud isnt going to replace standalone, if it does, thats when I guess I stop upgrading or move to elastic On Wed, Sep 30,

Re: SOLR indexing takes longer time

2020-08-18 Thread David Hastings
Another thing to mention is to make sure the indexer you build doesnt send commits until its actually done. Made that mistake with some early in house indexers. On Tue, Aug 18, 2020 at 9:38 AM Charlie Hull wrote: > 1. You could write some code to pull the items out of Mongo and dump > them to

Number of times in document

2020-08-12 Thread David Hastings
Is there any way to do a query for the minimum number of times a phrase or string exists in a document? This has been a request from some users as other search services (names not to be mentioned) have such a functionality. Ive been using solr since 1.4 and i think ive tried finding this ability

Re: Multiple "df" fields

2020-08-11 Thread David Hastings
why not use a copyfield for indexing? On Tue, Aug 11, 2020 at 9:59 AM Edward Turner wrote: > Hi all, > > Is it possible to have multiple "df" fields? (We think the answer is no > because our experiments did not work when adding multiple "df" values to > solrconfig.xml -- but we just wanted to

Re: solr query returns items with spaces removed

2020-07-29 Thread David Hastings
"Oh, and returning 100K docs is an anti-pattern, if you really need that many docs consider cursorMark and/or Streaming." er, i routinely ask for 2+ million records into a single file based on a query. I mean not into a web application or anything, its meant to be processed after the fact, but

Re: Meow attacks

2020-07-28 Thread David Hastings
so, your zookeeper/solr servers have public facing addresses/ports? On Tue, Jul 28, 2020 at 4:41 PM Odysci wrote: > Folks, > > I suspect one of our Zookeeper installations on AWS was subject to a Meow > attack ( > >

Re: sorting help

2020-07-15 Thread David Hastings
ercaseFilter in front of your patternreplace, > you’re removing uppercase characters. > > Best, > Erick > > > On Jul 15, 2020, at 3:06 PM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > howdy, > > i have a field that sorts fine all ot

sorting help

2020-07-15 Thread David Hastings
howdy, i have a field that sorts fine all other content, and i cant seem to debug why it wont sort for me on this one chunk of it. "sort":"alphatitle asc", "debugQuery":"on", "_":"1594733127740"}}, "response ":{"numFound":3,"start":0,"docs":[ { "title":"Money orders", { "title":"Finance,

Re: How to determine why solr stops running?

2020-06-29 Thread David Hastings
wing money/ram/ssd at the problem is just the best > > > answer. > > > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > > > >> Thanks everyone. Just to give an update on this issue, I bumped the > RAM > > >> available to Solr

Re: How to determine why solr stops running?

2020-06-29 Thread David Hastings
lem since. > > > On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > me personally, around 290gb. as much as we could shove into them > > > > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson > > > wrote: > &g

Re: Solr 7.6 optimize index size increase

2020-06-16 Thread David Hastings
I cant give you a 100% true answer but ive experienced this, and what "seemed" to happen to me was that the optimize would start, and that will drive the size up by 3 fold, and if you out of disk space in the process the optimize will quit since, it cant optimize, and leave the live index pieces

Re: How to determine why solr stops running?

2020-06-16 Thread David Hastings
. the sum of the heap allocations across all your JVMs should be below > that percentage. See Uwe Schindler's mmapdirectiry blog... > > Shot in the dark... > > On Tue, Jun 16, 2020, 11:51 David Hastings > wrote: > > > To add to this, i generally have solr start

Re: How to determine why solr stops running?

2020-06-16 Thread David Hastings
To add to this, i generally have solr start with this: -Xms31000m-Xmx31000m and the only other thing that runs on them are maria db gallera cluster nodes that are not in use (aside from replication) the 31gb is not an accident either, you dont want 32gb. On Tue, Jun 16, 2020 at 11:26 AM Shawn

Re: Getting rid of zookeeper

2020-06-09 Thread David Hastings
Zookeeper is annoying to both set up and manage, but then again the same thing can be said about solr cloud. not certain why you would want to deal with either On Tue, Jun 9, 2020 at 3:29 PM S G wrote: > Hello, > > I recently stumbled across KIP-500: Replace ZooKeeper with a Self-Managed >

Re: Script to check if solr is running

2020-06-08 Thread David Hastings
> > Why have a cold backup and then switch? > my current set up is: 1. master indexer 2. master slave on a release/commit basis 3. 3 live slave searching nodes in two data different centers the three live nodes are in front of nginx load balancing and they are mostly hot but not all of them, i

Re: What is the logical order of applying sorts in SOLR?

2020-05-16 Thread David Hastings
the bq parameter, heres a SO thread for it: https://stackoverflow.com/questions/45150856/how-to-know-when-to-use-solr-bq-vs-bf-and-how-to-apply-query-logic On Sat, May 16, 2020 at 6:27 PM Stephen Lewis Bianamara < stephen.bianam...@gmail.com> wrote: > Hi Paras, > > I'm not sure I follow. How

Re: Stopwords impact on search

2020-04-24 Thread David Hastings
you should never use the stopword filter unless you have a very specific purpose On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: > Hi everyone, > > What is, if any, the impact of stopwords in to my search ranking quality? > Will my ranking improve is I do not index stopwords? > > I'm trying

Re: Solr index size has increased in solr 7.7.2

2020-04-15 Thread David Hastings
i wouldnt worry about the index size until you get above a half terabyte or so. adding doc values and other features means you sacrifice things that dont matter, like size. memory and ssd's are cheap. On Wed, Apr 15, 2020 at 1:21 PM Rajdeep Sahoo wrote: > Hi all > We are migrating from solr

Re: How do *you* restrict access to Solr?

2020-03-16 Thread David Hastings
master slave is the idea that you have an indexing server you do all indexing to and a search server that replicates the index, to deliver the results etc. if you keep the indexer separate you can tune it differently as well as protect the data. also means you can remove the delete/update

Re: How do *you* restrict access to Solr?

2020-03-16 Thread David Hastings
Honestly? I know this isnt what youre going to want to hear, but security through obscurity. no one else knows what port the servers on, and its not accessible from anything outside of the internal network. if your solr install can be accessed from an external IP you have much larger issues.

Re: [SUSPICIOUS] Re: Best Practises around relevance tuning per query

2020-02-18 Thread David Hastings
I don’t think anyone is responding because it’s too focused of a use case, where you just simply have to figure out an alternative on your own. > On Feb 19, 2020, at 12:28 AM, Ashwin Ramesh wrote: > > ping on this :) > >> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh wrote: >> >> Hi, >>

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
ful stuff. > > Luckily for you, the patent on that has expired. :-) > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Feb 17, 2020, at 10:46 AM, David Hastings < > hastings.recurs...@gmail.com> wrote

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
i use stop words for building shingles into "interesting phrases" for my machine teacher/students, so i wouldnt say theres no reason, however my use case is very specific. Otherwise yeah, theyre gone for all practical reasons/search scenarios. On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood

Re: How to compute index size

2020-02-03 Thread David Hastings
Yup, I find the right calculation to be as much ram as the server can take, and as much SSD space as it will hold, when you run out, buy another server and repeat. machines/ram/SSD's are cheap. just get as much as you can. On Mon, Feb 3, 2020 at 11:59 AM Walter Underwood wrote: > What he

Re: Easiest way to export the entire index

2020-01-29 Thread David Hastings
i do this often and just create a 30gb file using wget, On Wed, Jan 29, 2020 at 10:21 AM Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Amanda, > I assume that you have all the fields stored so you will be able to export > full document. > > Several thousands records should not be

Re: How to negate numeric range query - or - how to get records NOT matching a certain numeric range

2020-01-24 Thread David Hastings
just tried "fq":"NOT year:[1900 TO 2000]"}}, on my data et and also worked as expected, mind if i ask why: (u_lastLendingDate_combined_ls_ns:([8610134693 TO 8611935823])) there are ()'s around your range query? On Fri, Jan 24, 2020 at 11:01 AM David Hastings < hasti

Re: How to negate numeric range query - or - how to get records NOT matching a certain numeric range

2020-01-24 Thread David Hastings
having fq=NOT field:value works for me, On Fri, Jan 24, 2020 at 10:56 AM Sebastian Riemer wrote: > Hi all! > > > > Consider a query containing fq-params like this: > > > > "*fq*":["tenant_id:1", > > "u_markedAsDeleted_b:false", > > "u_id_s:[* TO *]", > >

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread David Hastings
:54 AM, Audrey Lorberfeld - > > audrey.lorberf...@ibm.com wrote: > > > > > > David, > > > > > > Thank you, that is useful. So, would you recommend using a (clean) > > field over an external dictionary file? We have lots of "top qu

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread David Hastings
a (clean) field > over an external dictionary file? We have lots of "top queries" and measure > their nDCG. A thought was to programmatically generate an external file > where the weight per query term (or phrase) == its nDCG. Bad idea? > > Best, > Audrey > > On 1

Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread David Hastings
Ive used this quite a bit, my biggest piece of advice is to choose a field that you know is clean, with well defined terms/words, you dont want an autocomplete that has a massive dictionary, also it will make the start/reload times pretty slow On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -

Re: Failed to connect to server

2020-01-17 Thread David Hastings
something like this in your solr config: autosuggest false text 0.005 DocumentDictionaryFactory title weight true true On Fri, Jan 17, 2020 at 12:02 PM rhys J wrote: > On Thu, Jan 16, 2020 at 3:48 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > &

Re: Failed to connect to server

2020-01-16 Thread David Hastings
> 'Error: Solr core is loading' do you have any suggesters or anything configured that would get rebuilt? On Thu, Jan 16, 2020 at 3:41 PM rhys J wrote: > On Thu, Jan 16, 2020 at 3:27 PM Edward Ribeiro > wrote: > > > A regular update is a delete followed by an indexing of the document. So >

Re: SolrCloud upgrade concern

2020-01-16 Thread David Hastings
ha, im on that thread, didnt know they got stored on a site, thats good to know! -i stand by what i said in there. so i have nothing more to add On Thu, Jan 16, 2020 at 3:29 PM Arnold Bronley wrote: > Hi, > > I am trying to upgrade my system from Solr master-slave architecture to > SolrCloud

Re: does copyFields increase indexe size ?

2019-12-26 Thread David Hastings
The field is stored somewhere > On Dec 26, 2019, at 3:22 PM, Nicolas Paris wrote: > > Hi Eric > > Below a part of the managed-schema. There is 1k section* fields. The > second experience, I removed the copyField, droped the collection and > re-indexed the whole. To mesure the index size, I

Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-25 Thread David Hastings
Exactly. Although I’m a bit curious why your going a .1 version up, I always wait until an x2, so I won’t be upgrading until 9.3 > On Dec 25, 2019, at 9:45 AM, Erick Erickson wrote: > > Should work. At any rate, just try it. Since all you’re doing is copying > data, even if the new

Re: xms/xmx choices

2019-12-09 Thread David Hastings
0/1000 secs : 208, 0.47% Raw SOLR over 1000/1000 secs : 5261, 11.97% post solr changes: 28369 searches Complete SOLR average : 4.77 / 10th seconds for SOLR Raw SOLR over 1/1000 secs : 94, 0.33% Raw SOLR over 1000/1000 secs : 3583, 12.63% On Fri, Dec 6, 2019 at 9:39 AM David Hastings

Re: Search returning unexpected matches at the top

2019-12-06 Thread David Hastings
whats the field type for: clt_ref_no *_no isnt a default dynamic character, and owl-2924-8 usually translates into owl 2924 8 David J. Hastings | Lead Developer dhasti...@wshein.com | 716.882.2600 x 176 William S. Hein & Co., Inc. 2350 North Forest Road | Getzville, NY 14068

Re: xms/xmx choices

2019-12-06 Thread David Hastings
t; > > We added about 2.3m docs, then I replicated it to the production master > and since there was a change it replicated out to the slave node the gc > came from > > > > I’ll set one of the slaves to 31/31 and force all load to that one and > see how she does. Thanks!

Re: xms/xmx choices

2019-12-05 Thread David Hastings
and if this may be of use: https://imgur.com/a/qXBuSxG just been more or less winging the options since solr 1.3 On Thu, Dec 5, 2019 at 2:41 PM Shawn Heisey wrote: > On 12/5/2019 11:58 AM, David Hastings wrote: > > as of now we do an xms of 8gb and xmx of 60gb, generall

Re: xms/xmx choices

2019-12-05 Thread David Hastings
That probably isnt enough data, so if youre interested: https://gofile.io/?c=rZQ2y4 On Thu, Dec 5, 2019 at 2:52 PM David Hastings wrote: > I know theres no hard answer, and I know the Xms and Xmx should be the > same, but it was a set it and forget it sort of thing from years a

Re: xms/xmx choices

2019-12-05 Thread David Hastings
youd like I would be happy to provide, this is interesting. On Thu, Dec 5, 2019 at 2:41 PM Shawn Heisey wrote: > On 12/5/2019 11:58 AM, David Hastings wrote: > > as of now we do an xms of 8gb and xmx of 60gb, generally through the > > dashboard the JVM hangs around 16gb. I k

Re: From solr to solr cloud

2019-12-05 Thread David Hastings
are you noticing performance decreases in stand alone solr as of now? On Thu, Dec 5, 2019 at 2:29 PM Vignan Malyala wrote: > Hi > I currently have 500 collections in my stand alone solr. Bcoz of day by day > increase in Data, I want to convert it into solr cloud. > Can you suggest me how to do

xms/xmx choices

2019-12-05 Thread David Hastings
Hey all, over time ive adjusted and changed the solr Xms/Xmx various times with not too much thought aside from more is better, but ive noticed in many of the emails the recommended values are much lower than the numbers ive historically put in. i never really bothered to change them as the

Re: Exact match

2019-12-02 Thread David Hastings
if the query is in quotes it will work. also, not sure if youve been following, but get rid of: StopFilterFactory and all stopwords, or just make your stop word file empty if you need it to work in non quotes, add them to the query post submission ? On Mon, Dec 2, 2019 at 3:44 PM OTH wrote: >

Re: A Last Message to the Solr Users

2019-11-27 Thread David Hastings
Personally I found nothing in solr cloud worth changing from standalone for, and just added more complications, more servers, and required becoming an expert/knowledgeable in zoo keeper, id rather spend my time developing than becoming a systems administrator On Wed, Nov 27, 2019 at 3:45 AM Mark

Re: Using an & in an indexed field and then querying for it.

2019-11-25 Thread David Hastings
wrote: > On Mon, Nov 25, 2019 at 2:36 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > its breaking on the & because its in the url and you are most likely > > sending a get request to solr. you should send it as post or as %26 > > &

Re: Using an & in an indexed field and then querying for it.

2019-11-25 Thread David Hastings
its breaking on the & because its in the url and you are most likely sending a get request to solr. you should send it as post or as %26 On Mon, Nov 25, 2019 at 2:32 PM rhys J wrote: > I have some fields that have text like so: > > Reliable Van & Storage. > > They indexed fine when I used curl

Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread David Hastings
you missed the part about adding = to the query: =all=mega returns for me: "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "core":"mega", "df":"text", "q.op":"AND", "rows":"10", "echoParams":"all"}}, also we are a perl shop as

Re: How to tell which core was used based on Json or XML response from Solr

2019-11-22 Thread David Hastings
at 1:43 PM rhys J wrote: > On Fri, Nov 22, 2019 at 1:39 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > 2 things (maybe 3): > > 1. dont have this code facing a client thats not you, otherwise anyone > > could view the source and see w

Re: How to tell which core was used based on Json or XML response from Solr

2019-11-22 Thread David Hastings
2 things (maybe 3): 1. dont have this code facing a client thats not you, otherwise anyone could view the source and see where the solr server is, which means they can destroy your index or anything they want. put at the very least a simple api/front end in between the javascript page for the

Re: Highlighting on typing in search box

2019-11-21 Thread David Hastings
you can modify the result in this SO question to fit your needs: https://stackoverflow.com/questions/16742610/retrieve-results-from-solr-using-jquery-calls On Thu, Nov 21, 2019 at 10:42 AM rhys J wrote: > Are there any recommended APIs or code examples of using Solr and then > highlighting

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
a hack that mostly works. > > Infoseek had phrase IDF and it was a killer algorithm for relevance. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 8, 2019, at 11:08 AM, David Hastings < > hastings.

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
;> > >>>>>>>>>> Second, I have no idea what this will do. Are the equal signs > typos? > >>>>>>>> Used by custom code? > >>>>>>>>>> > >>>>>>>>>>>> > >>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
> >>>>>>>> > >>>>>>>> Third, the easiest way to see what’s happening under the covers > is to > >>>>>> add “=true” to the query and look at the parsed query. Ignore > all the > >>>>>> relevance calculations for the nonce, or specify “=qu

Re: Good Open Source Front End for Solr

2019-11-07 Thread David Hastings
well thats pretty slick On Thu, Nov 7, 2019 at 1:59 PM Erik Hatcher wrote: > Blacklight: http://projectblacklight.org/ > > ;) > > > > > On Nov 6, 2019, at 11:16 PM, Java Developer > wrote: > > > > Hi, > > > > What is the best open source front-end for Solr >

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread David Hastings
;>> > >>>>>> The solr.StopFilter removes all tokens that are stopwords. Those > words > >>> will > >>>>>>> not be in the index, so they can never match a query. > >>>>>> > >>>>>> > >>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread David Hastings
s here ? >positionIncrementGap="100" omitNorms="false" > > > > > > > words="stopwords.txt"/> > > > > > On 5 Nov 2019, at 14:15, David Hasti

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread David Hastings
The first thing you should do is remove any reference to stop words and never use them, then re-index your data and try it again. On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri wrote: > Hi, > > I am performing a search to match a name (text_field), however this term > contains 'and' and 'a'

Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread David Hastings
e, id won't be same. > Suppose, I have a doc with id : 20 > Now, it's newer version would be either 20.1 or 22 > What in this case? > -Original Message- > From: David Hastings [mailto:hastings.recurs...@gmail.com] > Sent: 04 November 2019 20:04 > To: solr-user@lucene.apache.

Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread David Hastings
when you add a new document using the same "id" value as another it just over writes it On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) < kushal.kh...@mind-infotech.com> wrote: > Could you please let me know how to achieve that ? > > > -Original Message- > From: Jörn Franke

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush limbaugh' would come back with results where he is an entity higher than if it was two words in a sentence On Fri, Oct 25, 2019 at 12:12 PM David Hastings

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
M > audrey.lorberf...@ibm.com > > > On 10/25/19, 12:06 PM, "David Hastings" > wrote: > > I use them for query boosting, so if someone searches for: > > i dont want to rush limbaugh out the door > vs > i talked to rush limbaugh through t

Re: POS Tagger

2019-10-25 Thread David Hastings
nch. The processing time is > mitigated by the spark-corenlp package which distribute the process over > multiple node. > > Also I am interesting in the way you use POS information within solr > queries, or solr fields. > > Thanks, > On Fri, Oct 25, 2019 at 10:42:43AM -0400, D

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
h > IBM > audrey.lorberf...@ibm.com > > > On 10/25/19, 10:30 AM, "David Hastings" > wrote: > > Do you mean for entity extraction? > I make a LOT of use from the stanford nlp project, and get out the > entities > and use them for different pu

Re: POS Tagger

2019-10-25 Thread David Hastings
https://nlp.stanford.edu/ On Fri, Oct 25, 2019 at 10:29 AM David Hastings < hastings.recurs...@gmail.com> wrote: > Do you mean for entity extraction? > I make a LOT of use from the stanford nlp project, and get out the > entities and use them for different purposes in solr >

Re: POS Tagger

2019-10-25 Thread David Hastings
Do you mean for entity extraction? I make a LOT of use from the stanford nlp project, and get out the entities and use them for different purposes in solr -Dave On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > Does anyone use a POS tagger with

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
uments, multi-language, > and we get ~80k-100k queries/day) > > Are you using edismax? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...@ibm.com > > > On 10/9/19, 3:11 PM, "David Hastings" > wrote: > > if you

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
ajority of you all do NOT use > stop words? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...@ibm.com > > > On 10/9/19, 11:14 AM, "David Hastings" > wrote: > > However, with all that sa

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
majority of you all do NOT use stop > words? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...@ibm.com > > > On 10/9/19, 11:14 AM, "David Hastings" > wrote: > > However, with all that said, stopwords CAN

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
oh and by 'non stop' i mean close enough for me :) On Wed, Oct 9, 2019 at 2:59 PM David Hastings wrote: > if you have anything close to a decent server you wont notice it all. im > at about 21 million documents, index varies between 450gb to 800gb > depending on merges, and about 60k

Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
Google in relevance tests, probably because of phrase IDF. > > More Like This could do the same thing, but it seems to be really slow and > not especially useful as a search component. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blo

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
.@ibm.com wrote: > > > > Hey Alex, > > > > Thank you! > > > > Re: stopwords being a thing of the past due to the affordability of > hardware...can you expand? I'm not sure I understand. > > > > -- > > Audrey Lorberfeld > > Data Scientist,

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
Lorberfeld > > Data Scientist, w3 Search > > IBM > > audrey.lorberf...@ibm.com > > > > > > On 10/8/19, 1:01 PM, "David Hastings" > wrote: > > > > Another thing to add to the above, > > > > > > IT:ibm. I

Re: Protecting Tokens from Any Analysis

2019-10-08 Thread David Hastings
Another thing to add to the above, > > IT:ibm. In this case, we would want to maintain the colon and the > capitalization (otherwise “it” would be taken out as a stopword). > stopwords are a thing of the past at this point. there is no benefit to using them now with hardware being so cheap. On

Re: SolR: How to sort (or boost) by Availability dates

2019-09-24 Thread David Hastings
It sounds like you want to do a normal search but only show available items. You could simply just add a fq parameter with dynamic values based on the current date fq=avaiable_from:[$todays_date TO *] AND available_to[* TO $todays_date] On Tue, Sep 24, 2019 at 9:41 AM Audrey Lorberfeld -

Re: Moving to solrcloud from single instance

2019-08-12 Thread David Hastings
I actually never had a problem with the index being larger than the memory for a standalone instance, but the entire index is on an SSD at least one my end On Mon, Aug 12, 2019 at 3:43 PM Erie Data Systems wrote: > I am starting the planning stages of moving from a single instance of solr > 8

Re: more like this query parser with faceting

2019-08-12 Thread David Hastings
ry parser? > > Is there a way to achieve the same with mlt as a request handler? > Roland > > David Hastings ezt írta (időpont: 2019. > aug. > 12., H, 20:44): > > > The easiest way will be to pass in a filter query (fq) > > > > On Mon, Aug 12, 2019 at 2:40 PM Szűc

Re: more like this query parser with faceting

2019-08-12 Thread David Hastings
The easiest way will be to pass in a filter query (fq) On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland wrote: > Hi All, > > Is there any tutorial or example how to use more like this functionality > when we have some other constraints set by the user through faceting > parameters like price range,

Re: Ranking

2019-07-27 Thread David Hastings
I can’t imagine this is actually true unless you have a default copy field and I is in one of them. Also the letter “I” is a bizarre test case > On Jul 27, 2019, at 3:40 PM, Steven White wrote: > > Hi everyone, > > I have 2 files like so: > > FA has the letter "i" only 2 times, and the file

Re: Getting list of unique values in a field

2019-07-12 Thread David Hastings
nks David. But is there a SolrJ sample code on how to do this? I need > to see one, or at least the API, so I know how to make the call. > > Steven > > On Fri, Jul 12, 2019 at 9:42 AM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > just u

Re: Getting list of unique values in a field

2019-07-12 Thread David Hastings
just use a facet on the field should work yes? On Fri, Jul 12, 2019 at 9:39 AM Steven White wrote: > Hi everyone, > > One of my indexed field is as follows: > > multiValued="false" indexed="true" required="true" stored="false"/> > > It holds the file extension of the files I'm indexing.

Re: Large Filter Query

2019-06-26 Thread David Hastings
yeah there is a performance hit but that is expected. in my scenario i pass sometimes a few thousand using this method, but i pre-process my results since its a set. you will not have any issues if you are using POST with the uri length. On Wed, Jun 26, 2019 at 3:02 PM Lucky Sharma wrote: >

Re: Large Filter Query

2019-06-26 Thread David Hastings
you can use the !terms operator and send them separated by a comma: {!terms f=id}id1,id2,..id1499,id1500 and run facets normally On Wed, Jun 26, 2019 at 2:31 PM Lucky Sharma wrote: > Hi all, > > What we are doing is, we will be having a set of unique Ids of solr > document at max 1500,

Re: Re: Query takes a long time Solr 6.1.0

2019-06-07 Thread David Hastings
There isnt anything wrong aside from your query is poorly thought out. On Fri, Jun 7, 2019 at 11:04 AM vishal patel wrote: > Any one is looking my issue?? > > Get Outlook for Android > > > From: vishal patel > Sent: Thursday, June 6, 2019

Re: strange behavior

2019-06-06 Thread David Hastings
audit_author.name:Burley,%20S.K. translates to audit_author.name:Burley, DEFAULT_OPERATOR DEFAULT_FIELD:S.K. On Thu, Jun 6, 2019 at 2:46 PM Wendy2 wrote: > > Hi, > > Why "AND" didn't work anymore? > > I use Solr 7.3.1 and edismax parser. > Could someone explain to me why the following query

Re: Empty rows from /export?

2019-05-31 Thread David Hastings
> Ah. So docValues are managed by Solr outside of Lucene. Interesting. i was under the impression docValues are in lucene, and he is just saying that an optimize is not a re-index, its just taking the actual files that already exist in your index and arranging them and removing deletions, an

Re: Streaming Expression: get the value of the array at the specified position

2019-05-10 Thread David Hastings
no. On Fri, May 10, 2019 at 11:09 AM Nazerke S wrote: > Hi, > > I am interested in getting the value of the array at the given index. For > example, > > let(echo="b", a=array(1,2,3,4,5), b=getAt(a, 2)) should return 3. > > Is there a way to get access an array's element by indexing? > >

Re: Solr query takes a too much time in Solr 6.1.0

2019-05-10 Thread David Hastings
first inclination is your index is cold. On Fri, May 10, 2019 at 9:32 AM vishal patel wrote: > We have 2 shards and 2 replicas in Live environment. we have multiple > collections. > Some times some query takes much time(QTime=52552). There are so many > documents indexing and searching within

Re: Search using filter query on multivalued fields

2019-05-03 Thread David Hastings
another option is to index dynamically, so you would index in this case, or this is what i would do: INGREDIENT_SALT_i:40 INGREDIENT_EGG_i:20 etc and query INGREDIENT_SALT_i:[20 TO *] or an arbitrary max value, since these are percentages INGREDIENT_SALT_i:[20 TO 100] On Fri, May 3, 2019 at

Re: Compound Primary Keys

2019-04-24 Thread David Hastings
another thing to consider doing is just merge the two fields into the id value: "id": "USER_RECORD_12334", since its a string. On Wed, Apr 24, 2019 at 2:35 PM Gus Heck wrote: > Hi Vivek > > Solr is not a database, nor should one try to use it as such. You'll need > to adjust your thinking

Re: Which fieldType to use for JSON Array in Solr 6.5.0?

2019-04-09 Thread David Hastings
Exactly, Solr is a search index, not a data store. you need to flatten your relationships. Right tool for the job etc. On Tue, Apr 9, 2019 at 4:28 PM Shawn Heisey wrote: > On 4/9/2019 2:04 PM, Abhijit Pawar wrote: > > Hello Guys, > > > > I am trying to index a JSON array in one of my

Re: Boolean Searches?

2019-03-14 Thread David Hastings
oh, thought it was implied with this: " and also use the edismax query parser" On Thu, Mar 14, 2019 at 11:38 AM Andy C wrote: > Dave, > > You don't mention what query parser you are using, but with the default > query parser you can field qualify all the terms entered in a text box by >

Re: Boolean Searches?

2019-03-14 Thread David Hastings
If you make your default operator "OR", or the q.op, and also use the edismax query parser you can use the qf field to boost the title heavily compared to the default field you are using, for example i use something like this, which may be over kill: title^100 description^50 topic^30 text i also

  1   2   3   >