Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood
I wouldn’t worry about performance with that setup. I just checked on a production system with 13 million docs in four shards, so 3+ million per shard. I searched on the most common term in the title field and got a response in 31 milliseconds. This was probably not cached, because the

Re: Wild-card query behavior

2019-10-09 Thread Mikhail Khludnev
Hello, Paresh. Please examine debugQuery output, otherwise 'doesn't work' is vague. On Wed, Oct 9, 2019 at 8:31 AM Paresh wrote: > Hi All, > > I am trying wild-card query with query, filter query with and without !join > and finding it difficult to understand the SOLR behavior. > > (-)

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Mikhail Khludnev
Hello, Bram. I guess [child] was recently extended. Docs might be outdated, don't hesitate to contribute doc improvement. [subquery] is a neat thing, it's just queries without relying on particular use case, if my understanding is right one may request something like _path_ field in [subquery],

How to combine [child] and [subquery?]

2019-10-09 Thread Bram Biesbrouck
Hi all, I'm diving deep into the ChildDocTransformer and its related SubQueryAugmenter. First of all, I think there's a bug in the Solr docs about [child]. It states: "This transformer returns all descendant documents of each parent document matching your query in a flat list nested inside the

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread guptavaibhav35
Hi, Kindly help me solve the issue when I am connecting NEO4j with solr. I am facing this issue in my log file while I have the jar file of neo4j driver in the lib folder of my core. Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Alexandre Rafalovitch
Try referencing the jar directly (by absolute path) with a statement in the solrconfig.xml (and reloading the core). The DIH example shipped with Solr shows how it works. This will help to see if the problem with not finding the jar or something else. Regards, Alex. On Wed, 9 Oct 2019 at

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Mikhail Khludnev
I might not fully understand how you would like to combine them. The possible reason is that [subquery] expect regular Solr Response to act on, but [child] might yield something hairish. On Wed, Oct 9, 2019 at 2:40 PM Bram Biesbrouck < bram.biesbro...@reinvention.be> wrote: > Hi Mikhail, > >

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Erick Erickson
Try starting Solr with the “-v” option. That will echo all the jars that are loaded and the paths. Where _exactly_ is the jar file? You say “in the lib folder of my core”, but that leaves a lot of room for interpretation. Are you running stand-alone or SolrCloud? Exactly how do you start Solr?

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hey Alex, Thank you! Re: stopwords being a thing of the past due to the affordability of hardware...can you expand? I'm not sure I understand. -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/8/19, 1:01 PM, "David Hastings" wrote: Another thing to

Re: Wild-card query behavior

2019-10-09 Thread Paresh
E.g. In query, join with wild-card query using parenthesis I get error - "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.parser.ParseException"], "msg":"org.apache.solr.search.SyntaxError: Cannot parse 'solrField:(12*': Encountered \"\" at line

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Bram Biesbrouck
Hi Mikhail, You're right, I should file an issue for the doc thing, I'll look into it. Thanks for pointing me towards parsing the _nest_path_ field. It's exactly what ChildDocTransformer does, indeed. Would you by any chance know why [child] and [subquery] can't be combined? They don't look too

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood
Stopwords were used when we were running search engines on 16-bit computers with 50 Megabyte disks, like the PDP-11. They avoided storing and processing long posting lists. Think of removing stopwords as a binary weighting on frequent terms, either on or off (not in the index). With idf, we

Re: How to combine [child] and [subquery?]

2019-10-09 Thread Bram Biesbrouck
My use case is this: I'd like solr to return my indexed document including all nested children. On top of that, some extra information about the root doc is added dynamically (the subquery). But I understand this is an advanced use case and probably not requested frequently. I'll try to around

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Alexandre Rafalovitch
Stopwords (it was discussed on mailing list several times I recall): The ideas is that it used to be part of the tricks to make the index as small as possible to allow faster search. Stopwords being the most common words This days, disk space is not an issue most of the time and there have

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Erick Erickson
The theory behind stopwords is that they are “safe” to remove when calculating relevance, so we can squeeze every last bit of usefulness out of very constrained hardware (think 64K of memory. Yes kilobytes). We’ve come a long way since then and the necessity of removing stopwords from the

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
However, with all that said, stopwords CAN be useful in some situations. I combine stopwords with the shingle factory to create "interesting phrases" (not really) that i use in "my more like this" needs. for example, europe for vacation europe on vacation will create the shingle europe_vacation

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
another add on, as the previous two were pretty much spot on:

Highlighting Solr 8

2019-10-09 Thread Eric Allen
Use case I am querying a catchall field and then would like to highlight that term in 3 other fields say a, b, and c. I already have full term vectors. >From my reading and limited testing the fastest choice would be hl.method unified hl.termVectors true hl.termPositions true hl.termOffsets

Windows Production

2019-10-09 Thread Suleiman Hasan
Dear all, I hope this email finds you well. I was just wondering if there is a way in which I can make solr in production mode (as a service) on windows server, not just on *nix systems. I'm working on a project and I need solr in production mode on windows server. Regards Suleiman Hassan

Re: Windows Production

2019-10-09 Thread David Barnett
Hi Suleiman As the solr distribution is the same regardless of Linux / Windows yes it's OK for Windows, to answer your specific question about Windows service we personally use NSSM to wrap the solr.cmd command. You then specify your arguments as you would starting solr in Linux Example *start

Solr ZK Status Page fails when using SSL feature of ZooKeeper

2019-10-09 Thread Ryan Rockenbaugh
I was going to file a bug in JIRA for this, but it said to discuss first on the user mailing list: I upgraded to Solr 8.2.0 and Zookeeper 3.5.5.  I added all the System properties and the missing "netty-all-4.1.29.Final.jar" file from zookeeper and put it in the classpath for solr.  Encrypted

Re: Wild-card query behavior

2019-10-09 Thread Mikhail Khludnev
Well it remind regular awkward parsing issues. Try to experiment with ={!join to=...from=... v='field:12*'} or ={!join to=... from=... v=$qq}=field:12* No more questions to ask. On Wed, Oct 9, 2019 at 4:39 PM Paresh wrote: > E.g. In query, join with wild-card query using parenthesis I get error

Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood
We did something like that with Infoseek and Ultraseek. We had a set of “glue words” that made noun phrases and indexed patterns like “noun glue noun” as single tokens. I remember Doug Cutting saying that Nutch did something similar using pairs, but using that as a prefilter instead of as a

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Also, in terms of computational cost, it would seem that including most terms/not having a stop ilst would take a toll on the system. For instance, right now we have "ibm" as a stop word because it appears everywhere in our corpus. If we did not include it in the stop words file, we would have

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
oh and by 'non stop' i mean close enough for me :) On Wed, Oct 9, 2019 at 2:59 PM David Hastings wrote: > if you have anything close to a decent server you wont notice it all. im > at about 21 million documents, index varies between 450gb to 800gb > depending on merges, and about 60k searches

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
if you have anything close to a decent server you wont notice it all. im at about 21 million documents, index varies between 450gb to 800gb depending on merges, and about 60k searches a day and stay sub second non stop, and this is on a single core/non cloud environment On Wed, Oct 9, 2019 at

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Wow, thank you so much, everyone. This is all incredibly helpful insight. So, would it be fair to say that the majority of you all do NOT use stop words? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/9/19, 11:14 AM, "David Hastings" wrote: However,

Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
Yeah, I dont use it as a search, only well, finding more documents like that one :) . for my purposes i tested between 2 to 5 part shingles and ended up that the 2 part was actually giving me better results, for my use case, than using any more. I dont suppose you could point me to any of the

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
only in my more like this tools, but they have a very specific purpose, otherwise no On Wed, Oct 9, 2019 at 2:31 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Wow, thank you so much, everyone. This is all incredibly helpful insight. > > So, would it be fair to say that the majority

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
True...I guess another rub here is that we're using the edismax parser, so all of our queries are inherently OR queries. So for a query like 'the ibm way', the search engine would have to: 1) retrieve a document list for: --> "ibm" (this list is probably 80% of the documents) --> "the"

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
yup. youre going to find solr is WAY more efficient than you think when it comes to complex queries. On Wed, Oct 9, 2019 at 3:17 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > True...I guess another rub here is that we're using the edismax parser, so > all of our queries are