Re: Schema API specifying different analysers for query and index

2021-03-02 Thread Alexandre Rafalovitch
RefGuide gives this for Adding, I would hope the Replace would be similar: curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field-type":{ "name":"myNewTextField", "class":"solr.TextField", "indexAnalyzer":{ "tokenizer":{

Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Alexandre Rafalovitch
I admit to not fully understanding the examples, but ComplexQueryParser looks like something worth at least reviewing: https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser Also I did not see any references to trying to copyField and process same content in

Re: HTML sample.html not indexing in Solr 8.8

2021-02-20 Thread Alexandre Rafalovitch
Most likely issue is that your core configuration (solrconfig.xml) does not have the request handler for that. The same config may have had that in 7.x, but changed since. More details: https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html Regards,

Re: Solr 8.0 query length limit

2021-02-18 Thread Alexandre Rafalovitch
Also, investigate if you have repeating conditions and push those into defaults in custom request handler endpoints (in solrconfig.xml). Also, Solr supports parameter substitutions, if you have repeated subconditions. Regards, Alex On Thu., Feb. 18, 2021, 7:08 a.m. Thomas Corthals, wrote:

Re: How to get case-sensitive Terms?

2021-02-18 Thread Alexandre Rafalovitch
. It is better to start new question threads for new questions. More people will pay attention. On Thu., Feb. 18, 2021, 1:31 a.m. elivis, wrote: > Alexandre Rafalovitch wrote > > What about copyField with the target being index only (docValue only?) > and > > no lowercase on the

Re: Meaning of "Index" flag under properties and schema

2021-02-17 Thread Alexandre Rafalovitch
I wonder if looking more directly at the indexes would allow you to get closer to the problem source. Have you tried comparing/exploring the indexes with Luke? It is in the Lucene distribution (not Solr), and there is a small explanation here:

Re: Why Solr questions on stackoverflow get very few views and answers, if at all?

2021-02-12 Thread Alexandre Rafalovitch
I answered quite a bunch a whole ago, as part of book writing process. I think a lot of them were missing core information like version of Solr. So they were not very timeless. The list allows a conversation and multiple perspectives, which is better than a one shot answer. Regards, Alex On

Re: Extract a list of the most recent field values?

2021-02-05 Thread Alexandre Rafalovitch
s query, as well as all the examples, are in json query format, in > a request body. The actual query will be sent using a custom API that only > accepts a regular URL query, with parameters. Any idea how I can rewrite the > json query above into a URL query? > > Also, it would be

Re: Extract a list of the most recent field values?

2021-02-05 Thread Alexandre Rafalovitch
This feels like basic faceting on category, but you are trying to make a latest record, rather than count as a sorting/grouping principle. How about using JSON Facets? https://lucene.apache.org/solr/guide/8_8/json-facet-api.html I would do the first level as range facet and do your dates at

Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
ttps://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml > > everything worked fine again. > > > What can I do to help updating the docs? > > > Best regards, > > Leon >

Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync. Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika

Re: How to get case-sensitive Terms?

2021-02-03 Thread Alexandre Rafalovitch
. elivis, wrote: > Alexandre Rafalovitch wrote > > It is documented in the reference guide: > > https://lucene.apache.org/solr/guide/8_8/analysis-screen.html > > > > Hope it helps, > >Alex. > > > > On Tue, 2 Feb 2021 at 00:57, elivis > > >

Re: How to get case-sensitive Terms?

2021-02-02 Thread Alexandre Rafalovitch
It is documented in the reference guide: https://lucene.apache.org/solr/guide/8_8/analysis-screen.html Hope it helps, Alex. On Tue, 2 Feb 2021 at 00:57, elivis wrote: > > Alexandre Rafalovitch wrote > > Admin UI also allows you to run text string against a field definition to

Re: Apache Solr Reference Guide isn't accessible

2021-02-01 Thread Alexandre Rafalovitch
And if you need something more recent while this is being fixed, you can look right at the source in GitHub, though a navigation, etc is missing: https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/analyzers.adoc Open Source :-) Regards, Alex. On Mon, 1 Feb 2021 at

Re: How to get case-sensitive Terms?

2021-01-30 Thread Alexandre Rafalovitch
Check the field type and associated indexing chain in managed-schema of your core. It probably has the lowercase filter in it. Find a better type or make one yourself. Remember to reload the schema and reindex the content. Admin UI also allows you to run text string against a field definition to

Re: Multi-select faceting for nested documents

2021-01-25 Thread Alexandre Rafalovitch
I don't have an answer, but I feel that maybe explaining the situation in more details would help a bit more. Specifically, you explain your data structure well, but not your actual presentation requirement in enough details. How would you like the multi-select to work, how it is working for you

Re: Exact matching without using new fields

2021-01-21 Thread Alexandre Rafalovitch
If, during index time, your "information" and "informed" are tokenized into the same root (inform?), then you will not be able to distinguish them without storing original forms somewhere, usually with copyField. Same with information vs INFORMATION. The search happens based on indexed tokens.

Re: [Solr8.7] Chinese ZH language ?

2021-01-10 Thread Alexandre Rafalovitch
>possible analysis error: cannot change field "tizh" from You have content indexed against old incompatible definition. Deleted but not purged records count. Delete your index data or change field name during testing. Regards, Alex On Sun., Jan. 10, 2021, 9:19 a.m. Bruno Mannina, wrote: >

Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Alexandre Rafalovitch
Try with the explicit URP chain too. It may work as well. Regards, Alex. On Thu, 17 Dec 2020 at 16:51, Dmitri Maziuk wrote: > > On 12/12/2020 4:36 PM, Shawn Heisey wrote: > > On 12/12/2020 2:30 PM, Dmitri Maziuk wrote: > >> Right, ```Every update request received by Solr is run through a

Re: DIH and UUIDProcessorFactory

2020-12-12 Thread Alexandre Rafalovitch
Why not? You should be able to put an URP chain after DIH, the usual way. Is that something about UUID that is special? Regards, Alex On Sat., Dec. 12, 2020, 2:55 p.m. Dmitri Maziuk, wrote: > Hi everyone, > > is there an easy way to use the stock UUID generator with DIH? We have a >

Re: is there a way to trigger a notification when a document is deleted in solr

2020-12-07 Thread Alexandre Rafalovitch
Maybe a postCommit listener? https://lucene.apache.org/solr/guide/8_4/updatehandlers-in-solrconfig.html Regards, Alex. On Mon, 7 Dec 2020 at 08:03, Pushkar Mishra wrote: > > Hi All, > > Is there a way to trigger a notification when a document is deleted in > solr? Or may be when auto purge

Re: chaining charFilter

2020-12-02 Thread Alexandre Rafalovitch
Did you reload the core for it to notice the new schema? Or try creating a new core from the same schema? If it is a SolrCloud, you also have to upload the schema to the Zookeeper. Regards, Alex. On Wed, 2 Dec 2020 at 09:19, Arturas Mazeika wrote: > Hi Solr-Team, > > The manual of

Re: Trouble with post.jar

2020-11-05 Thread Alexandre Rafalovitch
Are you sure you have the request handler for /update/extract defined in your solrconfig.xml? Not all the update request handlers are defined explicitly (you can check with Config API - /solr/hadoopDocs/config/requestHandler), but I am 99% sure that the /update/extract would be explicit because it

Re: Possible to add a default "appends" fq except for queries in the admin GUI?

2020-10-22 Thread Alexandre Rafalovitch
Why not have a custom handler endpoint for your online queries? You will be modifying them anyway to remove fq. Or even create individual endpoints for every significant use-case. You can share the configuration between them with initParams or useParams, but have more flexibility going forward.

Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Alexandre Rafalovitch
I think this is all explained quite well in the Ref Guide: https://lucene.apache.org/solr/guide/8_6/docvalues.html DocValues is a different way to index/store values. Faceting is a primary use case where docValues are better than what 'indexed=true' gives you. Regards, Alex. On Mon, 19 Oct

Re: converting string to solr.TextField

2020-10-16 Thread Alexandre Rafalovitch
Just as a side note, > indexed="true" If you are storing 32K message, you probably are not searching it as a whole string. So, don't index it. You may also want to mark the field as 'large' (and lazy):

Re: Solr 8.6.3

2020-10-15 Thread Alexandre Rafalovitch
Why not do an XSLT transformation on it before it hits Solr. Or during if it really has to be in-Solr for some reason https://lucene.apache.org/solr/guide/8_6/uploading-data-with-index-handlers.html#using-xslt-to-transform-xml-index-updates But you have more options outside as you could use

Re: solr-8983.pid: Permission denied

2020-10-15 Thread Alexandre Rafalovitch
gt;> addition, I do Solr start/stop with an /etc/init.d script (the Solr > >> distribution has the basic one which we can embellish) in which there is > >> control line RUNAS="solr". The RUNAS variable is used to properly start > >> Solr. > >> Tha

Re: Data Impor Handlert

2020-10-15 Thread Alexandre Rafalovitch
Solr now has package managers and DIH is one of the packages to reflect the fact that its development cycle is not locked to Solr's and to reduce core download. Tika may be heading the same way, as running Tika inside the Solr process could cause memory issues with complex PDFs. In terms of other

Re: solr-8983.pid: Permission denied

2020-10-15 Thread Alexandre Rafalovitch
It sounds like maybe you have started the Solr in a different way than you are restarting it. E.g. maybe you started it manually (bin/solr start, probably as a root) but are trying to restart it via service script. Who owned the .pid file? I am guessing 'root', while the service script probably

Re: Analytics for Solr logs

2020-10-13 Thread Alexandre Rafalovitch
The tool was introduced in Solr 8.5 and it is in bin/postlogs location. It is quite new. Regards, Alex. On Tue, 13 Oct 2020 at 12:39, Zisis T. wrote: > > I've stumbled upon > https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/logs.adoc > which looks very

Re: Folding Repeated Letters

2020-10-09 Thread Alexandre Rafalovitch
Are there that many of those words.?Because even if you deal with , there is still yas! Maybe you just have regexp synonyms? (ye+s+) Good luck, 413x On Thu., Oct. 8, 2020, 6:02 p.m. Mike Drob, wrote: > I'm looking for a way to transform words with repeated letters into the >

Re: Solr endpoint on the public internet

2020-10-08 Thread Alexandre Rafalovitch
of defence on top of everything. Respawn it every hour, if needed. On Thu, 8 Oct 2020 at 15:05, David Hastings wrote: > > Welp. Never mind I refer back to point #1 this is a bad idea > > > On Oct 8, 2020, at 3:01 PM, Alexandre Rafalovitch > > wrote: > > > > The u

Re: Solr endpoint on the public internet

2020-10-08 Thread Alexandre Rafalovitch
t; > > > https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3 > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > >> On Oct 8, 2020, at 11:49 AM, Alexandre Rafalovitc

Re: Solr endpoint on the public internet

2020-10-08 Thread Alexandre Rafalovitch
I think there were past discussions about people doing but they really really knew what they were doing from a security perspective, not just Solr one. You are increasing your risk factor a lot, so you need to think through this. What are you protecting and what are you exposing. Are you trying

Re: MappingCharFilterFactory weird behaviour

2020-10-05 Thread Alexandre Rafalovitch
How do you know it does not apply? My Doh moment is often forgetting that stored version of the field is not affected by analyzers. One has to look in schema Admin UI to check indexed values. Regards, Alex On Mon., Oct. 5, 2020, 6:01 a.m. Lukas Brune, wrote: > Hello! > > I'm having some

Re: advice on whether to use stopwords for use case

2020-09-30 Thread Alexandre Rafalovitch
You may also want to look at something like: https://docs.querqy.org/index.html ApacheCon had (is having..) a presentation on it that seemed quite relevant to your needs. The videos should be live in a week or so. Regards, Alex. On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch wrote

Re: advice on whether to use stopwords for use case

2020-09-29 Thread Alexandre Rafalovitch
I am not sure why you think stop words are your first choice. Maybe I misunderstand the question. I read it as that you need to exclude completely a set of documents that include specific keywords when called from specific module. If I wanted to differentiate the searches from specific module, I

Re: Slow Solr 8 response for long query

2020-09-29 Thread Alexandre Rafalovitch
What do the debug versions of the query show between two versions? One thing that changed is sow (split on whitespace) parameter among many. It is unlikely to be the cause, but I am mentioning just in case.

Minimum set of jars to run EmbeddedSolrServer

2020-09-28 Thread Alexandre Rafalovitch
Hello, Does anybody know (or even experimented) with what the minimum set of jars needed to run EmbeddedSolrServer. If I just include solr-core, that pulls in a huge number of Jars. I don't need - for example - Lucene analyzers for Korean and Japanese for this application. But what else do I

Re: Solr 8.6.2 UI issue

2020-09-25 Thread Alexandre Rafalovitch
Sounds strange. If you had Solr installed previously, it could be cached Javascript. Force-reload or try doing it in an anonymous window. Also try starting with an example (solr/start -e techproducts). Finally, if you are up to it, see if there are any serious errors in the Browser's developer

Re: Solr 8.6.2 text_general

2020-09-25 Thread Alexandre Rafalovitch
e same > > > > > > > Regards, > > Anuj > > On Thu, 24 Sep 2020 at 18:58, Alexandre Rafalovitch > wrote: > > > These are field definitions for _text_ and text, your original > > question was about the fields named "country"/"currency" and wha

Re: Index Deeply Nested documents and retrieve a full nested document in solr

2020-09-24 Thread Alexandre Rafalovitch
It is yes to both questions, but I am not sure if they play well together for historical reasons. For storing/parsing original JSON in any (custom) format: https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html (srcField parameter) For indexing nested children (with

Re: Solr 8.6.2 text_general

2020-09-24 Thread Alexandre Rafalovitch
r 8.6.2 > multiValued="true"/> > > On Thu, 24 Sep 2020 at 18:33, Alexandre Rafalovitch > wrote: > > > I think that means your field went from multiValued to singleValued. > > Double check your schema. Remember that multiValued flag can be set > &

Re: Solr 8.6.2 text_general

2020-09-24 Thread Alexandre Rafalovitch
I think that means your field went from multiValued to singleValued. Double check your schema. Remember that multiValued flag can be set both on the field itself and on its fieldType. Regards, Alex P.s. However if your field is supposed to be single-valued, maybe you should treat it as a

Re: Pining Solr

2020-09-18 Thread Alexandre Rafalovitch
Your builder parameter should be up to the collection, so only "http://testserver-dtv:8984/solr/cpsearch;. Then, on your Query object, you set query.setRequestHandler("/select_cpsearch") as per

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Alexandre Rafalovitch
t;nest_path". * > > Is this intentional? or should it be as follows? > > name="_nest_path_" type="* _nest_path_ *" /> > > Also, should we explicitly set index=true and store=true on _nest_path_ > and _nest_parent_ fields? &g

Re: How to remove duplicate tokens from solr

2020-09-17 Thread Alexandre Rafalovitch
This is not quite enough information. There is https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter but it has specific limitations. What is the problem that you are trying to solve that you feel is due to duplicate tokens? Why are they duplicates? Is

Re: Doing what does using SolrJ API

2020-09-17 Thread Alexandre Rafalovitch
Solr has a whole pipeline that you can run during document ingesting before the actual indexing happens. It is called Update Request Processor (URP) and is defined in solrconfig.xml or in an override file. Obviously, since you are indexing from SolrJ client, you have even more flexibility, but it

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Alexandre Rafalovitch
ild1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1, > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep > 07 12:40:

Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Alexandre Rafalovitch
Can you double-check your schema to see if you have all the fields required to support nested documents. You are supposed to get away with just _root_, but really you should also include _nest_path and _nest_parent_. Your particular exception seems to be triggering something (maybe a bug) related

Re: Why use a different analyzer for "index" and "query"?

2020-09-10 Thread Alexandre Rafalovitch
There are a lot of different use cases and the separate analyzers for indexing and query is part of the Solr power. For example, you could apply ngram during indexing time to generate multiple substrings. But you don't want to do that during the query, because otherwise you are matching on 'shared

Re: Inverse English an digits in Arabic Text

2020-09-08 Thread Alexandre Rafalovitch
If you are uploading a PDF, then you must be doing it via Tika or via an extract handler (which uses Tika under the covers). Try getting a standalone Tika of the same version and see what it outputs. Perhaps there is something in those specific PDF pages that confuse Tika. Like, if it used

Re: Inverse English an digits in Arabic Text

2020-09-07 Thread Alexandre Rafalovitch
> Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible. What very specifically do you mean by that. How do you see the inversion? If that's within some sort of web ui, then you are probably seeing the HTML bidi

Re: Must specify either 'defaultFieldType' or declare one typeMapping as default

2020-09-05 Thread Alexandre Rafalovitch
That's a really hard way to get introduced to Solr. What about downloading Solr and running one of the built-in examples? Because you are figuring out so many variables at once. Either way, your specific issue is not in schema.xml (which should be converted to managed-schema on first run, btw,

Re: Can't get Solr to work with Dovecot

2020-08-27 Thread Alexandre Rafalovitch
Is this a Solr-side message? Looks like dovecot doing proactive trimming of some crazy long header. You can lookup the record by UID in the Admin UI (UID=153535 instead of *:*) to check what is being indexed. Check that dovecot does not do any prefixing of field names (any record from first

Re: Exclude a folder/directory from indexing

2020-08-27 Thread Alexandre Rafalovitch
If you are indexing from Drupal into Solr, that's the question for Drupal's solr module. If you are doing it some other way, which way are you doing it? bin/post command? Most likely this is not the Solr question, but whatever you have feeding data into Solr. Regards, Alex. On Thu, 27 Aug

Re: Can't get Solr to work with Dovecot

2020-08-27 Thread Alexandre Rafalovitch
; > Error 404 Not Found > > HTTP ERROR 404 > Problem accessing /solr/dovecot. Reason: > Not Found > > > > Anything else I could try? > > Best, > > Francis > > On 2020-08-27 20:46, Alexandre Rafalovitch wrote: > > Uhm right. I may have forgo

Re: Can't get Solr to work with Dovecot

2020-08-27 Thread Alexandre Rafalovitch
t; > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495) > > at > > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) > > at > > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)

Re: Can't get Solr to work with Dovecot

2020-08-27 Thread Alexandre Rafalovitch
Have you tried blowing the index directory away (usually 'data' directory next to 'conf'). Because: cannot change field "box" from index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index options=DOCS This implies that your field box had different definitions, you updated it but the index

Re: Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-24 Thread Alexandre Rafalovitch
96063ffdcef08047/solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java#L201-L209 > but > not sure > > Regards, > Munendra S N > > > > On Sun, Aug 23, 2020 at 7:53 PM Alexandre Rafalovitch > wrote: > > > Thank you Nunedra, > > > &g

Re: PDF extraction using Tika

2020-08-24 Thread Alexandre Rafalovitch
The issue seems to be more with a specific file and at the level way below Solr's or possibly even Tika's: Caused by: java.io.IOException: expected='>' actual=' ' at offset 2383 at org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045) Are you indexing the

Re: Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-23 Thread Alexandre Rafalovitch
> > parent1.addField("class", "foo.bar.parent1"); > > > > SolrInputDocument child1 = new SolrInputDocument(); > > > > parent1.addField("sometag", Arrays.asList(child1)); > > child1.addField("id", "c1"); > > c

Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-22 Thread Alexandre Rafalovitch
Hello, I am trying to get up to date with both SolrJ and Nested Document implementation and not sure where I am failing with a basic test (https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java). I am using Solr 8.6.1 with a core created with bin/solr create -c solrj

Re: Solr ping taking 600 seconds

2020-08-17 Thread Alexandre Rafalovitch
If this is reproducible, I would run Wireshark on the network and see what happens at packet level. Leaning towards firewall timing out and just starting to drop all packets. Regards, Alex On Mon., Aug. 17, 2020, 6:22 p.m. Susheel Kumar, wrote: > Thanks for the all responses. > > Shawn -

Re: Multiple "df" fields

2020-08-11 Thread Alexandre Rafalovitch
I can't remember if field aliasing works with df but it may be worth a try: https://lucene.apache.org/solr/guide/8_1/the-extended-dismax-query-parser.html#field-aliasing-using-per-field-qf-overrides Another example:

Re: wt=xml not defaulting the results to xml format

2020-08-07 Thread Alexandre Rafalovitch
You have echoParams set to all. What does that return? Regards, Alex On Fri., Aug. 7, 2020, 11:31 a.m. yaswanth kumar, wrote: > Thanks for looking into this Erick, > > > solr/PROXIMITY_DATA_V2/select?q=pkey:223_*=true=country_en=country_en > > that's what the url I am hitting, and also I

Re: Multiple fq vs combined fq performance

2020-07-09 Thread Alexandre Rafalovitch
I _think_ it will run all 3 and then do index hopping. But if you know one fq is super expensive, you could assign it a cost Value over 100 will try to use PostFilter then and apply the query on top of results from other queries.

Re: Shingles behavior

2020-05-20 Thread Alexandre Rafalovitch
Did you try it with 'sow' parameter both ways? I am not sure I fully understand the question, especially with shingling on both passes rather than just indexing one. But at least it is something to try and is one of the difference areas between Solr and ES. Regards, Alex. On Tue, 19 May 2020

Re: What is the logical order of applying sorts in SOLR?

2020-05-20 Thread Alexandre Rafalovitch
If you use sort, you are basically ignoring relevancy (unless you put that into sort). Which you seem to know as your example uses FQ. Do you see performance drop on non-clustered or clustered Solr? Because, I would not be surprised if, for clustered node, all the results need to be brought into

Re: Large query size in Solr 8.3.0

2020-05-20 Thread Alexandre Rafalovitch
Does this actually work? This individual ID matching feels very fragile attempt at enforcing the sort order and maybe represents an architectural issue. Maybe you need to do some joins or graph walking instead. Or, more likely, you would benefit from over-fetching and just sorting on the ids on

Re: Proper way to manage managed-schema file

2020-04-13 Thread Alexandre Rafalovitch
If you are using API (which AdminUI does), the regenerated file will loose comments and sort everything in particular order. That's just the implementation at the moment. If you don't like that, you can always modify the schema file by hand and reload the core to notice the changes. You can even

Re: how to use multiple update process chain?

2020-04-13 Thread Alexandre Rafalovitch
You can only have one chain at the time. You can, however, create your custom URP chain to contain configuration from all three. Or, if you do use multiple chains that are configured similarly, you can pull each URP into its own definition and then mix and match then either in the chain or even

Re: Solr Admin Console hangs on Chrome

2019-12-11 Thread Alexandre Rafalovitch
Check for popup and other tracker blockers. It is possible one of the resources has a similar name and triggers blocking. There was a thread in early October with a similar discussion, but apart from the blockers idea nothing else was discovered at the time. An easy way would be to create a new

Re: Search returning unexpected matches at the top

2019-12-06 Thread Alexandre Rafalovitch
You can enable debug which will show you what matches and why. Check the reference guide for parameters: https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#debug-parameter Regards, Alex. On Fri, 6 Dec 2019 at 11:00, rhys J wrote: > > I have a search box that is just

Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread Alexandre Rafalovitch
What about XMLQueryParser: https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser Regards, Alex. On Wed, 27 Nov 2019 at 22:43, wrote: > > I am trying to simulate the following query(Lucene query builder) using Solr > > > > > BooleanQuery.Builder main = new

Re: Prevent Solr overwriting documents

2019-11-27 Thread Alexandre Rafalovitch
Oops. And the link... https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-OptimisticConcurrency On Wed, Nov 27, 2019, 6:24 PM Alexandre Rafalovitch, wrote: > How about Optimistic Concurrency with _version_ set to negative value? > >

Re: Prevent Solr overwriting documents

2019-11-27 Thread Alexandre Rafalovitch
How about Optimistic Concurrency with _version_ set to negative value? You could inject that extra value in URP chain if need be. Regards, Alex On Wed, Nov 27, 2019, 5:41 PM Aaron Hoffer, wrote: > We want to prevent Solr from overwriting an existing document if document's > ID already

Re: How to implement NOTIN operator with Solr

2019-11-19 Thread Alexandre Rafalovitch
I think the main question here is the compound word "credit card" always the same? If yes, you can preprocess it during indexing to something unique and discard (see Vincenzo's reply). You could even copyfield and process the copy to only leave standalone word "credit" in it, so it basically

Re: Full-text search for Solr manual

2019-11-13 Thread Alexandre Rafalovitch
Try: site:lucene.apache.org inurl:8_2 luceneMatchVersion (8.3 does not work, seems to be not fully? indexed by google yet) https://github.com/apache/lucene-solr/search?l=AsciiDoc=luceneMatchVersion (latest development version only). You can read the rendered documents (without extra processing

Re: Full-text search for Solr manual

2019-11-11 Thread Alexandre Rafalovitch
Grep on the source of the manual (which ships with Solr source). Google search with domain or keywords limitations. Online copy searching is not powered by Solr yet. Yes, we are aware of the irony and are discussing it. Regards, Aled On Tue, Nov 12, 2019, 1:25 AM Luke Miller, wrote: >

Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-10 Thread Alexandre Rafalovitch
, Nov 11, 2019, 2:30 PM Sthitaprajna, wrote: > > https://stackoverflow.com/questions/58763657/solr-missing-mandatory-uniquekey-field-id-or-unknown-field?noredirect=1#comment103816164_58763657 > > May be this will help ? I added screenshots. > > On Fri, 8 Nov 2019, 22:57 Ale

Re: Mixing query between different parsers

2019-11-10 Thread Alexandre Rafalovitch
Weird. Did you try echoParams=all just to see what other defaults are picked up. It feels like it picks up default parser and maybe default "df" value that points to not existing text field. Maybe enable debug too to see what it expands to. Regards, Alex On Sun, Nov 10, 2019, 9:26 PM

Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-08 Thread Alexandre Rafalovitch
Something does not make sense, because your schema defines "title" as the uniqueKey field, but your message talks about "id". Are you absolutely sure that the Solr/collection you get an error for is the same Solr where you are checking the schema? Also, do you have a bit more of the error and

Re: Good Open Source Front End for Solr

2019-11-06 Thread Alexandre Rafalovitch
For what purpose? Because, for example, Solr is not designed to serve direct to the browser, just like Mysql is not. So, usually, there is a custom middleware. On the other hand, Solr can serve as JDBC engine so you could use JDBC frontends to explore data. Or as an engine for visualisations.

Re: [Q] Ref Guide - What is Multi-Term Expansion?

2019-11-06 Thread Alexandre Rafalovitch
It mentions it in the start paragraph "Prefix, Wildcard, Regex, etc." So, if you search for "abc*" it expands to all terms that start from "abc", but then not everything can handle this situation as it is a lot of terms in the same position. So, not all analyzers can handle that and normally it

Re: Solr Ref Guide Changes - now HTML only

2019-10-28 Thread Alexandre Rafalovitch
I've done some experiments about indexing RefGuide (from source) into Solr at: https://github.com/arafalov/solr-refguide-indexing . But the problem was creating UI, hosting, etc. There was also a thought (mine) of either shipping RefGuide in Solr with pre-built index as an example or even just

Re: regarding Extracting text from Images

2019-10-23 Thread Alexandre Rafalovitch
Hi Alex, > Thanks for your reply. How do we integrate tesseract with Solr? Do we have > to implement Custom update processor or extend the > ExtractingRequestProcessor? > > Regards > Suresh > > On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch > > wrote: > >

Re: regarding Extracting text from Images

2019-10-23 Thread Alexandre Rafalovitch
I believe Tika that powers this can do so with extra libraries (tesseract?) But Solr does not bundle those extras. In any case, you may want to run Tika externally to avoid the conversion/extraction process be a burden to Solr itself. Regards, Alex On Wed, Oct 23, 2019, 1:58 PM suresh

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread Alexandre Rafalovitch
What command do you use to get the file into Solr? My guess that you are somehow not hitting the correct handler. Perhaps you are sending it to extract handler (designed for PDF, MSWord, etc) rather than the correct CSV handler. Solr comes with the examples of how to index CSV command. See for

Re: Solr Paryload example

2019-10-21 Thread Alexandre Rafalovitch
I remember several years ago a discussion/blog post about a similar problem. The author went through a lot of thinking and decided that the best way to deal with a similar problem was to have Solr documents represent different level of abstraction, more granular. IIRC, the equivalent for your

Re: Position search

2019-10-16 Thread Alexandre Rafalovitch
Also, you may want to note which normalized fields were truncated or > > were simply too small. This would give some guidance as to the bias of > > the normalization. If 95% of the fields were not truncated, there is > > a chance you are not doing good at normalizing b

Re: Position search

2019-10-16 Thread Alexandre Rafalovitch
ll. This would give some guidance as to the bias of the > normalization. If 95% of the fields were not truncated, there is a chance > you are not doing good at normalizing because you have a set of > particularly short messages. So I would expect a small set of side fields > remarking th

Re: Position search

2019-10-15 Thread Alexandre Rafalovitch
Is the 100 words a hard boundary or a soft one? If it is a hard one (always 100 words), the easiest is probably copy field and in the (unstored) copy, trim off whatever you don't want to search. Possibly using regular expressions. Of course, "what's a word" is an important question here.

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Alexandre Rafalovitch
the > > capitalization (otherwise “it” would be taken out as a stopword). > > > stopwords are a thing of the past at this point. there is no benefit to > using them now with hardware being so cheap. > > On Tue, Oct 8, 2019 at 12:43 PM Alexandre Rafalovitch >

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Alexandre Rafalovitch
Try referencing the jar directly (by absolute path) with a statement in the solrconfig.xml (and reloading the core). The DIH example shipped with Solr shows how it works. This will help to see if the problem with not finding the jar or something else. Regards, Alex. On Wed, 9 Oct 2019 at

Re: Protecting Tokens from Any Analysis

2019-10-08 Thread Alexandre Rafalovitch
If you don't want it to be touched by a tokenizer, how would the protection step know that the sequence of characters you want to protect is "IT:ibm" and not "this is an IT:ibm term I want to protect"? What it sounds to me is that you may want to: 1) copyField to a second field 2) Apply a much

Re: Turn off weighted search

2019-09-30 Thread Alexandre Rafalovitch
Can you give a more detailed example, please? Including the schema bits. There is a bunch of assumptions in here that are hard to really make sense of. Solr works with tokens, but you are talking about letter repetitions. Also, if you want to sort by the string, why not just use sort parameter?

Re: URGENT Documents automatically getting deleted in SOLR 6.6.0

2019-09-26 Thread Alexandre Rafalovitch
Your system is under attack, something trying to hack into it via Solr. Possibly a cryptominer or similar. And it is using DIH endpoint for it. Shawn explain the most likely cause for Solr actually deleting the records. I would also suggest: 1) Figure out where the request is coming from and

Re: Rename field in all documents from `i_itemNumber_l` to `i_itemNumber_cp_l`

2019-09-16 Thread Alexandre Rafalovitch
I don't think you can rename it in the index. However, you may be able to rename it during the query: https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-FieldNameAliases Or, if you use eDisMax, during query rewriting:

  1   2   3   4   5   6   7   8   9   10   >