RE: Filter by sibling ?

2021-03-02 Thread Manoj Mokashi
I tried passing a parent parser connected to the {!child} parser using query 
params, and it seems to work !

q=type:C1 AND AND {!child of='type:PR' v=$statusqry}
statusqry={!parent which='type:PR' }type:C2

Note that my real query is not exactly this, so I haven't tried the exact 
expression above

-Original Message-
From: Manoj Mokashi 
Sent: Wednesday, March 3, 2021 9:56 AM
To: solr-user@lucene.apache.org
Subject: RE: Filter by sibling ?

Ok. Will check. thanks !

-Original Message-
From: Joel Bernstein 
Sent: Tuesday, March 2, 2021 8:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Filter by sibling ?

Solr's graph expressions can do this type of thing. It allows you to walk the 
relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child
> 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}] }
>
> Can I fetch type:C1 documents which are children of parent docs that
> have child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of
> the intended recipient and may contain confidential and privileged 
> information.
> Any unauthorized view, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message. Anju Software, 
> Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>
Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Query response time long for dynamicField in Solr 6.1.0

2021-03-02 Thread vishal patel
I am using Solr 6.1.0. We have 2 shards and each has one replica.

My schema field is below in one collection


When I execute below query It is taking more than 180 milliseconds every time.
http://10.38.33.24:8983/solr/forms/select?q=project_id:(2117627+2102977+2109667+2102912+2113720+2102976+2102478+2114939+2101443+2123237+2078189+2086596+2079707+2079706+2079705+2079658+2088340+2088338+2113641+2117131+2117672+2120870+2079708+2113718+2096308+2125462+2117837+2115406+2123865+2081232+2080746+2081239+2082706+2098700+2103039+2098699+2082878+2082877+2079994+2113719+2107255+2103251+2100558+2112735+2100036+2100037+2115359+2099330+2112101+2115360+2112070+2125140+2103656+2090184+2090183+2088269+2088270+2115358+2113036+2096855+2098258+2097226+2097225+2113127+2102847+2081187+2082817+2085678+2085677+2100937+2116632+2117133+2121028+2102479+2080006+2117509+2091443+2094716+2109780+2109779+2102735+2102736+2102685+2101923+2103648+2102608+2102480+2103664+2079205+2075380+2079206+2091442+2088614+2088613+2079876+2079875+2082886+2088615+2079429+2079428+2117185+2082859+2082860+2125270+2081301+2117623+2112740+2086757+2086756+2101344+2086597+2086847+2102648+2113362+2109010+2100223+2079877+2082704+2109669+2103649+2100744+2101490+2117526+2117134+2124020+2124021+2123524+2127200+2125039+2103663)=updated+desc,id+desc=0=30==id,form_id,project_id,doctype,dc,form_type_id,status_id,originator_user_id,controller_user_id,form_num,originator_proxy_user_id,originator_user_type_id,controller_user_type_id,msg_id,msg_originator_id,msg_status_id,parent_msg_id,msg_type_id,msg_code,form_code,appType,instance_group_id,bim_model_id,is_draft,InvoiceColourCode,InvoiceCountAgainstOrder,msg_content,msg_content1,msg_content3,user_ref,form_type_name,form_group_name,observationId,locationId,pf_loc_folderId,hasFormAssociation,hasCommentAssociation,hasDocAssociation,hasBimViewAssociation,hasBimListAssociation,originator_org_id,form_closeby_date,form_creation_date,status_change_userId,status_update_date,lastmodified,is_public,title,*Start_Date,*Tender_End_Date,*Tender_End_Time,*Tender_Review_Date,*Tender_Review_Time,*TenderEndDatePassed,*Package_Description,*Budget,*Currency_Sign,*allowExternalVendor,*Enable_form_public_link,*Is_Tender_Public=off=true=http://10.38.33.24:8983/solr/forms,http://10.38.33.227:8983/solr/forms=true=form_id=msg_creation_date+desc=true

When I execute below query It is taking less than 80 milliseconds every time.

RE: Filter by sibling ?

2021-03-02 Thread Manoj Mokashi
Ok. Will check. thanks !

-Original Message-
From: Joel Bernstein 
Sent: Tuesday, March 2, 2021 8:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Filter by sibling ?

Solr's graph expressions can do this type of thing. It allows you to walk the 
relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child
> 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}] }
>
> Can I fetch type:C1 documents which are children of parent docs that
> have child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of
> the intended recipient and may contain confidential and privileged 
> information.
> Any unauthorized view, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message. Anju Software, 
> Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>
Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Re: Caffeine Cache Metrics Broken?

2021-03-02 Thread Shawn Heisey

On 3/2/2021 3:47 PM, Stephen Lewis Bianamara wrote:

I'm investigating a weird behavior I've observed in the admin page for
caffeine cache metrics. It looks to me like on the older caches, warm-up
queries were not counted toward hit/miss ratios, which of course makes
sense, but on Caffeine cache it looks like they are. I'm using solr 8.3.

Obviously this makes measuring its true impact a little tough. Is this by
any chance a known issue and already fixed in later versions?


The earlier cache implementations are entirely native to Solr -- all the 
source code is include in the Solr codebase.


Caffeine is a third-party cache implementation that has been integrated 
into Solr.  Some of the metrics might come directly from Caffeine, not 
Solr code.


I would expect warming queries to be counted on any of the cache 
implementations.  One of the reasons that the warming capability exists 
is to pre-populate the caches before actual queries begin.  If warming 
queries are somehow excluded, then the cache metrics would not be correct.


I looked into the code and did not find anything that would keep warming 
queries from affecting stats.  But it is always possible that I just 
didn't know what to look for.


In the master branch (Solr 9.0), CaffeineCache is currently the only 
implementation available.


Thanks,
Shawn


RE: Idle timeout expired and Early Client Disconnect errors

2021-03-02 Thread ufuk yılmaz
I divided the query to 1000 pieces and removed the parallel stream clause, it 
seems to be working without timeout so far, if it does I just can divide it to 
even smaller pieces I guess.

I tried to send all 1000 pieces in a “list” expression to be executed linearly, 
it didn’t work but I was just curious if it could handle such a large query 

Now I’m just generating expression strings from java code and sending them one 
by one. I tried to use SolrJ for this, but encountered a weird problem where 
even the simplest expression (echo) stops working after a few iterations in a 
loop. I’m guessing the underlying HttpClient is not closing connections timely, 
hitting the OS per-host connection limit. I asked a separate question about 
this. I was following the example on lucidworks: 
https://lucidworks.com/post/streaming-expressions-in-solrj/

I just modified my code to use regular REST calls using okhttp3, it’s a shame 
that I couldn’t use SolrJ since it truly streams every result 1 by 1 
continuously. REST just returns a single large response at the very end of the 
stream.

Thanks again for your help.

Sent from Mail for Windows 10

From: Joel Bernstein
Sent: 02 March 2021 00:19
To: solr-user@lucene.apache.org
Subject: Re: Idle timeout expired and Early Client Disconnect errors

Also the parallel function builds hash partitioning filters that could lead
to timeouts if they take too long to build. Try the query without the
parallel function if you're still getting timeouts when making the query
smaller.



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 4:03 PM Joel Bernstein  wrote:

> The settings in your version are 30 seconds and 15 seconds for socket and
> connection timeouts.
>
> Typically timeouts occur because one or more shards in the query are idle
> beyond the timeout threshold. This happens because lot's of data is being
> read from other shards.
>
> Breaking the query into small parts would be a good strategy.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz 
> wrote:
>
>> Hello Mr. Bernstein,
>>
>> I’m using version 8.4. So, if I understand correctly, I can’t increase
>> timeouts and they are bound to happen in such a large stream. Should I just
>> reduce the output of my search expressions?
>>
>> Maybe I can split my search results into ~100 parts and run the same
>> query 100 times in series. Each part would emit ~3M documents so they
>> should finish before timeout?
>>
>> Is this a reasonable solution?
>>
>> Btw how long is the default hard-coded timeout value? Because yesterday I
>> ran another query which took more than 1 hour without any timeouts and
>> finished successfully.
>>
>> Sent from Mail for Windows 10
>>
>> From: Joel Bernstein
>> Sent: 01 March 2021 23:03
>> To: solr-user@lucene.apache.org
>> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>>
>> Oh wait, I misread your email. The idle timeout issue is configurable in:
>>
>> https://issues.apache.org/jira/browse/SOLR-14672
>>
>> This unfortunately missed the 8.8 release and will be 8.9.
>>
>>
>>
>> This i
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:
>>
>> > What version are you using?
>> >
>> > Solr 8.7 has changes that caused these errors to hit the logs. These
>> used
>> > to be suppressed. This has been fixed in Solr 9.0 but it has not been
>> back
>> > ported to Solr 8.x.
>> >
>> > The errors are actually normal operational occurrences when doing joins
>> so
>> > should be suppressed in the logs and were before the specific release.
>> >
>> > It might make sense to do a release that specifically suppresses these
>> > errors without backporting the full Solr 9.0 changes which impact the
>> > memory footprint of export.
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> >
>> > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz > >
>> > wrote:
>> >
>> >> Hello all,
>> >>
>> >> I’m running a large streaming expression and feeding the result to
>> update
>> >> expression.
>> >>
>> >>  update(targetCollection, ...long running stream here...,
>> >>
>> >> I tried sending the exact same query multiple times, it sometimes works
>> >> and indexes some results, then gives exception, other times fails with
>> an
>> >> exception after 2 minutes.
>> >>
>> >> Response is like:
>> >> "EXCEPTION":"java.util.concurrent.ExecutionException:
>> >> java.io.IOException: params distrib=false=4 and my long
>> >> stream expression
>> >>
>> >> Server log (short):
>> >> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
>> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concurrent.TimeoutException: Idle timeout expired:
>> 12/12
>> >> ms
>> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concurrent.TimeoutException: Idle timeout expired:
>> 12/12
>> >> ms
>> >>
>> >> I tried to increase the 

RE: Default conjunction behaving differently after field type change

2021-03-02 Thread ufuk yılmaz
I changed the tokenizer class from KeywordTokenizerFactory to 
WhitespaceTokenizerFactory for the query analyzer using the Schema API, it 
seems to have solved the problem.

Sent from Mail for Windows 10

From: ufuk yılmaz
Sent: 02 March 2021 20:47
To: solr-user@lucene.apache.org
Subject: Default conjunction behaving differently after field type change

Hello all,

>From the Solr 8.4 (my version) documentation:

“The OR operator is the default conjunction operator. This means that if there 
is no Boolean operator between two terms, the OR operator is used. To search 
for documents that contain either "jakarta apache" or just "jakarta," use the 
query:

"jakarta apache" jakarta

or

"jakarta apache" OR jakarta”


I had a field type=”string” in my old schema:





I could use queries like:
username: (user1 user2 user3)

So it would find the documents of all 3 users (conjunction is OR)
-
Recently I changed the field definition in a new schema:


  
  
  
  




When I search with the same query:

username: (user1 user2 user3)

I get no results unless I change it to either:
username: (user1 OR user2 OR user3) //
username: (“user1” “user2” “user3”)


First I was thinking the default conjunction operator changed to AND, but it 
seems now standart query parser thinks user1 user2 user3 is a single string 
containin spaces I guess?

I couldn’t find how the default “string” field queries are analyzed, what is 
the difference that may cause this behavior?

--ufuk yilmaz



Sent from Mail for Windows 10




RE: Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
It worked! Thanks Mr. Rafalovitch. I just removed “type”: “query”.. keys from 
the json, and used indexAnalyzer and queryAnalyzer in place of analyzer json 
node.

Sent from Mail for Windows 10

From: Alexandre Rafalovitch
Sent: 03 March 2021 01:19
To: solr-user
Subject: Re: Schema API specifying different analysers for query and index

RefGuide gives this for Adding, I would hope the Replace would be similar:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type":{
 "name":"myNewTextField",
 "class":"solr.TextField",
 "indexAnalyzer":{
"tokenizer":{
   "class":"solr.PathHierarchyTokenizerFactory",
   "delimiter":"/" }},
 "queryAnalyzer":{
"tokenizer":{
   "class":"solr.KeywordTokenizerFactory" }}}
}' http://localhost:8983/solr/gettingstarted/schema

So, indexAnalyzer/queryAnalyzer, rather than array:
https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type

Hope this works,
Alex.
P.s. Also check whether you are using matching API and V1/V2 end point.

On Tue, 2 Mar 2021 at 15:25, ufuk yılmaz  wrote:
>
> Hello,
>
> I’m trying to change a field’s query analysers. The following works but it 
> replaces both index and query type analysers:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzer": {
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }
> }
> }
>
> I tried to change analyzer field to analyzers, to specify different analysers 
> for query and index, but it gave error:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzers": [{
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> },{
> "type": "index",
> "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }]
> }
> }
>
> "errorMessages":["Plugin init failure for [schema.xml]
> "msg":"error processing commands",...
>
> How can I specify different analyzers for query and index type when using 
> schema api?
>
> Sent from Mail for Windows 10
>



Running Simple Streaming expressions in a loop through SolrJ stops with read timeout after a few iterations

2021-03-02 Thread ufuk yılmaz
I’m using the following example on Lucidworks to use streaming expressions from 
SolrJ:

https://lucidworks.com/post/streaming-expressions-in-solrj/

Problem is, when I run it inside a for loop, even the simplest expression 
(echo) stops executing after about 5 iterations. I thought the underlying 
HttpClient was not closing the tcp connection to the solr host, and after 4-5 
iterations it reaches the max connections per host limit of the OS (mine is 
windows 10) and stops working.

But then I tried to manually supply a SolrClientCache with a custom configured 
HttpClient, debugged and saw my custom HttpClient is being utilized by the 
stream, but whatever I tried it didn’t change the outcome.

Do you have any idea about this problem? Am I on the right track about 
HttpClient not closing-reusing a connection after an expression is finished? Or 
is there another issue?

I also tried this with different expressions but result didn’t change.

I created a gist to share my code here: https://git.io/Jqevp
but I’m pasting a shortened version here to read without going there: 

-
String workerUrl = "http://mySolrHost:8983/solr/WorkerCollection;;

String expr = "echo(x)";

for (int i = 0; i < 20; i++) {

TupleStream tplStream = null;

ModifiableSolrParams modifiableSolrParams =
new ModifiableSolrParams()
.set("expr", expr.replaceAll("x", Integer.toString(i)))
.set("preferLocalShards", true)
.set("qt", "/stream");

TupleStream tplStream = new SolrStream(workerUrl, modifiableSolrParams);

tplStream.setStreamContext(new StreamContext());

tplStream.open();

Tuple tuple;
tuple = tplStream.read();
System.out.println(tuple.fields);

tplStream.close();
}
-

Sent from Mail for Windows 10



Caffeine Cache Metrics Broken?

2021-03-02 Thread Stephen Lewis Bianamara
Hi SOLR Community,

I'm investigating a weird behavior I've observed in the admin page for
caffeine cache metrics. It looks to me like on the older caches, warm-up
queries were not counted toward hit/miss ratios, which of course makes
sense, but on Caffeine cache it looks like they are. I'm using solr 8.3.

Obviously this makes measuring its true impact a little tough. Is this by
any chance a known issue and already fixed in later versions?

Thanks!
Stephen


Re: Schema API specifying different analysers for query and index

2021-03-02 Thread Alexandre Rafalovitch
RefGuide gives this for Adding, I would hope the Replace would be similar:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type":{
 "name":"myNewTextField",
 "class":"solr.TextField",
 "indexAnalyzer":{
"tokenizer":{
   "class":"solr.PathHierarchyTokenizerFactory",
   "delimiter":"/" }},
 "queryAnalyzer":{
"tokenizer":{
   "class":"solr.KeywordTokenizerFactory" }}}
}' http://localhost:8983/solr/gettingstarted/schema

So, indexAnalyzer/queryAnalyzer, rather than array:
https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type

Hope this works,
Alex.
P.s. Also check whether you are using matching API and V1/V2 end point.

On Tue, 2 Mar 2021 at 15:25, ufuk yılmaz  wrote:
>
> Hello,
>
> I’m trying to change a field’s query analysers. The following works but it 
> replaces both index and query type analysers:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzer": {
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }
> }
> }
>
> I tried to change analyzer field to analyzers, to specify different analysers 
> for query and index, but it gave error:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzers": [{
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> },{
> "type": "index",
> "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }]
> }
> }
>
> "errorMessages":["Plugin init failure for [schema.xml]
> "msg":"error processing commands",...
>
> How can I specify different analyzers for query and index type when using 
> schema api?
>
> Sent from Mail for Windows 10
>


Re: Location of Solr 9 Branch

2021-03-02 Thread Houston Putman
Solr 9 is an unreleased major version, so it lives in *master*. Once the
release process starts for Solr 9, it will live at *branch_9x*, and *master*
will host Solr 10.

On Tue, Mar 2, 2021 at 3:49 PM Phill Campbell 
wrote:

> I have just begun investigating Solr source code. Where is the branch for
> Solr 9?
>
>
>


Location of Solr 9 Branch

2021-03-02 Thread Phill Campbell
I have just begun investigating Solr source code. Where is the branch for Solr 
9?




Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
Hello,

I’m trying to change a field’s query analysers. The following works but it 
replaces both index and query type analysers:

{
"replace-field-type": {
"name": "string_ci",
"class": "solr.TextField",
"sortMissingLast": true,
"omitNorms": true,
"stored": true,
"docValues": false,
"analyzer": {
"type": "query",
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}
}
}

I tried to change analyzer field to analyzers, to specify different analysers 
for query and index, but it gave error:

{
"replace-field-type": {
"name": "string_ci",
"class": "solr.TextField",
"sortMissingLast": true,
"omitNorms": true,
"stored": true,
"docValues": false,
"analyzers": [{
"type": "query",
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
},{
"type": "index",
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}]
}
}

"errorMessages":["Plugin init failure for [schema.xml]
"msg":"error processing commands",...

How can I specify different analyzers for query and index type when using 
schema api?

Sent from Mail for Windows 10



Default conjunction behaving differently after field type change

2021-03-02 Thread ufuk yılmaz
Hello all,

>From the Solr 8.4 (my version) documentation:

“The OR operator is the default conjunction operator. This means that if there 
is no Boolean operator between two terms, the OR operator is used. To search 
for documents that contain either "jakarta apache" or just "jakarta," use the 
query:

"jakarta apache" jakarta

or

"jakarta apache" OR jakarta”


I had a field type=”string” in my old schema:





I could use queries like:
username: (user1 user2 user3)

So it would find the documents of all 3 users (conjunction is OR)
-
Recently I changed the field definition in a new schema:


  
  
  
  




When I search with the same query:

username: (user1 user2 user3)

I get no results unless I change it to either:
username: (user1 OR user2 OR user3) //
username: (“user1” “user2” “user3”)


First I was thinking the default conjunction operator changed to AND, but it 
seems now standart query parser thinks user1 user2 user3 is a single string 
containin spaces I guess?

I couldn’t find how the default “string” field queries are analyzed, what is 
the difference that may cause this behavior?

--ufuk yilmaz



Sent from Mail for Windows 10



Possible bug with AnalyzingInfixLookupFactory, FileDictionaryFactory and Context Filtering

2021-03-02 Thread Joaquim de Souza
Hi all,

I asked a question on StackOverflow
about
a problem I was having with the suggester module, but since then I have
looked into the source code of Solr, and I thinkit is a bug.

Essentially, context filtering is being applied to a suggester that is
backed by a FileDictionaryFactory. According to the docs, this should not
happen, and context filters should be ignored.

This is my config:


  
location
AnalyzingInfixLookupFactory
FileDictionaryFactory
tdwg.txt
text_general
false
  

  
common-name
AnalyzingInfixLookupFactory
DocumentDictionaryFactory
region.vernacular_names_t
common_name_suggest
searchable.context_ss
text_general
false
  


I have tested this on the latest version of Solr (8.8.1).

The relevant bit of source code is here
.
I would expect suggestions to be null, as the combination of
AnalyzingInfixLookupFactory and FileDictionaryFactor doesn't support
context filtering.

Is there anything I can do to fix this problem?

Thanks,
Joaquim


Re: Filter by sibling ?

2021-03-02 Thread Joel Bernstein
Solr's graph expressions can do this type of thing. It allows you to walk
the relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}]
> }
>
> Can I fetch type:C1 documents which are children of parent docs that have
> child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of the
> intended recipient and may contain confidential and privileged information.
> Any unauthorized view, use, disclosure or distribution is prohibited. If
> you are not the intended recipient, please contact the sender by reply
> email and destroy all copies of the original message. Anju Software, Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>


Re: Partial update bug on solr 8.8.0

2021-03-02 Thread Mike Drob
This looks like a bug that is already fixed but not yet released in 8.9

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-13034

On Tue, Mar 2, 2021 at 6:27 AM Mohsen Saboorian  wrote:

> Any idea about this post?
> https://stackoverflow.com/q/66335803/141438
>
> Regards.
>


Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Martin Graney
Hi Alex

Thanks for the reply.
We are not using the 'copyField bucket' approach as it is inflexible. Our
textual fields are all multivalued dynamic fields, which allows us to craft
a list of `pf` (phrase fields) with associated weighting boosts that are
meant to be used in the search on a *per-collection* basis. This allows us
to have all of the textual fields indexed independently and then simply
change the query when we want to include/exclude a field from the search
without the need to reindex the entire collection. e/dismax makes this more
flexible approach possible.

I'll take a look at the ComplexQueryParser and see if it is a good fit.
We use a lot of the e/dismax params though, such as `bf` (boost functions),
`bq` (boost queries), and 'pf' (phrase fields), to influence the relevance
score.

FYI: We are using Solr 8.3.

On Tue, 2 Mar 2021 at 13:38, Alexandre Rafalovitch 
wrote:

> I admit to not fully understanding the examples, but ComplexQueryParser
> looks like something worth at least reviewing:
>
>
> https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser
>
> Also I did not see any references to trying to copyField and process same
> content in different ways. If copyField is not stored, the overhead is not
> as large.
>
> Regards,
> Alex
>
>
>
> On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, 
> wrote:
>
> > Hi All
> >
> > I have been trying to implement multi word synonyms using `sow=false`
> into
> > a pre-existing system that applied pre-processing to the phrase to apply
> > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
> >
> > I got the synonyms expansion working perfectly, after discovering the
> > `preserveOriginal` filter param, but then I needed to re-implement the
> > existing wildcard behaviour.
> > I tried using the edge-ngram filter, but found that when searching for
> the
> > phrase `bread stick` on a field containing the word `breadstick` and
> > `q.op=AND` it returns no results, as the content `breadstick` does not
> > _start with_ `stick`. The previous wildcard behaviour would return all
> > documents that contain the substrings `bread` AND `stick`, which is the
> > desired behaviour.
> > I tried using the ngram filter, but this does not support the
> > `preserveOriginal`, and so loses a lot of relevance for exact matches,
> but
> > it also results in matches that are far too broad, creating 21 tokens
> from
> > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> > essentially matches all of the documents. Which means that boosts applied
> > to other fields, such as 'in stock', push irrelevant documents to the
> top.
> >
> > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> > syntax and local params, a solr feature that is not very well documented.
> > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> > sow=false v=$plain}` to effectively create a union of results, one with
> > multi word synonyms support and one with wildcard support.
> > But then I had to implement the other edismax params and immediately
> > stumbled.
> > Each query in production normally has a slew of `bf` and `bq` params,
> and I
> > cannot see a way to pass these into the nested query using local
> variables.
> > If I have 3 different `bf` params how can I pass them into the local
> param
> > subqueries?
> >
> > Also, as the search in production is across multiple fields I found
> passing
> > `qf` to both subqueries using dereferencing failed, as the parser saw it
> as
> > a single field and threw a 'number format exception'.
> > i.e.
> > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> > $tw=*bread* *stick*
> > $tp=bread stick
> > $tqf=title^2 desctiption^0.5
> >
> > As you can guess, I have spent quite some time going down this rabbit
> hole
> > in my attempt to reproduce the existing desired functionality alongside
> > multiterm synonyms.
> > Is there a way to get multiterm synonyms working with substring matching
> > effectively?
> > I am sure there is a much simpler way that I am missing than all of my
> > attempts so far.
> >
> > Solr: 8.3
> >
> > Thanks
> > Martin Graney
> >
> > --
> >  
> >
>


-- 
Martin Graney
Lead Developer

http://sooqr.com 
http://twitter.com/sooqrcom

Office: +31 (0) 88 766 7700
Mobile: +31 (0) 64 660 8543

-- 
 


Filter by sibling ?

2021-03-02 Thread Manoj Mokashi
Hi,

If I have a nested document structure, with say parent type:PR, child 1 type:C1 
and child2 type:C2,
would it possible to fetch documents of type C1  that are children of parents 
that have child2 docs with a certain condition ?
e.g. for
{ type:PR,
  Title: "XXX",
  Children1 : [ { type:C1, city:ABC} ],
  Children2 : [ { type:C2, status:Done}]
}

Can I fetch type:C1 documents which are children of parent docs that have child 
C2 docs with status:Done ?

Regards,
manoj

Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Alexandre Rafalovitch
I admit to not fully understanding the examples, but ComplexQueryParser
looks like something worth at least reviewing:

https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser

Also I did not see any references to trying to copyField and process same
content in different ways. If copyField is not stored, the overhead is not
as large.

Regards,
Alex



On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, 
wrote:

> Hi All
>
> I have been trying to implement multi word synonyms using `sow=false` into
> a pre-existing system that applied pre-processing to the phrase to apply
> wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
>
> I got the synonyms expansion working perfectly, after discovering the
> `preserveOriginal` filter param, but then I needed to re-implement the
> existing wildcard behaviour.
> I tried using the edge-ngram filter, but found that when searching for the
> phrase `bread stick` on a field containing the word `breadstick` and
> `q.op=AND` it returns no results, as the content `breadstick` does not
> _start with_ `stick`. The previous wildcard behaviour would return all
> documents that contain the substrings `bread` AND `stick`, which is the
> desired behaviour.
> I tried using the ngram filter, but this does not support the
> `preserveOriginal`, and so loses a lot of relevance for exact matches, but
> it also results in matches that are far too broad, creating 21 tokens from
> `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> essentially matches all of the documents. Which means that boosts applied
> to other fields, such as 'in stock', push irrelevant documents to the top.
>
> Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> syntax and local params, a solr feature that is not very well documented.
> I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> sow=false v=$plain}` to effectively create a union of results, one with
> multi word synonyms support and one with wildcard support.
> But then I had to implement the other edismax params and immediately
> stumbled.
> Each query in production normally has a slew of `bf` and `bq` params, and I
> cannot see a way to pass these into the nested query using local variables.
> If I have 3 different `bf` params how can I pass them into the local param
> subqueries?
>
> Also, as the search in production is across multiple fields I found passing
> `qf` to both subqueries using dereferencing failed, as the parser saw it as
> a single field and threw a 'number format exception'.
> i.e.
> q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> $tw=*bread* *stick*
> $tp=bread stick
> $tqf=title^2 desctiption^0.5
>
> As you can guess, I have spent quite some time going down this rabbit hole
> in my attempt to reproduce the existing desired functionality alongside
> multiterm synonyms.
> Is there a way to get multiterm synonyms working with substring matching
> effectively?
> I am sure there is a much simpler way that I am missing than all of my
> attempts so far.
>
> Solr: 8.3
>
> Thanks
> Martin Graney
>
> --
>  
>


Partial update bug on solr 8.8.0

2021-03-02 Thread Mohsen Saboorian
Any idea about this post?
https://stackoverflow.com/q/66335803/141438

Regards.


Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Martin Graney
Hi All

I have been trying to implement multi word synonyms using `sow=false` into
a pre-existing system that applied pre-processing to the phrase to apply
wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.

I got the synonyms expansion working perfectly, after discovering the
`preserveOriginal` filter param, but then I needed to re-implement the
existing wildcard behaviour.
I tried using the edge-ngram filter, but found that when searching for the
phrase `bread stick` on a field containing the word `breadstick` and
`q.op=AND` it returns no results, as the content `breadstick` does not
_start with_ `stick`. The previous wildcard behaviour would return all
documents that contain the substrings `bread` AND `stick`, which is the
desired behaviour.
I tried using the ngram filter, but this does not support the
`preserveOriginal`, and so loses a lot of relevance for exact matches, but
it also results in matches that are far too broad, creating 21 tokens from
`breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
essentially matches all of the documents. Which means that boosts applied
to other fields, such as 'in stock', push irrelevant documents to the top.

Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
syntax and local params, a solr feature that is not very well documented.
I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
sow=false v=$plain}` to effectively create a union of results, one with
multi word synonyms support and one with wildcard support.
But then I had to implement the other edismax params and immediately
stumbled.
Each query in production normally has a slew of `bf` and `bq` params, and I
cannot see a way to pass these into the nested query using local variables.
If I have 3 different `bf` params how can I pass them into the local param
subqueries?

Also, as the search in production is across multiple fields I found passing
`qf` to both subqueries using dereferencing failed, as the parser saw it as
a single field and threw a 'number format exception'.
i.e.
q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
$tw=*bread* *stick*
$tp=bread stick
$tqf=title^2 desctiption^0.5

As you can guess, I have spent quite some time going down this rabbit hole
in my attempt to reproduce the existing desired functionality alongside
multiterm synonyms.
Is there a way to get multiterm synonyms working with substring matching
effectively?
I am sure there is a much simpler way that I am missing than all of my
attempts so far.

Solr: 8.3

Thanks
Martin Graney

-- 
 


Re: Solr wiki page update

2021-03-02 Thread Jan Høydahl
Vincent,

I added you as editor, please try editing that page again.

Jan

> 11. feb. 2021 kl. 17:43 skrev Vincent Brehin :
> 
> Hi community members,
> I work for Adelean  https://www.adelean.com/ , we are offering services
> around everything Search related, and especially Solr consulting and
> support. We are based in Paris and operate mainly in France.
> Is it possible to list our company on the support page (Support - SOLR -
> Apache Software Foundation
> ) ?
> Or give me the permission to edit it on confluence (my user:
> vincent.brehin) ?
> Thanks !
> Best Regards,
> 
> Vincent