ManagedSynonymFilter

2020-04-30 Thread Kayak28
Hello, Community:

I have a simple question about a managed resources and managed synonyms.

I use ManagedSynonymGraphFilter and its managed resource is named as
english.
I do put several synonyms from URL below.

Now I want to delete all of the registered synonyms at one command.
I thought that the following URL could be the command to delete all.

curl -X DELETE 
"http://localhost:8983/solr/mycore/schema/analysis/stopwords/english;


However, Solr gives me an error.

{

  "responseHeader":{

"status":403,

"QTime":0},

  "error":{

"metadata":[

  "error-class","org.apache.solr.common.SolrException",

  "root-error-class","org.apache.solr.common.SolrException"],

"msg":"Cannot delete managed resource /schema/analysis/synonyms/english
as it is being used by 1 Solr components",

"code":403}}

And here is my questions:
- Is there any way to delete all registered synonyms at one API call?
- does  "components" refer to? Is that one field that contains a managed
synonym filter?

Sincerely,
Kaya Ota

-- 

Sincerely,
Kaya
github: https://github.com/28kayak


Re: SolrCloud degraded during backup and batch CSV update

2020-04-30 Thread Ganesh Sethuraman
Any other JVM settings change possible?

On Tue, Apr 28, 2020, 10:15 PM Sethuraman, Ganesh
 wrote:

> Hi
>
> We are using SolrCloud 7.2.1 with 3 node Zookeeper ensemble. We have 92
> collection each on avg. having 8 shards and 2 replica with 2 EC2 nodes,
> with JVM size of 18GB (G1 GC). We need your help with the Issue we faced
> today: The issue is SolrCloud server went into a degraded collections (for
> few collections) when the Solr backup and the Solr batch CSV update load
> happened at the same time as backup. The CSV data load was about ~5 GB per
> shard/replica. We think this happened after zkClient disconnect happened as
> noted below.  We had to restart Solr to bring it back to normal.
>
>
>   1.  Is it not suggested to run backup and Solr batch CSV update large
> load at the same time?
>   2.  In the past we have seen two CSV batch update load in parallel
> causes issues, is this also not suggested (this issue is not related to
> that)?
>   3.  Do you think we should increase Zookeeper timeout?
>   4.  How do we know if  we need to up the JVM Max memory, and by how much?
>   5.  We also see that once the Solr goes into degraded collection and
> recovery failed, it NEVER get back to normal, even after when there is no
> load. Is this a bug?
>
> The GC information and Solr Log below
>
>
> https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjAvMDQvMjkvLS0wMl9zb2xyX2djLmxvZy56aXAtLTEtNDAtMzE==WEB
>
>
> 2020-04-27 07:34:07.322 WARN
> (zkConnectionManagerCallback-6-thread-1-processing-n:mysolrsever.com:6010_solr-SendThread(zoo-prd-n1:2181))
> [   ] o.a.z.ClientCnxn Client session timed out, have not heard from server
> in 10775ms for sessionid 0x171a6fb51310008
> 
> 2020-04-27 07:34:07.426 WARN
> (zkConnectionManagerCallback-6-thread-1-processing-n:mysolrsever.com:6010_solr-EventThread)
> [   ] o.a.s.c.c.ConnectionManager zkClient has disconnected
>
>
>
>
> SOLR Log Below (Curtailed WARN log)
> 
> 2020-04-27 07:26:45.402 WARN
> (recoveryExecutor-4-thread-697-processing-n:mysolrsever.com:6010_solr
> x:mycollection_shard13_replica_n48 s:shard13 c:mycollection r:core_node51)
> [c:mycollection s:shard13 r:core_node51 x:mycollection_shard13_replica_n48]
> o.a.s.h.IndexFetcher Error in fetching file: _1kr_r.liv (downloaded 0 of
> 587 bytes)
> java.io.EOFException
>   at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
>   at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
>   at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1579)
>   at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1545)
>   at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1526)
>   at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1008)
>   at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:566)
>   at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:345)
>   at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:420)
>   at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:225)
>   at
> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:626)
>   at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:308)
>   at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:292)
>   at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-04-27 07:26:45.405 WARN
> (recoveryExecutor-4-thread-697-processing-n:mysolrsever.com:6010_solr
> x:mycollection_shard13_replica_n48 s:shard13 c:mycollection r:core_node51)
> [c:mycollection s:shard13 r:core_node51 x:mycollection_shard13_replica_n48]
> o.a.s.h.IndexFetcher Error in fetching file: _1kr_r.liv (downloaded 0 of
> 587 bytes)
> java.io.EOFException
>   at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
>   at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
>   at
> 

Re: off-heap OOM

2020-04-30 Thread Raji N
It used to occur every 3 days ,we reduced heap and it started
occurring every 5 days .  From the logs we can't get much. Some times we
see "unable to create  new native thread" in the logs and many times no
exceptions .
When it says "unable to create native thread" error , we got below
exceptions as we use cdcr. To eliminate cdcr from this issue , we disabled
CDCR also. But we still get OOM.

 WARN  (cdcr-update-log-synchronizer-93-thread-1) [   ]
o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception

java.lang.OutOfMemoryError: unable to create new native thread

   at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]

   at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]

   at
org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
~[httpclient-4.5.3.jar:4.5.3]

   at
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
~[httpclient-4.5.3.jar:4.5.3]

   at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

   at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

   at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

   at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

   at
org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

   at
org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

   at
org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
[solr-core-7.6.0.jar:7.6.0-SNAPSHOT
34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
14:02:46]

   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_211]

   at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[?:1.8.0_211]

   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_211]

   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[?:1.8.0_211]

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_211]

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_211]

Thanks,
Raji
On Thu, Apr 30, 2020 at 12:24 AM Mikhail Khludnev  wrote:

> Raji, how that "OOM for solr occur in every 5 days." exactly looks like?
> What is the error message? Where it's occurring exactly?
>
> On Thu, Apr 30, 2020 at 1:30 AM Raji N  wrote:
>
> > Thanks so much Jan. Will try your suggestions , yes we are also running
> > solr inside docker.
> >
> > Thanks,
> > Raji
> >
> > On Wed, Apr 29, 2020 at 1:46 PM Jan Høydahl 
> wrote:
> >
> > > I have seen the same, but only in Docker.
> > > I think it does not relate to Solr’s off-heap usage for filters and
> other
> > > data structures, but rather how Docker treats memory-mapped files as
> > > virtual memory.
> > > As you know, when using MMapDirectoryFactory, you actually let Linux
> > > handle the loading and unloading of the index files, and Solr will
> access
> > > them as if they were in a huge virtual memory pool. Naturally the index
> > > files grow large, and there is something strange going on in the way
> > Docker
> > > handles this, leading to OOM, not for Java heap but for the process.
> > >
> > > I have no definitive answer, but so far my research has found a few
> > > possible settings
> > >
> > > Set env.var MALLOC_ARENA_MAX=2
> > > Try to limit -XX:MaxDirectMemorySize
> > > Lower mem swappiness in Docker (--memory-swappiness 0)
> > > More generic insight into java mem allocation in Docker:
> > > https://dzone.com/articles/native-memory-allocation-in-examples
> > >
> > > Have not yet found a silver bullet, so very interested in this thread.
> > >
> > > Jan
> > >
> > > > 29. apr. 2020 kl. 19:26 skrev Raji N :
> > > >
> > > > Thank you for your reply.  When OOM happens somehow it doesn't
> generate
> > > > dump file. So we have hourly heaps running to diagnose this issue.
> Heap
> > > is
> > > > 

RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Jhonny Lopez
Yes, sounds like worth it.

Thanks guys!

-Original Message-
From: Mike Drob 
Sent: jueves, 30 de abril de 2020 5:30 p. m.
To: solr-user@lucene.apache.org
Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'

This email has been sent from a source external to Publicis Groupe. Please use 
caution when clicking links or opening attachments.
Cet email a été envoyé depuis une source externe à Publicis Groupe. Veuillez 
faire preuve de prudence lorsque vous cliquez sur des liens ou lorsque vous 
ouvrez des pièces jointes.



Is this worth filing a bug/suggestion to the folks over at snowballstem.org?

On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com 
 wrote:

> I agree with Erick. I think that's just how the cookie crumbles when
> stemming. If you have some time on your hands, you can integrate
> OpenNLP with your Solr instance and start using the lemmas of tokens
> instead of the stems. In this case, I believe if you were to lemmatize
> both "identify" and "identification," they would both condense to "identify."
>
> Best,
> Audrey
>
> On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:
>
> They are being stemmed to two different tokens, “identif” and
> “identifi”. Stemming is algorithmic and imperfect and in this case
> you’re getting bitten by that algorithm. It looks like you’re using
> PorterStemFilter, if you want you can look up the exact algorithm, but
> I don’t think it’s a bug, just one of those little joys of English...
>
> To get a clearer picture of exactly what’s being searched, try
> adding =query to your query, in particular looking at the parsed
> query that’s returned. That’ll tell you a bunch. In this particular
> case I don’t think it’ll tell you anything more, but for future…
>
> Best,
> Erick
>
> On, and un-checking the ‘verbose’ box on the analysis page removes
> a lot of distraction, the detailed information is often TMI ;)
>
> > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> jhonny.lo...@publicismedia.com> wrote:
> >
> > Sure, rewriting the message with links for images:
> >
> >
> > We’re facing an issue with stemming in solr. Most of the cases
> are working correctly, for example, if we search for bidding, solr
> brings results for bidding, bid, bids, etc. However, with nouns ended with 
> ‘ion’
> suffix, stemming is not working. Even when analyzers seems to have
> correct stemming of the word, the results are not reflecting that. One
> example. If I search ‘identifying’, this is the output:
> >
> > Analyzer (image link):
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo=
> >
> > A clip of results:
> > "haschildren_b":false,
> >"isbucket_text_s":"0",
> >"sectionbody_t":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a
> log file report to understand the trends and gauge auction spread
> overtime to assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"parsedupdatedby_s":"sitecorecarvaini",
> >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a
> log file report to understand the trends and gauge auction spread
> overtime to assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"hide_section_b":false
> >
> >
> > As you can see, it has used the stemming correctly and brings
> results for other words based in the root, in this case “Identify”.
> >
> > However, if I search for “Identification”, this is the output:
> >
> > Analyzer (imagelink):
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd49RpiQObzMgSjVhA=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=5RlkLH-90sYc4nyIgnPO9MsBlyh7iWSOphEVdjUvTIE=
> >
> >
> > Even with proper stemming, solr is only bringing results for the
> word identification (or identifications) but nothing else.
> >
> > The queries are over the same field that has the Porter Stemming
> Filter applied for both, query and index. This behavior is consistent
> with other ‘ion’ ended nouns: representation, modification, etc.
> >
> > Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > -Original Message-
> >
> > From: Erick Erickson 
> >
> > Sent: jueves, 30 de abril de 2020 1:47 p. m.
> >
> > To: solr-user@lucene.apache.org
> >
> > Subject: Re: Possible issue with Stemming and nouns ended with
> suffix 'ion'
> 

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Mike Drob
Is this worth filing a bug/suggestion to the folks over at snowballstem.org?

On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> I agree with Erick. I think that's just how the cookie crumbles when
> stemming. If you have some time on your hands, you can integrate OpenNLP
> with your Solr instance and start using the lemmas of tokens instead of the
> stems. In this case, I believe if you were to lemmatize both "identify" and
> "identification," they would both condense to "identify."
>
> Best,
> Audrey
>
> On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:
>
> They are being stemmed to two different tokens, “identif” and
> “identifi”. Stemming is algorithmic and imperfect and in this case you’re
> getting bitten by that algorithm. It looks like you’re using
> PorterStemFilter, if you want you can look up the exact algorithm, but I
> don’t think it’s a bug, just one of those little joys of English...
>
> To get a clearer picture of exactly what’s being searched, try adding
> =query to your query, in particular looking at the parsed query
> that’s returned. That’ll tell you a bunch. In this particular case I don’t
> think it’ll tell you anything more, but for future…
>
> Best,
> Erick
>
> On, and un-checking the ‘verbose’ box on the analysis page removes a
> lot of distraction, the detailed information is often TMI ;)
>
> > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> jhonny.lo...@publicismedia.com> wrote:
> >
> > Sure, rewriting the message with links for images:
> >
> >
> > We’re facing an issue with stemming in solr. Most of the cases are
> working correctly, for example, if we search for bidding, solr brings
> results for bidding, bid, bids, etc. However, with nouns ended with ‘ion’
> suffix, stemming is not working. Even when analyzers seems to have correct
> stemming of the word, the results are not reflecting that. One example. If
> I search ‘identifying’, this is the output:
> >
> > Analyzer (image link):
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo=
> >
> > A clip of results:
> > "haschildren_b":false,
> >"isbucket_text_s":"0",
> >"sectionbody_t":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a log
> file report to understand the trends and gauge auction spread overtime to
> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"parsedupdatedby_s":"sitecorecarvaini",
> >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a log
> file report to understand the trends and gauge auction spread overtime to
> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"hide_section_b":false
> >
> >
> > As you can see, it has used the stemming correctly and brings
> results for other words based in the root, in this case “Identify”.
> >
> > However, if I search for “Identification”, this is the output:
> >
> > Analyzer (imagelink):
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd49RpiQObzMgSjVhA=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=5RlkLH-90sYc4nyIgnPO9MsBlyh7iWSOphEVdjUvTIE=
> >
> >
> > Even with proper stemming, solr is only bringing results for the
> word identification (or identifications) but nothing else.
> >
> > The queries are over the same field that has the Porter Stemming
> Filter applied for both, query and index. This behavior is consistent with
> other ‘ion’ ended nouns: representation, modification, etc.
> >
> > Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > -Original Message-
> >
> > From: Erick Erickson 
> >
> > Sent: jueves, 30 de abril de 2020 1:47 p. m.
> >
> > To: solr-user@lucene.apache.org
> >
> > Subject: Re: Possible issue with Stemming and nouns ended with
> suffix 'ion'
> >
> >
> >
> > This email has been sent from a source external to Publicis Groupe.
> Please use caution when clicking links or opening attachments.
> >
> > Cet email a été envoyé depuis une source externe à Publicis Groupe.
> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
> lorsque vous ouvrez des pièces jointes.
> >
> >
> >
> >
> >
> >
> >
> > The mail server is pretty aggressive about stripping links, so we
> can’t see the images.
> >
> >
> >
> > Could you put 

RE: Solr fields mapping

2020-04-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Sam,

Ah, okay, I see. Hm, I wonder if you could hack "debug mode" to show you how 
they're interacting with the field. I'll keep thinking ... 

Best,
Audrey

On 4/30/20, 3:20 PM, "sambasivarao giddaluri"  
wrote:

Hi Audrey,

Yes i am aware of copyField but it does not fit in my use case. Reason is
while giving as output we have to show each field with its
value,  with copy it combines the value but we do not know field and value
relationship.

regards
sam

On Wed, Apr 29, 2020 at 9:53 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> Hi, Sam!
>
> Have you tried creating a copyField?
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_view_L_view_Lucene_job_Solr-2Dreference-2Dguide-2D8.x_javadoc_copying-2Dfields.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=3pUA4RBPyJvc8q5RLwe2-r6UbKLIilZBzuS6NC9G0yw=1MxuDdlavuTpOZVDD1lGILzhSgNnPhv6Chh7dSwBGxo=
 
>
> Best,
> Audrey
>
> On 4/28/20, 1:07 PM, "sambasivarao giddaluri" <
> sambasiva.giddal...@gmail.com> wrote:
>
> Hi All,
> Is there a way we can map fields in a single field?
> Ex: scheme has below fields
> createdBy.userName
> createdBy.name
> createdBy.email
>
> If have to retrieve these fields need to pass all the three fields in
> *fl*
> parameter  instead is there a way i can have a map or a object of 
these
> fields in to createdBy and in fl i pass only createdBy and get all
> these 3
> as output
>
> Regards
> sam
>
>
>




RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I agree with Erick. I think that's just how the cookie crumbles when stemming. 
If you have some time on your hands, you can integrate OpenNLP with your Solr 
instance and start using the lemmas of tokens instead of the stems. In this 
case, I believe if you were to lemmatize both "identify" and "identification," 
they would both condense to "identify."

Best,
Audrey

On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:

They are being stemmed to two different tokens, “identif” and “identifi”. 
Stemming is algorithmic and imperfect and in this case you’re getting bitten by 
that algorithm. It looks like you’re using PorterStemFilter, if you want you 
can look up the exact algorithm, but I don’t think it’s a bug, just one of 
those little joys of English...

To get a clearer picture of exactly what’s being searched, try adding 
=query to your query, in particular looking at the parsed query that’s 
returned. That’ll tell you a bunch. In this particular case I don’t think it’ll 
tell you anything more, but for future…

Best,
Erick

On, and un-checking the ‘verbose’ box on the analysis page removes a lot of 
distraction, the detailed information is often TMI ;)

> On Apr 30, 2020, at 2:51 PM, Jhonny Lopez 
 wrote:
> 
> Sure, rewriting the message with links for images:
> 
> 
> We’re facing an issue with stemming in solr. Most of the cases are 
working correctly, for example, if we search for bidding, solr brings results 
for bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
stemming is not working. Even when analyzers seems to have correct stemming of 
the word, the results are not reflecting that. One example. If I search 
‘identifying’, this is the output:
> 
> Analyzer (image link):
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo=
 
> 
> A clip of results:
> "haschildren_b":false,
>"isbucket_text_s":"0",
>"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
leverage the proprietary tools available or manually pull a log file report to 
understand the trends and gauge auction spread overtime to assess the impact of 
variable auction dynamics.\n\n\n\n\n\n\n",
>"parsedupdatedby_s":"sitecorecarvaini",
>"sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
leverage the proprietary tools available or manually pull a log file report to 
understand the trends and gauge auction spread overtime to assess the impact of 
variable auction dynamics.\n\n\n\n\n\n\n",
>"hide_section_b":false
> 
> 
> As you can see, it has used the stemming correctly and brings results for 
other words based in the root, in this case “Identify”.
> 
> However, if I search for “Identification”, this is the output:
> 
> Analyzer (imagelink):
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd49RpiQObzMgSjVhA=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=5RlkLH-90sYc4nyIgnPO9MsBlyh7iWSOphEVdjUvTIE=
 
> 
> 
> Even with proper stemming, solr is only bringing results for the word 
identification (or identifications) but nothing else.
> 
> The queries are over the same field that has the Porter Stemming Filter 
applied for both, query and index. This behavior is consistent with other ‘ion’ 
ended nouns: representation, modification, etc.
> 
> Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> 
> Thanks.
> 
> 
> 
> 
> 
> -Original Message-
> 
> From: Erick Erickson 
> 
> Sent: jueves, 30 de abril de 2020 1:47 p. m.
> 
> To: solr-user@lucene.apache.org
> 
> Subject: Re: Possible issue with Stemming and nouns ended with suffix 
'ion'
> 
> 
> 
> This email has been sent from a source external to Publicis Groupe. 
Please use caution when clicking links or opening attachments.
> 
> Cet email a été envoyé depuis une source externe à Publicis Groupe. 
Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou lorsque 
vous ouvrez des pièces jointes.
> 
> 
> 
> 
> 
> 
> 
> The mail server is pretty aggressive about stripping links, so we can’t 
see the images.
> 
> 
> 
> Could you put them somewhere and paste a link?
> 
> 
> 
> Best,
> 
> Erick
> 
> 
> 
>> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez 
 wrote:
> 
>> 
> 
>> We’re facing an issue with stemming in solr. Most of the cases are 
working correctly, for example, if we search for bidding, solr brings 

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread matthew sporleder
If you use the stemmer in your query analysis it should act the same, right?

On Thu, Apr 30, 2020 at 3:54 PM Erick Erickson  wrote:
>
> They are being stemmed to two different tokens, “identif” and “identifi”. 
> Stemming is algorithmic and imperfect and in this case you’re getting bitten 
> by that algorithm. It looks like you’re using PorterStemFilter, if you want 
> you can look up the exact algorithm, but I don’t think it’s a bug, just one 
> of those little joys of English...
>
> To get a clearer picture of exactly what’s being searched, try adding 
> =query to your query, in particular looking at the parsed query that’s 
> returned. That’ll tell you a bunch. In this particular case I don’t think 
> it’ll tell you anything more, but for future…
>
> Best,
> Erick
>
> On, and un-checking the ‘verbose’ box on the analysis page removes a lot of 
> distraction, the detailed information is often TMI ;)
>
> > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez  
> > wrote:
> >
> > Sure, rewriting the message with links for images:
> >
> >
> > We’re facing an issue with stemming in solr. Most of the cases are working 
> > correctly, for example, if we search for bidding, solr brings results for 
> > bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> > stemming is not working. Even when analyzers seems to have correct stemming 
> > of the word, the results are not reflecting that. One example. If I search 
> > ‘identifying’, this is the output:
> >
> > Analyzer (image link):
> > https://1drv.ms/u/s!AlRTlFq8tQbShd4-Cp40Cmc0QioS0A?e=1f3GJp
> >
> > A clip of results:
> > "haschildren_b":false,
> >"isbucket_text_s":"0",
> >"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> > leverage the proprietary tools available or manually pull a log file report 
> > to understand the trends and gauge auction spread overtime to assess the 
> > impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"parsedupdatedby_s":"sitecorecarvaini",
> >"sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
> > leverage the proprietary tools available or manually pull a log file report 
> > to understand the trends and gauge auction spread overtime to assess the 
> > impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"hide_section_b":false
> >
> >
> > As you can see, it has used the stemming correctly and brings results for 
> > other words based in the root, in this case “Identify”.
> >
> > However, if I search for “Identification”, this is the output:
> >
> > Analyzer (imagelink):
> > https://1drv.ms/u/s!AlRTlFq8tQbShd49RpiQObzMgSjVhA
> >
> >
> > Even with proper stemming, solr is only bringing results for the word 
> > identification (or identifications) but nothing else.
> >
> > The queries are over the same field that has the Porter Stemming Filter 
> > applied for both, query and index. This behavior is consistent with other 
> > ‘ion’ ended nouns: representation, modification, etc.
> >
> > Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > -Original Message-
> >
> > From: Erick Erickson 
> >
> > Sent: jueves, 30 de abril de 2020 1:47 p. m.
> >
> > To: solr-user@lucene.apache.org
> >
> > Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'
> >
> >
> >
> > This email has been sent from a source external to Publicis Groupe. Please 
> > use caution when clicking links or opening attachments.
> >
> > Cet email a été envoyé depuis une source externe à Publicis Groupe. 
> > Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou 
> > lorsque vous ouvrez des pièces jointes.
> >
> >
> >
> >
> >
> >
> >
> > The mail server is pretty aggressive about stripping links, so we can’t see 
> > the images.
> >
> >
> >
> > Could you put them somewhere and paste a link?
> >
> >
> >
> > Best,
> >
> > Erick
> >
> >
> >
> >> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez  
> >> wrote:
> >
> >>
> >
> >> We’re facing an issue with stemming in solr. Most of the cases are working 
> >> correctly, for example, if we search for bidding, solr brings results for 
> >> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> >> stemming is not working. Even when analyzers seems to have correct 
> >> stemming of the word, the results are not reflecting that. One example. If 
> >> I search ‘identifying’, this is the output:
> >
> >>
> >
> >> Analyzer (image):
> >
> >>
> >
> >> A clip of results:
> >
> >> "haschildren_b":false,
> >
> >>"isbucket_text_s":"0",
> >
> >>"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> >> leverage the proprietary tools available or manually pull a log file 
> >> report to understand the trends and gauge auction spread overtime to 
> >> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >
> >>"parsedupdatedby_s":"sitecorecarvaini",
> >
> >>"sectionbody_t_en":"\n\n\nIn order 

Re: Solr fields mapping

2020-04-30 Thread matthew sporleder
fl=createdByMap:concat("createdBy.userName:
",createdBy.userName,",","createdBy.name: ",createdBy.name," ...)

On Thu, Apr 30, 2020 at 3:20 PM sambasivarao giddaluri
 wrote:
>
> Hi Audrey,
>
> Yes i am aware of copyField but it does not fit in my use case. Reason is
> while giving as output we have to show each field with its
> value,  with copy it combines the value but we do not know field and value
> relationship.
>
> regards
> sam
>
> On Wed, Apr 29, 2020 at 9:53 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > Hi, Sam!
> >
> > Have you tried creating a copyField?
> > https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/copying-fields.html
> >
> > Best,
> > Audrey
> >
> > On 4/28/20, 1:07 PM, "sambasivarao giddaluri" <
> > sambasiva.giddal...@gmail.com> wrote:
> >
> > Hi All,
> > Is there a way we can map fields in a single field?
> > Ex: scheme has below fields
> > createdBy.userName
> > createdBy.name
> > createdBy.email
> >
> > If have to retrieve these fields need to pass all the three fields in
> > *fl*
> > parameter  instead is there a way i can have a map or a object of these
> > fields in to createdBy and in fl i pass only createdBy and get all
> > these 3
> > as output
> >
> > Regards
> > sam
> >
> >
> >


Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Erick Erickson
They are being stemmed to two different tokens, “identif” and “identifi”. 
Stemming is algorithmic and imperfect and in this case you’re getting bitten by 
that algorithm. It looks like you’re using PorterStemFilter, if you want you 
can look up the exact algorithm, but I don’t think it’s a bug, just one of 
those little joys of English...

To get a clearer picture of exactly what’s being searched, try adding 
=query to your query, in particular looking at the parsed query that’s 
returned. That’ll tell you a bunch. In this particular case I don’t think it’ll 
tell you anything more, but for future…

Best,
Erick

On, and un-checking the ‘verbose’ box on the analysis page removes a lot of 
distraction, the detailed information is often TMI ;)

> On Apr 30, 2020, at 2:51 PM, Jhonny Lopez  
> wrote:
> 
> Sure, rewriting the message with links for images:
> 
> 
> We’re facing an issue with stemming in solr. Most of the cases are working 
> correctly, for example, if we search for bidding, solr brings results for 
> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> stemming is not working. Even when analyzers seems to have correct stemming 
> of the word, the results are not reflecting that. One example. If I search 
> ‘identifying’, this is the output:
> 
> Analyzer (image link):
> https://1drv.ms/u/s!AlRTlFq8tQbShd4-Cp40Cmc0QioS0A?e=1f3GJp
> 
> A clip of results:
> "haschildren_b":false,
>"isbucket_text_s":"0",
>"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> leverage the proprietary tools available or manually pull a log file report 
> to understand the trends and gauge auction spread overtime to assess the 
> impact of variable auction dynamics.\n\n\n\n\n\n\n",
>"parsedupdatedby_s":"sitecorecarvaini",
>"sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
> leverage the proprietary tools available or manually pull a log file report 
> to understand the trends and gauge auction spread overtime to assess the 
> impact of variable auction dynamics.\n\n\n\n\n\n\n",
>"hide_section_b":false
> 
> 
> As you can see, it has used the stemming correctly and brings results for 
> other words based in the root, in this case “Identify”.
> 
> However, if I search for “Identification”, this is the output:
> 
> Analyzer (imagelink):
> https://1drv.ms/u/s!AlRTlFq8tQbShd49RpiQObzMgSjVhA
> 
> 
> Even with proper stemming, solr is only bringing results for the word 
> identification (or identifications) but nothing else.
> 
> The queries are over the same field that has the Porter Stemming Filter 
> applied for both, query and index. This behavior is consistent with other 
> ‘ion’ ended nouns: representation, modification, etc.
> 
> Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> 
> Thanks.
> 
> 
> 
> 
> 
> -Original Message-
> 
> From: Erick Erickson 
> 
> Sent: jueves, 30 de abril de 2020 1:47 p. m.
> 
> To: solr-user@lucene.apache.org
> 
> Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'
> 
> 
> 
> This email has been sent from a source external to Publicis Groupe. Please 
> use caution when clicking links or opening attachments.
> 
> Cet email a été envoyé depuis une source externe à Publicis Groupe. Veuillez 
> faire preuve de prudence lorsque vous cliquez sur des liens ou lorsque vous 
> ouvrez des pièces jointes.
> 
> 
> 
> 
> 
> 
> 
> The mail server is pretty aggressive about stripping links, so we can’t see 
> the images.
> 
> 
> 
> Could you put them somewhere and paste a link?
> 
> 
> 
> Best,
> 
> Erick
> 
> 
> 
>> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez  
>> wrote:
> 
>> 
> 
>> We’re facing an issue with stemming in solr. Most of the cases are working 
>> correctly, for example, if we search for bidding, solr brings results for 
>> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
>> stemming is not working. Even when analyzers seems to have correct stemming 
>> of the word, the results are not reflecting that. One example. If I search 
>> ‘identifying’, this is the output:
> 
>> 
> 
>> Analyzer (image):
> 
>> 
> 
>> A clip of results:
> 
>> "haschildren_b":false,
> 
>>"isbucket_text_s":"0",
> 
>>"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
>> leverage the proprietary tools available or manually pull a log file report 
>> to understand the trends and gauge auction spread overtime to assess the 
>> impact of variable auction dynamics.\n\n\n\n\n\n\n",
> 
>>"parsedupdatedby_s":"sitecorecarvaini",
> 
>>"sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
>> leverage the proprietary tools available or manually pull a log file report 
>> to understand the trends and gauge auction spread overtime to assess the 
>> impact of variable auction dynamics.\n\n\n\n\n\n\n",
> 
>>"hide_section_b":false
> 
>> 
> 
>> 
> 
>> As you can see, it has used the stemming correctly and 

Re: Solr fields mapping

2020-04-30 Thread sambasivarao giddaluri
Hi Audrey,

Yes i am aware of copyField but it does not fit in my use case. Reason is
while giving as output we have to show each field with its
value,  with copy it combines the value but we do not know field and value
relationship.

regards
sam

On Wed, Apr 29, 2020 at 9:53 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> Hi, Sam!
>
> Have you tried creating a copyField?
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/copying-fields.html
>
> Best,
> Audrey
>
> On 4/28/20, 1:07 PM, "sambasivarao giddaluri" <
> sambasiva.giddal...@gmail.com> wrote:
>
> Hi All,
> Is there a way we can map fields in a single field?
> Ex: scheme has below fields
> createdBy.userName
> createdBy.name
> createdBy.email
>
> If have to retrieve these fields need to pass all the three fields in
> *fl*
> parameter  instead is there a way i can have a map or a object of these
> fields in to createdBy and in fl i pass only createdBy and get all
> these 3
> as output
>
> Regards
> sam
>
>
>


RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Jhonny Lopez
Sure, rewriting the message with links for images:


We’re facing an issue with stemming in solr. Most of the cases are working 
correctly, for example, if we search for bidding, solr brings results for 
bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, stemming 
is not working. Even when analyzers seems to have correct stemming of the word, 
the results are not reflecting that. One example. If I search ‘identifying’, 
this is the output:

Analyzer (image link):
https://1drv.ms/u/s!AlRTlFq8tQbShd4-Cp40Cmc0QioS0A?e=1f3GJp

A clip of results:
"haschildren_b":false,
"isbucket_text_s":"0",
"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
leverage the proprietary tools available or manually pull a log file report to 
understand the trends and gauge auction spread overtime to assess the impact of 
variable auction dynamics.\n\n\n\n\n\n\n",
"parsedupdatedby_s":"sitecorecarvaini",
"sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
leverage the proprietary tools available or manually pull a log file report to 
understand the trends and gauge auction spread overtime to assess the impact of 
variable auction dynamics.\n\n\n\n\n\n\n",
"hide_section_b":false


As you can see, it has used the stemming correctly and brings results for other 
words based in the root, in this case “Identify”.

However, if I search for “Identification”, this is the output:

Analyzer (imagelink):
https://1drv.ms/u/s!AlRTlFq8tQbShd49RpiQObzMgSjVhA


Even with proper stemming, solr is only bringing results for the word 
identification (or identifications) but nothing else.

The queries are over the same field that has the Porter Stemming Filter applied 
for both, query and index. This behavior is consistent with other ‘ion’ ended 
nouns: representation, modification, etc.

Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?

Thanks.





-Original Message-

From: Erick Erickson 

Sent: jueves, 30 de abril de 2020 1:47 p. m.

To: solr-user@lucene.apache.org

Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'



This email has been sent from a source external to Publicis Groupe. Please use 
caution when clicking links or opening attachments.

Cet email a été envoyé depuis une source externe à Publicis Groupe. Veuillez 
faire preuve de prudence lorsque vous cliquez sur des liens ou lorsque vous 
ouvrez des pièces jointes.







The mail server is pretty aggressive about stripping links, so we can’t see the 
images.



Could you put them somewhere and paste a link?



Best,

Erick



> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez  
> wrote:

>

> We’re facing an issue with stemming in solr. Most of the cases are working 
> correctly, for example, if we search for bidding, solr brings results for 
> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> stemming is not working. Even when analyzers seems to have correct stemming 
> of the word, the results are not reflecting that. One example. If I search 
> ‘identifying’, this is the output:

>

> Analyzer (image):

>

> A clip of results:

> "haschildren_b":false,

> "isbucket_text_s":"0",

> "sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> leverage the proprietary tools available or manually pull a log file report 
> to understand the trends and gauge auction spread overtime to assess the 
> impact of variable auction dynamics.\n\n\n\n\n\n\n",

> "parsedupdatedby_s":"sitecorecarvaini",

> "sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
> leverage the proprietary tools available or manually pull a log file report 
> to understand the trends and gauge auction spread overtime to assess the 
> impact of variable auction dynamics.\n\n\n\n\n\n\n",

> "hide_section_b":false

>

>

> As you can see, it has used the stemming correctly and brings results for 
> other words based in the root, in this case “Identify”.

>

> However, if I search for “Identification”, this is the output:

>

> Analyzer (image):

>

> Even with proper stemming, solr is only bringing results for the word 
> identification (or identifications) but nothing else.

>

> The queries are over the same field that has the Porter Stemming Filter 
> applied for both, query and index. This behavior is consistent with other 
> ‘ion’ ended nouns: representation, modification, etc.

>

> Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?

>

> Thanks.

>

>

>

>

>   Jhonny Lopez

>   Technical Architect

>   Avenida Calle 26 No. 92 - 32, Edificio BTS3

>   APDO. 128-1255 Bogota

>   T: +573006805461

>   jhonny.lo...@publicismedia.com

>   www.prodigious.com

>

>

>

>

>

>

> --

> -- Disclaimer The information in this email and any attachments may

> contain proprietary and confidential information that is intended 

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Erick Erickson
The mail server is pretty aggressive about stripping links, so we can’t see the 
images.

Could you put them somewhere and paste a link?

Best,
Erick

> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez  
> wrote:
> 
> We’re facing an issue with stemming in solr. Most of the cases are working 
> correctly, for example, if we search for bidding, solr brings results for 
> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> stemming is not working. Even when analyzers seems to have correct stemming 
> of the word, the results are not reflecting that. One example. If I search 
> ‘identifying’, this is the output:
>  
> Analyzer (image):
> 
> A clip of results:
> "haschildren_b":false,
> "isbucket_text_s":"0",
> "sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> leverage the proprietary tools available or manually pull a log file report 
> to understand the trends and gauge auction spread overtime to assess the 
> impact of variable auction dynamics.\n\n\n\n\n\n\n",
> "parsedupdatedby_s":"sitecorecarvaini",
> "sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
> leverage the proprietary tools available or manually pull a log file report 
> to understand the trends and gauge auction spread overtime to assess the 
> impact of variable auction dynamics.\n\n\n\n\n\n\n",
> "hide_section_b":false
>  
>  
> As you can see, it has used the stemming correctly and brings results for 
> other words based in the root, in this case “Identify”.
>  
> However, if I search for “Identification”, this is the output:
>  
> Analyzer (image):
> 
> Even with proper stemming, solr is only bringing results for the word 
> identification (or identifications) but nothing else.
>  
> The queries are over the same field that has the Porter Stemming Filter 
> applied for both, query and index. This behavior is consistent with other 
> ‘ion’ ended nouns: representation, modification, etc.
>  
> Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
>  
> Thanks.
>  
>  
>  
> 
>   Jhonny Lopez
>   Technical Architect 
>   Avenida Calle 26 No. 92 - 32, Edificio BTS3
>   APDO. 128-1255 Bogota
>   T: +573006805461
>   jhonny.lo...@publicismedia.com
>   www.prodigious.com
>  
>  
> 
> 
> 
> 
>  
> Disclaimer The information in this email and any attachments may contain 
> proprietary and confidential information that is intended for the 
> addressee(s) only. If you are not the intended recipient, you are hereby 
> notified that any disclosure, copying, distribution, retention or use of the 
> contents of this information is prohibited. When addressed to our clients or 
> vendors, any information contained in this e-mail or any attachments is 
> subject to the terms and conditions in any governing contract. If you have 
> received this e-mail in error, please immediately contact the sender and 
> delete the e-mail.



Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Jhonny Lopez
We're facing an issue with stemming in solr. Most of the cases are working 
correctly, for example, if we search for bidding, solr brings results for 
bidding, bid, bids, etc. However, with nouns ended with 'ion' suffix, stemming 
is not working. Even when analyzers seems to have correct stemming of the word, 
the results are not reflecting that. One example. If I search 'identifying', 
this is the output:

Analyzer (image):
[cid:image002.png@01D61EEB.C2A5EC50]
A clip of results:
"haschildren_b":false,
"isbucket_text_s":"0",
"sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
leverage the proprietary tools available or manually pull a log file report to 
understand the trends and gauge auction spread overtime to assess the impact of 
variable auction dynamics.\n\n\n\n\n\n\n",
"parsedupdatedby_s":"sitecorecarvaini",
"sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
leverage the proprietary tools available or manually pull a log file report to 
understand the trends and gauge auction spread overtime to assess the impact of 
variable auction dynamics.\n\n\n\n\n\n\n",
"hide_section_b":false


As you can see, it has used the stemming correctly and brings results for other 
words based in the root, in this case "Identify".

However, if I search for "Identification", this is the output:

Analyzer (image):
[cid:image003.png@01D61EF4.5BECD6F0]
Even with proper stemming, solr is only bringing results for the word 
identification (or identifications) but nothing else.

The queries are over the same field that has the Porter Stemming Filter applied 
for both, query and index. This behavior is consistent with other 'ion' ended 
nouns: representation, modification, etc.

Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?

Thanks.



[https://resourcesanalytics.blob.core.windows.net/email-signature-logos/sig/EMEA/IT/Prodigious/logopro.jpg]
  Jhonny Lopez
  Technical Architect
  Avenida Calle 26 No. 92 - 32, Edificio BTS3
  APDO. 128-1255 Bogota
  T: +573006805461
  jhonny.lo...@publicismedia.com
  www.prodigious.com







Disclaimer The information in this email and any attachments may contain 
proprietary and confidential information that is intended for the addressee(s) 
only. If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, retention or use of the contents of this 
information is prohibited. When addressed to our clients or vendors, any 
information contained in this e-mail or any attachments is subject to the terms 
and conditions in any governing contract. If you have received this e-mail in 
error, please immediately contact the sender and delete the e-mail.


Re: Delete on 8.5.1

2020-04-30 Thread Joe Obernberger
Hi All - while I'm still getting the error, it does appear to work 
(still gives the error - but a search of the data then shows less 
results - so the delete is working).  In some cases, it may be necessary 
to run the query several times.


-Joe

On 4/29/2020 9:03 AM, Joe Obernberger wrote:
Hi - I also tried deleting from solrj (8.5.1) using 
CloudSolrClient.deleteByQuery.


This results in:

Error: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS: Async 
exception during distributed update: Error from server at 
http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/: null




request: http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/
Remote error message: Task queue processing has stalled for 20203 ms 
with 0 remaining elements to process.
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS: Async 
exception during distributed update: Error from server at 
http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/: null




request: http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/
Remote error message: Task queue processing has stalled for 20203 ms 
with 0 remaining elements to process.
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:665)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
    at 
org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368)
    at 
org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296)
    at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1143)
    at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:906)
    at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:838)
    at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
    at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:940)
    at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:903)
    at 
com.ngc.bigdata.solrsearcher.SearcherThread.doSearch(SearcherThread.java:401)
    at 
com.ngc.bigdata.solrsearcher.SearcherThread.run(SearcherThread.java:125)
    at 
com.ngc.bigdata.solrsearcher.Worker.doSearchTest(Worker.java:145)
    at 
com.ngc.bigdata.solrsearcher.SolrSearcher.main(SolrSearcher.java:60)



On 4/28/2020 11:50 AM, Joe Obernberger wrote:
Hi all - I'm running this query on solr cloud 8.5.1 with the index on 
HDFS:


curl http://enceladus:9100/solr/PROCESSOR_LOGS/update?commit=true -H 
"Connect-Type: text/xml" --data-binary 
'StartTime:[2020-01-01T01:02:43Z TO 
2020-04-25T00:00:00Z]'


getting this response:





  1
  500
  54091


  
    name="error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException
    name="root-error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException

  
  2 Async exceptions during distributed update:
Error from server at 
http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/: null




request: http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/
Remote error message: Task queue processing has stalled for 20193 ms 
with 0 remaining elements to process.
Error from server at 
http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/: null




request: http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/
Remote error message: Task queue processing has stalled for 20021 ms 
with 0 remaining elements to process.
  name="trace">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: 
2 Async exceptions during distributed update:
Error from server at 
http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/: null




request: http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/
Remote error message: Task queue processing has stalled for 20193 ms 
with 0 remaining elements to process.
Error from server at 
http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/: null




request: http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/
Remote error message: Task queue processing has stalled for 20021 ms 
with 0 remaining elements to process.
    at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
    at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
    at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
    at 

More like this inly return is and score issue

2020-04-30 Thread derrick cui
Hi,
I want to return more fields in moreLikeThis response, how should I reach it?
Currently the main doc returns all fields, but morelikethis result only has I’d 
and score, please help 
Thanks



Re: off-heap OOM

2020-04-30 Thread Mikhail Khludnev
Raji, how that "OOM for solr occur in every 5 days." exactly looks like?
What is the error message? Where it's occurring exactly?

On Thu, Apr 30, 2020 at 1:30 AM Raji N  wrote:

> Thanks so much Jan. Will try your suggestions , yes we are also running
> solr inside docker.
>
> Thanks,
> Raji
>
> On Wed, Apr 29, 2020 at 1:46 PM Jan Høydahl  wrote:
>
> > I have seen the same, but only in Docker.
> > I think it does not relate to Solr’s off-heap usage for filters and other
> > data structures, but rather how Docker treats memory-mapped files as
> > virtual memory.
> > As you know, when using MMapDirectoryFactory, you actually let Linux
> > handle the loading and unloading of the index files, and Solr will access
> > them as if they were in a huge virtual memory pool. Naturally the index
> > files grow large, and there is something strange going on in the way
> Docker
> > handles this, leading to OOM, not for Java heap but for the process.
> >
> > I have no definitive answer, but so far my research has found a few
> > possible settings
> >
> > Set env.var MALLOC_ARENA_MAX=2
> > Try to limit -XX:MaxDirectMemorySize
> > Lower mem swappiness in Docker (--memory-swappiness 0)
> > More generic insight into java mem allocation in Docker:
> > https://dzone.com/articles/native-memory-allocation-in-examples
> >
> > Have not yet found a silver bullet, so very interested in this thread.
> >
> > Jan
> >
> > > 29. apr. 2020 kl. 19:26 skrev Raji N :
> > >
> > > Thank you for your reply.  When OOM happens somehow it doesn't generate
> > > dump file. So we have hourly heaps running to diagnose this issue. Heap
> > is
> > > around 700MB and threads around 150. But 29GB of native memory is used
> > up,
> > > it is consumed by java.io.DirectBufferR (27GB major consumption) and
> > > java.io.DirectByteBuffer  objects .
> > >
> > > We use solr 7.6.0 in solrcloud mode and OS is alpine . Java version
> > >
> > > java -version
> > >
> > > Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
> > >
> > > java version "1.8.0_211"
> > >
> > > Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
> > >
> > > Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
> > >
> > >
> > >
> > > Thanks much for taking a look at it.
> > >
> > > Raji
> > >
> > >
> > >
> > > On Wed, Apr 29, 2020 at 10:04 AM Shawn Heisey 
> > wrote:
> > >
> > >> On 4/29/2020 2:07 AM, Raji N wrote:
> > >>> Has anyone encountered off-heap OOM. We are thinking of reducing heap
> > >>> further and increasing the hardcommit interval . Any other
> > suggestions? .
> > >>> Please share your thoughts.
> > >>
> > >> It sounds like it's not heap memory that's running out.
> > >>
> > >> When the OutOfMemoryError is logged, it will also contain a message
> > >> mentioning which resource ran out.
> > >>
> > >> A common message that might be logged with the OOME is "Unable to
> create
> > >> native thread".  This type of error, if that's what's happening,
> > >> actually has nothing at all to do with memory, OOME is just how Java
> > >> happens to report it.
> > >>
> > >> You will need to know exactly which resource is running out before we
> > >> can offer any assistance.
> > >>
> > >> If the OOME is logged, the message you're looking for will be in the
> > >> solr log, not the tiny special log that is created when Solr is killed
> > >> by an OOME.  What version of Solr are you running, and what OS is it
> > >> running on?
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> >
> >
>


-- 
Sincerely yours
Mikhail Khludnev