Max number of cores in solr

2019-08-28 Thread Vignan Malyala
Hi
Im planning to create separate core for each of my client in solr.
Can I create around 500 cores in solr. Is it a good idea?
For each client i have around 10 records on average currently.

How much physical memory it might consume. Plz help with this.
Thank you


Re: Require searching only for file content and not metadata

2019-08-28 Thread Jörn Franke
You need to provide a little bit more details.  What is your Schema? How is the 
document structured ? Where do you get metadata from?

Have you read the Solr reference guide? Have you read a book about Solr?

> Am 28.08.2019 um 08:10 schrieb Khare, Kushal (MIND) 
> :
> 
> Could anyone please help me with how to use this approach ? I humbly request 
> all the users to please help me get through this.
> Thanks !
> 
> -Original Message-
> From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
> Sent: 28 August 2019 04:08
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
> 
> It will be easier to parse documents create content, metadata and other 
> required fields yourself in place of using default post tool. You will have 
> better control on what is going to  which field.
> 
> 
>> On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
>> kushal.kh...@mind-infotech.com> wrote:
>> 
>> Basically, what problem I am facing is - I am getting the textual
>> content
>> + other metadata in my _text_ field. But, I want only the textual
>> + content
>> written inside the document.
>> I tried various Request Handler Update Extract configurations, but
>> none of them worked for me.
>> Please help me resolve this as I am badly stuck in this.
>> 
>> -Original Message-
>> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
>> Sent: 27 August 2019 12:59
>> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
>> Subject: RE: Require searching only for file content and not metadata
>> 
>> Chris,
>> What I have done is, I just created a core, used POST tool to index
>> the documents from my file system, and then moved to Solr Admin for querying.
>> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
>> to be searched for, instead of all the fields that solr creates by
>> itself like - author name. last modified, creator, id, etc.
>> I simply want solr to search only for the content inside the document
>> (the body of the document) & not on all the fields. For an example, if
>> I search for 'Kushal', it should return the document only if it has
>> the word in it as the content, not because it has author name or owner as 
>> Kushal.
>> Hope its clear than before now. Please help me with this !
>> 
>> Thankyou!
>> Kushal Khare
>> 
>> -Original Message-
>> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
>> Sent: 26 August 2019 18:47
>> To: solr-user@lucene.apache.org
>> Subject: Re: Require searching only for file content and not metadata
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>> 
>> Kushal,
>> 
>>> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
>>> This is Kushal Khare, a new addition to the user-list. I started
>>> working with Solr few days ago for implementing it in my project.
>>> 
>>> Now, I have the basics done, and reached the query stage.
>>> 
>>> My problem is – I need to restrict the solr to search only for the
>>> file content and not the metadata. I have gone through various
>>> articles on the internet, but could not get any help.
>>> 
>>> Therefore, I hope I could get some solutions here.
>> 
>> How are you querying Solr? Are you querying from a web application?
>> From a thick-client application? Directly from a web browser?
>> 
>> What do you consider "metadata" versus "content"? To Solr, everything
>> is the same...
>> 
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>> 
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
>> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
>> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
>> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
>> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
>> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
>> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
>> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
>> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
>> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
>> wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
>> UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
>> =LWwW
>> -END PGP SIGNATURE-
>> 
>> 
>> 
>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
>> attachments. WARNING: Computer viruses can be transmitted via email.
>> The recipient should check this email and any attachments for 

RE: Index fetch failed

2019-08-28 Thread Akreeti Agarwal
Hi,

Memory details for slave1:

Filesystem  Size  Used Avail Use% Mounted on
/dev/xvda1   99G   40G   55G  43% /
tmpfs   7.8G 0  7.8G   0% /dev/shm

Memory details for slave2:

Filesystem  Size  Used Avail Use% Mounted on
/dev/xvda1   99G   45G   49G  48% /
tmpfs   7.8G 0  7.8G   0% /dev/shm

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Atita Arora  
Sent: Wednesday, August 28, 2019 11:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Index fetch failed

Hii,

Do you have enough memory free for the index chunk to be fetched/Downloaded on 
the slave node?


On Wed, Aug 28, 2019 at 6:57 AM Akreeti Agarwal  wrote:

> Hello Everyone,
>
> I am getting this error continuously on Solr slave, can anyone tell me 
> the solution for this:
>
> 642141666 ERROR (indexFetcher-72-thread-1) [   x:sitecore_web_index]
> o.a.s.h.ReplicationHandler Index fetch failed
> :org.apache.solr.common.SolrException: Unable to download _12i7v_f.liv 
> completely. Downloaded 0!=123
>  at
> org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1434)
>  at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1314)
>  at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:812)
>  at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.jav
> a:427)
>
>
> Thanks & Regards,
> Akreeti Agarwal
> (M) +91-8318686601
>
> ::DISCLAIMER::
>
> --
> --
> --
> 
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only. E-mail transmission is not 
> guaranteed to be secure or error-free as information could be 
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
> may contain viruses in transmission. The e mail and its contents (with 
> or without referred errors) shall therefore not attach any liability 
> on the originator or HCL or its affiliates. Views or opinions, if any, 
> presented in this email are solely those of the author and may not 
> necessarily reflect the views or opinions of HCL or its affiliates. 
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of authorized representative of HCL 
> is strictly prohibited. If you have received this email in error 
> please delete it and notify the sender immediately. Before opening any 
> email and/or attachments, please check them for viruses and other defects.
>
> --
> 
>


RE: Index fetch failed

2019-08-28 Thread Akreeti Agarwal
Yes I am using solr-5.5.5.
This error is intermittent. I don't think there must be any issue with master 
connection limits. This error is accompanied by this on master side:

ERROR (qtp1450821318-60072) [   x:sitecore_web_index] 
o.a.s.h.ReplicationHandler Unable to get file names for indexCommit generation: 
1558637
java.nio.file.NoSuchFileException: 
/solrm-efs/solr-m/server/solr/sitecore_web_index/data/index/_12i9p_1.liv
   at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
   at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
   at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
   at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
   at java.nio.file.Files.readAttributes(Files.java:1737)
   at java.nio.file.Files.size(Files.java:2332)
   at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
   at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:124)
   at 
org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:563)
   at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:253)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
   at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
   at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
   at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
   at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
   at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
   at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
   at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Atita Arora  
Sent: Wednesday, August 28, 2019 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Index fetch failed

This looks like ample memory to get the index chunk.
Also, I looked at the IndexFetcher code, I remember you were using Solr
5.5.5 and the only reason in my view, this would happen is when the index chunk 
is not downloaded as can also be seen in the error (Downloaded
0!=123) which clearly states that the index generations are not in sync and 
this is not user aborted action too.

Is this error intermittent? could there be a possibility that your master has 
connection limits? or maybe some network hiccup?



On Wed, Aug 28, 2019 at 10:40 AM Akreeti Agarwal  wrote:

> Hi,
>
> Memory details for slave1:
>
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda1   99G   40G   55G  43% /
> tmpfs   7.8G 0  7.8G   0% /dev/shm
>
> Memory details for slave2:
>
> Filesystem  

Re: Index fetch failed

2019-08-28 Thread Atita Arora
This looks like ample memory to get the index chunk.
Also, I looked at the IndexFetcher code, I remember you were using Solr
5.5.5 and the only reason in my view, this would happen is when the index
chunk is not downloaded as can also be seen in the error (Downloaded
0!=123) which clearly states that the index generations are not in sync and
this is not user aborted action too.

Is this error intermittent? could there be a possibility that your master
has connection limits? or maybe some network hiccup?



On Wed, Aug 28, 2019 at 10:40 AM Akreeti Agarwal  wrote:

> Hi,
>
> Memory details for slave1:
>
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda1   99G   40G   55G  43% /
> tmpfs   7.8G 0  7.8G   0% /dev/shm
>
> Memory details for slave2:
>
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda1   99G   45G   49G  48% /
> tmpfs   7.8G 0  7.8G   0% /dev/shm
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Atita Arora 
> Sent: Wednesday, August 28, 2019 11:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index fetch failed
>
> Hii,
>
> Do you have enough memory free for the index chunk to be
> fetched/Downloaded on the slave node?
>
>
> On Wed, Aug 28, 2019 at 6:57 AM Akreeti Agarwal  wrote:
>
> > Hello Everyone,
> >
> > I am getting this error continuously on Solr slave, can anyone tell me
> > the solution for this:
> >
> > 642141666 ERROR (indexFetcher-72-thread-1) [   x:sitecore_web_index]
> > o.a.s.h.ReplicationHandler Index fetch failed
> > :org.apache.solr.common.SolrException: Unable to download _12i7v_f.liv
> > completely. Downloaded 0!=123
> >  at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1434)
> >  at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1314)
> >  at
> >
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:812)
> >  at
> > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.jav
> > a:427)
> >
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> > (M) +91-8318686601
> >
> > ::DISCLAIMER::
> >
> > --
> > --
> > --
> > 
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only. E-mail transmission is not
> > guaranteed to be secure or error-free as information could be
> > intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> > may contain viruses in transmission. The e mail and its contents (with
> > or without referred errors) shall therefore not attach any liability
> > on the originator or HCL or its affiliates. Views or opinions, if any,
> > presented in this email are solely those of the author and may not
> > necessarily reflect the views or opinions of HCL or its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure,
> > modification, distribution and / or publication of this message
> > without the prior written consent of authorized representative of HCL
> > is strictly prohibited. If you have received this email in error
> > please delete it and notify the sender immediately. Before opening any
> > email and/or attachments, please check them for viruses and other
> defects.
> >
> > --
> >
> 
> >
>


RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Could anyone please help me with how to use this approach ? I humbly request 
all the users to please help me get through this.
Thanks !

-Original Message-
From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
Sent: 28 August 2019 04:08
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

It will be easier to parse documents create content, metadata and other 
required fields yourself in place of using default post tool. You will have 
better control on what is going to  which field.


On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
kushal.kh...@mind-infotech.com> wrote:

> Basically, what problem I am facing is - I am getting the textual
> content
> + other metadata in my _text_ field. But, I want only the textual
> + content
> written inside the document.
> I tried various Request Handler Update Extract configurations, but
> none of them worked for me.
> Please help me resolve this as I am badly stuck in this.
>
> -Original Message-
> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
> Sent: 27 August 2019 12:59
> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
> Subject: RE: Require searching only for file content and not metadata
>
> Chris,
> What I have done is, I just created a core, used POST tool to index
> the documents from my file system, and then moved to Solr Admin for querying.
> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
> to be searched for, instead of all the fields that solr creates by
> itself like - author name. last modified, creator, id, etc.
> I simply want solr to search only for the content inside the document
> (the body of the document) & not on all the fields. For an example, if
> I search for 'Kushal', it should return the document only if it has
> the word in it as the content, not because it has author name or owner as 
> Kushal.
> Hope its clear than before now. Please help me with this !
>
> Thankyou!
> Kushal Khare
>
> -Original Message-
> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
> Sent: 26 August 2019 18:47
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Kushal,
>
> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> > This is Kushal Khare, a new addition to the user-list. I started
> > working with Solr few days ago for implementing it in my project.
> >
> > Now, I have the basics done, and reached the query stage.
> >
> > My problem is – I need to restrict the solr to search only for the
> > file content and not the metadata. I have gone through various
> > articles on the internet, but could not get any help.
> >
> > Therefore, I hope I could get some solutions here.
>
> How are you querying Solr? Are you querying from a web application?
> From a thick-client application? Directly from a web browser?
>
> What do you consider "metadata" versus "content"? To Solr, everything
> is the same...
>
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
> wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
> UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
> =LWwW
> -END PGP SIGNATURE-
>
> 
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
> attachments. WARNING: Computer viruses can be transmitted via email.
> The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage
> caused by any virus/trojan/worms/malicious code transmitted by this
> email. www.motherson.com
>
> 
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or 

Re: Max number of cores in solr

2019-08-28 Thread Pure Host - Wolfgang Freudenberger
We run SOLR with Replica n=2 and are happily torture them with 1500~ 
cores and above, each set contains at least 10.000 docs, most of them 
are over millions. It works.



We have a 256GB RAM in the servers, allocation for SOLR is 140G.

Mit freundlichem Gruß / kind regards

Wolfgang Freudenberger
Pure Host IT-Services
Münsterstr. 14
48341 Altenberge
GERMANY
Tel.: (+49) 25 71 - 99 20 170
Fax: (+49) 25 71 - 99 20 171

Umsatzsteuer ID DE259181123

Informieren Sie sich über unser gesamtes Leistungsspektrum unter 
www.pure-host.de
Get our whole services at www.pure-host.de

Am 28.08.2019 um 08:55 schrieb Vignan Malyala:

Hi
Im planning to create separate core for each of my client in solr.
Can I create around 500 cores in solr. Is it a good idea?
For each client i have around 10 records on average currently.

How much physical memory it might consume. Plz help with this.
Thank you





Re: Max number of cores in solr

2019-08-28 Thread Shawn Heisey

On 8/28/2019 12:55 AM, Vignan Malyala wrote:

Im planning to create separate core for each of my client in solr.
Can I create around 500 cores in solr. Is it a good idea?
For each client i have around 10 records on average currently.


There is no limit that I know of to the number of cores.  You're only 
limited by system resources.  That many cores will have a lot of files 
to open, and a lot of threads, so you would definitely need to increase 
the OS limits on file handles and processes.


Solr startup with that many cores could take a very long time.  If you 
run SolrCloud, I would say that you should find a way to run fewer 
indexes -- SolrCloud begins to have scalability problems with only a few 
hundred.



How much physical memory it might consume. Plz help with this.
Thank you


500 cores each with 10 documents is only 50 million total documents. 
 This isn't very big, but you will need plenty of resources.


The most important resource for good performance will be memory.  And we 
can't tell you how much you'll need.  That will depend on exactly how 
you use Solr and the nature of your data.  I've personally handled 
several cores with about 80 million documents total with 8GB of heap and 
64GB of total system memory, which only left enough memory to cache 
about a third of the total index size.  Some indexes can have difficulty 
handling only a few million documents on the same hardware.


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#RAM

https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn


Re: Require searching only for file content and not metadata

2019-08-28 Thread Shawn Heisey

On 8/27/2019 7:18 AM, Khare, Kushal (MIND) wrote:

Basically, what problem I am facing is - I am getting the textual content + 
other metadata in my _text_ field. But, I want only the textual content written 
inside the document.
I tried various Request Handler Update Extract configurations, but none of them 
worked for me.
Please help me resolve this as I am badly stuck in this.


Controlling exactly what gets indexed in which fields is likely going to 
require that you write the indexing software yourself -- a program that 
extracts the data you want and sends it to Solr for indexing.


We do not recommend running the Extracting Request Handler in production 
-- Tika is known to crash when given some documents (usually PDF files 
are the problematic ones, but other formats can cause it too), and if it 
crashes while running inside Solr, it will take Solr down with it.


Here is an example program that uses Tika for rich document parsing.  It 
also talks to a database, but that part could be easily removed or modified:


https://lucidworks.com/post/indexing-with-solrj/

Thanks,
Shawn


RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
If I try to add any metadata in a field like this :

doc.addField("meta", metadata.get("dc_creator"));
1. I don't get that field in the results, though it has been created.And, 
following is the definition on the schema :
  

2. When I check it in my code for the value using, 
System.out.println(metadata.get("dc_creator")); --> I get 'null'

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 28 August 2019 16:50
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when you query, are you specifying =_text_? _text_ is usually 
a catch-all field in the default schemas with this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
>
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
>
> This is the code that I tried :
>
> package mind.solr;
>
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
>
> public class solrJExtract {
>
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();  private
> AutoDetectParser autoParser;  private int totalTika = 0;  private int
> totalSql = 0;
>
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
>
>
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
>
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
>
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
>
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
>
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();  }
>
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
>
>  }
>
>  /**
>   * ***Tika processing here
>   */
>  // Recursively traverse the filesystem, parsing everything found.
>  private void doTikaDocuments(File root) throws IOException,
> SolrServerException {
>
>// Simple loop for recursively indexing all the files
>// in the root directory passed in.
>for (File file : root.listFiles()) {
>  if (file.isDirectory()) {
>doTikaDocuments(file);
>continue;
>  }
>  // Get ready to parse the file.
>  ContentHandler textHandler = new BodyContentHandler();
>  Metadata metadata = new Metadata();
>  

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
I already tried this example, I am currently working on this. I have complied 
the code, it is indexing the documents. But, it is not adding any thing to the 
field - _text_ . Also, not giving any metadata.
doc.addField("_text_", textHandler.toString()); --> here, 
textHandler.toString() is blank for all the 40 documents. All I am getting is 
the 'id' & 'version' field.

This is the code that I tried :

package mind.solr;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.UpdateResponse;
import org.apache.solr.common.SolrInputDocument;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Collection;

public class solrJExtract {

private HttpSolrClient client;
  private long start = System.currentTimeMillis();
  private AutoDetectParser autoParser;
  private int totalTika = 0;
  private int totalSql = 0;

  @SuppressWarnings("rawtypes")
private Collection docList = new ArrayList();


public static void main(String[] args) {
try {
solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
idxer.doTikaDocuments(new File("D:\\docs"));
idxer.endIndexing();
} catch (Exception e) {
  e.printStackTrace();
}
  }

  private  solrJExtract(String url) throws IOException, SolrServerException {
// Create a SolrCloud-aware client to send docs to Solr
// Use something like HttpSolrClient for stand-alone

client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
.withConnectionTimeout(1)
.withSocketTimeout(6)
.build();

// binary parser is used by default for responses
client.setParser(new XMLResponseParser());

// One of the ways Tika can be used to attempt to parse arbitrary files.
autoParser = new AutoDetectParser();
  }

// Just a convenient place to wrap things up.
  @SuppressWarnings("unchecked")
private void endIndexing() throws IOException, SolrServerException {
if ( docList.size() > 0) { // Are there any documents left over?
  client.add(docList, 30); // Commit within 5 minutes
}
client.commit(); // Only needs to be done at the end,
// commitWithin should do the rest.
// Could even be omitted
// assuming commitWithin was specified.
long endTime = System.currentTimeMillis();
System.out.println("Total Time Taken: " + (endTime - start) +
" milliseconds to index " + totalSql +
" SQL rows and " + totalTika + " documents");

  }

  /**
   * ***Tika processing here
   */
  // Recursively traverse the filesystem, parsing everything found.
  private void doTikaDocuments(File root) throws IOException, 
SolrServerException {

// Simple loop for recursively indexing all the files
// in the root directory passed in.
for (File file : root.listFiles()) {
  if (file.isDirectory()) {
doTikaDocuments(file);
continue;
  }
  // Get ready to parse the file.
  ContentHandler textHandler = new BodyContentHandler();
  Metadata metadata = new Metadata();
  ParseContext context = new ParseContext();
  // Tim Allison noted the following, thanks Tim!
  // If you want Tika to parse embedded files (attachments within your .doc 
or any other embedded
  // files), you need to send in the autodetectparser in the parsecontext:
  // context.set(Parser.class, autoParser);

  InputStream input = new FileInputStream(file);

  // Try parsing the file. Note we haven't checked at all to
  // see whether this file is a good candidate.
  try {
autoParser.parse(input, textHandler, metadata, context);
  } catch (Exception e) {
// Needs better logging of what went wrong in order to
// track down "bad" documents.
System.out.println(String.format("File %s failed", 
file.getCanonicalPath()));
e.printStackTrace();
continue;
  }
  // Just to show how much meta-data and what form it's in.
  dumpMetadata(file.getCanonicalPath(), metadata);

  // Index just a couple of the meta-data fields.
  SolrInputDocument doc = new SolrInputDocument();

  doc.addField("id", file.getCanonicalPath());

  // Crude way to get known meta-data fields.
  // Also possible to write a simple loop to examine all the
  // metadata returned and selectively index it and/or
  // just get a list of them.
  // One can also use the Lucidworks field mapping to
  // accomplish much the same thing.
  String author = metadata.get("Author");

/*
 * if 

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
CURRENTLY, I AM GETTING

 "_text_" :
[" \n \n date 2019-06-24T09:52:33Z  \n cp:revision 5  \n Total-Time 1  \n 
extended-properties:AppVersion 15.  \n stream_content_type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
meta:paragraph-count 18  \n meta:word-count 20  \n 
extended-properties:PresentationFormat Widescreen  \n dc:creator Khare, Kushal 
(MIND)  \n extended-properties:Company MIND  \n Word-Count 20  \n 
dcterms:created 2019-06-18T07:25:29Z  \n dcterms:modified 2019-06-24T09:52:33Z  
\n Last-Modified 2019-06-24T09:52:33Z  \n Last-Save-Date 2019-06-24T09:52:33Z  
\n Paragraph-Count 18  \n meta:save-date 2019-06-24T09:52:33Z  \n dc:title 
PowerPoint Presentation  \n Application-Name Microsoft Office PowerPoint  \n 
extended-properties:TotalTime 1  \n modified 2019-06-24T09:52:33Z  \n 
Content-Type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
Slide-Count 2  \n stream_size 32234  \n X-Parsed-By 
org.apache.tika.parser.DefaultParser  \n X-Parsed-By 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser  \n creator Khare, Kushal 
(MIND)  \n meta:author Khare, Kushal (MIND)  \n meta:creation-date 
2019-06-18T07:25:29Z  \n extended-properties:Application Microsoft Office 
PowerPoint  \n meta:last-author Khare, Kushal (MIND)  \n meta:slide-count 2  \n 
Creation-Date 2019-06-18T07:25:29Z  \n xmpTPg:NPages 2  \n resourceName 
D:\\docs\\DemoOutput.pptx  \n Last-Author Khare, Kushal (MIND)  \n 
Revision-Number 5  \n Application-Version 15.  \n Author Khare, Kushal 
(MIND)  \n publisher MIND  \n Presentation-Format Widescreen  \n dc:publisher 
MIND  \n PowerPoint Presentation \n \n  slide-content   \n Hello. This is just 
for Demo!  \n If you find it anywhere, throw it away !\nA.W.A.Y away away away 
away away Away AWAY! \n  \n  \n A.W.A.Y once again !  \n  \n  \n  \n  \n  \n  
\n  \n  \n  \n  \n  \n  \n \n slide-master-content  \n slide-content   \n 
A.W.A.Y \n  \n away \n \n slide-master-content  \n embedded 
/docProps/thumbnail.jpeg"],

WHAT I WANT :

"_text_"  :
["\n  slide-content   \n Hello. This is just for Demo!  \n If you find it 
anywhere, throw it away !\nA.W.A.Y away away away away away Away AWAY! \n  \n  
\n A.W.A.Y once again !  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n \n 
slide-master-content  \n slide-content   \n A.W.A.Y \n  \n away \n \n 
slide-master-content  \n embedded /docProps/thumbnail.jpeg"],

"meta" : ["\n \n date 2019-06-24T09:52:33Z  \n cp:revision 5  \n Total-Time 1  
\n extended-properties:AppVersion 15.  \n stream_content_type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
meta:paragraph-count 18  \n meta:word-count 20  \n 
extended-properties:PresentationFormat Widescreen  \n dc:creator Khare, Kushal 
(MIND)  \n extended-properties:Company MIND  \n Word-Count 20  \n 
dcterms:created 2019-06-18T07:25:29Z  \n dcterms:modified 2019-06-24T09:52:33Z  
\n Last-Modified 2019-06-24T09:52:33Z  \n Last-Save-Date 2019-06-24T09:52:33Z  
\n Paragraph-Count 18  \n meta:save-date 2019-06-24T09:52:33Z  \n dc:title 
PowerPoint Presentation  \n Application-Name Microsoft Office PowerPoint  \n 
extended-properties:TotalTime 1  \n modified 2019-06-24T09:52:33Z  \n 
Content-Type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
Slide-Count 2  \n stream_size 32234  \n X-Parsed-By 
org.apache.tika.parser.DefaultParser  \n X-Parsed-By 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser  \n creator Khare, Kushal 
(MIND)  \n meta:author Khare, Kushal (MIND)  \n meta:creation-date 
2019-06-18T07:25:29Z  \n extended-properties:Application Microsoft Office 
PowerPoint  \n meta:last-author Khare, Kushal (MIND)  \n meta:slide-count 2  \n 
Creation-Date 2019-06-18T07:25:29Z  \n xmpTPg:NPages 2  \n resourceName 
D:\\docs\\DemoOutput.pptx  \n Last-Author Khare, Kushal (MIND)  \n 
Revision-Number 5  \n Application-Version 15.  \n Author Khare, Kushal 
(MIND)  \n publisher MIND  \n Presentation-Format Widescreen  \n dc:publisher 
MIND  \n PowerPoint Presentation \n"]
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 28 August 2019 14:18
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

On 8/27/2019 7:18 AM, Khare, Kushal (MIND) wrote:
> Basically, what problem I am facing is - I am getting the textual content + 
> other metadata in my _text_ field. But, I want only the textual content 
> written inside the document.
> I tried various Request Handler Update Extract configurations, but none of 
> them worked for me.
> Please help me resolve this as I am badly stuck in this.

Controlling exactly what gets indexed in which fields is likely going to 
require that you write the indexing software yourself -- a program that 
extracts the data you want and sends it to Solr for indexing.

We do not recommend running the Extracting Request Handler in production
-- Tika is known to 

Problem of Shutdown Process for Windows Server

2019-08-28 Thread Kayak28
Hello, Community:

I use Solr with Windows servers, and cannot shutdown Solr successfully.
When I try to stop Solr using solr.cmd, which is kicked from Windows Task
Manager, it "looks" like Solr stops without any problem.
Here "looks" means that at least log file that Solr wrote does not seem to
have any error.
(I pasted a piece of the log where I believe "success" at the end of this
email )

However, next time I start up the Solr, I face the error message that says
"Address already in use."

This problem happens occasionally, happens a different server at irregular
time/date.
So, I could not simulate the situation yet.
I wonder why Solr could not shutdown successfully.

If anyone of you has faced a similar incident or knows a solution, then it
is very helpful to share your bits of advice.
Any clue will be very appreciated.


*Environment*
OS: Windows Server 2012 R2
Java: Oracle JDK 1.8.0
Solr  Version: 5.2.1
Solr Structures:15 Solr server, enabled to distributed search with sharding
(Not using SolrCloud)
Memory(Solr / physical) : 20GB/32GB
Index Size: around 300GB

*Logs*
INFO  - 2019-05-25 21:06:15.996; [   ]
org.apache.solr.core.CachingDirectoryFactory; looking to close
D:\Documents\solr-home\collection1\data
[CachedDir<>]
INFO  - 2019-05-25 21:06:15.996; [   ]
org.apache.solr.core.CachingDirectoryFactory; Closing directory:
D:\Documents\solr-home\collection1\data
INFO  - 2019-05-25 21:06:15.996; [   ]
org.apache.solr.core.CachingDirectoryFactory; looking to close
D:\Documents\solr-home\collection1\data\index
[CachedDir<>]
INFO  - 2019-05-25 21:06:15.996; [   ]
org.apache.solr.core.CachingDirectoryFactory; Closing directory:
D:\Documents\solr-home\collection1\data\index
INFO  - 2019-05-25 21:06:16.199; [   ]
org.eclipse.jetty.server.handler.ContextHandler; Stopped
o.e.j.w.WebAppContext@4b9e13df
{/solr,file:/D:/Documents/solr/server/solr-webapp/webapp/,UNAVAILABLE}{/solr.war}

* Note: solr-home directory is the directory where I store Solr cores.

Sincerely,
Kaya Ota


RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Yup ! I have already made stored = true for _text_. I will see to it. No 
worries.

BUT, I really need HELP for the separation of content & metadata. I checked , 
but there isn't any field that is copying the values into the '_text_' field.
The only definition I have for _text_ is :


For this : doc.addField(“metadatafield1”, value_of_metadata_field1);
I added author name, etc in the code, but not getting those fields. Also,  > 
doc.addField("_text_", textHandler.toString()); has blank value in it.

Please help !
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 28 August 2019 16:50
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when you query, are you specifying =_text_? _text_ is usually 
a catch-all field in the default schemas with this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
>
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
>
> This is the code that I tried :
>
> package mind.solr;
>
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
>
> public class solrJExtract {
>
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();  private
> AutoDetectParser autoParser;  private int totalTika = 0;  private int
> totalSql = 0;
>
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
>
>
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
>
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
>
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
>
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
>
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();  }
>
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
>
>  }
>
>  /**
>   * ***Tika processing here
>   */
>  // Recursively traverse the filesystem, parsing everything found.
>  private void doTikaDocuments(File root) throws IOException,
> SolrServerException {
>
>// Simple loop for recursively indexing all the files
>// in the root directory passed in.
>for (File file : root.listFiles()) {
>  if (file.isDirectory()) {
>doTikaDocuments(file);

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Yes, I have already gone through the reference guide. Its all because of the 
guide and documentation that I have reached till this stage.
Well, I am indexing rich document formats like - .docx, .pptx, .pdf etc.
The metadata I am talking about is - that currently sorl puts all the data like 
author, editor, content type details of the documents in the _text_  field, 
along with the textual content, and what I want is to separate them.
I also tried using ExtractingRequestHandler, understood the fmap.content in 
tika, but still can't reach the desired output.

-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 28 August 2019 12:55
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

You need to provide a little bit more details.  What is your Schema? How is the 
document structured ? Where do you get metadata from?

Have you read the Solr reference guide? Have you read a book about Solr?

> Am 28.08.2019 um 08:10 schrieb Khare, Kushal (MIND) 
> :
>
> Could anyone please help me with how to use this approach ? I humbly request 
> all the users to please help me get through this.
> Thanks !
>
> -Original Message-
> From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
> Sent: 28 August 2019 04:08
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> It will be easier to parse documents create content, metadata and other 
> required fields yourself in place of using default post tool. You will have 
> better control on what is going to  which field.
>
>
>> On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
>> kushal.kh...@mind-infotech.com> wrote:
>>
>> Basically, what problem I am facing is - I am getting the textual
>> content
>> + other metadata in my _text_ field. But, I want only the textual
>> + content
>> written inside the document.
>> I tried various Request Handler Update Extract configurations, but
>> none of them worked for me.
>> Please help me resolve this as I am badly stuck in this.
>>
>> -Original Message-
>> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
>> Sent: 27 August 2019 12:59
>> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
>> Subject: RE: Require searching only for file content and not metadata
>>
>> Chris,
>> What I have done is, I just created a core, used POST tool to index
>> the documents from my file system, and then moved to Solr Admin for querying.
>> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
>> to be searched for, instead of all the fields that solr creates by
>> itself like - author name. last modified, creator, id, etc.
>> I simply want solr to search only for the content inside the document
>> (the body of the document) & not on all the fields. For an example,
>> if I search for 'Kushal', it should return the document only if it
>> has the word in it as the content, not because it has author name or owner 
>> as Kushal.
>> Hope its clear than before now. Please help me with this !
>>
>> Thankyou!
>> Kushal Khare
>>
>> -Original Message-
>> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
>> Sent: 26 August 2019 18:47
>> To: solr-user@lucene.apache.org
>> Subject: Re: Require searching only for file content and not metadata
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Kushal,
>>
>>> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
>>> This is Kushal Khare, a new addition to the user-list. I started
>>> working with Solr few days ago for implementing it in my project.
>>>
>>> Now, I have the basics done, and reached the query stage.
>>>
>>> My problem is – I need to restrict the solr to search only for the
>>> file content and not the metadata. I have gone through various
>>> articles on the internet, but could not get any help.
>>>
>>> Therefore, I hope I could get some solutions here.
>>
>> How are you querying Solr? Are you querying from a web application?
>> From a thick-client application? Directly from a web browser?
>>
>> What do you consider "metadata" versus "content"? To Solr, everything
>> is the same...
>>
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>>
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
>> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
>> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
>> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
>> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
>> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
>> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
>> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
>> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
>> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
>> 

Re: Require searching only for file content and not metadata

2019-08-28 Thread Erick Erickson
Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with 
doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when
you query, are you specifying =_text_? _text_ is usually a catch-all field 
in the default schemas with
this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
> 
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
> 
> This is the code that I tried :
> 
> package mind.solr;
> 
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
> 
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
> 
> public class solrJExtract {
> 
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();
>  private AutoDetectParser autoParser;
>  private int totalTika = 0;
>  private int totalSql = 0;
> 
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
> 
> 
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
> 
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
> 
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
> 
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
> 
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();
>  }
> 
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
> 
>  }
> 
>  /**
>   * ***Tika processing here
>   */
>  // Recursively traverse the filesystem, parsing everything found.
>  private void doTikaDocuments(File root) throws IOException, 
> SolrServerException {
> 
>// Simple loop for recursively indexing all the files
>// in the root directory passed in.
>for (File file : root.listFiles()) {
>  if (file.isDirectory()) {
>doTikaDocuments(file);
>continue;
>  }
>  // Get ready to parse the file.
>  ContentHandler textHandler = new BodyContentHandler();
>  Metadata metadata = new Metadata();
>  ParseContext context = new ParseContext();
>  // Tim Allison noted the following, thanks Tim!
>  // If you want Tika to parse embedded files (attachments within your 
> .doc or any other embedded
>  // files), you need to send in the autodetectparser in the parsecontext:
>  // context.set(Parser.class, autoParser);
> 
>  InputStream input = new FileInputStream(file);
> 
>  // Try parsing the file. Note we haven't checked at all to
>  // see whether this file is a good candidate.
>  try {
> 

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Attaching managed-schema.xml

-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 28 August 2019 16:30
To: solr-user@lucene.apache.org
Subject: RE: Require searching only for file content and not metadata

I already tried this example, I am currently working on this. I have complied 
the code, it is indexing the documents. But, it is not adding any thing to the 
field - _text_ . Also, not giving any metadata.
doc.addField("_text_", textHandler.toString()); --> here, 
textHandler.toString() is blank for all the 40 documents. All I am getting is 
the 'id' & 'version' field.

This is the code that I tried :

package mind.solr;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.UpdateResponse;
import org.apache.solr.common.SolrInputDocument;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Collection;

public class solrJExtract {

private HttpSolrClient client;
  private long start = System.currentTimeMillis();
  private AutoDetectParser autoParser;
  private int totalTika = 0;
  private int totalSql = 0;

  @SuppressWarnings("rawtypes")
private Collection docList = new ArrayList();


public static void main(String[] args) {
try {
solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
idxer.doTikaDocuments(new File("D:\\docs"));
idxer.endIndexing();
} catch (Exception e) {
  e.printStackTrace();
}
  }

  private  solrJExtract(String url) throws IOException, SolrServerException {
// Create a SolrCloud-aware client to send docs to Solr
// Use something like HttpSolrClient for stand-alone

client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
.withConnectionTimeout(1)
.withSocketTimeout(6)
.build();

// binary parser is used by default for responses
client.setParser(new XMLResponseParser());

// One of the ways Tika can be used to attempt to parse arbitrary files.
autoParser = new AutoDetectParser();
  }

// Just a convenient place to wrap things up.
  @SuppressWarnings("unchecked")
private void endIndexing() throws IOException, SolrServerException {
if ( docList.size() > 0) { // Are there any documents left over?
  client.add(docList, 30); // Commit within 5 minutes
}
client.commit(); // Only needs to be done at the end,
// commitWithin should do the rest.
// Could even be omitted
// assuming commitWithin was specified.
long endTime = System.currentTimeMillis();
System.out.println("Total Time Taken: " + (endTime - start) +
" milliseconds to index " + totalSql +
" SQL rows and " + totalTika + " documents");

  }

  /**
   * ***Tika processing here
   */
  // Recursively traverse the filesystem, parsing everything found.
  private void doTikaDocuments(File root) throws IOException, 
SolrServerException {

// Simple loop for recursively indexing all the files
// in the root directory passed in.
for (File file : root.listFiles()) {
  if (file.isDirectory()) {
doTikaDocuments(file);
continue;
  }
  // Get ready to parse the file.
  ContentHandler textHandler = new BodyContentHandler();
  Metadata metadata = new Metadata();
  ParseContext context = new ParseContext();
  // Tim Allison noted the following, thanks Tim!
  // If you want Tika to parse embedded files (attachments within your .doc 
or any other embedded
  // files), you need to send in the autodetectparser in the parsecontext:
  // context.set(Parser.class, autoParser);

  InputStream input = new FileInputStream(file);

  // Try parsing the file. Note we haven't checked at all to
  // see whether this file is a good candidate.
  try {
autoParser.parse(input, textHandler, metadata, context);
  } catch (Exception e) {
// Needs better logging of what went wrong in order to
// track down "bad" documents.
System.out.println(String.format("File %s failed", 
file.getCanonicalPath()));
e.printStackTrace();
continue;
  }
  // Just to show how much meta-data and what form it's in.
  dumpMetadata(file.getCanonicalPath(), metadata);

  // Index just a couple of the meta-data fields.
  SolrInputDocument doc = new SolrInputDocument();

  doc.addField("id", file.getCanonicalPath());

  // Crude way to get known meta-data fields.
  // Also possible to write a simple loop to examine all the
 

Re: Query-time synonyms without indexing

2019-08-28 Thread Erick Erickson
Not sure. You have an

section and 


section. Frankly I’m not sure which one will be used for the index-time chain.

Why don’t you just try it?
change

to 


reload and go. It’d take you 5 minutes and you’d have your answer.

Best,
Erick


> On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen  
> wrote:
> 
> Yes, but isn't that what I am already doing in this case (look at the
> fieldType in the original mail)?
> Is there some other way to specify that field type and achieve what I want?
> 
> Thanks,
> Bjarke
> 
> On Tue, Aug 27, 2019, 17:32 Erick Erickson  wrote:
> 
>> You can have separate index and query time analysis chains, there are many
>> examples in the stock Solr schemas.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
>> morten...@eluence.com> wrote:
>>> 
>>> We have a solr file of type "string".
>>> It turns out that we need to do synonym expansion on query time in order
>> to
>>> account for some changes over time in the values stored in that field.
>>> 
>>> So we have tried introducing a custom fieldType that applies the synonym
>>> filter at query time only (see bottom of mail), but that requires us to
>>> change the field. But now, when we index new documents, Solr complains:
>>> 400 Bad Request
>>> Error: 'Exception writing document id someid to the index; possible
>>> analysis error: cannot change field "auth_country_code" from index
>>> options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
>>> 
>>> Since we are only making query time changes, I would really like to not
>>> have to reindex our entire collection. Is that possible somehow?
>>> 
>>> Thanks,
>>> Bjarke
>>> 
>>> 
>>> >> sortMissingLast="true" positionIncrementGap="100">
>>>   
>>> 
>>>   
>>>   
>>>   
>>>   >> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>>>   
>>> 
>> 
>> 



Re: Problems with restricting access to users using Basic auth

2019-08-28 Thread Jason Gerlowski
Hi Salmaan,

Are you still seeing this behavior, or were you able to figure things out?

I just got a chance to try out the security.json in Solr 7.6 myself,
and I can't reproduce the behavior you're seeing.

It might be helpful to level set here.  Make sure that our
security.json settings and our test requests are exactly the same.

This is the security.json I used in my test deployment:

{
  "authentication":{
   "blockUnknown": true,
   "class":"solr.BasicAuthPlugin",
   "credentials":{
 "solr":"gP31s0FQevh3k0i0y6g9AP/TZLWctxfZjtC9sOh8vZU=
J7an406gVyx4v4CkR8YLgmhClk9Yv/fIBSfZoi1f0kY=",
 "solr-user":"gP31s0FQevh3k0i0y6g9AP/TZLWctxfZjtC9sOh8vZU=
J7an406gVyx4v4CkR8YLgmhClk9Yv/fIBSfZoi1f0kY="
   }
  },
  "authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions":[
  {"name": "dev-read", "collection": ["collection1",
"collection2"], "role": ["dev", "admin"] },
  {"name": "security-edit", "role": "admin"},
  {"name": "security-read", "role": "admin"},
  {"name": "schema-edit", "role": "admin"},
  {"name": "schema-read", "role": "admin"},
  {"name": "config-edit", "role": "admin"},
  {"name": "config-read", "role": "admin"},
  {"name": "core-admin-edit", "role": "admin"},
  {"name": "core-admin-read", "role": "admin"},
  {"name": "collection-api-edit", "role": "admin"},
  {"name": "collection-api-read", "role": "admin"},
  {"name": "read", "role": "admin"},
  {"name": "update", "role": "admin"},
  {"name": "all", "role": "admin"}
   ],
   "user-role":{
 "solr":"admin",
 "solr-user": "dev"
   }
  }
}

And this is the output of a script I use to test permissions quickly:

$ ./test-security.sh

Testing permissions for user [solr]
Request [/admin/collections?action=LIST] returned status [200]
Request [/collection1/select?q=*:*] returned status [200]
Request [/collection2/select?q=*:*] returned status [200]
Request [/collection3/select?q=*:*] returned status [200]

Testing permissions for user [solr-user]
Request [/admin/collections?action=LIST] returned status [403]
Request [/collection1/select?q=*:*] returned status [200]
Request [/collection2/select?q=*:*] returned status [200]
Request [/collection3/select?q=*:*] returned status [403]

You can find this script here, to see the exact curl commands being
used and run it yourself: https://paste.apache.org/tjtdg

That output looks correct to me.  solr-user is prevented from
accessing other APIs and other collections, but can access collection1
and collection2.

Does your security.json match mine, or do the permissions differ in
some way?  Can you still reproduce the behavior using my script?

Good luck,

Jason

On Thu, Aug 22, 2019 at 2:13 AM Salmaan Rashid Syed
 wrote:
>
> Hi,
>
> Any suggestions as to what can be done?
>
> Regards,
> Salmaan
>
>
> On Wed, Aug 21, 2019 at 4:33 PM Jason Gerlowski 
> wrote:
>
> > Ah, ok.  SOLR-13355 still affects 7.6, so that explains why you're
> > seeing this behavior.
> >
> > You could upgrade to get the new behavior, but you don't need to-
> > there's a workaround.  You just need to add a few extra rules to your
> > security.json.  The problem in SOLR-13355 is that the "all" permission
> > isn't being considered for APIs that are covered by other predefined
> > permissions.  So the workaround is to add a permission rule for each
> > of the predefined permissions, locking them down to the "admin" role.
> > It really bloats security.json, but should do the job.  So your
> > security.json should have a permissions section that looks like the
> > JSON below:
> >
> > {"name": "dev-read", "collection": ["collection1", "collection2"],
> > "role": "dev"},
> > {"name": "security-edit", "role": "admin"},
> > {"name": "security-read", "role": "admin"},
> > {"name": "schema-edit", "role": "admin"},
> > {"name": "schema-read", "role": "admin"},
> > {"name": "config-edit", "role": "admin"},
> > {"name": "config-read", "role": "admin"},
> > {"name": "core-admin-edit", "role": "admin"},
> > {"name": "core-admin-read", "role": "admin"},
> > {"name": "collection-api-edit", "role": "admin"},
> > {"name": "collection-api-read", "role": "admin"},
> > {"name": "read", "role": "admin"},
> > {"name": "update", "role": "admin"},
> > {"name": "all", "role": "admin"}
> >
> > Hope that helps.  Let me know if that still has any problems for you.
> >
> > Jason
> >
> > On Wed, Aug 21, 2019 at 6:48 AM Salmaan Rashid Syed
> >  wrote:
> > >
> > > Hi Jason,
> > >
> > > Is there a way to fix this in version 7.6?
> > >
> > > Or is it mandatory to upgrade to other versions?
> > >
> > > If I have to upgrade to a higher version, then what is the best way to do
> > > this without effecting the current configuration and indexed data?
> > >
> > > Thanks,
> > > Salmaan
> > >
> > >
> > >
> > > On Wed, Aug 21, 2019 at 4:13 PM Salmaan Rashid Syed <
> > > salmaan.ras...@mroads.com> wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > I am using version 7.6 

RE: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-28 Thread Wittenberg, Lucas
Ok, thank you Erick and Toke.
As suggested I switched to using DocValues and SortedDocValues.
Now QTime is down to an average of 1100, which is much, much better but still 
far from the 30 I had with SOLR 4.
I suppose it is due to the block-oriented compression you mentioned. Not sure 
if it is possible to improve this even more. Is it possible/wise to disable the 
compression?
Anyway, really appreciate the support. Thanks.
/cheers

-Message d'origine-
De : Wittenberg, Lucas 
Envoyé : mardi 27 août 2019 11:06
À : solr-user@lucene.apache.org
Objet : RE: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector 
and PostFilter

Thanks for the suggestion.
But the "customid" field is already set as docValues="true" actually.
Well, I guess so as it is a type="string" which by default has docValues="true".

 


-Message d'origine-
De : Wittenberg, Lucas
Envoyé : lundi 26 août 2019 18:01
À : solr-user@lucene.apache.org
Objet : SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and 
PostFilter

Hello all,
Here is the situation I am facing.

I am migrating from SOLR 4 to SOLR 7. SOLR 4 is running on Tomcat 8, SOLR 7 
runs with built in Jetty 9.
The largest core contains about 1,800,000 documents (about 3 GB).

The migration went through smoothly. But something's bothering me.

I have a PostFilter to collect only some documents according to a pre-selected 
list.

Here is the code for the org.apache.solr.search.DelegatingCollector:

@Override
protected void doSetNextReader(LeafReaderContext context) throws 
IOException {
this.reader = context.reader();
super.doSetNextReader(context);
}

@Override
public void collect(int docNumber) throws IOException {
if (null != this.reader && 
isValid(this.reader.document(docNumber).get("customid")))
{
super.collect(docNumber);
}
}

private boolean isValid(String customId) {
boolean valid = false;
if (null != customMap) // HashMap, contains the 
custom IDs to keep. Contains an average of 2k items
{
valid = customMap.get(customId) != null;
}

return valid;
}

And here is an example of query sent to SOLR:


/select?fq=%7B!MyPostFilter%20sessionid%3DWST0DEV-QS-5BEEB1CC28B45580F92CCCEA32727083=system%20upgrade

So, the problem is:
- It runs pretty fast on SOLR 4, with average QTime equals to 30.
- But now on SOLR 7, it is awfully slow with average QTime around 25000!

And I am wondering what can be the source of such bad performances...

With a very simplified (or should I say transparent) collect function (see 
below), there is no degradation. This test just to exclude server/platform from 
the equation.

@Override
public void collect(int docNumber) throws IOException {
super.collect(docNumber);
}

My guess is that since LUCENE 7, there have been drastic changes in the way the 
API access documents, but I am not sure to have understood everything.
I got it from this post: 
https://stackoverflow.com/questions/48474506/how-to-get-docvalue-by-document-id-in-lucene-7

I suppose this has something to do with the issues I am facing.
But I have no idea how to upgrade/change my PostFilter and/or 
DelegatingCollector to go back to good performances.

If any LUCENE/SOLR experts could provide some hints or leads, it would be very 
appreciated.
Thanks in advance.


PS:
In the core schema:



This field is string-type as it can be something like "100034_001".

In the solrconfig.xml:



I can share the full schema and solrconfig.xml files if needed but so far, 
there is no other particular configuration in there.
This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.



Re: Problem of Shutdown Process for Windows Server

2019-08-28 Thread Shawn Heisey

On 8/28/2019 4:01 AM, Kayak28 wrote:

I use Solr with Windows servers, and cannot shutdown Solr successfully.
When I try to stop Solr using solr.cmd, which is kicked from Windows Task
Manager, it "looks" like Solr stops without any problem.
Here "looks" means that at least log file that Solr wrote does not seem to
have any error.
(I pasted a piece of the log where I believe "success" at the end of this
email )


The solr.cmd script will try to stop Solr gracefully, wait five seconds, 
and then forcibly terminate it.  We have modified this operation in 
recent versions for operating systems other than windows, so that the 
bash script will wait up to three minutes for Solr to terminate 
gracefully before it is forcibly terminated.  But this has not been done 
on Windows.



*Environment*
OS: Windows Server 2012 R2
Java: Oracle JDK 1.8.0
Solr  Version: 5.2.1


Which of the many Java 8 versions are you running?  1.8.0 is not 
specific enough.  You should be running at least build 40, and something 
numbered above 100 would be better.  The latest 1.8.0 versions of Oracle 
Java have a different license than earlier versions, something you might 
need to be aware of.  Oracle is now requiring a paid license for any 
production use of their Java.  Only development can be done for free.



Solr Structures:15 Solr server, enabled to distributed search with sharding
(Not using SolrCloud)
Memory(Solr / physical) : 20GB/32GB
Index Size: around 300GB


You should probably have at least 128GB of total system memory for 300GB 
of index, and 256GB would be better.  Assuming that there is no software 
other than Solr on this machine, you only have about 12GB of memory left 
to cache that 300GB of index data.  If there is other software on the 
system, there will probably be even less memory available.  This could 
cause Solr to be very slow to shut down.


Maybe the user that's running the stop command doesn't have permission 
to forcibly terminate the Solr process.  In which case you would have to 
wait for the graceful shutdown, and as I just mentioned, that could be 
very slow on your setup.


Thanks,
Shawn


Solutio for long time highlighting

2019-08-28 Thread SOLR4189
Hi all.

In our team we thought about some tricky solution for queries with long time
highlighting. For example, highlighting that takes more than 25 seconds. So,
we created our component that wraps highlighting component of SOLR in this
way:

public void inform(SolrCore core) {
. . . .
subSearchComponent = core.getSearchComponent("highlight");
. . . .
}

public void process(ResponseBuilder rb) throws Exception {
long timeout = 25000;
ExecutorService exec = null:
try {
exec = Executors.newSingleThreadExecutor();
Future future = exec.submit(() -> {
try {
subSearchComponent.process(rb);
} catch (IOException e) {
return e;
} 
return null;
});
Exception ex = future.get(timeout, TimeUnit.MILLISECONDS);
if (ex != null) {
throw ex;
}
} catch ( TimeoutException toe) {
. . . .
} catch (Exception e) {
   throw new IOException(e);
} finally {
if (exec != null) {
exec.shutdownNow();
}
}
}

This solution works, but sometime we see that searchers stay open and as a
result our RAM usage is pretty high (like a memory leak of SolrIndexSearcher
objects). And only after a SOLR service restart they disappear.

What do you think about this solution?
Maybe exists some built-in function for it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Shawn Heisey

On 8/27/2019 8:22 AM, Pushkar Raste wrote:

I am trying to run Solr 4 on JDK11, although this version is not supported
on JDK11 it seems to be working fine except for the error/exception "Unmap
hack not supported on this platform".
What the risks/downsides of running into this.


The first version of Solr that was qualified with Java 9 was Solr 7.0.0. 
 New Java versions did not work properly with older versions of Solr. 
Java 8 is as high as you can go with Solr 4.


Solr versions up through 4.7.x have a minimum Java version requirement 
of Java 6.  From 4.8.0 through 5.x, Java 7 is required as a minimum. 
Starting with Solr 6.0.0, the minimum requirement moved to Java 8.  When 
Solr 9.0.0 is released, its minimum requirement will be Java 11.


Right now, with Solr 8.x being the current version, Solr 7.x is only 
going to get major bugfixes, and there will be no updates at all to 
version 6.x and older.  The problem you're running into with Solr 4 on 
Java 11 will not be fixed.  If you want to run Java 11, you will need to 
upgrade to the latest Solr 7.x or 8.x.  Early 7.x versions would not 
work with Java 10 or later.


Thanks,
Shawn


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Pushkar Raste
Can someone help me with this?

On Tue, Aug 27, 2019 at 10:22 AM Pushkar Raste 
wrote:

> Hi,
> I am trying to run Solr 4 on JDK11, although this version is not supported
> on JDK11 it seems to be working fine except for the error/exception "Unmap
> hack not supported on this platform".
> What the risks/downsides of running into this.
>
-- 
— Pushkar Raste


Re: Clustering error - Solr 8.2

2019-08-28 Thread Erick Erickson
What it says ;) 

My guess is that your configuration mentions the field “features” in, perhaps 
carrot.snippet or carrot.title.

But it’s a guess.

Best,
Erick

> On Aug 28, 2019, at 5:18 PM, Joe Obernberger  
> wrote:
> 
> Hi All - trying to use clustering with SolrCloud 8.2, but getting this error:
> 
> "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query 
> Field 'features' is not a valid field name",
> 
> The URL, I'm using is:
> http://solrServer:9100/solr/DOCS/select?q=*%3A*=/clustering=true=true
>   
> 
> 
> Thanks for any ideas!
> 
> Complete response:
> {
>  "responseHeader":{
>"zkConnected":true,
>"status":400,
>"QTime":38,
>"params":{
>  "q":"*:*",
>  "qt":"/clustering",
>  "clustering":"true",
>  "clustering.collection":"true"}},
>  "error":{
>"metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","org.apache.solr.common.SolrException",
>  
> "error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException",
>  
> "root-error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException"],
>"msg":"Error from server at null: org.apache.solr.search.SyntaxError: 
> Query Field 'features' is not a valid field name",
>"code":400}}
> 
> 
> -Joe
> 



Solr query exclude parent when no _childDocuments exists

2019-08-28 Thread craftlogan
So in Solr I have a data structure that is made of of Restaurant Documents
that have Deals attached as _childDocuments to a Restaurant. However with my
current query

{!parent which='content_type:restaurant'}_text_=kids
=*,[child parentFilter=content_type:restaurant
childFilter="content_type:deal AND day:Monday" limit=3]

I will receive the response below but I want to exclude the bottom
restaurant Bob Evans because I want to exclude any restaurant that has no
child documents(deals) attached. Any help on this query will be greatly
appreciated.




{
  id: "restaurant_7",
  content_type: "restaurant",
  deal_image: [
"/storage/restaurants/images/BoneFire Smokehouse/BoneFire
Smokehousedeal_image.png"
  ],
  name: [
"BoneFire Smokehouse"
  ],
  cuisine: [
" BBQ"
  ],
  description: [
"In 2009, The Bone Fire Smokehouse was originally located in Kingsport,
Tennessee. In 2012, sensing a real renaissance of economic and musical
growth, this popular eatery and entertainment venue was moved to historic
downtown Abingdon. The new Smokehouse opened in the former Withers Hardware
Store building, and a new better Bone Fire Smokehouse came to life in
another vintage location. They have become a popular choice for both
residents and tourists, serving as an integral part of the growing Abingdon
Dining and Musical Renaissance for the last five years."
  ],
  phone: [
42389589166
  ],
  city: "Abingdon",
  state: "Virginia",
  zip: 24210,
  country: "United States",
  full_address: [
"260 W Main St, Abingdon, VA 24210, USA"
  ],
  latlon: "36.7089569,-81.9791766",
  _version_: 1642803109735432200,
  _childDocuments_: [
{
  id: "restaurant_7_deal_149",
  deal: "Free Kid's Meal with Purchase of Adult Meal",
  deal_type: "Kids",
  content_type: "deal",
  day: "Monday",
  _version_: 1642803109735432200
}
  ]
},
{
  id: "restaurant_5",
  content_type: "restaurant",
  deal_image: [
   
"/Applications/MAMP/htdocs/dailyeatz/storage/app/public/restaurants/images/Bob
Evans/Bob Evansdeal_image.png"
  ],
  name: [
"Bob Evans"
  ],
  cuisine: [
" American"
  ],
  description: [
"Our success is built on the basics: high-quality food and heartfelt
hospitality. In the words of our founder, Everybody is somebody at Bob
Evans. We invite you to join us for a meal as we bring the values of the
farm to the table by providing, flavorful meals at our place or yours."
  ],
  phone: [
42389589166
  ],
  city: "Johnson City",
  state: "Tennessee",
  zip: 37615,
  country: "United States",
  full_address: [
"2801 Boones Creek Rd, Johnson City, TN 37615, USA"
  ],
  latlon: "36.3802985,-82.4254633",
  _version_: 1642803106647376000
}



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Erick Erickson
No, you cannot just use the collection name. Replicas are just cores.
You can host many replicas of a single collection on a single Solr node
in a single CoreContainer (there’s only one per Solr JVM). If you just
specified a collection name how would the code have any clue which 
of the possibilities to return?

The name is in the form collection_shard1_replica_n21

How do you know where the doc you’re working on? Put the ID through
the hashing mechanism.

This isn’t the same at all if you’re running stand-alone, then there’s only
one name.

But as I indicated above, your ask for just using the collection name isn’t
going to work by definition.

So perhaps this is an XY problem. You’re asking about getCore, which is
a very specific, low-level concept. What are you trying to do at a higher
level? Why do you think you need to get a core? What do you want to _do_
with the doc that you need the core it resides in?

Best,
Erick

> On Aug 28, 2019, at 5:28 PM, Arnold Bronley  wrote:
> 
> Wait, would I need to use core name like  collection1_shard1_replica_n4
> etc/? Can't I use collection name? What if  I have multiple shards, how
> would I know where does the document that I am working with lives in
> currently.
> I would rather prefer to use collection name and expect the core
> information to be abstracted out that way.
> 
> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
> wrote:
> 
>> Hmmm, should work. What is your core_name? There’s strings like
>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the
>> right one?
>> 
>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> In a custom Solr plugin code,
>>> req.getCore().getCoreContainer().getCore(core_name) is returning null
>> even
>>> if core by name core_name is loaded and up in Solr. req is object
>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
>>> 
>>> Any ideas on why this might be the case?
>> 
>> 



Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Arnold Bronley
Hi Erick,

I have configured the SolrCloud collection-wise only and there is no other
way. The way you have defined 3 zkHosts (comma separated values for zkHost
property), I tried that one before as it was more intuitive. But it did not
work for me. I had to use 3 different replica elements each for one of the
3 SolrCloud clusters. source and target properties mention the same
collection name in my case. Instead of hardcoding it, I am using the
collection.configName variable which gets replaced by the collection name
to which this solrconfig.xml belongs to.

If follow your configuration (which does not work in my case and I have
tested it), my question was how to NOT send CDCR updates to targetZkHost2
and targetZkHost3 but not targetZkHost1?

On Tue, Aug 13, 2019 at 3:23 PM Erick Erickson 
wrote:

> You configure CDCR by _collection_, so this question really makes no
> sense.
> You’d never mention collection.configName. So what I suspect is that you’re
> misreading the docs.
>
> 
> ${targetZkHost1},${targetZkHost2},${targetZkHost3}
> sourceCollection_on_local_cluster
> targetCollection_on_targetZkHost1 2 and 3
> 
>
> “Turning off CDCR” selective for ZooKeeper instances really makes no sense
> as the
> point of ZK ensembles is to keep running even if one goes away.
>
> So can you rephrase the question? Or state the problem you’re trying to
> solve another way?
>
> Best,
> Erick
>
> > On Aug 13, 2019, at 1:57 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > Is there a way to turn off the CDCR for only selected target clusters.
> >
> > Say, I have a configuration like following. I have 3 target clusters
> > targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn
> off
> > the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
> > targetZkHost1?
> >
> > E.g.
> >
> >  
> > 
> > ${targetZkHost1}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > ${targetZkHost2}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > ${targetZkHost3}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > 8
> > 1000
> > 128
> > 
> >
> > 
> > 1000
> > 
> >
> > 
> > disabled
> > 
> >  
>
>


req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Arnold Bronley
Hi,

In a custom Solr plugin code,
req.getCore().getCoreContainer().getCore(core_name) is returning null even
if core by name core_name is loaded and up in Solr. req is object
of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.

Any ideas on why this might be the case?


Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-28 Thread Toke Eskildsen
Wittenberg, Lucas  wrote:
> As suggested I switched to using DocValues and SortedDocValues.
> Now QTime is down to an average of 1100, which is much, much better
> but still far from the 30 I had with SOLR 4.
> I suppose it is due to the block-oriented compression you mentioned.

I apologize for being unclear: Only stored fields are block compressed in Solr 
7. doc values for string fields are ... well, also compressed, but in much 
smaller blocks (prefix compression as far as I remember) and each string field 
separately, so they should be very fast to access.

1100 ms for Solr 7 vs. 30 ms for Solr 4 sounds like a huge difference. You 
don't by chance use a fully merged (aka "optimized") index? Doc values in Solr 
7 can (very counter-intuitively) suffer from that for some access patterns.

Maybe you are doing something sub-optimal like calling DocValues.getSorted for 
each collect call? Could you share your code somewhere?

- Toke Eskildsen


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Arnold Bronley
Wait, would I need to use core name like  collection1_shard1_replica_n4
etc/? Can't I use collection name? What if  I have multiple shards, how
would I know where does the document that I am working with lives in
currently.
I would rather prefer to use collection name and expect the core
information to be abstracted out that way.

On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
wrote:

> Hmmm, should work. What is your core_name? There’s strings like
> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the
> right one?
>
> > On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > In a custom Solr plugin code,
> > req.getCore().getCoreContainer().getCore(core_name) is returning null
> even
> > if core by name core_name is loaded and up in Solr. req is object
> > of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> >
> > Any ideas on why this might be the case?
>
>


Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Shawn Heisey

On 8/28/2019 1:42 PM, Arnold Bronley wrote:

I have configured the SolrCloud collection-wise only and there is no other
way. The way you have defined 3 zkHosts (comma separated values for zkHost
property), I tried that one before as it was more intuitive. But it did not
work for me. I had to use 3 different replica elements each for one of the
3 SolrCloud clusters. source and target properties mention the same
collection name in my case. Instead of hardcoding it, I am using the
collection.configName variable which gets replaced by the collection name
to which this solrconfig.xml belongs to.


I am pretty sure that ${collection.configName} refers to the 
configuration name stored in zookeeper, NOT the collection name.  There 
is nothing at all in Solr that requires those names to be the same, and 
for many SolrCloud installs, they are not the same.  If this is working 
for you, then you're probably naming your configs the same as the 
collection.  If you were to ever use the same config on multiple 
collections, that would probably stop working.


I do not know if there is a property with the collection name.  There 
probably is.


Thanks,
Shawn


Clustering error - Solr 8.2

2019-08-28 Thread Joe Obernberger
Hi All - trying to use clustering with SolrCloud 8.2, but getting this 
error:


"msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 
'features' is not a valid field name",

The URL, I'm using is:
http://solrServer:9100/solr/DOCS/select?q=*%3A*=/clustering=true=true
  


Thanks for any ideas!

Complete response:
{
  "responseHeader":{
"zkConnected":true,
"status":400,
"QTime":38,
"params":{
  "q":"*:*",
  "qt":"/clustering",
  "clustering":"true",
  "clustering.collection":"true"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException",
  
"error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException",
  
"root-error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException"],
"msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query 
Field 'features' is not a valid field name",
"code":400}}


-Joe



Re: 8.2.0 getting warning - unable to load jetty, not starting JettyAdminServer

2019-08-28 Thread Arnold Bronley
@Furkan: You might be right. I am getting this permission error when I
start the Solr but it hasn't caused any visible issues yet.
 /opt/solr/bin/solr: line 2130: /var/solr/solr-8983.pid: Permission denied

On Wed, Aug 21, 2019 at 6:33 AM Martijn Koster 
wrote:

> Hi Arnold,
>
> It’s hard to say without seeing exactly what you’re doing and exactly what
> you’re seeing.
> Simplify it first, ie remove your custom plugins and related config and
> see if the problem reproduces still, then try without cloud mode and see it
> it reproduces still. Then create an issue on
> https://github.com/docker-solr/docker-solr/issues <
> https://github.com/docker-solr/docker-solr/issues>, labelled as a
> question, with the exact command you run and its full output, and attach
> your zipped-up project directory (Dockerfile, config files and plugins, and
> full docker log output).
>
> — Martijn
>
> > On 20 Aug 2019, at 19:26, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > I am using 8.2.0-slim version. I wrap it in my own image by specifying
> some
> > additional settings in Dockerfile (all it does is specify a custom Solr
> > home, copy my config files and custom Solr plugins to container and boot
> in
> > SolrCloud mode).
> > All things same, if I just change version from 8.2.0-slim to 8.1.1-slim
> > then I do not get any such warning.
> >
> > On Tue, Aug 20, 2019 at 5:01 AM Furkan KAMACI 
> > wrote:
> >
> >> Hi Arnold,
> >>
> >> Such errors may arise due to file permission issues. I can run latest
> >> version without of Solr via docker image without any errors. Could you
> >> write which steps do you follow to run Solr docker?
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Tue, Aug 20, 2019 at 1:25 AM Arnold Bronley  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am getting following warning in Solr admin UI logs. I did not get
> this
> >>> warning in Solr 8.1.1
> >>> Please note that I am using Solr docker slim image from here -
> >>> https://hub.docker.com/_/solr/
> >>>
> >>> Unable to load jetty, not starting JettyAdminServer
> >>>
> >>
>
>


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Pushkar Raste
I understand that the problem will not be fixed. What I am trying to
understand is even with the exception (the only exception I saw after
running my Solr4 cluster on JDK11 for 4 weeks), I am able index and query
documents just fine.

What does this exception really affect.

On Wed, Aug 28, 2019 at 3:08 PM Shawn Heisey  wrote:

> On 8/27/2019 8:22 AM, Pushkar Raste wrote:
> > I am trying to run Solr 4 on JDK11, although this version is not
> supported
> > on JDK11 it seems to be working fine except for the error/exception
> "Unmap
> > hack not supported on this platform".
> > What the risks/downsides of running into this.
>
> The first version of Solr that was qualified with Java 9 was Solr 7.0.0.
>   New Java versions did not work properly with older versions of Solr.
> Java 8 is as high as you can go with Solr 4.
>
> Solr versions up through 4.7.x have a minimum Java version requirement
> of Java 6.  From 4.8.0 through 5.x, Java 7 is required as a minimum.
> Starting with Solr 6.0.0, the minimum requirement moved to Java 8.  When
> Solr 9.0.0 is released, its minimum requirement will be Java 11.
>
> Right now, with Solr 8.x being the current version, Solr 7.x is only
> going to get major bugfixes, and there will be no updates at all to
> version 6.x and older.  The problem you're running into with Solr 4 on
> Java 11 will not be fixed.  If you want to run Java 11, you will need to
> upgrade to the latest Solr 7.x or 8.x.  Early 7.x versions would not
> work with Java 10 or later.
>
> Thanks,
> Shawn
>
-- 
— Pushkar Raste


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Jörn Franke
It is simply a risk. It is not tested. Any functionality may fail eventually or 
have unknown side effects in the long run. It is also not clear to me why you 
want to update Java, but not Solr. If you want the latest security fixes, bug 
fixes and new features then I would go first for a new Solr version and 
afterwards for a newer JDK.

> Am 28.08.2019 um 21:58 schrieb Pushkar Raste :
> 
> I understand that the problem will not be fixed. What I am trying to
> understand is even with the exception (the only exception I saw after
> running my Solr4 cluster on JDK11 for 4 weeks), I am able index and query
> documents just fine.
> 
> What does this exception really affect.
> 
>> On Wed, Aug 28, 2019 at 3:08 PM Shawn Heisey  wrote:
>> 
>>> On 8/27/2019 8:22 AM, Pushkar Raste wrote:
>>> I am trying to run Solr 4 on JDK11, although this version is not
>> supported
>>> on JDK11 it seems to be working fine except for the error/exception
>> "Unmap
>>> hack not supported on this platform".
>>> What the risks/downsides of running into this.
>> 
>> The first version of Solr that was qualified with Java 9 was Solr 7.0.0.
>>  New Java versions did not work properly with older versions of Solr.
>> Java 8 is as high as you can go with Solr 4.
>> 
>> Solr versions up through 4.7.x have a minimum Java version requirement
>> of Java 6.  From 4.8.0 through 5.x, Java 7 is required as a minimum.
>> Starting with Solr 6.0.0, the minimum requirement moved to Java 8.  When
>> Solr 9.0.0 is released, its minimum requirement will be Java 11.
>> 
>> Right now, with Solr 8.x being the current version, Solr 7.x is only
>> going to get major bugfixes, and there will be no updates at all to
>> version 6.x and older.  The problem you're running into with Solr 4 on
>> Java 11 will not be fixed.  If you want to run Java 11, you will need to
>> upgrade to the latest Solr 7.x or 8.x.  Early 7.x versions would not
>> work with Java 10 or later.
>> 
>> Thanks,
>> Shawn
>> 
> -- 
> — Pushkar Raste


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Shawn Heisey

On 8/28/2019 1:58 PM, Pushkar Raste wrote:

What does this exception really affect.


I believe it is related in some way to how Lucene uses Java's MMAP 
capability to access data on disk.  The MMAP functionality that Lucene 
uses required changes to properly support later Java versions.  There 
were also probably other changes required.


I have no idea whether or not Lucene can still function properly when 
you run into that exception.  Your experience suggests that it can, but 
you're in untested territory, and you may find, as Jörn was saying, that 
in the long run you are damaging your index and that it could stop 
working in some catastrophic way.


Thanks,
Shawn


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Erick Erickson
Hmmm, should work. What is your core_name? There’s strings like 
collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the 
right one?

> On Aug 28, 2019, at 3:56 PM, Arnold Bronley  wrote:
> 
> Hi,
> 
> In a custom Solr plugin code,
> req.getCore().getCoreContainer().getCore(core_name) is returning null even
> if core by name core_name is loaded and up in Solr. req is object
> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> 
> Any ideas on why this might be the case?



Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Arnold Bronley
@Shawn: You are right. In my case, the collection name is same as
configuration name and that is why it works. Do you know if there is some
other property that I can use that refers to the collection name instead?

On Wed, Aug 28, 2019 at 3:52 PM Shawn Heisey  wrote:

> On 8/28/2019 1:42 PM, Arnold Bronley wrote:
> > I have configured the SolrCloud collection-wise only and there is no
> other
> > way. The way you have defined 3 zkHosts (comma separated values for
> zkHost
> > property), I tried that one before as it was more intuitive. But it did
> not
> > work for me. I had to use 3 different replica elements each for one of
> the
> > 3 SolrCloud clusters. source and target properties mention the same
> > collection name in my case. Instead of hardcoding it, I am using the
> > collection.configName variable which gets replaced by the collection name
> > to which this solrconfig.xml belongs to.
>
> I am pretty sure that ${collection.configName} refers to the
> configuration name stored in zookeeper, NOT the collection name.  There
> is nothing at all in Solr that requires those names to be the same, and
> for many SolrCloud installs, they are not the same.  If this is working
> for you, then you're probably naming your configs the same as the
> collection.  If you were to ever use the same config on multiple
> collections, that would probably stop working.
>
> I do not know if there is a property with the collection name.  There
> probably is.
>
> Thanks,
> Shawn
>


Re: Problem of Shutdown Process for Windows Server

2019-08-28 Thread Kayak28
Hello, Shawn and Community:

Thank you for a quick response, and giving useful information.

As far as I understand,  the main cause of this problem is something like
"time out."
This time out could perhaps happen due to insufficient memory for our index
size, so waiting time is not enough to gracefully stop Solr.
And in the case time out happens, a user should have the right permission
to kill Solr process.
If the user is not eligible to kill it, Solr process remains to use its
port number.
As a result, when I start Solr again, it shows me "Address already in use."

The possible actions that I can do to resolve this issue are to have bigger
memory and to check the user's permission.
Do I understand what you emailed me correctly?

I believe the following lines of code will shutdown Solr forcefully(
because comment out says so )
These codes come from line 718-735 of solr.cmd file.
Do you think it can be better if I edit "timeout /T 5" to a greater number
so that waiting time becomes longer than 5 seconds?
Longer waiting time will help the system to shutdown Solr gracefully?

IF "%%x"=="0.0.0.0" (
set found_it=1
@echo Stopping Solr process %%N running on port %SOLR_PORT%
set /A STOP_PORT=%SOLR_PORT% - 1000
"%JAVA%" %SOLR_SSL_OPTS% -Djetty.home="%SOLR_SERVER_DIR%" -jar
"%SOLR_SERVER_DIR%\start.jar" "%SOLR_JETTY_CONFIG%" STOP.PORT=!STOP_PORT!
STOP.KEY=%STOP_KEY% --stop
del "%SOLR_TIP%"\bin\solr-%SOLR_PORT%.port
*timeout /T 5*
REM Kill it if it is still running after the graceful
For /f "tokens=2,5" %%j in ('netstat -nao ^| find "TCP " ^|
find ":%SOLR_PORT% "') do (
  IF "%%N"=="%%k" (
for /f "delims=: tokens=1,2" %%a IN ("%%j") do (
  IF "%%a"=="0.0.0.0" (
@echo Forcefully killing process %%N
  *  taskkill /f /PID %%N*
  )
)
  )
)
  )



I need to check my Java version later, but I wonder why Java version should
be at least 1.8.0_40?

Again, thank you for your response.

Sincerely,
Kaya Ota






2019年8月29日(木) 0:33 Shawn Heisey :

> On 8/28/2019 4:01 AM, Kayak28 wrote:
> > I use Solr with Windows servers, and cannot shutdown Solr successfully.
> > When I try to stop Solr using solr.cmd, which is kicked from Windows Task
> > Manager, it "looks" like Solr stops without any problem.
> > Here "looks" means that at least log file that Solr wrote does not seem
> to
> > have any error.
> > (I pasted a piece of the log where I believe "success" at the end of this
> > email )
>
> The solr.cmd script will try to stop Solr gracefully, wait five seconds,
> and then forcibly terminate it.  We have modified this operation in
> recent versions for operating systems other than windows, so that the
> bash script will wait up to three minutes for Solr to terminate
> gracefully before it is forcibly terminated.  But this has not been done
> on Windows.
>
> > *Environment*
> > OS: Windows Server 2012 R2
> > Java: Oracle JDK 1.8.0
> > Solr  Version: 5.2.1
>
> Which of the many Java 8 versions are you running?  1.8.0 is not
> specific enough.  You should be running at least build 40, and something
> numbered above 100 would be better.  The latest 1.8.0 versions of Oracle
> Java have a different license than earlier versions, something you might
> need to be aware of.  Oracle is now requiring a paid license for any
> production use of their Java.  Only development can be done for free.
>
> > Solr Structures:15 Solr server, enabled to distributed search with
> sharding
> > (Not using SolrCloud)
> > Memory(Solr / physical) : 20GB/32GB
> > Index Size: around 300GB
>
> You should probably have at least 128GB of total system memory for 300GB
> of index, and 256GB would be better.  Assuming that there is no software
> other than Solr on this machine, you only have about 12GB of memory left
> to cache that 300GB of index data.  If there is other software on the
> system, there will probably be even less memory available.  This could
> cause Solr to be very slow to shut down.
>
> Maybe the user that's running the stop command doesn't have permission
> to forcibly terminate the Solr process.  In which case you would have to
> wait for the graceful shutdown, and as I just mentioned, that could be
> very slow on your setup.
>
> Thanks,
> Shawn
>


Re: Question: Solr perform well with thousands of replicas?

2019-08-28 Thread Hendrik Haddorp

Hi,

we are usually using Solr Clouds with 5 nodes and up to 2000 collections
and a replication factor of 2. So we have close to 1000 cores per node.
That is on Solr 7.6 but I believe 7.3 worked as well. We tuned a few
caches down to a minimum as otherwise the memory usage goes up a lot.
The Solr UI is having some problems with a high number of collections,
like lots of timeouts when loading the status.

Older Solr versions had problem with the overseer queue in ZooKeeper. If
you restarted too many nodes at once then the queue got too long and
Solr died and required some help and cleanup to start at all again.

regards,
Hendrik

On 29.08.19 05:27, Hongxu Ma wrote:

Hi
I have a solr-cloud cluster, but it's unstable when collection number is big: 
1000 replica/core per solr node.

To solve this issue, I have read the performance guide:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

I noted there is a sentence on solr-cloud section:
"Recent Solr versions perform well with thousands of replicas."

I want to know does it mean a single solr node can handle thousands of 
replicas? or a solr cluster can (if so, what's the size of the cluster?)

My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)

Thanks for you help.






Question: Solr perform well with thousands of replicas?

2019-08-28 Thread Hongxu Ma
Hi
I have a solr-cloud cluster, but it's unstable when collection number is big: 
1000 replica/core per solr node.

To solve this issue, I have read the performance guide:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

I noted there is a sentence on solr-cloud section:
"Recent Solr versions perform well with thousands of replicas."

I want to know does it mean a single solr node can handle thousands of 
replicas? or a solr cluster can (if so, what's the size of the cluster?)

My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)

Thanks for you help.