Re: Question on Solr/WordPress Integration

2019-03-01 Thread markus kalkbrenner
If you’re more familiar with PHP you can do the same using the Solarium library 
instead of SolrJ for Java.

Once the PDFs are extracted and indexed, Drupal is an alternative to Wordpress 
as Frontend. Using the Serach API Solr module you can access and „present“ any 
existing Solr index without a single line of custom code.

Markus

> Am 02.03.2019 um 01:30 schrieb Erick Erickson :
> 
> Writing a Java (SolrJ) program that traverses a filesystem and extracts the 
> contents of PDF is actually quite simple, see: 
> https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the 
> RDBMS stuff). That code is a little out of date so may need some very minor 
> tweaks.
> 
> Tika (the library Solr uses to parse PDFs and most other files) may have 
> something that makes the job even easier, I’d ask on their user’s list. 
> Putting WordPress in the middle of it all seems unnecessarily complicated.
> 
> Best,
> Erick
> 
>> On Mar 1, 2019, at 11:18 AM, Paul Buiocchi  wrote:
>> 
>> Thank you Shawn !
>> 
>> Sent from Yahoo Mail on Android 
>> 
>> On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi 
>> wrote:   Greetings, 
>> 
>> I have a couple of questions about Solr /Wordpress integration - 
>> 
>> First , I am not "committed to using WordPress as a front end. If there is a 
>> better front end option , I would be willing to convert. For functionality , 
>> all I am looking for is the ability to full txt search , highlight the 
>> search terms in the search results  It should be pretty simple , maybe I 
>> am overanalyzing it  ...Looking for as much "out of the box" as possible 
>> 
>> My scenario is this: 
>> 
>> I am putting together an old newspaper archive site . about 25k pdf files 
>> that are full txt searchable. 
>> 
>> Questions on architecture: 
>> 1) Is there a way for Solr to index from a local file structure i.e local 
>> drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
>> Wordpress/Solr integration , I found that I had to upload the documents in 
>> Wordpress to get Solr to recognize them . 
>> 
>> I'm sure I will have more questions , any help/suggestions would be greatly 
>> appreciated - thank you  
>> 
>> Sent from Yahoo Mail on Android  
> 


Re: Question on Solr/WordPress Integration

2019-03-01 Thread Erick Erickson
Writing a Java (SolrJ) program that traverses a filesystem and extracts the 
contents of PDF is actually quite simple, see: 
https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the 
RDBMS stuff). That code is a little out of date so may need some very minor 
tweaks.

Tika (the library Solr uses to parse PDFs and most other files) may have 
something that makes the job even easier, I’d ask on their user’s list. Putting 
WordPress in the middle of it all seems unnecessarily complicated.

Best,
Erick

> On Mar 1, 2019, at 11:18 AM, Paul Buiocchi  wrote:
> 
> Thank you Shawn !
> 
> Sent from Yahoo Mail on Android 
> 
>  On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi 
> wrote:   Greetings, 
> 
> I have a couple of questions about Solr /Wordpress integration - 
> 
> First , I am not "committed to using WordPress as a front end. If there is a 
> better front end option , I would be willing to convert. For functionality , 
> all I am looking for is the ability to full txt search , highlight the search 
> terms in the search results  It should be pretty simple , maybe I am 
> overanalyzing it  ...Looking for as much "out of the box" as possible 
> 
> My scenario is this: 
> 
> I am putting together an old newspaper archive site . about 25k pdf files 
> that are full txt searchable. 
> 
> Questions on architecture: 
> 1) Is there a way for Solr to index from a local file structure i.e local 
> drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
> Wordpress/Solr integration , I found that I had to upload the documents in 
> Wordpress to get Solr to recognize them . 
> 
> I'm sure I will have more questions , any help/suggestions would be greatly 
> appreciated - thank you  
> 
> Sent from Yahoo Mail on Android  



Re: CloudSolrClient Question

2019-03-01 Thread Erick Erickson
First, that resource leak is worrying. Is there any way you could take a stack 
trace and/or memory dump? I suppose it’d be easy enough to simulate. It’s 
particularly worrying because SolrJ is how Solr<->Solr communications happen so 
if there really is more than transitory leak that’d affect Solr as well.

Second, I don’t think there’s a method that does exactly what you want, 
anything like “isOneNodeHealthyForEachSlice()”, but there is 
DocCollection.getActiveSlices() so I guess you could ask if 
(docCollection.getActiveSlices().length == known_number_of_shards)…

Best
Erick

> On Mar 1, 2019, at 2:24 PM, Webster Homer  
> wrote:
> 
> I am using the CloudSolrClient Solrj api for querying solr cloud collections. 
> For the most part it works well. However we recently experienced a series of 
> outages where our production cloud became unavailable. All the nodes were 
> down. That's a separate topic... The client application tried to launch 
> searches but always experienced a SolrServerException that there were no live 
> nodes available. After a few hundred such exceptions, the application ran out 
> of memory and failed when trying to allocate a thread... I'm not sure where 
> the resources are being leaked in exception handling. Is there a way to ask 
> the CloudSolrClient if there are enough replicas to execute the search.
> 
> I'm using Solr 7.2



CloudSolrClient Question

2019-03-01 Thread Webster Homer
I am using the CloudSolrClient Solrj api for querying solr cloud collections. 
For the most part it works well. However we recently experienced a series of 
outages where our production cloud became unavailable. All the nodes were down. 
That's a separate topic... The client application tried to launch searches but 
always experienced a SolrServerException that there were no live nodes 
available. After a few hundred such exceptions, the application ran out of 
memory and failed when trying to allocate a thread... I'm not sure where the 
resources are being leaked in exception handling. Is there a way to ask the 
CloudSolrClient if there are enough replicas to execute the search.

I'm using Solr 7.2


Re: Question on Solr/WordPress Integration

2019-03-01 Thread Paul Buiocchi
Thank you Shawn !

Sent from Yahoo Mail on Android 
 
  On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi 
wrote:   Greetings, 

I have a couple of questions about Solr /Wordpress integration - 

First , I am not "committed to using WordPress as a front end. If there is a 
better front end option , I would be willing to convert. For functionality , 
all I am looking for is the ability to full txt search , highlight the search 
terms in the search results  It should be pretty simple , maybe I am 
overanalyzing it  ...Looking for as much "out of the box" as possible 

My scenario is this: 

I am putting together an old newspaper archive site . about 25k pdf files that 
are full txt searchable. 

Questions on architecture: 
1) Is there a way for Solr to index from a local file structure i.e local 
drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
Wordpress/Solr integration , I found that I had to upload the documents in 
Wordpress to get Solr to recognize them . 

I'm sure I will have more questions , any help/suggestions would be greatly 
appreciated - thank you  

Sent from Yahoo Mail on Android  


Re: Python Client for Solr Cloud - Leader aware

2019-03-01 Thread Walter Underwood
There is no guarantee that sending an update to a non-leader node is slower. It 
certainly seems like a bad idea, but forwarding a document is fast and indexing 
a document is slow, so it might not even be measurable.

We’ve indexed a million docs per minute by sending all updates to the load 
balancer for the cluster, ignoring shards or leaders.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 1, 2019, at 9:34 AM, Jason Gerlowski  wrote:
> 
> Hi Ganesh,
> 
> I'm not an expert on pysolr, but from a quick scan of their update
> code, it does look like pysolr attempts to send update requests to _a_
> leader node for a particular collection.  But that's all it does.  It
> doesn't check which shard the document(s) will belong to and try to
> pick the _correct_ leader. If your collections only have 1 shard, this
> is still pretty great.  But if your collections have multiple shards
> (and multiple leaders), then this will perform worse than SolrJ.
> 
> (This is based on what I gleaned from the code here:
> https://github.com/django-haystack/pysolr/blob/master/pysolr.py#L1268
> . Happy to be corrected by someone with more context.)
> 
> Best,
> 
> Jason
> 
> On Tue, Feb 26, 2019 at 1:50 PM Ganesh Sethuraman
>  wrote:
>> 
>> We are using Solr Cloud 7.2.1. Is there a leader aware python client (like
>> SolrJ for Java), which can send the updates to the leader and it its highly
>> available?
>> I see PySolr https://pypi.org/project/pysolr/ project, not able to find any
>> documentation if it supports leader aware updates.
>> 
>> Regards
>> Ganesh



Re: Solr Reference Guide for version 7.7

2019-03-01 Thread Jason Gerlowski
Hi Edwin,

I volunteered to release the 7.7 ref-guide last week but decided to
wait until 7.7.1 came out to work on it.  (You probably know that
7.7.0 contained some serious bugs.  These would've required
non-trivial documentation effort in the ref-guide, and 7.7.1 already
had a release-manager and was coming soon, so it was simpler to wait.)

I'm back working on the 7.7 ref-guide today and hopefully we'll have
one out next week.  In the meantime, if you'd like to have the latest
documentation you can always check out the source code and build the
ref-guide locally ("ant clean default" from the solr/solr-ref-guide
directory, see the README in that same directory for more help)

Best,

Jason

On Thu, Feb 28, 2019 at 11:05 PM Zheng Lin Edwin Yeo
 wrote:
>
> Hi,
>
> Understand that Solr 7.7.1 has just been released, but Solr 7.7.0 has been
> released almost a month ago.
>
> However, from http://lucene.apache.org/solr/guide/, I still could not
> access the guide for version 7.7, the latest version is still 7.6.
>
> Is there any plans to release the guide for 7.7, or has the site been
> shifted to a new URL?
>
> Regards,
> Edwin


Re: Question on Solr/WordPress Integration

2019-03-01 Thread Shawn Heisey

On 3/1/2019 10:25 AM, Paul Buiocchi wrote:

I have a couple of questions about Solr /Wordpress integration -


You would need to talk to the person who wrote the plugin for Wordpress 
that integrates with Solr.  If they indicate that a question can only be 
answered by the Solr project, then bring that to us.



I am putting together an old newspaper archive site . about 25k pdf files that 
are full txt searchable.


If you want Solr to index your PDF documents, you would have to use 
SolrCell, also known as the Extracting Request Handler.


We strongly recommend that this functionality should never be used in 
production.  The reason is that the underlying technology, Apache Tika, 
can crash when given certain input.  PDF documents are more likely than 
other kinds to cause this problem.  If Tika crashes when it is being run 
inside Solr, then Solr will also crash.



Questions on architecture:
1) Is there a way for Solr to index from a local file structure i.e local 
drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
Wordpress/Solr integration , I found that I had to upload the documents in 
Wordpress to get Solr to recognize them .


Yes, you can index just about anything you like if you are willing to 
create the configuration and the software to do it.  But in order for 
Wordpress to understand that data, it most likely would have to be done 
through Wordpress.


Thanks,
Shawn


Re: Python Client for Solr Cloud - Leader aware

2019-03-01 Thread Jason Gerlowski
Hi Ganesh,

I'm not an expert on pysolr, but from a quick scan of their update
code, it does look like pysolr attempts to send update requests to _a_
leader node for a particular collection.  But that's all it does.  It
doesn't check which shard the document(s) will belong to and try to
pick the _correct_ leader. If your collections only have 1 shard, this
is still pretty great.  But if your collections have multiple shards
(and multiple leaders), then this will perform worse than SolrJ.

(This is based on what I gleaned from the code here:
https://github.com/django-haystack/pysolr/blob/master/pysolr.py#L1268
. Happy to be corrected by someone with more context.)

Best,

Jason

On Tue, Feb 26, 2019 at 1:50 PM Ganesh Sethuraman
 wrote:
>
> We are using Solr Cloud 7.2.1. Is there a leader aware python client (like
> SolrJ for Java), which can send the updates to the leader and it its highly
> available?
> I see PySolr https://pypi.org/project/pysolr/ project, not able to find any
> documentation if it supports leader aware updates.
>
> Regards
> Ganesh


Question on Solr/WordPress Integration

2019-03-01 Thread Paul Buiocchi
Greetings, 

I have a couple of questions about Solr /Wordpress integration - 

First , I am not "committed to using WordPress as a front end. If there is a 
better front end option , I would be willing to convert. For functionality , 
all I am looking for is the ability to full txt search , highlight the search 
terms in the search results  It should be pretty simple , maybe I am 
overanalyzing it  ...Looking for as much "out of the box" as possible 

My scenario is this: 

I am putting together an old newspaper archive site . about 25k pdf files that 
are full txt searchable. 

Questions on architecture: 
1) Is there a way for Solr to index from a local file structure i.e local 
drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
Wordpress/Solr integration , I found that I had to upload the documents in 
Wordpress to get Solr to recognize them . 

I'm sure I will have more questions , any help/suggestions would be greatly 
appreciated - thank you  

Sent from Yahoo Mail on Android

Re: Giving SolrJ credentials for Zookeeper

2019-03-01 Thread Jason Gerlowski
Hi Ryan,

I haven't tried this myself, but wanted to offer a sanity check based
on how I understand those instructions.

Are you setting the "zkCredentialsProvider", "zkDigestUsername", and
"zkDigestPassword" system-properties on your client app/process as
well as on your Solr/ZK servers?  Or are you just setting it in the
config for your Solr/ZK servers?  I expect those system properties
need to be set for the client process as well, though the ref-guide
page doesn't explicitly say so.

Best,

Jason

On Tue, Feb 26, 2019 at 12:56 PM Snead, Ryan [USA]  wrote:
>
> I am following along with the example found in Zookeeper Access Control of 
> the Apache Solr 7.5 Reference Guide. I have gotten to the point where I can 
> use the zkcli.sh control script to access my secured Zookeeper environment. I 
> can also connect using Zookeeper's zkCli.sh and then authenticate using the 
> auth command. The point where I run into trouble is having completed the 
> steps in the article, how do I find what parameters to set with SolrJ to 
> allow my indexer code to communicate with Zookeeper.
>
> The error my Java code is returning when I try to process a QueryRequest is: 
> Error reading cluster properties from zookeeper 
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperError Code = 
> NoAuth for /clusterprops.json
>
> My code is:
> solrClient = new CloudSolrClient.Builder("localhost:2181", 
> Optional.of("/")).build();
> String solrQuery = String.format("PRODUCT_TYPE:USER and PRODUCT_SK:%s", 
> productSk);
> SolrQuery q = new SolrQuery();
> q.set("q", solrQuery);
> QueryRequest request = new QueryRequest(q);
> numfound = request.process(solrClient).getResults().getNumFound();
> Error occurs at the last line. I suspect that I need to set a property in 
> solrClient, but it is not clear to me what that would be.
>
> References:
> https://lucene.apache.org/solr/guide/7_5/zookeeper-access-control.html
> ZooKeeper Access Control | Apache Solr Reference Guide 
> 7.5
> Content stored in ZooKeeper is critical to the operation of a SolrCloud 
> cluster. Open access to SolrCloud content on ZooKeeper could lead to a 
> variety of problems.
> lucene.apache.org
>
>


Re: %solr_logs_dir% does not like spaces

2019-03-01 Thread Jason Gerlowski
+1 to submitting a JIRA, even if you cannot find an edit to solr.cmd
to fix the issue.

And +1 to the issue likely just being a lack of double-quotes around
the reference to SOLR_LOG_DIR.

Best,

Jason Gerlowski

On Tue, Feb 26, 2019 at 11:56 AM Erick Erickson  wrote:
>
> If you can munge the solr.cmd file and it works for you, _please_ submit a 
> JIRA and a patch!
>
> most of the Solr devs develop on *nix boxes, so this kind of thing creeps in 
> and we need to fix it.
>
> Best,
> Erick
>
> > On Feb 26, 2019, at 6:38 AM, paul.d...@ub.unibe.ch wrote:
> >
> > Perhaps the instances of %SOLR_LOGS_DIR% in the solr.cmd files should be 
> > quoted i.e. "%SOLR_LOGS_DIR%" ??
> >
> >
> >
> > Gesendet von Mail für 
> > Windows 10
> >
> >
> >
> > Von: Arturas Mazeika
> > Gesendet: Dienstag, 26. Februar 2019 15:10
> > An: solr-user@lucene.apache.org
> > Betreff: Re: %solr_logs_dir% does not like spaces
> >
> >
> >
> > Hi Paul,
> >
> > getting rid of space in "program files" is doable, you are right. One way
> > to do it is through
> >
> >   - echo %programfiles% ==> C:\Program Files
> >   - echo %programfiles(x86)% ==> C:\Program Files (x86)
> >
> > Getting rid of spaces in sub directories is very difficult as we use tons
> > of those for different components of our suite.
> >
> > Any other options to set it in some XML file or something?
> >
> > Cheers,
> > Arturas
> >
> >
> > On Tue, Feb 26, 2019 at 3:03 PM  wrote:
> >
> >> Looks like a bug in solr.cmd. You could try eliminating the spaces and/or
> >> opening an issue.
> >>
> >>
> >>
> >> Instead of ‘Program Files (x86)’ use ‘PROGRA~2’
> >>
> >> And don’t have spaces in your subdirectory…
> >>
> >>
> >>
> >> NB: Depending on your Windows Version you may Have another alias for
> >> ‘Program Files (x86)’; use «dir /X» to view the aliases.
> >>
> >>
> >>
> >> Gesendet von Mail für
> >> Windows 10
> >>
> >>
> >>
> >> Von: Arturas Mazeika
> >> Gesendet: Dienstag, 26. Februar 2019 14:41
> >> An: solr-user@lucene.apache.org
> >> Betreff: %solr_logs_dir% does not like spaces
> >>
> >>
> >>
> >> Hi All,
> >>
> >> I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
> >> into a subdirectory that contains spaces of a directory that contains
> >> spaces.
> >>
> >> If I set on windows:
> >>
> >> setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"
> >>
> >> and start a solr instance:
> >>
> >> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> >> F:\solr_deployment\solr_data -m 1g
> >>
> >> this goes smoothly.
> >>
> >> However If I set the logging directory to:
> >>
> >> setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
> >> Directory\logs\solr"
> >>
> >> then I get a cryptic error:
> >>
> >> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> >> F:\solr_deployment\solr_data -m 1g
> >> Files was unexpected at this time.
> >>
> >> If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that
> >> it
> >> dies around those lines in solr.cmd:
> >>
> >> F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
> >> Files was unexpected at this time.
> >>
> >> In the solr.cmd the following block is shown:
> >>
> >> IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks
> >>
> >> @REM This is quite hacky, but examples rely on a different log4j2.xml
> >> @REM so that we can write logs for examples to %SOLR_HOME%\..\logs
> >> IF [%SOLR_LOGS_DIR%] == [] (
> >>  set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
> >> ) ELSE (
> >>  set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
> >> )
> >>
> >> comments?
> >>
> >> Cheers,
> >> Arturas
> >>
>


Code review for SOLR related changes.

2019-03-01 Thread Fiz Ahmed
Hi Solr Experts,

Can you please suggest Code review techniques for SOLR related changes in a
Project.


Thanks
FIZ
AML Team.


Re: Spring Boot Solr+ Kerberos+ Ambari

2019-03-01 Thread Jason Gerlowski
Hi Rushikesh,

Solr's Kerberos authentication is completely independent of Ranger.
You can set it up to use Ranger, as is common with Hortonworks HDP,
but it's also possible to setup Kerberos+Solr without Ranger in the
picture at all.  I haven't come across a concise explanation of _how_
to do this within Ambari online anywhere.  But there are several
useful resources for configuring Solr+Kerberos outside Ambari that
apply just as well in an Ambari/HDP environment.  (Since Ambari gives
users ultimate flexibility in configuring components, almost anything
you do outside of Ambari can be done inside Ambari)   See:

https://github.com/chatman/solr-kerberos-docker - a set of demo docker
containers that run a KDC and configure Solr to use it.  A helpful
starting place for configuration.
https://lucene.apache.org/solr/guide/7_6/kerberos-authentication-plugin.html
- Solr's documentation on enabling Kerberos.

That should help you get Kerberos configured.  If your questions are
more around how to do particular operations or how to change
particular configuration options in the Ambari UI, those questions are
better addressed to the Ambari mailing lists or by Hortonworks
support.

Hope that helps,

Jason

On Tue, Feb 26, 2019 at 7:58 AM Rushikesh Garadade
 wrote:
>
> Hi,
> Thanks for the links. I have followed these steps earlier as well, however
> I did not excuted  steps from Ranger as I don't want authorization.
>  I didn't get any success.
>
> Thats why My question is
> *Is Ranger mandatory when you just want authentication with Kerberos?*
>
>
> Thank you,
> Rushikesh Garadade
>
> On Thu, Feb 21, 2019, 6:34 PM Furkan KAMACI  wrote:
>
> > Hi,
> >
> > You can also check here:
> >
> > https://community.hortonworks.com/articles/15159/securing-solr-collections-with-ranger-kerberos.html
> > On
> > the other hand, we have a section for Solr Kerberos at documentation:
> >
> > https://lucene.apache.org/solr/guide/6_6/kerberos-authentication-plugin.html
> > For
> > any Ambari specific questions, you can ask them at this forum:
> > https://community.hortonworks.com/topics/forum.html
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Thu, Feb 21, 2019 at 1:43 PM Rushikesh Garadade <
> > rushikeshgarad...@gmail.com> wrote:
> >
> > > Hi Furkan,
> > > I think the link you provided is for ranger audit setting, please correct
> > > me if wrong?
> > >
> > > I use HDP 2.6.5. which has Solr 5.6
> > >
> > > Thank you,
> > > Rushikesh Garadade
> > >
> > >
> > > On Thu, Feb 21, 2019, 2:57 PM Furkan KAMACI 
> > > wrote:
> > >
> > > > Hi Rushikesh,
> > > >
> > > > Did you check here:
> > > >
> > > >
> > >
> > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html
> > > >
> > > > By the way, which versions do you use?
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > > > On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade <
> > > > rushikeshgarad...@gmail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I am trying to set Kerberos for Solr which is installed on
> > Hortonworks
> > > > > Ambari.
> > > > >
> > > > > Q1. Is Ranger a mandatory component for Solr Kerberos configuration
> > on
> > > > > ambari.?
> > > > >
> > > > > I am getting little confused with documents available on internet for
> > > > this.
> > > > > I tried to do without ranger but not getting any success.
> > > > >
> > > > > If is there any good document for the same, please let me know.
> > > > >
> > > > > Thanks,
> > > > > Rushikesh Garadade.
> > > > >
> > > >
> > >
> >


Re: Old searcher to new searcher

2019-03-01 Thread Shawn Heisey

On 3/1/2019 4:42 AM, Amjad Khan wrote:

We are trying to extend AbstractSolrEventListener class and override 
newSearcher method. Was curious to know if we can copy the existing searcher 
cache to new searcher instead of executing the query receiving from 
solrconfig.. Because we are not sure what item was mostly searched.


In general, no.  Cache entries refer to documents by Lucene ID.  When a 
new searcher is opening, every single Lucene ID in the index might have 
changed.  That is why searches are re-executed -- the information in the 
existing caches cannot be trusted.  There is no way for Solr to 
determine when an existing Lucene ID has not changed.


Thanks,
Shawn


Re: Porter Stem filter and employing

2019-03-01 Thread Shawn Heisey

On 3/1/2019 4:38 AM, Marisol Redondo wrote:

When using the PorterStemFilter, I saw that the work "employing" is change
to "emploi" and my document is not found in the query to solr because of
that.

This also happens with other words that finish in -ying as annoying or
deploying.

It there any path for this filter or should I create a new Jira issue?



When you are using a stemming filter, you will need to use the same 
filter on both the query analysis and the index analysis, so that 
similar words are stemmed to the same root in both cases, leading to 
matches.


If the other steps in your analysis chain are changing the words so that 
the stemming filter cannot recognize the word, that might also cause 
problems.


Thanks,
Shawn


RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread Dyer, James
Instead of dataConfig=data-config.xml, use config=data-config.xml .

From: sami 
Sent: Friday, March 1, 2019 3:05 AM
To: solr-user@lucene.apache.org
Subject: RE: Index database with SolrJ using xml file directly throws an error

Hi James,

Thanks for your reply. I am not absolotuely sure I understood everything
correctly here. I would like to index my database to start with fresh index.
I have already done it with DIH execute function.

>

It works absolutely fine. But, I want to use SolrJ API instead of using the
inbuilt execute function. The data-config.xml and solrconfig.xml works fine
with my database.

I am using the same data-config.xml file and solrconfig.xml file to do the
indexing with program mentioned in my query.

String url = "http://localhost:8983/solr/test;;
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig","data-config.xml"); *I tried this too. as you
suggested not to use full path. *
server.query(params);

I checked the xml file for any bogus characters too. BUT the same files work
fine with inbuilt DIH not with the code. What it could be?



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: cve-2017-

2019-03-01 Thread Jeff Courtade
Thank you very much

On Fri, Mar 1, 2019 at 12:24 AM Tomás Fernández Löbbe 
wrote:

> I updated the description of SOLR-12770
>  a bit. The problem
> stated is that, since the "shards" parameter allows any URL, someone could
> make an insecure Solr instance hit some other (secure) web endpoint. Solr
> would throw an exception, but the error may include information from such
> endpoint (parsing error). I don't believe this would allow access to a
> local file (though, if you know of a way, please report to
> secur...@lucene.apache.org)
>
> The only way to know (to my knowledge) if your Solr instance was affected
> is by looking at your Solr logs. If you log queries, you should be able to
> see what's being included in the "shards" parameter and detect something
> that's not looking right. Also, if Solr is fooled to hit some other
> endpoint, it would fail with a parsing error, so you should probably see
> exceptions in your logs. The worst case, I guess, depends on how much
> access the Solr process has and how much damage it can cause to an adjacent
> web endpoint via a GET request.
>
> Note that this can only impact you if your Solr instance can be directly
> accessed by untrusted sources.
>
> HTH
>
> On Thu, Feb 28, 2019 at 11:54 AM Jeff Courtade 
> wrote:
>
> > This particular cve came out in the mailing list. Fed 12th
> >
> >
> > CVE-2017-3164 SSRF issue in Apache Solr
> >
> >  I need to know what the exploit for this could be?
> >
> >
> > can a user send a bogus shards param via a web request and get a local
> > file?
> >
> >
> > What does an attack vector look like for this?
> >
> >
> > I am being asked specifically this...
> >
> >
> > -  How would we know if the vulnerability in the Solr CVE was
> > taking advantage of? What are signs of us being exploited? What is the
> > worst case scenario with this CVE?
> >
> > Could someone help me answer this please?
> >
> >
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/www-announce/201902.mbox/%3CCAECwjAVjBN=wO5rYs6ktAX-5=-f5jdfwbbtsm2ttjebgo5j...@mail.gmail.com%3E
> >
> >
> >
> > the bug is
> >
> >
> >
> > https://issues.apache.org/jira/browse/SOLR-12770
> >
> >
> >
> > the mitigation is upgrading to solr 7.7
> >
>


Porter Stem filter and employing

2019-03-01 Thread Marisol Redondo
Hi.

When using the PorterStemFilter, I saw that the work "employing" is change
to "emploi" and my document is not found in the query to solr because of
that.

This also happens with other words that finish in -ying as annoying or
deploying.

It there any path for this filter or should I create a new Jira issue?

Thanks


Old searcher to new searcher

2019-03-01 Thread Amjad Khan
We are trying to extend AbstractSolrEventListener class and override 
newSearcher method. Was curious to know if we can copy the existing searcher 
cache to new searcher instead of executing the query receiving from 
solrconfig.. Because we are not sure what item was mostly searched.

Will appreciate the help.

Thanks

RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread sami
Hi James,

Thanks for your reply. I am not absolotuely sure I understood everything
correctly here. I would like to index my database to start with fresh index.
I have already done it with DIH execute function. 

 

It works absolutely fine. But, I want to use SolrJ API instead of using the
inbuilt execute function. The data-config.xml and solrconfig.xml works fine
with my database. 

I am using the same data-config.xml file and solrconfig.xml file to do the
indexing with program mentioned in my query. 

String url = "http://localhost:8983/solr/test;;
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig","data-config.xml");  *I tried this too. as you
suggested not to use full path. *
server.query(params); 

I checked the xml file for any bogus characters too. BUT the same files work
fine with inbuilt DIH not with the code. What it could be? 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: MLT and facetting

2019-03-01 Thread Martin Frank Hansen (MHQ)
Hi Walter, 

Thanks for your answer, it makes sense. 

Best regards
Martin


Internal - KMD A/S

-Original Message-
From: Walter Underwood  
Sent: 1. marts 2019 03:30
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

The last time I looked, the MLT was a search handler but not a search 
component. It wasn’t able to be combined with other features. The handler is 
based on very old code, like 1.3.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 28, 2019, at 5:47 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Martin,
> 
> I have no idea on this, as the case has not been active for almost 2 years.
> Maybe I can try to follow up.
> 
> Faceting by default will show the list according to the number of 
> occurrences. But I'm not sure how it will affect the MLT score or how 
> it will be output when combine together, as it is not working 
> currently and there is no way to test.
> 
> Regards,
> Edwin
> 
> On Thu, 28 Feb 2019 at 14:51, Martin Frank Hansen (MHQ)  wrote:
> 
>> Hi Edwin,
>> 
>> Ok that is nice to know. Do you know when this bug will get fixed?
>> 
>> By ordering I mean that MLT score the documents according to its 
>> similarity function (believe it is cosine similarity), and I don’t 
>> know how faceting will affect this score? Or ignore it all together?
>> 
>> Best regards
>> 
>> Martin
>> 
>> 
>> Internal - KMD A/S
>> 
>> -Original Message-
>> From: Zheng Lin Edwin Yeo 
>> Sent: 28. februar 2019 06:19
>> To: solr-user@lucene.apache.org
>> Subject: Re: MLT and facetting
>> 
>> Hi Martin,
>> 
>> According to the JIRA, it says it is a bug, as it was working 
>> previously in Solr 4. I have not tried Solr 4 before, so I'm not sure how it 
>> works.
>> 
>> For the ordering of the documents, do you mean to sort them according 
>> to the criteria that you want?
>> 
>> Regards,
>> Edwin
>> 
>> On Wed, 27 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
>> wrote:
>> 
>>> Hi Edwin,
>>> 
>>> Thanks for your response. Are you sure it is a bug? Or is it not 
>>> meant to work together?
>>> After doing some thinking I do see a problem faceting a MLT-result.
>>> MLT-results have a clear ordering of the documents which will be 
>>> hard to maintain with facets. How will faceting MLT-results deal 
>>> with the ordering of the documents? Will the ordering just be ignored?
>>> 
>>> Best regards
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>> Internal - KMD A/S
>>> 
>>> -Original Message-
>>> From: Zheng Lin Edwin Yeo 
>>> Sent: 27. februar 2019 03:38
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: MLT and facetting
>>> 
>>> Hi Martin,
>>> 
>>> I also get the same problem in Solr 7.7 if I turn on faceting in 
>>> /mlt requestHandler.
>>> 
>>> Found this issue in the JIRA:
>>> https://issues.apache.org/jira/browse/SOLR-7883
>>> Seems like it is a bug in Solr and it has not been resolved yet.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> On Tue, 26 Feb 2019 at 21:03, Martin Frank Hansen (MHQ) 
>>> wrote:
>>> 
 Hi Edwin,
 
 Here it is:
 
 
 
 
 
 -
 
 
 -
 
 text
 
 1
 
 1
 
 true
 
 
 
 
 
 
 Internal - KMD A/S
 
 -Original Message-
 From: Zheng Lin Edwin Yeo 
 Sent: 26. februar 2019 08:24
 To: solr-user@lucene.apache.org
 Subject: Re: MLT and facetting
 
 Hi Martin,
 
 What is your setting in your /mlt requestHandler in solrconfig.xml?
 
 Regards,
 Edwin
 
 On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
 
 wrote:
 
> Hi Edwin,
> 
> Thanks for your response.
> 
> Yes you are right. It was simply the search parameters from Solr.
> 
> The query looks like this:
> 
> http://
> .../solr/.../mlt?df=text=Journalnummer=on=id,
> Jo
> ur
> nalnummer=id:*6512815*
> 
> best regards,
> 
> Martin
> 
> 
> Internal - KMD A/S
> 
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 26. februar 2019 03:54
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
> 
> Hi Martin,
> 
> I think there are some pictures which are not being sent through 
> in the email.
> 
> Do send your query that you are using, and which version of Solr 
> you are using?
> 
> Regards,
> Edwin
> 
> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> 
> wrote:
> 
>> Hi,
>> 
>> 
>> 
>> I am trying to combine the mlt functionality with facets, but 
>> Solr throws
>> org.apache.solr.common.SolrException: ":"Unable to compute facet 
>> ranges, facet context is not set".
>> 
>> 
>> 
>> What I am trying to do is quite simple, find similar documents 
>> using mlt and group these using the facet parameter. When using 
>> mlt and facets separately everything works 

RE: MLT and facetting

2019-03-01 Thread Martin Frank Hansen (MHQ)
Hi Dave, 

The problem is that we have different levels of metadata and documents. 
The documents are arranged such that we have a case for which there are 
multiple documents (files). When we use the mlt function, we do it on 
file-level, but it needs to be displayed at case level, which means that we 
need to group files that are connected to the same case. 

Hope this makes sense. 


Internal - KMD A/S

-Original Message-
From: Dave  
Sent: 1. marts 2019 02:51
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

I’m more curious what you’d expect to see, and what possible benefit you could 
get from it

> On Feb 28, 2019, at 8:48 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Martin,
> 
> I have no idea on this, as the case has not been active for almost 2 years.
> Maybe I can try to follow up.
> 
> Faceting by default will show the list according to the number of 
> occurrences. But I'm not sure how it will affect the MLT score or how 
> it will be output when combine together, as it is not working 
> currently and there is no way to test.
> 
> Regards,
> Edwin
> 
>> On Thu, 28 Feb 2019 at 14:51, Martin Frank Hansen (MHQ)  wrote:
>> 
>> Hi Edwin,
>> 
>> Ok that is nice to know. Do you know when this bug will get fixed?
>> 
>> By ordering I mean that MLT score the documents according to its 
>> similarity function (believe it is cosine similarity), and I don’t 
>> know how faceting will affect this score? Or ignore it all together?
>> 
>> Best regards
>> 
>> Martin
>> 
>> 
>> Internal - KMD A/S
>> 
>> -Original Message-
>> From: Zheng Lin Edwin Yeo 
>> Sent: 28. februar 2019 06:19
>> To: solr-user@lucene.apache.org
>> Subject: Re: MLT and facetting
>> 
>> Hi Martin,
>> 
>> According to the JIRA, it says it is a bug, as it was working 
>> previously in Solr 4. I have not tried Solr 4 before, so I'm not sure how it 
>> works.
>> 
>> For the ordering of the documents, do you mean to sort them according 
>> to the criteria that you want?
>> 
>> Regards,
>> Edwin
>> 
>> On Wed, 27 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
>> wrote:
>> 
>>> Hi Edwin,
>>> 
>>> Thanks for your response. Are you sure it is a bug? Or is it not 
>>> meant to work together?
>>> After doing some thinking I do see a problem faceting a MLT-result.
>>> MLT-results have a clear ordering of the documents which will be 
>>> hard to maintain with facets. How will faceting MLT-results deal 
>>> with the ordering of the documents? Will the ordering just be ignored?
>>> 
>>> Best regards
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>> Internal - KMD A/S
>>> 
>>> -Original Message-
>>> From: Zheng Lin Edwin Yeo 
>>> Sent: 27. februar 2019 03:38
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: MLT and facetting
>>> 
>>> Hi Martin,
>>> 
>>> I also get the same problem in Solr 7.7 if I turn on faceting in 
>>> /mlt requestHandler.
>>> 
>>> Found this issue in the JIRA:
>>> https://issues.apache.org/jira/browse/SOLR-7883
>>> Seems like it is a bug in Solr and it has not been resolved yet.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> On Tue, 26 Feb 2019 at 21:03, Martin Frank Hansen (MHQ) 
>>> wrote:
>>> 
 Hi Edwin,
 
 Here it is:
 
 
 
 
 
 -
 
 
 -
 
 text
 
 1
 
 1
 
 true
 
 
 
 
 
 
 Internal - KMD A/S
 
 -Original Message-
 From: Zheng Lin Edwin Yeo 
 Sent: 26. februar 2019 08:24
 To: solr-user@lucene.apache.org
 Subject: Re: MLT and facetting
 
 Hi Martin,
 
 What is your setting in your /mlt requestHandler in solrconfig.xml?
 
 Regards,
 Edwin
 
 On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
 
 wrote:
 
> Hi Edwin,
> 
> Thanks for your response.
> 
> Yes you are right. It was simply the search parameters from Solr.
> 
> The query looks like this:
> 
> http://
> .../solr/.../mlt?df=text=Journalnummer=on=id,
> Jo
> ur
> nalnummer=id:*6512815*
> 
> best regards,
> 
> Martin
> 
> 
> Internal - KMD A/S
> 
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 26. februar 2019 03:54
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
> 
> Hi Martin,
> 
> I think there are some pictures which are not being sent through 
> in the email.
> 
> Do send your query that you are using, and which version of Solr 
> you are using?
> 
> Regards,
> Edwin
> 
> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> 
> wrote:
> 
>> Hi,
>> 
>> 
>> 
>> I am trying to combine the mlt functionality with facets, but 
>> Solr throws
>> org.apache.solr.common.SolrException: ":"Unable to compute facet 
>> ranges, facet context is not set".
>> 
>> 
>> 
>> What I am trying to do is quite simple, find similar documents 

RE: MLT and facetting

2019-03-01 Thread Martin Frank Hansen (MHQ)
Hi Edwin, 

Thanks for your time, much appreciated. 

Best regards 
Martin


Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo  
Sent: 1. marts 2019 02:48
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

Hi Martin,

I have no idea on this, as the case has not been active for almost 2 years.
Maybe I can try to follow up.

Faceting by default will show the list according to the number of occurrences. 
But I'm not sure how it will affect the MLT score or how it will be output when 
combine together, as it is not working currently and there is no way to test.

Regards,
Edwin

On Thu, 28 Feb 2019 at 14:51, Martin Frank Hansen (MHQ)  wrote:

> Hi Edwin,
>
> Ok that is nice to know. Do you know when this bug will get fixed?
>
> By ordering I mean that MLT score the documents according to its 
> similarity function (believe it is cosine similarity), and I don’t 
> know how faceting will affect this score? Or ignore it all together?
>
> Best regards
>
> Martin
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 28. februar 2019 06:19
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
>
> Hi Martin,
>
> According to the JIRA, it says it is a bug, as it was working 
> previously in Solr 4. I have not tried Solr 4 before, so I'm not sure how it 
> works.
>
> For the ordering of the documents, do you mean to sort them according 
> to the criteria that you want?
>
> Regards,
> Edwin
>
> On Wed, 27 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
> wrote:
>
> > Hi Edwin,
> >
> > Thanks for your response. Are you sure it is a bug? Or is it not 
> > meant to work together?
> > After doing some thinking I do see a problem faceting a MLT-result.
> > MLT-results have a clear ordering of the documents which will be 
> > hard to maintain with facets. How will faceting MLT-results deal 
> > with the ordering of the documents? Will the ordering just be ignored?
> >
> > Best regards
> >
> > Martin
> >
> >
> >
> > Internal - KMD A/S
> >
> > -Original Message-
> > From: Zheng Lin Edwin Yeo 
> > Sent: 27. februar 2019 03:38
> > To: solr-user@lucene.apache.org
> > Subject: Re: MLT and facetting
> >
> > Hi Martin,
> >
> > I also get the same problem in Solr 7.7 if I turn on faceting in 
> > /mlt requestHandler.
> >
> > Found this issue in the JIRA:
> > https://issues.apache.org/jira/browse/SOLR-7883
> > Seems like it is a bug in Solr and it has not been resolved yet.
> >
> > Regards,
> > Edwin
> >
> > On Tue, 26 Feb 2019 at 21:03, Martin Frank Hansen (MHQ) 
> > wrote:
> >
> > > Hi Edwin,
> > >
> > > Here it is:
> > >
> > >
> > > 
> > >
> > >
> > > -
> > >
> > >
> > > -
> > >
> > > text
> > >
> > > 1
> > >
> > > 1
> > >
> > > true
> > >
> > > 
> > >
> > > 
> > >
> > >
> > > Internal - KMD A/S
> > >
> > > -Original Message-
> > > From: Zheng Lin Edwin Yeo 
> > > Sent: 26. februar 2019 08:24
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: MLT and facetting
> > >
> > > Hi Martin,
> > >
> > > What is your setting in your /mlt requestHandler in solrconfig.xml?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
> > > 
> > > wrote:
> > >
> > > > Hi Edwin,
> > > >
> > > > Thanks for your response.
> > > >
> > > > Yes you are right. It was simply the search parameters from Solr.
> > > >
> > > > The query looks like this:
> > > >
> > > > http://
> > > > .../solr/.../mlt?df=text=Journalnummer=on=i
> > > > d,
> > > > Jo
> > > > ur
> > > > nalnummer=id:*6512815*
> > > >
> > > > best regards,
> > > >
> > > > Martin
> > > >
> > > >
> > > > Internal - KMD A/S
> > > >
> > > > -Original Message-
> > > > From: Zheng Lin Edwin Yeo 
> > > > Sent: 26. februar 2019 03:54
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: MLT and facetting
> > > >
> > > > Hi Martin,
> > > >
> > > > I think there are some pictures which are not being sent through 
> > > > in the email.
> > > >
> > > > Do send your query that you are using, and which version of Solr 
> > > > you are using?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > > On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> > > > 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >
> > > > >
> > > > > I am trying to combine the mlt functionality with facets, but 
> > > > > Solr throws
> > > > > org.apache.solr.common.SolrException: ":"Unable to compute 
> > > > > facet ranges, facet context is not set".
> > > > >
> > > > >
> > > > >
> > > > > What I am trying to do is quite simple, find similar documents 
> > > > > using mlt and group these using the facet parameter. When 
> > > > > using mlt and facets separately everything works fine, but not 
> > > > > when combining the
> > > > functionality.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > {
> > > > >
> > > > >   "responseHeader":{
> > > > >
> > > > > "status":500,
> > > > >
> > > > > "QTime":109},
> > > > >
> > > > >   

Re: Custom search in own SearchComponent

2019-03-01 Thread Mikhail Khludnev
Just a guess
QueryParser might be a Lucene class, which isn;t aware of Solr scheme and
hence might not properly convert term to number or point "product_type:106"
Check the particular query with Solr fq, then use QParser for parsing. See
QueryComponent as a sample.

On Fri, Mar 1, 2019 at 12:59 PM Moritz Schmidt 
wrote:

> Hi there,
>
> I’m working on a custom SearchComponent to add some docs to the response
> with another filter query than in the original request.
> Here is what I currently have:
>
> public void process(ResponseBuilder rb) throws IOException {
> if(rb.getResults() != null) {
>   QueryParser queryParser = new QueryParser("name", new
> StandardAnalyzer());
>   Query query = null;
>   try {
> query = queryParser.parse("text_tg:" + rb.getQueryString());
>   } catch (ParseException e) {
>   e.printStackTrace();
>   }
>
>   TopDocs additionalDocs = rb.req.getSearcher().search(query, 5);
>   ScoreDoc[] scoreDocs = additionalDocs.scoreDocs;
>   System.out.println(additionalDocs.scoreDocs.length);
>   ArrayList specialDocs = new ArrayList();
>
>   for(int i = 0; i < scoreDocs.length; i++) {
> Document doc = rb.req.getSearcher().doc(scoreDocs[i].doc);
> specialDocs.add(doc);
>   }
>
>   rb.rsp.add("special_responses", specialDocs);
> }
>   }
>
> This works fine but doesn’t filter the results.
> As soon as I add something like ‘ AND product_type:106’ in the query, use
> getDocList(Query query, Query filter, Sort lsort, int offset, int len) or a
> BooleanQuery a la:
> Query query = null;
> Query filterQuery = null;
> try {
>   filterQuery = queryParser.parse("product_type:106");
>   query = queryParser.parse("text_tg:" + rb.getQueryString());
> } catch (ParseException e) {
> e.printStackTrace();
> }
>
> BooleanClause queryClause = new BooleanClause(query,
> BooleanClause.Occur.SHOULD);
> BooleanClause filterClause = new BooleanClause(filterQuery,
> BooleanClause.Occur.FILTER);
>
> BooleanQuery.Builder builder = new BooleanQuery.Builder();
> builder.add(queryClause);
> builder.add(filterClause);
> BooleanQuery bQuery = builder.build();
>
> TopDocs additionalDocs = rb.req.getSearcher().search(bQuery, 5);
>
> I get 0 results.
>
> Does anyone know what I am doing wrong?
>
> Thanks in advance and best regards,
>
> Moritz Schmidt
>
> Spektrum Kompakt - Themen auf den Punkt gebracht.
> www.spektrum.de/kompakt 
> 
>
> Spektrum der Wissenschaft Verlagsgesellschaft mbH
> Sitz Heidelberg
> Registergericht Mannheim, HRB 338114
> Geschaeftsfuehrer: Markus Bossle
> 
>
> Spektrum der Wissenschaft was founded in 1978 as the German edition of
> Scientific American.
> It publishes several popular science magazines in print and digital and
> operates the biggest German language website on science news.
> Spektrum der Wissenschaft is part of SpringerNature.
>


-- 
Sincerely yours
Mikhail Khludnev


Custom search in own SearchComponent

2019-03-01 Thread Moritz Schmidt
Hi there,

I’m working on a custom SearchComponent to add some docs to the response with 
another filter query than in the original request.
Here is what I currently have:

public void process(ResponseBuilder rb) throws IOException {
if(rb.getResults() != null) {
  QueryParser queryParser = new QueryParser("name", new StandardAnalyzer());
  Query query = null;
  try {
query = queryParser.parse("text_tg:" + rb.getQueryString());
  } catch (ParseException e) {
  e.printStackTrace();
  }

  TopDocs additionalDocs = rb.req.getSearcher().search(query, 5);
  ScoreDoc[] scoreDocs = additionalDocs.scoreDocs;
  System.out.println(additionalDocs.scoreDocs.length);
  ArrayList specialDocs = new ArrayList();

  for(int i = 0; i < scoreDocs.length; i++) {
Document doc = rb.req.getSearcher().doc(scoreDocs[i].doc);
specialDocs.add(doc);
  }

  rb.rsp.add("special_responses", specialDocs);
}
  }

This works fine but doesn’t filter the results.
As soon as I add something like ‘ AND product_type:106’ in the query, use 
getDocList(Query query, Query filter, Sort lsort, int offset, int len) or a 
BooleanQuery a la:
Query query = null;
Query filterQuery = null;
try {
  filterQuery = queryParser.parse("product_type:106");
  query = queryParser.parse("text_tg:" + rb.getQueryString());
} catch (ParseException e) {
e.printStackTrace();
}

BooleanClause queryClause = new BooleanClause(query, 
BooleanClause.Occur.SHOULD);
BooleanClause filterClause = new BooleanClause(filterQuery, 
BooleanClause.Occur.FILTER);

BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(queryClause);
builder.add(filterClause);
BooleanQuery bQuery = builder.build();

TopDocs additionalDocs = rb.req.getSearcher().search(bQuery, 5);

I get 0 results.

Does anyone know what I am doing wrong?

Thanks in advance and best regards,

Moritz Schmidt

Spektrum Kompakt - Themen auf den Punkt gebracht.
www.spektrum.de/kompakt 


Spektrum der Wissenschaft Verlagsgesellschaft mbH
Sitz Heidelberg
Registergericht Mannheim, HRB 338114
Geschaeftsfuehrer: Markus Bossle


Spektrum der Wissenschaft was founded in 1978 as the German edition of 
Scientific American.
It publishes several popular science magazines in print and digital and 
operates the biggest German language website on science news.
Spektrum der Wissenschaft is part of SpringerNature.


Antwort: Re: Re: High CPU usage with Solr 7.7.0

2019-03-01 Thread Lukas Weiss
This is the information of the Thread Dump screen of the Solr web 
interface:

process reaper (8195)
java.util.concurrent.SynchronousQueue$TransferStack@23ec2c53

sun.misc.Unsafe.park​(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos​(LockSupport.java:215)
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill​(SynchronousQueue.java:460)
java.util.concurrent.SynchronousQueue$TransferStack.transfer​(SynchronousQueue.java:362)
java.util.concurrent.SynchronousQueue.poll​(SynchronousQueue.java:941)
java.util.concurrent.ThreadPoolExecutor.getTask​(ThreadPoolExecutor.java:1073)
java.util.concurrent.ThreadPoolExecutor.runWorker​(ThreadPoolExecutor.java:1134)
java.util.concurrent.ThreadPoolExecutor$Worker.run​(ThreadPoolExecutor.java:624)
java.lang.Thread.run​(Thread.java:748)
0.8959ms
0.ms

commitScheduler-14-thread-35 (8174)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll​(ScheduledThreadPoolExecutor.java:809)
java.util.concurrent.ThreadPoolExecutor.getTask​(ThreadPoolExecutor.java:1073)
java.util.concurrent.ThreadPoolExecutor.runWorker​(ThreadPoolExecutor.java:1134)
java.util.concurrent.ThreadPoolExecutor$Worker.run​(ThreadPoolExecutor.java:624)
java.lang.Thread.run​(Thread.java:748)
644010.9423ms
643930.ms

commitScheduler-16-thread-62 (8173)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll​(ScheduledThreadPoolExecutor.java:809)
java.util.concurrent.ThreadPoolExecutor.getTask​(ThreadPoolExecutor.java:1073)
java.util.concurrent.ThreadPoolExecutor.runWorker​(ThreadPoolExecutor.java:1134)
java.util.concurrent.ThreadPoolExecutor$Worker.run​(ThreadPoolExecutor.java:624)
java.lang.Thread.run​(Thread.java:748)
644831.4905ms
644740.ms

qtp1282287470-8051 (8051)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2d2bd65e

sun.misc.Unsafe.park​(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos​(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos​(AbstractQueuedSynchronizer.java:2078)
org.eclipse.jetty.util.BlockingArrayQueue.poll​(BlockingArrayQueue.java:392)
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll​(QueuedThreadPool.java:656)
org.eclipse.jetty.util.thread.QueuedThreadPool.access$800​(QueuedThreadPool.java:46)
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run​(QueuedThreadPool.java:720)
java.lang.Thread.run​(Thread.java:748)
14.5521ms
10.ms

qtp1282287470-8050 (8050)
sun.nio.ch.EPollArrayWrapper.epollWait​(Native Method)
sun.nio.ch.EPollArrayWrapper.poll​(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect​(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect​(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select​(SelectorImpl.java:97)
sun.nio.ch.SelectorImpl.select​(SelectorImpl.java:101)
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select​(ManagedSelector.java:423)
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce​(ManagedSelector.java:360)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask​(EatWhatYouKill.java:357)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce​(EatWhatYouKill.java:181)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce​(EatWhatYouKill.java:168)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run​(EatWhatYouKill.java:126)
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run​(ReservedThreadExecutor.java:366)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob​(QueuedThreadPool.java:765)
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run​(QueuedThreadPool.java:683)
java.lang.Thread.run​(Thread.java:748)
10.8397ms
10.ms

qtp1282287470-8049 (8049)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4a97a2f0

sun.misc.Unsafe.park​(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos​(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await​(AbstractQueuedSynchronizer.java:2163)
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait​(ReservedThreadExecutor.java:292)
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run​(ReservedThreadExecutor.java:357)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob​(QueuedThreadPool.java:765)
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run​(QueuedThreadPool.java:683)
java.lang.Thread.run​(Thread.java:748)
2.4610ms
0.ms

qtp1282287470-8047 (8047)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2d2bd65e

sun.misc.Unsafe.park​(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos​(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos​(AbstractQueuedSynchronizer.java:2078)
org.eclipse.jetty.util.BlockingArrayQueue.poll​(BlockingArrayQueue.java:392)
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll​(QueuedThreadPool.java:656)

Errors during solrcloud replication (7.7.x)

2019-03-01 Thread Karl Stoney
Hey all,
I’m looking for some support with replication errors we’re seeing in SolrCloud 
7.7.x (tried both .0 and .1).

I’ve created a StackOverflow issue:

We have errors in SolrCloud (7.7.1) during replication, which we can't 
understand.  We thought it may be 
https://issues.apache.org/jira/browse/SOLR-13255 or 
https://issues.apache.org/jira/browse/SOLR-13249 which is why we upgraded to 
7.7.1 but it’s still there.

On our currently elected leader, we see:
```
request: 
http://solr-1.search-solr.preprod.k8.atcloud.io:80/solr/at-uk_shard1_replica_n2/update?update.distrib=FROMLEADER=http%3A%2F%2Fsolr-2.search-solr.preprod.k8.atcloud.io%3A80%2Fsolr%2Fat-uk_shard1_replica_n1%2F=javabin=2
Remote error message: org.apache.solr.common.util.ByteArrayUtf8CharSequence 
cannot be cast to java.lang.String
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:385)
 ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan 
- 2019-02-23 02:39:09]
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:183)
 ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan 
- 2019-02-23 02:39:09]
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
 ~[metrics-core-3.2.6.jar:3.2.6]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
 ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan 
- 2019-02-23 02:39:09]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_191]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
```

And if you go an look in the logs for the replica, you see:
```
08:35:22.060 [qtp1540374340-20] ERROR org.apache.solr.servlet.HttpSolrCall - 
null:java.lang.ClassCastException: 
org.apache.solr.common.util.ByteArrayUtf8CharSequence cannot be cast to 
java.lang.String
at 
org.apache.solr.common.util.JavaBinCodec.readEnumFieldValue(JavaBinCodec.java:813)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:339)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278)
at 
org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:640)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:337)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278)
at 
org.apache.solr.common.util.JavaBinCodec.readMapEntry(JavaBinCodec.java:819)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:341)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:295)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:280)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:333)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:235)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:298)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:278)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:191)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:126)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:123)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
at