Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Arnold Bronley
Thanks, Jan.

The issue is created here -
https://github.com/docker-solr/docker-solr/issues/289


On Fri, Jan 31, 2020 at 7:13 PM Jan Høydahl  wrote:

> I poked around a bit and ended up reading this thead
> https://github.com/docker-library/openjdk/issues/320 <
> https://github.com/docker-library/openjdk/issues/320>
> Appears that the ‘openjdk’ official docker image is NOT the adoptopenjdk
> disto, but a RedHat-build binary build of the latest official openJDK
> project source code from upstream.
>
> The reason that the URL referenced in openjdk Dockerfile <
> https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36>
> seems to have AdoptOpenJDK in it is simply because that project has agreed
> to host upstream binaries as well as their own, see
> https://adoptopenjdk.net/upstream.html <
> https://adoptopenjdk.net/upstream.html>
>
> So to summarize:
>
> 1) Solr’s official image uses openjdk docker images
> 2) openjdk docker image uses a RedHat-built «upstream» build of Java
> 3) This means that we are NOT using AdoptOpenJDK, but «Oracle» openjdk,
> although not built and released by Oracle
>
> This is somewhat messy.
>
> In my opinion we should switch docker-solr to using
> adoptopenjdk:11-jre-hotspot as base image.
> Reason I don’t feel like going the Corretto path is that Corretto uses
> AmazonLinux which is RPM based, and they support fewer versions/arch than
> AdoptOpenJDK .
>
> Please open an issue in the docker-solr project and we’ll continue
> discussion there.
>
> Jan
>
> > 31. jan. 2020 kl. 23:29 skrev Chris Hostetter  >:
> >
> >
> > : Link to the issue was helpful.
> > :
> > : Although, when I take a look at Dockerfile for any Solr version from
> here
> > : https://github.com/docker-solr/docker-solr, the very first line says
> > : FROM openjdk...It
> > : does not say FROM adoptopenjdk. Am I missing something?
> >
> > Ahhh ... I have no idea, But at least now I better understand your
> > concern.
> >
> > I would suggest opening an issue / PR in the github:docker-solr repo ...
> > there are plans to eventually officially move managment of docker-solr
> > into the Apache Lucene/Solr project, but for now it's an independent
> > packaging effort...
> >
> > https://github.com/docker-solr/docker-solr/
> > https://github.com/docker-solr/docker-solr/issues/276
> >
> > ...in the meantime: If you can't use openjdk, then as far as i
> understand
> > how docker images work, you'd need to build your own using a patched
> > Dockerfile.
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
>
>


Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Arnold Bronley
Chris,

Link to the issue was helpful.

Although, when I take a look at Dockerfile for any Solr version from here
https://github.com/docker-solr/docker-solr, the very first line says
FROM openjdk...It
does not say FROM adoptopenjdk. Am I missing something?

On Fri, Jan 31, 2020 at 1:58 PM Chris Hostetter 
wrote:

>
> Just upgrade?
>
> This has been fixed in most recent versions of AdoptOpenJDK builds...
> https://github.com/AdoptOpenJDK/openjdk-build/issues/465
>
> hossman@slate:~$ java8
> hossman@slate:~$ java -XshowSettings:properties -version 2>&1 | grep -e
> vendor -e version
> java.class.version = 52.0
> java.runtime.version = 1.8.0_222-b10
> java.specification.vendor = Oracle Corporation
> java.specification.version = 1.8
> java.vendor = AdoptOpenJDK
> java.vendor.url = http://java.oracle.com/
> java.vendor.url.bug = http://bugreport.sun.com/bugreport/
> java.version = 1.8.0_222
> java.vm.specification.vendor = Oracle Corporation
> java.vm.specification.version = 1.8
> java.vm.vendor = AdoptOpenJDK
> java.vm.version = 25.222-b10
> os.version = 5.0.0-32-generic
> openjdk version "1.8.0_222"
>
>
> hossman@slate:~$ java11
> hossman@slate:~$ java -XshowSettings:properties -version 2>&1 | grep -e
> vendor -e version
> java.class.version = 55.0
> java.runtime.version = 11.0.4+11
> java.specification.vendor = Oracle Corporation
> java.specification.version = 11
> java.vendor = AdoptOpenJDK
> java.vendor.url = https://adoptopenjdk.net/
> java.vendor.url.bug =
> https://github.com/AdoptOpenJDK/openjdk-build/issues
> java.vendor.version = AdoptOpenJDK
> java.version = 11.0.4
> java.version.date = 2019-07-16
> java.vm.specification.vendor = Oracle Corporation
> java.vm.specification.version = 11
> java.vm.vendor = AdoptOpenJDK
> java.vm.version = 11.0.4+11
> os.version = 5.0.0-32-generic
> openjdk version "11.0.4" 2019-07-16
>
>
>
>
> : Date: Fri, 31 Jan 2020 12:45:36 -0500
> : From: Arnold Bronley 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Re: Oracle OpenJDK to Amazon Corretto OpenJDK
> :
> : Thanks for the helpful information. It is a no-go because even though it
> is
> : OpenJDK and free, vendor is Oracle and legal dept. at our company is
> trying
> : to get away from anything Oracle.
> : It is little paranoid reaction, I agree.
> :
> : See the java.vendor property in following output.
> :
> : $ java -XshowSettings:properties -version
> : Property settings:
> : awt.toolkit = sun.awt.X11.XToolkit
> : file.encoding = UTF-8
> : file.encoding.pkg = sun.io
> : file.separator = /
> : java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
> : java.awt.printerjob = sun.print.PSPrinterJob
> : java.class.path = .
> : java.class.version = 52.0
> : java.endorsed.dirs =
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed
> : java.ext.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext
> : /usr/java/packages/lib/ext
> : java.home = /usr/lib/jvm/java-8-openjdk-amd64/jre
> : java.io.tmpdir = /tmp
> : java.library.path = /usr/java/packages/lib/amd64
> : /usr/lib/x86_64-linux-gnu/jni
> : /lib/x86_64-linux-gnu
> : /usr/lib/x86_64-linux-gnu
> : /usr/lib/jni
> : /lib
> : /usr/lib
> : java.runtime.name = OpenJDK Runtime Environment
> : java.runtime.version = 1.8.0_181-8u181-b13-1~deb9u1-b13
> : java.specification.name = Java Platform API Specification
> : java.specification.vendor = Oracle Corporation
> : java.specification.version = 1.8
> : java.vendor = Oracle Corporation
> : java.vendor.url = http://java.oracle.com/
> : java.vendor.url.bug = http://bugreport.sun.com/bugreport/
> : java.version = 1.8.0_181
> : java.vm.info = mixed mode
> : java.vm.name = OpenJDK 64-Bit Server VM
> : java.vm.specification.name = Java Virtual Machine Specification
> : java.vm.specification.vendor = Oracle Corporation
> : java.vm.specification.version = 1.8
> : java.vm.vendor = Oracle Corporation
> : java.vm.version = 25.181-b13
> : line.separator = \n
> : os.arch = amd64
> : os.name = Linux
> : os.version = 4.9.0-8-amd64
> : path.separator = :
> : sun.arch.data.model = 64
> : sun.boot.class.path =
> : /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar
> : /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar
> : /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/sunrsasign.jar
> : /usr/

Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Arnold Bronley
Specification
> java.vm.specification.vendor = Oracle Corporation
> java.vm.specification.version = 11
> java.vm.vendor = Amazon.com Inc.
> java.vm.version = 11.0.6+10-LTS
> jdk.debug = release
> line.separator = \n
> os.arch = amd64
> os.name = Linux
> os.version = 4.19.76-linuxkit
> path.separator = :
> sun.arch.data.model = 64
> sun.boot.library.path = /usr/lib/jvm/java-11-amazon-corretto/lib
> sun.cpu.endian = little
> sun.cpu.isalist =
> sun.io.unicode.encoding = UnicodeLittle
> sun.java.launcher = SUN_STANDARD
> sun.jnu.encoding = ANSI_X3.4-1968
> sun.management.compiler = HotSpot 64-Bit Tiered Compilers
> sun.os.patch.level = unknown
> user.country = US
> user.dir = /
> user.home = /root
> user.language = en
> user.name = root
> user.timezone =
>
> openjdk version "11.0.6" 2020-01-14 LTS
> OpenJDK Runtime Environment Corretto-11.0.6.10.1 (build 11.0.6+10-LTS)
> OpenJDK 64-Bit Server VM Corretto-11.0.6.10.1 (build 11.0.6+10-LTS, mixed
> mode)
>
>
> Kevin Risden
>
>
> On Fri, Jan 31, 2020 at 1:25 PM Kevin Risden  wrote:
>
> > Whoops forgot to share the same output from latest. The docker images are
> > clearly building from AdoptOpenJDK so specification vendor is potentially
> > misleading?
> >
> > ➜  ~ docker pull solr
> > Using default tag: latest
> > latest: Pulling from library/solr
> > Digest:
> > sha256:ef1f2241c1aa51746aa3ad05570123eef128d98e91bc07336c37f2a1b37df7a9
> > Status: Image is up to date for solr:latest
> > docker.io/library/solr:latest
> > ➜  ~ docker run --rm -it solr bash -c "java -XshowSettings:properties
> > -version"
> > Property settings:
> > awt.toolkit = sun.awt.X11.XToolkit
> > file.encoding = UTF-8
> > file.separator = /
> > java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
> > java.awt.printerjob = sun.print.PSPrinterJob
> > java.class.path =
> > java.class.version = 55.0
> > java.home = /usr/local/openjdk-11
> > java.io.tmpdir = /tmp
> > java.library.path = /usr/java/packages/lib
> > /usr/lib64
> > /lib64
> > /lib
> > /usr/lib
> > java.runtime.name = OpenJDK Runtime Environment
> > java.runtime.version = 11.0.6+10
> > java.specification.name = Java Platform API Specification
> > java.specification.vendor = Oracle Corporation
> > java.specification.version = 11
> > java.vendor = Oracle Corporation
> > java.vendor.url = http://java.oracle.com/
> > java.vendor.url.bug = http://bugreport.java.com/bugreport/
> > java.vendor.version = 18.9
> > java.version = 11.0.6
> > java.version.date = 2020-01-14
> > java.vm.compressedOopsMode = 32-bit
> > java.vm.info = mixed mode
> > java.vm.name = OpenJDK 64-Bit Server VM
> > java.vm.specification.name = Java Virtual Machine Specification
> > java.vm.specification.vendor = Oracle Corporation
> > java.vm.specification.version = 11
> > java.vm.vendor = Oracle Corporation
> > java.vm.version = 11.0.6+10
> > jdk.debug = release
> > line.separator = \n
> > os.arch = amd64
> > os.name = Linux
> > os.version = 4.19.76-linuxkit
> > path.separator = :
> > sun.arch.data.model = 64
> > sun.boot.library.path = /usr/local/openjdk-11/lib
> > sun.cpu.endian = little
> > sun.cpu.isalist =
> > sun.io.unicode.encoding = UnicodeLittle
> > sun.java.launcher = SUN_STANDARD
> > sun.jnu.encoding = UTF-8
> > sun.management.compiler = HotSpot 64-Bit Tiered Compilers
> > sun.os.patch.level = unknown
> > user.dir = /opt/solr-8.4.1
> > user.home = /home/solr
> > user.language = en
> > user.name = solr
> > user.timezone =
> >
> > openjdk version "11.0.6" 2020-01-14
> > OpenJDK Runtime Environment 18.9 (build 11.0.6+10)
> > OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode)
> >
> > Kevin Risden
> >
> >
> > On Fri, Jan 31, 2020 at 1:22 PM Kevin Risden  wrote:
> >
> >> What specific Solr tag are you using? That looks like JDK 1.8 and an
> >> older version.
> >>
> >> Just picking the current latest as an example:
> >>
> >>
> >>
> https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile
> >>
> >> which uses op

Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Arnold Bronley
Thanks for the helpful information. It is a no-go because even though it is
OpenJDK and free, vendor is Oracle and legal dept. at our company is trying
to get away from anything Oracle.
It is little paranoid reaction, I agree.

See the java.vendor property in following output.

$ java -XshowSettings:properties -version
Property settings:
awt.toolkit = sun.awt.X11.XToolkit
file.encoding = UTF-8
file.encoding.pkg = sun.io
file.separator = /
java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
java.awt.printerjob = sun.print.PSPrinterJob
java.class.path = .
java.class.version = 52.0
java.endorsed.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed
java.ext.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext
/usr/java/packages/lib/ext
java.home = /usr/lib/jvm/java-8-openjdk-amd64/jre
java.io.tmpdir = /tmp
java.library.path = /usr/java/packages/lib/amd64
/usr/lib/x86_64-linux-gnu/jni
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu
/usr/lib/jni
/lib
/usr/lib
java.runtime.name = OpenJDK Runtime Environment
java.runtime.version = 1.8.0_181-8u181-b13-1~deb9u1-b13
java.specification.name = Java Platform API Specification
java.specification.vendor = Oracle Corporation
java.specification.version = 1.8
java.vendor = Oracle Corporation
java.vendor.url = http://java.oracle.com/
java.vendor.url.bug = http://bugreport.sun.com/bugreport/
java.version = 1.8.0_181
java.vm.info = mixed mode
java.vm.name = OpenJDK 64-Bit Server VM
java.vm.specification.name = Java Virtual Machine Specification
java.vm.specification.vendor = Oracle Corporation
java.vm.specification.version = 1.8
java.vm.vendor = Oracle Corporation
java.vm.version = 25.181-b13
line.separator = \n
os.arch = amd64
os.name = Linux
os.version = 4.9.0-8-amd64
path.separator = :
sun.arch.data.model = 64
sun.boot.class.path =
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/sunrsasign.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar
/usr/lib/jvm/java-8-openjdk-amd64/jre/classes
sun.boot.library.path = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64
sun.cpu.endian = little
sun.cpu.isalist =
sun.io.unicode.encoding = UnicodeLittle
sun.java.launcher = SUN_STANDARD
sun.jnu.encoding = UTF-8
sun.management.compiler = HotSpot 64-Bit Tiered Compilers
sun.os.patch.level = unknown
user.country = US
user.dir = /opt/solr
user.home = /home/solr
user.language = en
user.name = solr
user.timezone =

openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-1~deb9u1-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)




On Fri, Jan 31, 2020 at 10:39 AM Jan Høydahl  wrote:

> Yep, the OpenJDK in Solr image is pure open source, no Oracle license
> required.
>
> If I’m not mistaken it is the AdoptOpenJdk distro under the hoods, which
> will receive patches for several years unlike Oracles openjdk distro that
> is only updated for 6 months.
>
> For every Solr release we refresh all docked images with newest JRE 11
> version such that even a pull of 8.1 will get latest patched java.
>
> We should perhaps document this somewhere. I plan to add some “Solr on
> Docker” chapter to the reference guide.
>
> Jan Høydahl
>
> > 31. jan. 2020 kl. 16:00 skrev Koen De Groote <
> koen.degro...@limecraft.com>:
> >
> > Indeed, only Oracle JDK is affected by the commercial license, not
> OpenJDK,
> > as can be read here: https://www.baeldung.com/oracle-jdk-vs-openjdk
> >
> > Point 5 specifically.
> >
> > Also explained here:
> >
> https://www.quora.com/Does-using-OpenJDK-provide-a-way-to-be-safe-from-Oracle-Java-Licensing-fee
> >
> >
> >> On Fri, Jan 31, 2020 at 3:45 PM Erick Erickson  >
> >> wrote:
> >>
> >> Why is it a no-go? It’s free too.
> >>
> >>> On Jan 31, 2020, at 12:31 AM, Arnold Bronley 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I use Solr docker images from https://hub.docker.com/_/solr/. It uses
> >>> Oracle OpenJDK. It is a no go for where I work. What is the best way to
> >>> replace this JDK with some other OpenJDK such as Amazon Corretto
> OpenJDK
> >>> for my docker containers if I still want to use above images?
> >>
> >>
>


Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-30 Thread Arnold Bronley
Hi,

I use Solr docker images from https://hub.docker.com/_/solr/. It uses
Oracle OpenJDK. It is a no go for where I work. What is the best way to
replace this JDK with some other OpenJDK such as Amazon Corretto OpenJDK
for my docker containers if I still want to use above images?


Solr 6.3 and OpenJDK 11

2020-01-28 Thread Arnold Bronley
Hi,

How much of a problem would it be if I use OpenJDK 11 with Solr 6.3. I am
aware that the system requirements page for Solr mentions that 'You should
avoid Java 9 or later for Lucene/Solr 6.x or earlier.' I am interested in
knowing what sort functionalities would break in Solr if I try to use
OpenJDK 11 with Solr 6.3.


QParser does not retain double quotes

2020-01-22 Thread Arnold Bronley
Hi,

I have following code that does some parsing with QParser plugin. I noticed
that it does not retain the double quotes in the filterQueryString. How
should make it retain the double quotes?

QParser.getParser(filterQueryString, null, req).getQuery();

filterQueryString passed = id:"x:1234"


Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Arnold Bronley
I knew about the + and other signs and their connections to MUST and other
operators. What I did not understand was why it was not adding parentheses
around the expression. In your first replay you mentioned that -  'roughly,
a builder for each query enclosed in "parenthesis"' - that was the key
point I was missing.

On Wed, Jan 22, 2020 at 2:40 PM Arnold Bronley 
wrote:

> Thanks, Edaward. This was the exact answer I was looking for :)
>
> On Wed, Jan 22, 2020 at 1:08 PM Edward Ribeiro 
> wrote:
>
>> If you are using Lucene's BooleanQueryBuilder then you need to do nesting
>> of your queries (roughly, a builder for each query enclosed in
>> "parenthesis").
>>
>> A query like (text:child AND text:toys) OR age:12 would be:
>>
>> Query query1 = new TermQuery(new Term("text", "toys"));
>> Query query2 = new TermQuery(new Term("text", "children"));
>> Query query3 = new TermQuery(new Term("age", "12"));
>>
>> BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
>> andBuilder.add(query1, BooleanClause.Occur.MUST);
>> andBuilder.add(query2, BooleanClause.Occur.MUST);
>>
>> BooleanQuery.Builder builder = new BooleanQuery.Builder();
>> builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
>> builder.add(query3, BooleanClause.Occur.SHOULD);
>>
>> BooleanQuery booleanQuery = builder.build();
>>
>> This booleanQuery.toString() will be:
>>
>> (+text:toys +text:children) age:12
>>
>> That is the parsing of "(text:child AND text:toys) OR age:12"
>>
>>
>> Edward
>>
>> On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley 
>> wrote:
>> >
>> > Hi,
>> >
>> > BooleanQueryBuilder is not adding parenthesis around the query. It
>> > only adds + sign at the start of the query but not the parentheses
>> around
>> > the query. Why is that? How should I add it?
>> >
>> > booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)
>>
>


Re: BooleanQueryBuilder is not adding parenthesis around the query

2020-01-22 Thread Arnold Bronley
Thanks, Edaward. This was the exact answer I was looking for :)

On Wed, Jan 22, 2020 at 1:08 PM Edward Ribeiro 
wrote:

> If you are using Lucene's BooleanQueryBuilder then you need to do nesting
> of your queries (roughly, a builder for each query enclosed in
> "parenthesis").
>
> A query like (text:child AND text:toys) OR age:12 would be:
>
> Query query1 = new TermQuery(new Term("text", "toys"));
> Query query2 = new TermQuery(new Term("text", "children"));
> Query query3 = new TermQuery(new Term("age", "12"));
>
> BooleanQuery.Builder andBuilder = new BooleanQuery.Builder();
> andBuilder.add(query1, BooleanClause.Occur.MUST);
> andBuilder.add(query2, BooleanClause.Occur.MUST);
>
> BooleanQuery.Builder builder = new BooleanQuery.Builder();
> builder.add(andBuilder.build(), BooleanClause.Occur.SHOULD);
> builder.add(query3, BooleanClause.Occur.SHOULD);
>
> BooleanQuery booleanQuery = builder.build();
>
> This booleanQuery.toString() will be:
>
> (+text:toys +text:children) age:12
>
> That is the parsing of "(text:child AND text:toys) OR age:12"
>
>
> Edward
>
> On Tue, Jan 21, 2020 at 5:24 PM Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > BooleanQueryBuilder is not adding parenthesis around the query. It
> > only adds + sign at the start of the query but not the parentheses around
> > the query. Why is that? How should I add it?
> >
> > booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)
>


BooleanQueryBuilder is not adding parenthesis around the query

2020-01-21 Thread Arnold Bronley
Hi,

BooleanQueryBuilder is not adding parenthesis around the query. It
only adds + sign at the start of the query but not the parentheses around
the query. Why is that? How should I add it?

booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)


Lucene query to Solr query

2020-01-19 Thread Arnold Bronley
Hi,

I have a Lucene query as following (toString represenation of Lucene's
Query object):

+(topics:29)^2 (topics:38)^3 +(-id:41135)

It works fine when I am using it as a lucene query in
SolrIndexSearcher.getDocList function.

However, now I want to use it as a Solr query and query against a
collection. I tried to use the as-is representation from Lucene query
object's toString method but it does not work. How should I proceed?


SolrCloud upgrade concern

2020-01-16 Thread Arnold Bronley
Hi,

I am trying to upgrade my system from Solr master-slave architecture to
SolrCloud architecture.
Meanwhile, I stumbled upon this very negative post about SolrCloud.

https://lucene.472066.n3.nabble.com/A-Last-Message-to-the-Solr-Users-td4452980.html


Given that it is from one of the initial authors of SolrCloud
functionality, I am having second thoughts about the upgrade and I am
somewhat concerned.

I will greatly appreciate any advice/feedback on this from Solr community.


Re: remote debugging for docker solr

2020-01-14 Thread Arnold Bronley
Thanks.

The issue turned out to be little different that expected. My IntelliJ was
on a Windows (my main operating system). Solr was running inside a docker
container inside Debian VM hosted on my Windows operating system.
Debug server that runs inside the docker container will accept connections
only from locahost (container itself). In order for the debug server to
accept the connections made from Intellij (which is on a different host, it
is neither running inside docker container nor inside Debian VM, it is
running on Windows host machine), in the command line arguments for JVM,

instead of
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

we need to pass
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005

Notice the wildcard usage for address parameter. It tells the debug server
to accept connections from any host.

Thanks for all the help.



On Mon, Jan 13, 2020 at 2:56 PM Edward Ribeiro 
wrote:

> Hi,
>
> I was able to connect my IDE to Solr running on a container by using the
> following command:
>
> command: >
>  bash -c "solr start -c -f -a
> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005;"
>
> It starts SolrCloud ( -c ) and listens on foreground ( -f ) so you don't
> need to resort to tail -f.
>
> OTOH, I understand that you want to run the commands below after the
> container is up and Solr is accepting connections:
>
> wait-for-solr.sh;
> solr create -c data_core -d /var/lib/solr/data_core;
>
> IMHO, you could put those commands in a bash script (setup.sh in my
> example) and start them first in background. Therefore, while the setup.sh
> script is attempting to connect to Solr the server is starting in
> foreground:
>
> command: >
> bash -c "./myscript.sh& exec solr start -c -f -a
> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005; "
>
> See that there's an 'exec' keyword between the script and the solr
> launching command.
>
> Best,
> Edward
>
> On Mon, Jan 13, 2020 at 3:09 PM Arnold Bronley 
> wrote:
> >
> > Thanks for your helpful replies, guys.
> >
> > @Edward: you were correct. I forgot to export 5005 port in YAML. After
> > exporting this port, I am at least able to see the process with following
> > command (I was not able to see it before):
> >
> > gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$
> sudo
> > lsof -i tcp:5005
> > COMMAND  PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
> > docker-pr 181928 root4u  IPv6 91702195  0t0  TCP *:5005 (LISTEN)
> >
> > However, even after this, it is not working. i.e. I am not able to
> connect
> > to the debug session from IntelliJ. It still throws connection refused
> > error as it did previously.
> >
> > Here is the full Solr setting now:
> >
> >   solr:
> > build:
> >   context: ${DEV_ENV_ROOT}/solr/docker
> >   dockerfile: Dockerfile.development
> >   args:
> > DEV_ENV_USER_ID: ${DEV_ENV_USER_ID}
> > networks:
> >   - default
> > ports:
> >   - "8983:8983"
> >   - "9983:9983"
> >   - "5005:5005"
> > environment:
> >   - SOLR_HEAP=2048m
> > command: >
> >  bash -c "solr start -a
> > "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"
> -cloud
> > -s /var/lib/solr -t /var/data/solr; set -x; export; wait-for-solr.sh;
> >
> >  solr create -c data_core -d /var/lib/solr/data_core;
> >  tail -f /var/log/solr/solr.log"
> >
> > @Martijn: As you can see in the setting above, the tail command does not
> > let container exit. I had to start Solr in background otherwise I cannot
> > run command to create collection after it if Solr is started in
> foreground.
> >
> > I tried both the settings that you have given above. I see the '
> Listening
> > for transport dt_socket at address: 5005' statement in the startup logs.
> I
> > can also see that there is a process running on this port.
> > gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$
> sudo
> > lsof -i tcp:5005
> > COMMAND  PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
> > docker-pr 181928 root4u  IPv6 91702195  0t0  TCP *:5005 (LISTEN)
> >
> > but if I try to connect it with telnet command directly from the Debian
> > machine then it fails
> > gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$
> telnet
> > localhost 5005
> > Trying 127.0.0.1...
> > Connected to loca

Re: remote debugging for docker solr

2020-01-13 Thread Arnold Bronley
Thanks for your helpful replies, guys.

@Edward: you were correct. I forgot to export 5005 port in YAML. After
exporting this port, I am at least able to see the process with following
command (I was not able to see it before):

gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$ sudo
lsof -i tcp:5005
COMMAND  PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
docker-pr 181928 root4u  IPv6 91702195  0t0  TCP *:5005 (LISTEN)

However, even after this, it is not working. i.e. I am not able to connect
to the debug session from IntelliJ. It still throws connection refused
error as it did previously.

Here is the full Solr setting now:

  solr:
build:
  context: ${DEV_ENV_ROOT}/solr/docker
  dockerfile: Dockerfile.development
  args:
DEV_ENV_USER_ID: ${DEV_ENV_USER_ID}
networks:
  - default
ports:
  - "8983:8983"
  - "9983:9983"
  - "5005:5005"
environment:
  - SOLR_HEAP=2048m
command: >
 bash -c "solr start -a
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005" -cloud
-s /var/lib/solr -t /var/data/solr; set -x; export; wait-for-solr.sh;

 solr create -c data_core -d /var/lib/solr/data_core;
 tail -f /var/log/solr/solr.log"

@Martijn: As you can see in the setting above, the tail command does not
let container exit. I had to start Solr in background otherwise I cannot
run command to create collection after it if Solr is started in foreground.

I tried both the settings that you have given above. I see the ' Listening
for transport dt_socket at address: 5005' statement in the startup logs. I
can also see that there is a process running on this port.
gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$ sudo
lsof -i tcp:5005
COMMAND  PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
docker-pr 181928 root4u  IPv6 91702195  0t0  TCP *:5005 (LISTEN)

but if I try to connect it with telnet command directly from the Debian
machine then it fails
gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$ telnet
localhost 5005
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.

if I use the Debian machine name on which this docker instance is running,
it still fails but with different error:
gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$ sudo
lsof -i tcp:5005
(failed reverse-i-search)`telent ': ^Clnet localhost 5005
gnandre@gnandre-deb9-64:/sandbox/gnandre/mw-ruby-development-server$ telnet
gnandre-deb9-64 5005
Trying 172.28.155.62...
telnet: Unable to connect to remote host: Connection refused

I used telnet command to test the debug port. I also tried to connect the
debugging session from IntelliJ and it fails with the same issue. i.e.
Connection refused. This is was the same error that I was getting in the
first place.

Any more suggestions?



On Sat, Jan 11, 2020 at 10:15 AM Martijn Koster 
wrote:

> I think you may have a quoting issue there: remove those inner ones. Not
> that it should matter in this instance.
> I’m not sure why you’re using “start”, which will run solr in the
> background, or what you expect to happen after the wait-for-solr.sh — if it
> all worked as you expected that would wait for solr, then exit, destroying
> the container.
>
> This seems to work:
>
> version: '3.3'
> services:
>   solr1:
> container_name: solr1
> image: solr:8.4
> ports:
>  - "8981:8983”
>  - "5005:5005”
> command: bash -c "solr-fg -a
> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005”
>
> Or even just:
>
> version: ‘3.3'
> services:
>   solr1:
> container_name: solr1
> image: solr:8.4
> ports:
>  - "8981:8983”
>  - "5005:5005”
> environment:
>  -
> "SOLR_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005”
>
> The thing to look for in the logs is:   solr1| Listening for transport
> dt_socket at address: 5005
>
> BTW, when looking at these sort of subtleties, it’s always useful to exec
> into the container, and run `tr '\000' '\n'  the pid of java in your container, doublecheck with `ps -eflwww`), to make
> sure it’s running with the arguments you expect it to.
>
> — Martijn
>
> > On 10 Jan 2020, at 23:42, Arnold Bronley 
> wrote:
> >
> > bash -c "solr start -a
> > "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"
> -cloud
> > -s /var/lib/solr -t /var/data/solr; set -x; export; wait-for-solr.sh;"
>
>


remote debugging for docker solr

2020-01-10 Thread Arnold Bronley
Hi,

I have a running dockerized instance of Solr which runs fine with the
following setting for command option for solr service in docker-compose.yml
file

command: >
 bash -c "solr start -cloud -s /var/lib/solr -t /var/data/solr;
set -x; export; wait-for-solr.sh;"

Recently, I wanted to debug one custom Solr plugin, so I added a command
line argument for remote debugging to above command:

 command: >
 bash -c "solr start -a
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005" -cloud
-s /var/lib/solr -t /var/data/solr; set -x; export; wait-for-solr.sh;"

Although,  Solr starts without any issues, I cannot see any process running
on debug port and consequently I am not able to connect to any remote debug
process through my IntelliJ.

Any suggestions?


Re: Accessing other core in SolrCloud mode

2020-01-06 Thread Arnold Bronley
Hi Erick,

Thanks for replying. I know that I should deal at collection level in
SolrCloud mode and leave dealing with cores to SolrCloud. I am also aware
of collection aliasing feature.

However, the plugins that I am trying to migrate to SolrCloud have some
usecases like following:
1. One of the plugins requires other core so that it can fetch some data
from it. This data gets used in pre-indexing phase for the main core. This
plugin gets invoked during the pre-indexing phase of the main core.
2. One other plugin requires some other cores so that it can execute
MoreLikeThis queries against them (MoreLikeThis like method returns Lucene
Query object so I assume that it does not work at Solr collection level but
at Solr core level unless we convert it to SolrQuery object somehow and
execute it through some Solr client)

On Mon, Jan 6, 2020 at 3:28 PM Erick Erickson 
wrote:

> This kind of seems like an XY problem. Why do you want to get to the other
> core?
> If you need to run the same query on multiple cores… you shouldn’t be
> thinking
> that way, think “collections” rather than cores. And you can use
> “collection aliasing”
> (see the Collections API CREATEALIAS command) to alias to multiple
> _collections_.
> The rest is automatic.
>
> If that’s irrelevant, please tell us _why_ your custom code needs to
> access otherCore,
> maybe there’s something built in. It’s just that much of the time trying
> to force
> stand-alone logic on SolrCloud is making it harder than it needs to be.
>
> Best,
> Erick
>
> > On Jan 6, 2020, at 2:39 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > I have one custom Solr plugin that uses following logic to access some
> > other core present on the same Solr instance.
> >
> > request.getCore().getCoreContainer().getCore(otherCoreName) where request
> > is an object of type SolrQueryRequest
> >
> > This works fine in master-slave mode.
> >
> > Now if try to use the same logic in SolrCloud mode, it does not work
> > because what was the core name above is a collection name now and core
> > name is something like otherCoreName_shard1_replica_n1. There is also a
> > possibility that the the core might be sharded in SolrCloud mode and it
> > partially exists on two or more separate Solr instances.
> >
> > How would I change above logic for accessing other Solr core so that it
> > works in SolrCloud mode?
>
>


Re: Does CloudMLTQParser support getting related documents from different shards?

2020-01-06 Thread Arnold Bronley
*  What results do you get when you just try it in cloud mode? *

When I try it in SolrCloud mode, the part that deals with fetching the
results from the same core works fine. However, the part that deals with
fetching results from other cores does not work.




*This is a _parser_, it’s just in charge of parsing the query and, in this
case getting the relevant from the indicated document to add to the query
while doing some sanity checking. The bits that distribute the query to
shards and collate the results are elsewhere.  *

AFAIK, the part that fetches the relevant documents is following:
MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().getIndexReader());
mlt.like(docId, curBoostFields, tie)

mlt.like method returns the Lucene Query object. So should I just convert
this query object to SolrQuery object and handle the distributed calls
myself by making a request against a collection instead of particular core?
How would I convert Lucene Query to SolrQuery? Should I just use toString
method?





On Sat, Jan 4, 2020 at 9:21 AM Erick Erickson 
wrote:

> What results do you get when you just try it in cloud mode?
>
> This is a _parser_, it’s just in charge of parsing the query and, in this
> case
> getting the relevant from the indicated document to add to the
> query while doing some sanity checking. The bits that
> distribute the query to shards and collate the results are elsewhere.
>
> Best,
> Erick
>
> > On Jan 3, 2020, at 5:43 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
> > MoreLikeThis handler does not support distributed mode and the issue is
> > still open for that - https://issues.apache.org/jira/browse/SOLR-5480.
> >
> > However, I saw that there is some possibility to use CloudMLTQParser to
> > work around above issue. CloudMLTQParser claims to work in distributed
> mode
> > too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a
> look
> > at the code though,
> >
> https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
> > ,
> > it seems like that for the document for which we are finding out related
> > documents, that document is getting fetched now with real-time get
> request
> > handler.
> > core.getRequestHandler("/get").handleRequest(request, rsp);
> > This is good because now that document will get fetched from any shard
> > existing
> > in cloud wherever that document exists.
> >
> > However, the part where the relevant related documents are supposed to be
> > fetched it still uses the same old sort of code.
> > e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
> > getIndexReader());
> >
> > Isn't getSearcher bound to only the particular shard and AFAIK it does
> not
> > work across shards in cloud?
> >
> > So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?
>
>
On Sat, Jan 4, 2020 at 9:21 AM Erick Erickson 
wrote:

> What results do you get when you just try it in cloud mode?
>
> This is a _parser_, it’s just in charge of parsing the query and, in this
> case
> getting the relevant from the indicated document to add to the
> query while doing some sanity checking. The bits that
> distribute the query to shards and collate the results are elsewhere.
>
> Best,
> Erick
>
> > On Jan 3, 2020, at 5:43 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
> > MoreLikeThis handler does not support distributed mode and the issue is
> > still open for that - https://issues.apache.org/jira/browse/SOLR-5480.
> >
> > However, I saw that there is some possibility to use CloudMLTQParser to
> > work around above issue. CloudMLTQParser claims to work in distributed
> mode
> > too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a
> look
> > at the code though,
> >
> https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
> > ,
> > it seems like that for the document for which we are finding out related
> > documents, that document is getting fetched now with real-time get
> request
> > handler.
> > core.getRequestHandler("/get").handleRequest(request, rsp);
> > This is good because now that document will get fetched from any shard
> > existing
> > in cloud wherever that document exists.
> >
> > However, the part where the relevant related documents are supposed to be
> > fetched it still uses the same old sort of code.
> > e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
> > getIndexReader());
> >
> > Isn't getSearcher bound to only the particular shard and AFAIK it does
> not
> > work across shards in cloud?
> >
> > So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?
>
>


Accessing other core in SolrCloud mode

2020-01-06 Thread Arnold Bronley
Hi,

I have one custom Solr plugin that uses following logic to access some
other core present on the same Solr instance.

request.getCore().getCoreContainer().getCore(otherCoreName) where request
is an object of type SolrQueryRequest

This works fine in master-slave mode.

Now if try to use the same logic in SolrCloud mode, it does not work
because what was the core name above is a collection name now and core
name is something like otherCoreName_shard1_replica_n1. There is also a
possibility that the the core might be sharded in SolrCloud mode and it
partially exists on two or more separate Solr instances.

How would I change above logic for accessing other Solr core so that it
works in SolrCloud mode?


Does CloudMLTQParser support getting related documents from different shards?

2020-01-03 Thread Arnold Bronley
Hi,

I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
MoreLikeThis handler does not support distributed mode and the issue is
still open for that - https://issues.apache.org/jira/browse/SOLR-5480.

However, I saw that there is some possibility to use CloudMLTQParser to
work around above issue. CloudMLTQParser claims to work in distributed mode
too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a look
at the code though,
https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
,
it seems like that for the document for which we are finding out related
documents, that document is getting fetched now with real-time get request
handler.
core.getRequestHandler("/get").handleRequest(request, rsp);
This is good because now that document will get fetched from any shard
existing
in cloud wherever that document exists.

However, the part where the relevant related documents are supposed to be
fetched it still uses the same old sort of code.
e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
getIndexReader());

Isn't getSearcher bound to only the particular shard and AFAIK it does not
work across shards in cloud?

So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?


Re: Solr Admin Console hangs on Chrome

2019-12-10 Thread Arnold Bronley
I am also facing similar issue. I have also switched to other browsers to
solve this issue.

On Tue, Dec 10, 2019 at 2:22 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> It seems like the Solr Admin console has become slow when you use it on
> the chrome browser. If I go to the query tab and execute a query, even the
> default *:* after that the browser window becomes very slow.
> I'm using chrome Version 78.0.3904.108 (Official Build) (64-bit) on Windows
>
> The work around is to use Firefox
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>


Indexing strategies for user profiles

2019-12-10 Thread Arnold Bronley
Hi,

I have a Solr collection 'products' for different products that users
interact with. With MoreLikeThis, I can retrieve for a given product
another related product. Now, I want to create a Solr collection for users
such that I can use MoreLikeThis approach between users and products. Not
just that, I would also like to get relevant product for a user based on
some sort of collaborative filtering. What should be my indexing indexing
and collection creation strategy to tackle this problem in general?


How to update range of dynamic fields in Solr

2019-10-23 Thread Arnold Bronley
Here is the detailed question in stack-overflow. Please help.

https://stackoverflow.com/questions/14280506/how-to-update-range-of-dynamic-fields-in-solr-4


Critical issue SOLR-13141

2019-09-25 Thread Arnold Bronley
Hi,
I am using Solr version 8.2.0 and I see that there is one critical JIRA
issue open(link below) for CDCR. The issue does not mention anything about
8.2.0 but it says that it is fixed in 8.3.0. Does this mean that CDCR is
not functional in Solr 8.2.0 and should I wait for 8.3.0 to be released?

https://issues.apache.org/jira/browse/SOLR-13141


Re: Reloading after creating a collection

2019-09-19 Thread Arnold Bronley
Hi,
I am not changing the the config to enable CDCR. I am just using the CDCR
API to start it. Does that count as changing configuration?

On Thu, Sep 19, 2019 at 12:20 PM Shawn Heisey  wrote:

> On 9/19/2019 9:36 AM, Arnold Bronley wrote:
> > Why is it that I need to reload collection after I created it? CDCR runs
> > into issues if I do not do this.
>
> If the config doesn't change after creation, I would not expect that to
> be required.
>
> If you do change the config to enable CDCR after the collection is
> created, then you have to reload so that Solr sees the new config.
>
> Thanks,
> Shawn
>


Reloading after creating a collection

2019-09-19 Thread Arnold Bronley
Hi,

Why is it that I need to reload collection after I created it? CDCR runs
into issues if I do not do this.


OR and AND queries case sensitive in q param?

2019-09-12 Thread Arnold Bronley
Hi,

in Solr 6.3, I was able to use OR and AND operators in case insensitive
manner.

E.g.
If I have two documents like following in my corpus:
document 1:
{
id:1
author:rick
}

document 2:
{
id:2
author:morty
}

Then if I pass 'rick OR morty' to q param then I would get both documents
back. I would get both documents back even if I pass 'rick or morty'.

In Solr 8.2, I am not able to 'rick or morty' does not give any results
back. 'rick OR morty' gives both results back.

Is this intentional change?


Re: SolrClient from inside processAdd function

2019-09-10 Thread Arnold Bronley
Hi,

Thanks for all this information. I am doing this now like following:

@Override
public void inform(SolrCore core) {
HttpSolrClient.Builder builder = new HttpSolrClient.Builder();
String baseURL =
core.getCoreContainer().getZkController().getBaseUrl() + "/" +
dataInfo.dataSource;
builder.withBaseSolrUrl(baseURL);
solrClient =  builder.build();

core.addCloseHook(new CloseHook() {
@Override
public void preClose(SolrCore core) {
//no cleanup needed before closing the core
}

@Override
public void postClose(SolrCore core) {
try {
solrClient.close();
} catch (IOException e) {
logger.error("Not able to close solr client  after closing " +
"Solr core '{}'", core.getName(), e);
}
}
});
}


On Fri, Sep 6, 2019 at 5:40 PM Arnold Bronley 
wrote:

> Hi Markus,
>
> "Depending on cloudMode we create new SolrClient instances based on these
> classes.   "
>
> But I still do not see SolrClient creation anywhere in your code snippet.
> Am I missing something? I tried the solution with system properties and it
> works but I would like to avoid that.
>
> On Thu, Sep 5, 2019 at 6:20 PM Markus Jelsma 
> wrote:
>
>> Hello Arnold,
>>
>> In the Factory's inform() method you receive a SolrCore reference. Using
>> this you can get the CloudDescriptor and the ZkController references. These
>> provide access to what you need to open a connection for SolrClient.
>>
>> Our plugins usually work in cloud and non-cloud environments, so we
>> initialize different things for each situation. Like this abstracted in
>> some CloudUtils thing:
>>
>> cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
>> zk = core.getCoreContainer().getZkController(); // this is the
>> ZkController ref
>> coreName = core.getCoreDescriptor().getName();
>>
>> // Are we in cloud mode?
>> if (zk != null) {
>>   collectionName = core.getCoreDescriptor().getCollectionName();
>>   shardId = cloudDescriptor.getShardId();
>> } else {
>>   collectionName = null;
>>   shardId = null;
>> }
>>
>> Depending on cloudMode we create new SolrClient instances based on these
>> classes.
>>
>> Check the apidocs and you'll quickly see what you need.
>>
>> We use these api's to get what we need. But you can also find these
>> things if you check the Java system properties, which is easier. We use the
>> api's to read the data because if api's change, we get a compile error. If
>> the system properties change, we don't. So the system properties is easier,
>> but the api's are safer. Although a unit tests should guard against that as
>> well.
>>
>> Regards,
>> Markus
>>
>> ps, on this list there is normally no need to create a new thread for an
>> existing one, even if you are eagerly waiting for a reply. It might take
>> some patience though.
>>
>> -Original message-
>> > From:Arnold Bronley 
>> > Sent: Thursday 5th September 2019 18:44
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: SolrClient from inside processAdd function
>> >
>> > Hi Markus,
>> >
>> > Is there any way to get the information about the current Solr endpoint
>> > from within the custom URP?
>> >
>> > On Wed, Sep 4, 2019 at 3:10 PM Markus Jelsma <
>> markus.jel...@openindex.io>
>> > wrote:
>> >
>> > > Hello Arnold,
>> > >
>> > > Yes, we do this too for several cases.
>> > >
>> > > You can create the SolrClient in the Factory's inform() method, and
>> pass
>> > > is to the URP when it is created. You must implement SolrCoreAware and
>> > > close the client when the core closes as well. Use a CloseHook for
>> this.
>> > >
>> > > If you do not close the client, it will cause trouble if you run unit
>> > > tests, and most certainly when you regularly reload cores.
>> > >
>> > > Regards,
>> > > Markus
>> > >
>> > >
>> > >
>> > > -Original message-
>> > > > From:Arnold Bronley 
>> > > > Sent: Wednesday 4th September 2019 20:10
>> > > > To: solr-user@lucene.apache.org
>> > > > Subject: Re: SolrClient from inside processAdd function
>> > > >
>> > > > I need to search some other collection inside processAdd function
>> and
>> > > > append that information to the indexing request.
>> > > >
>> > > > On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson <
>> erickerick...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > This really sounds like an XY problem. What do you need the
>> SolrClient
>> > > > > _for_? I suspect there’s an easier way to do this…..
>> > > > >
>> > > > > Best,
>> > > > > Erick
>> > > > >
>> > > > > > On Sep 3, 2019, at 6:17 PM, Arnold Bronley <
>> arnoldbron...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Is there a way to create SolrClient from inside processAdd
>> function
>> > > for
>> > > > > > custom update processor for the same Solr on which it is
>> executing?
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: SolrClient from inside processAdd function

2019-09-06 Thread Arnold Bronley
Hi Markus,

"Depending on cloudMode we create new SolrClient instances based on these
classes.   "

But I still do not see SolrClient creation anywhere in your code snippet.
Am I missing something? I tried the solution with system properties and it
works but I would like to avoid that.

On Thu, Sep 5, 2019 at 6:20 PM Markus Jelsma 
wrote:

> Hello Arnold,
>
> In the Factory's inform() method you receive a SolrCore reference. Using
> this you can get the CloudDescriptor and the ZkController references. These
> provide access to what you need to open a connection for SolrClient.
>
> Our plugins usually work in cloud and non-cloud environments, so we
> initialize different things for each situation. Like this abstracted in
> some CloudUtils thing:
>
> cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
> zk = core.getCoreContainer().getZkController(); // this is the
> ZkController ref
> coreName = core.getCoreDescriptor().getName();
>
> // Are we in cloud mode?
> if (zk != null) {
>   collectionName = core.getCoreDescriptor().getCollectionName();
>   shardId = cloudDescriptor.getShardId();
> } else {
>   collectionName = null;
>   shardId = null;
> }
>
> Depending on cloudMode we create new SolrClient instances based on these
> classes.
>
> Check the apidocs and you'll quickly see what you need.
>
> We use these api's to get what we need. But you can also find these things
> if you check the Java system properties, which is easier. We use the api's
> to read the data because if api's change, we get a compile error. If the
> system properties change, we don't. So the system properties is easier, but
> the api's are safer. Although a unit tests should guard against that as
> well.
>
> Regards,
> Markus
>
> ps, on this list there is normally no need to create a new thread for an
> existing one, even if you are eagerly waiting for a reply. It might take
> some patience though.
>
> -Original message-
> > From:Arnold Bronley 
> > Sent: Thursday 5th September 2019 18:44
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrClient from inside processAdd function
> >
> > Hi Markus,
> >
> > Is there any way to get the information about the current Solr endpoint
> > from within the custom URP?
> >
> > On Wed, Sep 4, 2019 at 3:10 PM Markus Jelsma  >
> > wrote:
> >
> > > Hello Arnold,
> > >
> > > Yes, we do this too for several cases.
> > >
> > > You can create the SolrClient in the Factory's inform() method, and
> pass
> > > is to the URP when it is created. You must implement SolrCoreAware and
> > > close the client when the core closes as well. Use a CloseHook for
> this.
> > >
> > > If you do not close the client, it will cause trouble if you run unit
> > > tests, and most certainly when you regularly reload cores.
> > >
> > > Regards,
> > > Markus
> > >
> > >
> > >
> > > -Original message-
> > > > From:Arnold Bronley 
> > > > Sent: Wednesday 4th September 2019 20:10
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: SolrClient from inside processAdd function
> > > >
> > > > I need to search some other collection inside processAdd function and
> > > > append that information to the indexing request.
> > > >
> > > > On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson <
> erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > This really sounds like an XY problem. What do you need the
> SolrClient
> > > > > _for_? I suspect there’s an easier way to do this…..
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > > On Sep 3, 2019, at 6:17 PM, Arnold Bronley <
> arnoldbron...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is there a way to create SolrClient from inside processAdd
> function
> > > for
> > > > > > custom update processor for the same Solr on which it is
> executing?
> > > > >
> > > > >
> > > >
> > >
> >
>


host and port for SolrTestCaseJ4 and EmbeddedSolrServer

2019-09-05 Thread Arnold Bronley
Hi,

In SolrTestCaseJ4 there is initCore function. After using this function how
to know on which host and port the solr is running. Same goes for
EmbeddedSolrServer?
How to know on which port and host it is running?


Get host/port information for current Solr

2019-09-05 Thread Arnold Bronley
Hi,

is there a way to get host/port information for current Solr from inside
custom Solr URP plugin? One way it to do use 'localhost:8983' but I  feel
little uncomfortable with such hardcoding of the port.


Re: SolrClient from inside processAdd function

2019-09-05 Thread Arnold Bronley
Hi Markus,

Is there any way to get the information about the current Solr endpoint
from within the custom URP?

On Wed, Sep 4, 2019 at 3:10 PM Markus Jelsma 
wrote:

> Hello Arnold,
>
> Yes, we do this too for several cases.
>
> You can create the SolrClient in the Factory's inform() method, and pass
> is to the URP when it is created. You must implement SolrCoreAware and
> close the client when the core closes as well. Use a CloseHook for this.
>
> If you do not close the client, it will cause trouble if you run unit
> tests, and most certainly when you regularly reload cores.
>
> Regards,
> Markus
>
>
>
> -Original message-
> > From:Arnold Bronley 
> > Sent: Wednesday 4th September 2019 20:10
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrClient from inside processAdd function
> >
> > I need to search some other collection inside processAdd function and
> > append that information to the indexing request.
> >
> > On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson 
> > wrote:
> >
> > > This really sounds like an XY problem. What do you need the SolrClient
> > > _for_? I suspect there’s an easier way to do this…..
> > >
> > > Best,
> > > Erick
> > >
> > > > On Sep 3, 2019, at 6:17 PM, Arnold Bronley 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Is there a way to create SolrClient from inside processAdd function
> for
> > > > custom update processor for the same Solr on which it is executing?
> > >
> > >
> >
>


Atomic indexing as default indexing mode in Solr

2019-09-04 Thread Arnold Bronley
Why atomic indexing is not the default mode of indexing in Solr? That way
the ownership model of the content changes from document level to field
level for clients. Multiple clients can participate in the contribution
process of the same Solr document without overwriting each other.


Re: SolrClient from inside processAdd function

2019-09-04 Thread Arnold Bronley
Hi Simon,

I am interested in knowing what did you end up doing in your use-case then.
Can you please share it at least at high level?

On Wed, Sep 4, 2019 at 2:26 PM Simon Rosenthal 
wrote:

> Similarly, I had considered a URP which would call the Solr Tagger to add
> new metadata fields  for indexing to incoming documents (and recall
> discussing this with David Smiley), but eventually decided against this
> approach on the grounds of complexity.
>
> -Simon
>
> On Wed, Sep 4, 2019 at 2:10 PM Arnold Bronley 
> wrote:
>
> > I need to search some other collection inside processAdd function and
> > append that information to the indexing request.
> >
> > On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson 
> > wrote:
> >
> > > This really sounds like an XY problem. What do you need the SolrClient
> > > _for_? I suspect there’s an easier way to do this…..
> > >
> > > Best,
> > > Erick
> > >
> > > > On Sep 3, 2019, at 6:17 PM, Arnold Bronley 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Is there a way to create SolrClient from inside processAdd function
> for
> > > > custom update processor for the same Solr on which it is executing?
> > >
> > >
> >
>
>
> --
> I am transferring my email  from Yahoo to simon.rosent...@gmail.com. I
> will
> continue to receive Yahoo email but will reply from this account. Please
> update your address lists accordingly.
>


Re: SolrClient from inside processAdd function

2019-09-04 Thread Arnold Bronley
I need to search some other collection inside processAdd function and
append that information to the indexing request.

On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson 
wrote:

> This really sounds like an XY problem. What do you need the SolrClient
> _for_? I suspect there’s an easier way to do this…..
>
> Best,
> Erick
>
> > On Sep 3, 2019, at 6:17 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > Is there a way to create SolrClient from inside processAdd function for
> > custom update processor for the same Solr on which it is executing?
>
>


SolrClient from inside processAdd function

2019-09-03 Thread Arnold Bronley
Hi,

Is there a way to create SolrClient from inside processAdd function for
custom update processor for the same Solr on which it is executing?


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
@Andrea: Yeah, I would try to avoid getting that information from
System.getProperty. I am also looking for some class that will give this
information.

@Erick: Is there any way to get the information about current Solr
endpoint/Zk ensemble info from inside  StatelessScriptUpdateProcessorFactory
so that I can make that http request?

On Thu, Aug 29, 2019 at 5:18 PM Andrea Gazzarini 
wrote:

> I remember ZK coordinates (hosts, ports and root) are set as system
> properties in Solr nodes (please open the admin console and see their
> names). So, it would be just a matter of
>
> System.getProperty(ZK ensemble coordinates|root)
>
> Prior to go in that direction: I don't know/remember if there's some ZK
> Solr specific class where they can be asked. If that class exists, it would
> be a better way, otherwise you can go with the system property approach.
>
> Andrea
>
> On Thu, 29 Aug 2019, 21:32 Arnold Bronley, 
> wrote:
>
> > @Andrea: I agree with you. Do you know if there is a way to initialize
> > SolrCloudClient directly from some information that I get
> > from SolrQueryRequest or from AddUpdateCommand object?
> >
> > @Erick: Thank you for the information about
> > StatelessScriptUpdateProcessorFactory.
> >
> > "In your situation, add this _before_ the update is distributed and
> instead
> > of
> > coreB, ask for collectionB."
> >
> > Right, but how do I ask for for collectionB?
> >
> > "Next, you want to get the value from “coreB”. Don’t do that, get it from
> > _collection_ B."
> >
> > Right, but how do I get value _collection_B?
> >
> >
> >
> > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson 
> > wrote:
> >
> > > Have you looked at using one of the update processors?
> > >
> > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> > > anything
> > > you’d like to do in a script (Groovy, Postscript. Python I think, and
> > > others). See:
> > > ./example/files/conf/update-script.js for one example.
> > >
> > > You put it in your solrconfig file in the update handler, then put the
> > > script in your
> > > conf directory and push it to ZK and the rest is automagical.
> > >
> > > There are a bunch of other update processors that you can use that are
> > also
> > > pretty much by configuration, but the one I referenced is the one that
> is
> > > the
> > > most general-purpose.
> > >
> > > In your situation, add this _before_ the update is distributed and
> > instead
> > > of
> > > coreB, ask for collectionB.
> > >
> > > Distributed updates go like this:
> > > 1. the doc gets routed to a leader for a shard
> > > 2. the doc gets forwarded to each replica.
> > >
> > > Now, depending on where you put the update processor (and you’ll have
> to
> > > dig a bit. Much of this distribution logic is implicit, but you can
> > > explicitly
> > > define it in solrconfig.xml), this either happens  _before_ the docs
> are
> > > sent
> > > to the rest of the replicas or _after_ the docs arrive at each replica.
> > > From what
> > > you’ve described, you want to do this before distribution so all copies
> > > have
> > > the new field. You don’t care what replica is the leader. You don’t
> care
> > > how many
> > > other replicas exist or where they are. You don’t even care if there’s
> > any
> > > replica hosting this particular collection on the node that does this,
> it
> > > happens
> > > before distribution.
> > >
> > > Next, you want to get the value from “coreB”. Don’t do that, get it
> from
> > > _collection_ B. Since you have the doc ID (presumably the ),
> > > using get-by-id instead of a standard query will be very efficient. I
> can
> > > imagine
> > > under very heavy load this might introduce too much overhead, but it’s
> > > where I’d start.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley  >
> > > wrote:
> > > >
> > > > I can't use  CloudSolrClient  because I need to intercept the
> incoming
> > > > indexing request and then add one more field to it. All this happens
> on
> > > > Solr side and not client side.
> > > >
> > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <
> a.gazzar...@sease.io
> > >
> > > > wrote:
> > &

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
@Andrea: I agree with you. Do you know if there is a way to initialize
SolrCloudClient directly from some information that I get
from SolrQueryRequest or from AddUpdateCommand object?

@Erick: Thank you for the information about
StatelessScriptUpdateProcessorFactory.

"In your situation, add this _before_ the update is distributed and instead
of
coreB, ask for collectionB."

Right, but how do I ask for for collectionB?

"Next, you want to get the value from “coreB”. Don’t do that, get it from
_collection_ B."

Right, but how do I get value _collection_B?



On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson 
wrote:

> Have you looked at using one of the update processors?
>
> Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> anything
> you’d like to do in a script (Groovy, Postscript. Python I think, and
> others). See:
> ./example/files/conf/update-script.js for one example.
>
> You put it in your solrconfig file in the update handler, then put the
> script in your
> conf directory and push it to ZK and the rest is automagical.
>
> There are a bunch of other update processors that you can use that are also
> pretty much by configuration, but the one I referenced is the one that is
> the
> most general-purpose.
>
> In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB.
>
> Distributed updates go like this:
> 1. the doc gets routed to a leader for a shard
> 2. the doc gets forwarded to each replica.
>
> Now, depending on where you put the update processor (and you’ll have to
> dig a bit. Much of this distribution logic is implicit, but you can
> explicitly
> define it in solrconfig.xml), this either happens  _before_ the docs are
> sent
> to the rest of the replicas or _after_ the docs arrive at each replica.
> From what
> you’ve described, you want to do this before distribution so all copies
> have
> the new field. You don’t care what replica is the leader. You don’t care
> how many
> other replicas exist or where they are. You don’t even care if there’s any
> replica hosting this particular collection on the node that does this, it
> happens
> before distribution.
>
> Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B. Since you have the doc ID (presumably the ),
> using get-by-id instead of a standard query will be very efficient. I can
> imagine
> under very heavy load this might introduce too much overhead, but it’s
> where I’d start.
>
> Best,
> Erick
>
> > On Aug 29, 2019, at 1:45 PM, Arnold Bronley 
> wrote:
> >
> > I can't use  CloudSolrClient  because I need to intercept the incoming
> > indexing request and then add one more field to it. All this happens on
> > Solr side and not client side.
> >
> > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Arnold,
> >> why don't you use solrj (in this case a CloudSolrClient) instead of
> dealing
> >> with such low-level details? The actual location of the document you are
> >> looking for would be completely abstracted.
> >>
> >> Best,
> >> Andrea
> >>
> >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
> >> wrote:
> >>
> >>> So, here is the problem that I am trying to solve. I am moving from
> Solr
> >>> master-slave architecture to SolrCloud architecture. I have one custom
> >> Solr
> >>> plugin that does following:
> >>>
> >>> 1. When a document (say document with unique id doc1)is getting indexed
> >> to
> >>> a core say core A then this plugin adds one more field to the indexing
> >>> request. It fetches this new field from core B. Core B in our case
> >>> maintains popularity score field for each document which gets
> calculated
> >> in
> >>> a different project. It fetches the popularity score from score B for
> >> doc1
> >>> and adds it to indexing request.
> >>> 2. In following code, dataInfo.dataSource is the name of the core B.
> >>>
> >>> I can use the name of the core B like collection_shard1_replica_n21 and
> >> it
> >>> works. But it is not a good solution. What if I had a multiple shards
> for
> >>> core B? In that case the the doc1 that I am trying to find might not be
> >>> present in collection_shard1_replica_n21.
> >>>
> >>> So is there something like,
> >>>
> >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> >>>

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
I can't use  CloudSolrClient  because I need to intercept the incoming
indexing request and then add one more field to it. All this happens on
Solr side and not client side.

On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
wrote:

> Hi Arnold,
> why don't you use solrj (in this case a CloudSolrClient) instead of dealing
> with such low-level details? The actual location of the document you are
> looking for would be completely abstracted.
>
> Best,
> Andrea
>
> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
> wrote:
>
> > So, here is the problem that I am trying to solve. I am moving from Solr
> > master-slave architecture to SolrCloud architecture. I have one custom
> Solr
> > plugin that does following:
> >
> > 1. When a document (say document with unique id doc1)is getting indexed
> to
> > a core say core A then this plugin adds one more field to the indexing
> > request. It fetches this new field from core B. Core B in our case
> > maintains popularity score field for each document which gets calculated
> in
> > a different project. It fetches the popularity score from score B for
> doc1
> > and adds it to indexing request.
> > 2. In following code, dataInfo.dataSource is the name of the core B.
> >
> > I can use the name of the core B like collection_shard1_replica_n21 and
> it
> > works. But it is not a good solution. What if I had a multiple shards for
> > core B? In that case the the doc1 that I am trying to find might not be
> > present in collection_shard1_replica_n21.
> >
> > So is there something like,
> >
> > SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> >
> > @Override
> > public void processAdd(AddUpdateCommand cmd) throws IOException {
> >SolrInputDocument doc = cmd.getSolrInputDocument();
> >String uniqueId = getUniqueId(doc);
> >
> >SolrCore dataCore =
> > req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> >
> >if (dataCore == null){
> >LOG.error("Solr core '{}' to use as data source could not be
> > found!  "
> >+ "Please check if it is loaded.", dataInfo.dataSource);
> >} else{
> >
> >   Document sourceDoc = getSourceDocument(dataCore, uniqueId);
> >
> >   if (sourceDoc != null){
> >
> >   populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
> >   }
> >}
> >
> >// pass it up the chain
> >super.processAdd(cmd);
> > }
> >
> >
> > On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson 
> > wrote:
> >
> > > No, you cannot just use the collection name. Replicas are just cores.
> > > You can host many replicas of a single collection on a single Solr node
> > > in a single CoreContainer (there’s only one per Solr JVM). If you just
> > > specified a collection name how would the code have any clue which
> > > of the possibilities to return?
> > >
> > > The name is in the form collection_shard1_replica_n21
> > >
> > > How do you know where the doc you’re working on? Put the ID through
> > > the hashing mechanism.
> > >
> > > This isn’t the same at all if you’re running stand-alone, then there’s
> > only
> > > one name.
> > >
> > > But as I indicated above, your ask for just using the collection name
> > isn’t
> > > going to work by definition.
> > >
> > > So perhaps this is an XY problem. You’re asking about getCore, which is
> > > a very specific, low-level concept. What are you trying to do at a
> higher
> > > level? Why do you think you need to get a core? What do you want to
> _do_
> > > with the doc that you need the core it resides in?
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley  >
> > > wrote:
> > > >
> > > > Wait, would I need to use core name like
> collection1_shard1_replica_n4
> > > > etc/? Can't I use collection name? What if  I have multiple shards,
> how
> > > > would I know where does the document that I am working with lives in
> > > > currently.
> > > > I would rather prefer to use collection name and expect the core
> > > > information to be abstracted out that way.
> > > >
> > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hmmm, should work. What is your core_name? There’s strings like
> > > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
> > using
> > > the
> > > >> right one?
> > > >>
> > > >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
> arnoldbron...@gmail.com
> > >
> > > >> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> In a custom Solr plugin code,
> > > >>> req.getCore().getCoreContainer().getCore(core_name) is returning
> null
> > > >> even
> > > >>> if core by name core_name is loaded and up in Solr. req is object
> > > >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> > > >>>
> > > >>> Any ideas on why this might be the case?
> > > >>
> > > >>
> > >
> > >
> >
>


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
So, here is the problem that I am trying to solve. I am moving from Solr
master-slave architecture to SolrCloud architecture. I have one custom Solr
plugin that does following:

1. When a document (say document with unique id doc1)is getting indexed to
a core say core A then this plugin adds one more field to the indexing
request. It fetches this new field from core B. Core B in our case
maintains popularity score field for each document which gets calculated in
a different project. It fetches the popularity score from score B for doc1
and adds it to indexing request.
2. In following code, dataInfo.dataSource is the name of the core B.

I can use the name of the core B like collection_shard1_replica_n21 and it
works. But it is not a good solution. What if I had a multiple shards for
core B? In that case the the doc1 that I am trying to find might not be
present in collection_shard1_replica_n21.

So is there something like,

SolrCollecton dataCollection = getCollection(dataInfo.dataSource);

@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
   SolrInputDocument doc = cmd.getSolrInputDocument();
   String uniqueId = getUniqueId(doc);

   SolrCore dataCore =
req.getCore().getCoreContainer().getCore(dataInfo.dataSource);

   if (dataCore == null){
   LOG.error("Solr core '{}' to use as data source could not be found!  "
   + "Please check if it is loaded.", dataInfo.dataSource);
   } else{

  Document sourceDoc = getSourceDocument(dataCore, uniqueId);

  if (sourceDoc != null){

  populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
  }
   }

   // pass it up the chain
   super.processAdd(cmd);
}


On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson 
wrote:

> No, you cannot just use the collection name. Replicas are just cores.
> You can host many replicas of a single collection on a single Solr node
> in a single CoreContainer (there’s only one per Solr JVM). If you just
> specified a collection name how would the code have any clue which
> of the possibilities to return?
>
> The name is in the form collection_shard1_replica_n21
>
> How do you know where the doc you’re working on? Put the ID through
> the hashing mechanism.
>
> This isn’t the same at all if you’re running stand-alone, then there’s only
> one name.
>
> But as I indicated above, your ask for just using the collection name isn’t
> going to work by definition.
>
> So perhaps this is an XY problem. You’re asking about getCore, which is
> a very specific, low-level concept. What are you trying to do at a higher
> level? Why do you think you need to get a core? What do you want to _do_
> with the doc that you need the core it resides in?
>
> Best,
> Erick
>
> > On Aug 28, 2019, at 5:28 PM, Arnold Bronley 
> wrote:
> >
> > Wait, would I need to use core name like  collection1_shard1_replica_n4
> > etc/? Can't I use collection name? What if  I have multiple shards, how
> > would I know where does the document that I am working with lives in
> > currently.
> > I would rather prefer to use collection name and expect the core
> > information to be abstracted out that way.
> >
> > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
> > wrote:
> >
> >> Hmmm, should work. What is your core_name? There’s strings like
> >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using
> the
> >> right one?
> >>
> >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> In a custom Solr plugin code,
> >>> req.getCore().getCoreContainer().getCore(core_name) is returning null
> >> even
> >>> if core by name core_name is loaded and up in Solr. req is object
> >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> >>>
> >>> Any ideas on why this might be the case?
> >>
> >>
>
>


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Arnold Bronley
Wait, would I need to use core name like  collection1_shard1_replica_n4
etc/? Can't I use collection name? What if  I have multiple shards, how
would I know where does the document that I am working with lives in
currently.
I would rather prefer to use collection name and expect the core
information to be abstracted out that way.

On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
wrote:

> Hmmm, should work. What is your core_name? There’s strings like
> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the
> right one?
>
> > On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > In a custom Solr plugin code,
> > req.getCore().getCoreContainer().getCore(core_name) is returning null
> even
> > if core by name core_name is loaded and up in Solr. req is object
> > of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> >
> > Any ideas on why this might be the case?
>
>


Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Arnold Bronley
@Shawn: You are right. In my case, the collection name is same as
configuration name and that is why it works. Do you know if there is some
other property that I can use that refers to the collection name instead?

On Wed, Aug 28, 2019 at 3:52 PM Shawn Heisey  wrote:

> On 8/28/2019 1:42 PM, Arnold Bronley wrote:
> > I have configured the SolrCloud collection-wise only and there is no
> other
> > way. The way you have defined 3 zkHosts (comma separated values for
> zkHost
> > property), I tried that one before as it was more intuitive. But it did
> not
> > work for me. I had to use 3 different replica elements each for one of
> the
> > 3 SolrCloud clusters. source and target properties mention the same
> > collection name in my case. Instead of hardcoding it, I am using the
> > collection.configName variable which gets replaced by the collection name
> > to which this solrconfig.xml belongs to.
>
> I am pretty sure that ${collection.configName} refers to the
> configuration name stored in zookeeper, NOT the collection name.  There
> is nothing at all in Solr that requires those names to be the same, and
> for many SolrCloud installs, they are not the same.  If this is working
> for you, then you're probably naming your configs the same as the
> collection.  If you were to ever use the same config on multiple
> collections, that would probably stop working.
>
> I do not know if there is a property with the collection name.  There
> probably is.
>
> Thanks,
> Shawn
>


Re: 8.2.0 getting warning - unable to load jetty, not starting JettyAdminServer

2019-08-28 Thread Arnold Bronley
@Furkan: You might be right. I am getting this permission error when I
start the Solr but it hasn't caused any visible issues yet.
 /opt/solr/bin/solr: line 2130: /var/solr/solr-8983.pid: Permission denied

On Wed, Aug 21, 2019 at 6:33 AM Martijn Koster 
wrote:

> Hi Arnold,
>
> It’s hard to say without seeing exactly what you’re doing and exactly what
> you’re seeing.
> Simplify it first, ie remove your custom plugins and related config and
> see if the problem reproduces still, then try without cloud mode and see it
> it reproduces still. Then create an issue on
> https://github.com/docker-solr/docker-solr/issues <
> https://github.com/docker-solr/docker-solr/issues>, labelled as a
> question, with the exact command you run and its full output, and attach
> your zipped-up project directory (Dockerfile, config files and plugins, and
> full docker log output).
>
> — Martijn
>
> > On 20 Aug 2019, at 19:26, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > I am using 8.2.0-slim version. I wrap it in my own image by specifying
> some
> > additional settings in Dockerfile (all it does is specify a custom Solr
> > home, copy my config files and custom Solr plugins to container and boot
> in
> > SolrCloud mode).
> > All things same, if I just change version from 8.2.0-slim to 8.1.1-slim
> > then I do not get any such warning.
> >
> > On Tue, Aug 20, 2019 at 5:01 AM Furkan KAMACI 
> > wrote:
> >
> >> Hi Arnold,
> >>
> >> Such errors may arise due to file permission issues. I can run latest
> >> version without of Solr via docker image without any errors. Could you
> >> write which steps do you follow to run Solr docker?
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Tue, Aug 20, 2019 at 1:25 AM Arnold Bronley  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am getting following warning in Solr admin UI logs. I did not get
> this
> >>> warning in Solr 8.1.1
> >>> Please note that I am using Solr docker slim image from here -
> >>> https://hub.docker.com/_/solr/
> >>>
> >>> Unable to load jetty, not starting JettyAdminServer
> >>>
> >>
>
>


req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Arnold Bronley
Hi,

In a custom Solr plugin code,
req.getCore().getCoreContainer().getCore(core_name) is returning null even
if core by name core_name is loaded and up in Solr. req is object
of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.

Any ideas on why this might be the case?


Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Arnold Bronley
Hi Erick,

I have configured the SolrCloud collection-wise only and there is no other
way. The way you have defined 3 zkHosts (comma separated values for zkHost
property), I tried that one before as it was more intuitive. But it did not
work for me. I had to use 3 different replica elements each for one of the
3 SolrCloud clusters. source and target properties mention the same
collection name in my case. Instead of hardcoding it, I am using the
collection.configName variable which gets replaced by the collection name
to which this solrconfig.xml belongs to.

If follow your configuration (which does not work in my case and I have
tested it), my question was how to NOT send CDCR updates to targetZkHost2
and targetZkHost3 but not targetZkHost1?

On Tue, Aug 13, 2019 at 3:23 PM Erick Erickson 
wrote:

> You configure CDCR by _collection_, so this question really makes no
> sense.
> You’d never mention collection.configName. So what I suspect is that you’re
> misreading the docs.
>
> 
> ${targetZkHost1},${targetZkHost2},${targetZkHost3}
> sourceCollection_on_local_cluster
> targetCollection_on_targetZkHost1 2 and 3
> 
>
> “Turning off CDCR” selective for ZooKeeper instances really makes no sense
> as the
> point of ZK ensembles is to keep running even if one goes away.
>
> So can you rephrase the question? Or state the problem you’re trying to
> solve another way?
>
> Best,
> Erick
>
> > On Aug 13, 2019, at 1:57 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > Is there a way to turn off the CDCR for only selected target clusters.
> >
> > Say, I have a configuration like following. I have 3 target clusters
> > targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn
> off
> > the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
> > targetZkHost1?
> >
> > E.g.
> >
> >  
> > 
> > ${targetZkHost1}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > ${targetZkHost2}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > ${targetZkHost3}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > 8
> > 1000
> > 128
> > 
> >
> > 
> > 1000
> > 
> >
> > 
> > disabled
> > 
> >  
>
>


Re: 8.2.0 getting warning - unable to load jetty, not starting JettyAdminServer

2019-08-20 Thread Arnold Bronley
Hi,

I am using 8.2.0-slim version. I wrap it in my own image by specifying some
additional settings in Dockerfile (all it does is specify a custom Solr
home, copy my config files and custom Solr plugins to container and boot in
SolrCloud mode).
All things same, if I just change version from 8.2.0-slim to 8.1.1-slim
then I do not get any such warning.

On Tue, Aug 20, 2019 at 5:01 AM Furkan KAMACI 
wrote:

> Hi Arnold,
>
> Such errors may arise due to file permission issues. I can run latest
> version without of Solr via docker image without any errors. Could you
> write which steps do you follow to run Solr docker?
>
> Kind Regards,
> Furkan KAMACI
>
> On Tue, Aug 20, 2019 at 1:25 AM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > I am getting following warning in Solr admin UI logs. I did not get this
> > warning in Solr 8.1.1
> > Please note that I am using Solr docker slim image from here -
> > https://hub.docker.com/_/solr/
> >
> > Unable to load jetty, not starting JettyAdminServer
> >
>


8.2.0 getting warning - unable to load jetty, not starting JettyAdminServer

2019-08-19 Thread Arnold Bronley
Hi,

I am getting following warning in Solr admin UI logs. I did not get this
warning in Solr 8.1.1
Please note that I am using Solr docker slim image from here -
https://hub.docker.com/_/solr/

Unable to load jetty, not starting JettyAdminServer


Turn off CDCR for only selected target clusters

2019-08-13 Thread Arnold Bronley
Hi,

Is there a way to turn off the CDCR for only selected target clusters.

Say, I have a configuration like following. I have 3 target clusters
targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn off
the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
targetZkHost1?

E.g.

  
 
${targetZkHost1}
${collection.configName}
${collection.configName}
 

 
${targetZkHost2}
${collection.configName}
${collection.configName}
 

 
${targetZkHost3}
${collection.configName}
${collection.configName}
 

 
8
1000
128
 

 
1000
 

 
disabled
 
  


Using custom scoring formula

2019-08-07 Thread Arnold Bronley
Hi,

I have a topic verctor calculated for each of the Solr document in a
collection. Topic vector is calculated using LDA (
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation).  Now I want to
return the similar document to a given document from this collection. I can
simply use normalized dot product between the given vector and all other
vectors to see which one has product of ~1. That will tell me that those
are very similar documents. Is there a way to achieve this using Solr?


Solr 7.7.2 vs Solr 8.2.0

2019-07-30 Thread Arnold Bronley
Hi,

We are trying to decide whether we should upgrade to Solr 7.7.2 version or
Solr 8.2.0 version. We are currently on Solr 6.3.0 version.

On one hand 8.2.0 version feels like a good choice because it is the latest
version. But then experience tells that initial versions usually have lot
of bugs compared to the later LTS versions.

Also, there is one more issue. There is this major JIRA bug
https://issues.apache.org/jira/browse/SOLR-13336 which mostly won't get
fixed in any 7.x version, but is fixed in Solr 8.1. I checked and our Solr
configuration is vulnerable to it. Do you have any recommendation as to
which Solr version one should move to given these facts?


getFields function in org.apache.lucene.document.Document class does not handle dynamic fields

2019-07-17 Thread Arnold Bronley
Following is the definition of the getFields function
in  org.apache.lucene.document.Document class. As you can see, it can't
handle the dynamic fields because dynamic fields have pattern like
field_name_*, so the equals condition won't match in following function.
Shouldn't we use matches function instead of equals so that a dynamic field
string can be passed to the function?

/**
 * Returns an array of {@link IndexableField}s with the given name.
 * This method returns an empty array when there are no
 * matching fields.  It never returns null.
 *
 * @param name the name of the field
 * @return a Field[] array
 */
public IndexableField[] getFields(String name) {
  List result = new ArrayList<>();
  for (IndexableField field : fields) {
if (field.name().equals(name)) {
  result.add(field);
}
  }


Re: CDCR one source multiple targets

2019-04-10 Thread Arnold Bronley
This had a very simple solution if anybody else is wondering about the same
issue.I had to define separate replica elements inside cdcr. Following is
an example.

  target1:2181  techproducts techproducts   target2:2181  techproducts techproducts   8 1000 128   1000disabled  

On Thu, Mar 21, 2019 at 10:40 AM Arnold Bronley 
wrote:

> I see a similar question asked but no answers there too.
> http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
> OP there is using multiple cdcr request handlers but in my case I am using
> multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
> for one source- multiple target cluster situation.
> Can somebody please confirm whether this is even supported?
>
>
> On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
> wrote:
>
>> Hi,
>>
>> is it possible to use CDCR with one source SolrCloud cluster and multiple
>> target SolrCloud clusters? I tried to edit the zkHost setting in source
>> cluster's solrconfig file by adding multiple comma separated values for
>> target zkhosts for multuple target clusters. But the CDCR replication
>> happens only to one of the zkhosts and not all. If this is not supported
>> then how should I go about implementing something like this?
>>
>>
>


Re: CDCR one source multiple targets

2019-03-21 Thread Arnold Bronley
I see a similar question asked but no answers there too.
http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
OP there is using multiple cdcr request handlers but in my case I am using
multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
for one source- multiple target cluster situation.
Can somebody please confirm whether this is even supported?


On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
wrote:

> Hi,
>
> is it possible to use CDCR with one source SolrCloud cluster and multiple
> target SolrCloud clusters? I tried to edit the zkHost setting in source
> cluster's solrconfig file by adding multiple comma separated values for
> target zkhosts for multuple target clusters. But the CDCR replication
> happens only to one of the zkhosts and not all. If this is not supported
> then how should I go about implementing something like this?
>
>


CDCR one source multiple targets

2019-03-20 Thread Arnold Bronley
Hi,

is it possible to use CDCR with one source SolrCloud cluster and multiple
target SolrCloud clusters? I tried to edit the zkHost setting in source
cluster's solrconfig file by adding multiple comma separated values for
target zkhosts for multuple target clusters. But the CDCR replication
happens only to one of the zkhosts and not all. If this is not supported
then how should I go about implementing something like this?


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Thanks, Nish. It turned out to be other issue. I had not restarted one of
the node in the cluster which had become leader meanwhile.
It is good to know though that there is malformed XML in the example. I
will try to submit a document fix soon.

On Thu, Mar 14, 2019 at 5:37 PM Nish Karve  wrote:

> Arnold,
>
> Have you copied the configuration from the Solr docs? The bi directional
> cluster configuration (for cluster 1) has a malformed XML. It is missing
> the closing tag for the updateLogSynchronizer under the request handler
> configuration.
>
> Please disregard if you have already considered that in your configuration.
> I had a lot of issues trying to figure out the issue when I realized that
> it was a documentation error.
>
> Thanks
> Nishant
>
>
> On Thu, Mar 14, 2019, 2:54 PM Arnold Bronley  wrote:
>
> > Configuration is almost identical for both clusters in terms of cdcr
> except
> > for zkHost parameter configuration.
> >
> > On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
> > wrote:
> >
> > > Exactly. I have it defined in both clusters. I am following the
> > > instructions from here .
> > >
> >
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
> > >
> > > On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> > > wrote:
> > >
> > >> Hi Arnold,
> > >>
> > >> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> > >> clusters' collections. Both clusters need to act as source and target.
> > >>
> > >> Amrit Sarkar
> > >> Search Engineer
> > >> Lucidworks, Inc.
> > >> 415-589-9269
> > >> www.lucidworks.com
> > >> Twitter http://twitter.com/lucidworks
> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > >> Medium: https://medium.com/@sarkaramrit2
> > >>
> > >>
> > >> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley <
> arnoldbron...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues.
> > But
> > >> > after setting up bidirectional cdcr configuration, I am not able to
> > >> index a
> > >> > document.
> > >> >
> > >> > Following is the error that I am getting:
> > >> >
> > >> > Async exception during distributed update: Error from server at
> > >> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> > >> > request:
> > >> > http://host1
> > >> >
> > >> >
> > >>
> >
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> > >> >
> > >>
> >
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> > >> > Remote error message: unknown UpdateRequestProcessorChain:
> > >> > cdcr-processor-chain
> > >> >
> > >> > Do you know why I might be getting this error?
> > >> >
> > >>
> > >
> >
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Configuration is almost identical for both clusters in terms of cdcr except
for zkHost parameter configuration.

On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
wrote:

> Exactly. I have it defined in both clusters. I am following the
> instructions from here .
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
>
> On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> wrote:
>
>> Hi Arnold,
>>
>> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
>> clusters' collections. Both clusters need to act as source and target.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>>
>> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
>> wrote:
>>
>> > Hi,
>> >
>> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
>> > after setting up bidirectional cdcr configuration, I am not able to
>> index a
>> > document.
>> >
>> > Following is the error that I am getting:
>> >
>> > Async exception during distributed update: Error from server at
>> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
>> > request:
>> > http://host1
>> >
>> >
>> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
>> >
>> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
>> > Remote error message: unknown UpdateRequestProcessorChain:
>> > cdcr-processor-chain
>> >
>> > Do you know why I might be getting this error?
>> >
>>
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Exactly. I have it defined in both clusters. I am following the
instructions from here .
https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates

On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar  wrote:

> Hi Arnold,
>
> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> clusters' collections. Both clusters need to act as source and target.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
>
> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
> > after setting up bidirectional cdcr configuration, I am not able to
> index a
> > document.
> >
> > Following is the error that I am getting:
> >
> > Async exception during distributed update: Error from server at
> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> > request:
> > http://host1
> >
> >
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> >
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> > Remote error message: unknown UpdateRequestProcessorChain:
> > cdcr-processor-chain
> >
> > Do you know why I might be getting this error?
> >
>


Re: ExactStatsCache not working for distributed IDF

2019-03-14 Thread Arnold Bronley
Hi,

I tried that as well. No change in scores.

On Thu, Mar 14, 2019 at 3:37 PM Michael Gibney 
wrote:

> Are you basing your conclusion (that it's not working as expected) on the
> scores as reported in the debug output? If you haven't already, try adding
> "score" to the "fl" param -- if different (for a given doc) than the score
> as reported in debug, then it's probably working as intended ... just a
> little confusing in the debug output.
>
> On Thu, Mar 14, 2019 at 3:23 PM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > I am using ExactStatsCache in SolrCloud (7.7.1) by adding following to
> > solrconfig.xml file for all collections. I restarted and indexed the
> > documents of all collections after this change just to be sure.
> >
> > 
> >
> > However, when I do multi-collection query, the scores do not change
> before
> > and after adding ExactStatsCache. I can still see the docCount in debug
> > output coming from individual shards and not even from whole collection.
> I
> > was expecting that the docCount would be of addition of all docCounts of
> > all collections included in search query.
> >
> > Do you know what I might be doing wrong?
> >
>


Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Hi,

I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
after setting up bidirectional cdcr configuration, I am not able to index a
document.

Following is the error that I am getting:

Async exception during distributed update: Error from server at
http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request request:
http://host1
:8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
Remote error message: unknown UpdateRequestProcessorChain:
cdcr-processor-chain

Do you know why I might be getting this error?


ExactStatsCache not working for distributed IDF

2019-03-14 Thread Arnold Bronley
Hi,

I am using ExactStatsCache in SolrCloud (7.7.1) by adding following to
solrconfig.xml file for all collections. I restarted and indexed the
documents of all collections after this change just to be sure.



However, when I do multi-collection query, the scores do not change before
and after adding ExactStatsCache. I can still see the docCount in debug
output coming from individual shards and not even from whole collection. I
was expecting that the docCount would be of addition of all docCounts of
all collections included in search query.

Do you know what I might be doing wrong?


Re: SolrCloud exclusive features

2019-02-26 Thread Arnold Bronley
Here is what I have found on my own little research. Please correct me if I
am wrong. Also, please feel free to add more features.


   - Collections API
   - ConfigSets API
   - Zookeeper CLI
   - Streaming expressions
   - Parallel SQL interface
   - Authorization plugins
   - Blob store API


On Sat, Feb 16, 2019 at 7:07 PM Arnold Bronley 
wrote:

> I am glad to learn that there are others in similar need. A list for
> SolrCloud exclusive features will be really awesome.
> Can any Solr devs please reply to this thread?
>
>
> On Fri, Feb 15, 2019 at 8:39 AM David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
>> >streaming expressions are only available in
>> SolrCloud mode and not in Solr master-slave mode?
>>
>> yes, and its annoying as there are features of solr cloud I do not like.
>> as far as a comprehensive list, that I do not know but would be interested
>> in one as well
>>
>> On Thu, Feb 14, 2019 at 5:07 PM Arnold Bronley 
>> wrote:
>>
>> > Hi,
>> >
>> > Are there any features that are only exclusive to SolrCloud?
>> >
>> > e.g. when I am reading Streaming Expressions documentation, first
>> sentence
>> > there says 'Streaming Expressions provide a simple yet powerful stream
>> > processing language for Solr Cloud.'
>> >
>> > So, does this mean that streaming expressions are only available in
>> > SolrCloud mode and not in Solr master-slave mode?
>> >
>> > If yes, is there a list of such features that only exclusively
>> available in
>> > SolrCloud?
>> >
>>
>


Re: SolrCloud exclusive features

2019-02-16 Thread Arnold Bronley
I am glad to learn that there are others in similar need. A list for
SolrCloud exclusive features will be really awesome.
Can any Solr devs please reply to this thread?


On Fri, Feb 15, 2019 at 8:39 AM David Hastings 
wrote:

> >streaming expressions are only available in
> SolrCloud mode and not in Solr master-slave mode?
>
> yes, and its annoying as there are features of solr cloud I do not like.
> as far as a comprehensive list, that I do not know but would be interested
> in one as well
>
> On Thu, Feb 14, 2019 at 5:07 PM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > Are there any features that are only exclusive to SolrCloud?
> >
> > e.g. when I am reading Streaming Expressions documentation, first
> sentence
> > there says 'Streaming Expressions provide a simple yet powerful stream
> > processing language for Solr Cloud.'
> >
> > So, does this mean that streaming expressions are only available in
> > SolrCloud mode and not in Solr master-slave mode?
> >
> > If yes, is there a list of such features that only exclusively available
> in
> > SolrCloud?
> >
>


SolrCloud exclusive features

2019-02-14 Thread Arnold Bronley
Hi,

Are there any features that are only exclusive to SolrCloud?

e.g. when I am reading Streaming Expressions documentation, first sentence
there says 'Streaming Expressions provide a simple yet powerful stream
processing language for Solr Cloud.'

So, does this mean that streaming expressions are only available in
SolrCloud mode and not in Solr master-slave mode?

If yes, is there a list of such features that only exclusively available in
SolrCloud?


Is there fl.encoder just like hl.encoder

2018-12-10 Thread Arnold Bronley
hl.encoder escapes html characters in highlight text response except the
highlight html characters that Solr uses. Is there something similar
available for field text that we get back as response from Solr?


Re: Not able reproduce race condition issue to justify implementation of optimistic concurrency

2018-11-16 Thread Arnold Bronley
Thanks for replying, Chris.

1) depending on the number of CPUs / load on your solr server, it's
possible you're just getting lucky. it's hard to "prove" with a
multithreaded test that concurrency bugs exist.

- Agreed. However, between 200k total calls, race condition not happening
even once - I feel 'too' lucky.

2)  a lot depends on what your updates look like (ie: the impl of
SolrDocWriter.atomicWrite()), and what the field definitions look like.

If you are in fact doing "atomic updates" (ie: sending a "set" command on
the field) instead of sending the whole document *AND* if the fields f1 &
f2 are fields that only use docValues (ie: not stored or indexed) then
under the covers you're getting an "in-place" update in which (IIRC) it's
totally safe for the 2 updates to happen concurrently to *DIFFERENT*
fields of the same document.

- atomicWrite() function is just a simple wrapper function that adds set
and other appropriate atomic operators before indexing payload.
- I am not using docValues for these fields. Here are their definitions:
   

  So I don't think I am getting benefited with in-place updates.

- I will give a try for scenario of two different threads updating one
single field of same document instead of two different threads writing two
different fields on same document.
- I was actually worried about performance issue because I do batch
indexing and I will need to send the whole batch again if any single
document fails with 409 response in that batch. Or otherwise I will need to
somehow retry for the document that failed with 409 response. Although,
identifying which document failed is only possible by parsing the 409
response message string which doesn't seem like a good of way of doing it.




On Fri, Nov 16, 2018 at 1:10 PM Chris Hostetter 
wrote:

>
> 1) depending on the number of CPUs / load on your solr server, it's
> possible you're just getting lucky. it's hard to "prove" with a
> multithreaded test that concurrency bugs exist.
>
> 2) a lot depends on what your updates look like (ie: the impl of
> SolrDocWriter.atomicWrite()), and what the field definitions look like.
>
> If you are in fact doing "atomic updates" (ie: sending a "set" command on
> the field) instead of sending the whole document *AND* if the fields f1 &
> f2 are fields that only use docValues (ie: not stored or indexed) then
> under the covers you're getting an "in-place" update in which (IIRC) it's
> totally safe for the 2 updates to happen concurrently to *DIFFERENT*
> fields of the same document.
>
> Where you are almost certainly going to get into trouble, even if you are
> leveraging "in-place" updates under the hood, is if 2 diff threads try to
> update the *SAME* field -- even if the individual threads don't try to
> assert that the final count matches their expected count, you will likely
> wind up missing some updates (ie: the final value may not be equal the sum
> of the total incremements from both threads)
>
> Other problems will exist in cases where in-place updates can't be used
> (ie: if you also updated a String field when incrememebting your numeric
> counter)
>
> The key thing to remember is that there is almost no overhead in using
> optimistic concurrency -- *UNLESS* you encounter a collision/failure.  If
> you are planning on having concurrent indexing clients reading docs from
> solr, modifying them, and writing back to solr -- and there is a change
> multiple client threads will touch the same document, then the slight
> addition of optimistic concurrency params to the updates & retrying on
> failure is a trivial addition to the client code, and shouldn't have a
> noticable impact on performance.
>
>
>
> : Before implementing optimistic concurrency solution, I had written one
> test
> : case to check if two threads atomically writing two different fields (say
> : f1 and f2) of the same document (say d) run into conflict or not.
> : Thread t1 atomically writes counter c1 to field f1 of document d, commits
> : and then reads the value of f1 and makes sure that it is equal to c1. It
> : then increments c1 by 1 and resumes until c1 reaches to say 1000.
> : Thread t2 does the same, but with counter c2 and field f2 but with same
> : document d.
> : What I observed is the assertion of f1 = c1 or f2 = c2 in each loop never
> : fails.
> : I increased the max counter value to even 10 instead of mere 1000 and
> : still no conflict
> : I was under the impression that there would often be conflict and that is
> : why I will require optimistic concurrency solution. How is this possible?
> : Any idea?
> :
> : Here is the test case code:
> :
> : https://pastebin.com/KCLPYqeg
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Not able reproduce race condition issue to justify implementation of optimistic concurrency

2018-11-16 Thread Arnold Bronley
Hi,

Before implementing optimistic concurrency solution, I had written one test
case to check if two threads atomically writing two different fields (say
f1 and f2) of the same document (say d) run into conflict or not.
Thread t1 atomically writes counter c1 to field f1 of document d, commits
and then reads the value of f1 and makes sure that it is equal to c1. It
then increments c1 by 1 and resumes until c1 reaches to say 1000.
Thread t2 does the same, but with counter c2 and field f2 but with same
document d.
What I observed is the assertion of f1 = c1 or f2 = c2 in each loop never
fails.
I increased the max counter value to even 10 instead of mere 1000 and
still no conflict
I was under the impression that there would often be conflict and that is
why I will require optimistic concurrency solution. How is this possible?
Any idea?

Here is the test case code:

https://pastebin.com/KCLPYqeg


Re: Judging the MoreLikeThis results for relevancy

2018-02-13 Thread Arnold Bronley
Thanks for the reply,  Alessandro.

Can you please elaborate on a point  "a document which has a score 50% of
the original doc score, it doesn't
mean it is 50% similar"? I did not understand this for two reasons:

1. In the end, we are calculating similarity score between documents when
we are solving the Problem of Search where search query is also treated as
a small document. Similarity has inherent meaning of how similar one thing
is to the another.

2. If we think about the vector representations of documents in
multidimensional space, we are basically calculating the "distance" between
these documents. We interpret that distance as "similarity". Farther away
the document vectors in that space, less similar those documents are with
each other. How we calculate the distance is one thing (e.g. cosine
distance, Euclidean distance,etc) but once we agree upon
distance/similarity calculation method, if document vector A is at a
distance of 5 and 10 units from document vectors B and C respectively then
can't we say that B is twice as relevant to A as C is to A? Or in terms of
distance, C is twice as distant to  A and B is to A?


I found this response from jlman in following thread very similar to my
solution.

http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=print_post=561671


He also warns about the scores between two documents not being
bidirectional.

If all else remains constant (relevancy algorithm, number of documents in
index etc), why the relevancy between two documents calculated with the
approach that I mentioned is not bidirectional? That is why is it possible
that document A is more similar to B than B is similar to A.
When I think in terms of multidimensional vector space, this does not make
sense at all. Because, distance between A and B in multidimensional space
is not going to change provided all else remains constant ( relevancy
algorithm, number of document in index etc). If A is at a distance of 5
units from B then B is also at distance of 5 units from A. Isn't it?

Thanks,
Arnold

On Thu, Feb 8, 2018 at 7:02 AM, Alessandro Benedetti 
wrote:

> Hi,
> I have been personally working a lot with the MoreLikeThis and I am close
> to
> contribute a refactor of that module ( to break up the monolithic giant
> facade class mostly) .
>
> First of all the MoreLikeThis handler will return the original document (
> not scored) + the similar documents(scored).
> The original document is not considered by the MoreLikeThis query, so it is
> not returned as part of the results of the MLT lucene query, it is just
> added to the response in the beginning.
>
> if I remember well, but I am unable to check at the moment, you should be
> able to get the original document in the response set ( with max score)
> using the More Like This query parser.
> Please double check that
>
> Generally speaking at the moment TF-IDF is used under the hood, which means
> that sometime the score is not probabilistic.
> So a document which has a score 50% of the original doc score, it doesn't
> mean it is 50% similar, but for your use case it may be a feasible
> approximation.
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Judging the MoreLikeThis results for relevancy

2018-02-07 Thread Arnold Bronley
Hi,

I am using MoreLikeThis handler to get related documents for a given
document. To determine if I am getting good results or not, here is what I
do:

The same original document should be returned as a top match.

If it is not, then there is some problem with the relevancy.

Then, as same input document will be 100% match with itself, we can use its
absolute score to compare how other documents (ranked 2nd, ranked 3rd and
so on) are doing in terms of relevancy by comparing their scores to the
score of the top result which is the same input document

Is this a good idea?

Do you see any flaw in this logic?


Re: Spellcheck collations results

2018-02-07 Thread Arnold Bronley
Thanks for replying Alessandro.

I am passing these parameters:

q=polt=polt=json=true=true=7=true=true=true=3=3=true=0.72





On Thu, Jan 25, 2018 at 4:28 AM, alessandro.benedetti 
wrote:

> Can you tell us the request parameters used for the spellcheck ?
>
> In particular are you using these ? (from the wiki) :
>
> " The *spellcheck.maxCollationTries* Parameter
> This parameter specifies the number of collation possibilities for Solr to
> try before giving up. Lower values ensure better performance. Higher values
> may be necessary to find a collation that can return results. The default
> value is 0, which maintains backwards-compatible (Solr 1.4) behavior (do
> not
> check collations). This parameter is ignored if spellcheck.collate is
> false.
>
> The *spellcheck.maxCollationEvaluations* Parameter
> This parameter specifies the maximum number of word correction combinations
> to rank and evaluate prior to deciding which collation candidates to test
> against the index. This is a performance safety-net in case a user enters a
> query with many misspelled words. The default is 10,000 combinations, which
> should work well in most situations. "
>
> Regards
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Spellcheck collations results

2018-01-24 Thread Arnold Bronley
 Hi,

in a spellchecker call, if I don't get back collations object in the
response, is it correct to assume that even if I create a query myself by
joining the individually spell-corrected words in suggestions object in
response, it will have 0 results?

E.g. In the following spellchecker response

for query : here lise the grate mighty king

"spellcheck": {
"suggestions": [
  "lise",
  {
"numFound": 7,
"startOffset": 0,
"endOffset": 11,
"origFreq": 0,
"suggestion": [
  {
"word": "lies",
"freq": 550
  }]
  },
  "grate",
  {
"numFound": 2,
"startOffset": 15,
"endOffset": 19,
"origFreq": 778,
"suggestion": [
  {
"word": "great",
"freq": 1580
  }
]
  } ]
  }
],
"correctlySpelled": false,
"collations": []
  }

There are no collation results returned. So if one decides to reconstruct
the correct query by taking help of individually corrected words in
suggestions object, is it reasonable to assume that such a corrected query
(In this case it will be 'here lies the great king') will still return 0
results?because there were no collations returned?


Proximity search in spellcheck collation

2018-01-18 Thread Arnold Bronley
Hi,

Does Solr spellcheck collator consider proximity between words in
multi-word search phrase?

i.e. instead of returning spell suggestions by considering each individual
word separately, does it consider them in group if the words occur together
often?

E.g. bll gats should return bill gates instead of ball gate

Thanks


Re: spell-check does not return collations when using search query with filter

2017-10-19 Thread Arnold Bronley
Let me know if I should open a JIRA issue for this. Thanks.

On Tue, Oct 17, 2017 at 10:40 AM, Arnold Bronley <arnoldbron...@gmail.com>
wrote:

> I tried spellcheck.q=polt and q=tag:polt. I get collations, but they are
> only for polt and not tag:polt. Because of that, the hits that I get back
> are for frequency of plot and not frequency of tag:plot
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 20,
> "params": {
>   "spellcheck.collateExtendedResults": "true",
>   "indent": "true",
>   "spellcheck.maxCollations": "3",
>   "spellcheck.maxCollationTries": "3",
>   "spellcheck.extendedResults": "true",
>   "q": "tag:polt",
>   "spellcheck.q": "polt",
>   "spellcheck": "true",
>   "spellcheck.accuracy": "0.72",
>   "spellcheck.onlyMorePopular": "true",
>   "spellcheck.count": "7",
>   "wt": "json",
>   "spellcheck.collate": "true"
> }
>   },
>   "response": {
> "numFound": 0,
> "start": 0,
> "docs": [
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "polt",
>   {
> "numFound": 7,
> "startOffset": 0,
> "endOffset": 4,
> "origFreq": 0,
> "suggestion": [
>   {
> "word": "plot",
> "freq": 5934
>   },
>   {
> "word": "port",
> "freq": 495
>   },
>   {
> "word": "post",
> "freq": 233
>   },
>   {
> "word": "poly",
> "freq": 216
>   },
>   {
> "word": "pole",
> "freq": 175
>   },
>   {
> "word": "poll",
> "freq": 12
>   },
>   {
> "word": "polm",
> "freq": 9
>   }
> ]
>   }
> ],
> "correctlySpelled": false,
> "collations": [
>   "collation",
>   {
> "collationQuery": "plot",
> "hits": 10538,
> "misspellingsAndCorrections": [
>   "polt",
>   "plot"
> ]
>   },
>   "collation",
>   {
> "collationQuery": "port",
> "hits": 754,
> "misspellingsAndCorrections": [
>   "polt",
>   "port"
> ]
>   },
>   "collation",
>   {
> "collationQuery": "post",
> "hits": 626,
> "misspellingsAndCorrections": [
>   "polt",
>   "post"
> ]
>   }
> ]
>   }
> }
>
> On Tue, Oct 17, 2017 at 5:01 AM, alessandro.benedetti <
> a.benede...@sease.io> wrote:
>
>> But you used :
>>
>> "spellcheck.q": "tag:polt",
>>
>> Instead of :
>> "spellcheck.q": "polt",
>>
>> Regards
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


Re: spell-check does not return collations when using search query with filter

2017-10-17 Thread Arnold Bronley
I tried spellcheck.q=polt and q=tag:polt. I get collations, but they are
only for polt and not tag:polt. Because of that, the hits that I get back
are for frequency of plot and not frequency of tag:plot

{
  "responseHeader": {
"status": 0,
"QTime": 20,
"params": {
  "spellcheck.collateExtendedResults": "true",
  "indent": "true",
  "spellcheck.maxCollations": "3",
  "spellcheck.maxCollationTries": "3",
  "spellcheck.extendedResults": "true",
  "q": "tag:polt",
  "spellcheck.q": "polt",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "wt": "json",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [
  "polt",
  {
"numFound": 7,
"startOffset": 0,
"endOffset": 4,
"origFreq": 0,
"suggestion": [
  {
"word": "plot",
"freq": 5934
  },
  {
"word": "port",
"freq": 495
  },
  {
"word": "post",
"freq": 233
  },
  {
"word": "poly",
"freq": 216
  },
  {
"word": "pole",
"freq": 175
  },
  {
"word": "poll",
"freq": 12
  },
  {
"word": "polm",
"freq": 9
  }
]
  }
],
"correctlySpelled": false,
"collations": [
  "collation",
  {
"collationQuery": "plot",
"hits": 10538,
"misspellingsAndCorrections": [
  "polt",
  "plot"
]
  },
  "collation",
  {
"collationQuery": "port",
"hits": 754,
"misspellingsAndCorrections": [
  "polt",
  "port"
]
  },
  "collation",
  {
"collationQuery": "post",
"hits": 626,
"misspellingsAndCorrections": [
  "polt",
  "post"
]
  }
]
  }
}

On Tue, Oct 17, 2017 at 5:01 AM, alessandro.benedetti 
wrote:

> But you used :
>
> "spellcheck.q": "tag:polt",
>
> Instead of :
> "spellcheck.q": "polt",
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: spell-check does not return collations when using search query with filter

2017-10-16 Thread Arnold Bronley
With q instead of spellcheck.q I get following response:


{
  "responseHeader": {
"status": 0,
"QTime": 23,
"params": {
  "q": "tag:polt",
  "spellcheck.collateExtendedResults": "true",
  "indent": "true",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.maxCollations": "3",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "spellcheck.maxCollationTries": "3",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [
  "polt",
  {
"numFound": 7,
"startOffset": 3,
"endOffset": 8,
"origFreq": 0,
"suggestion": [
  {
"word": "plot",
"freq": 5934
  },
  {
"word": "port",
"freq": 495
  },
  {
"word": "post",
"freq": 233
  },
  {
"word": "poly",
"freq": 216
  },
  {
"word": "pole",
"freq": 175
  },
  {
"word": "poll",
"freq": 12
  },
  {
"word": "polm",
"freq": 9
  }
]
  }
],
"correctlySpelled": false,
"collations": [

]
  }
}


with q and using the workaround that I mentioned, I get proper response as
follows (Note that I passed tag:\polt to q but in responseHeader, it shows
the escaped version i.e. tag:\\polt):

{
  "responseHeader": {
"status": 0,
"QTime": 20,
"params": {
  "q": "tag:\\polt",
  "spellcheck.collateExtendedResults": "true",
  "indent": "true",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.maxCollations": "3",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "spellcheck.maxCollationTries": "3",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [
  "polt",
  {
"numFound": 7,
"startOffset": 4,
"endOffset": 9,
"origFreq": 0,
"suggestion": [
  {
"word": "plot",
"freq": 5934
  },
  {
"word": "port",
    "freq": 495
  },
  {
"word": "post",
"freq": 233
  },
  {
"word": "poly",
"freq": 216
  },
  {
"word": "pole",
"freq": 175
  },
  {
"word": "poll",
"freq": 12
  },
  {
"word": "polm",
"freq": 9
  }
]
  }
],
"correctlySpelled": false,
"collations": [
  "collation",
  {
"collationQuery": "tag:plot",
"hits": 703,
"misspellingsAndCorrections": [
  "polt",
  "plot"
]
  },
  "collation",
  {
"collationQuery": "tag:port",
"hits": 8,
"misspellingsAndCorrections": [
  "polt",
  "port"
]
  },
  "collation",
  {
"collationQuery": "tag:post",
"hits": 3,
"misspellingsAndCorrections": [
  "polt",
  "post"
]
  }
]
  }

On Mon, Oct 16, 2017 at 3:00 PM, Arnold Bronley <arnoldbron...@gmail.com>
wrote:

> with spellcheck.q I don't get anything back at all.
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 10,
> "params": {
>   "spellcheck.collateExtendedResults": "true",
>   "spellcheck.q": "tag:polt",
>   "indent": "true",
>   "spellcheck": "true",
>   "spellcheck.accuracy": "0.72",
>   "spellcheck.maxCollations": "3",
>   "spellcheck.onlyMorePopular": "true",
>   "spellcheck.count": "7",
>   "spellcheck.maxCollationTries": "3",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "spellcheck.collate": "true"
> }
>   },
>   "response": {
> "numFound": 0,
> "start": 0,
> "docs": [
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>
> ],
> "correctlySpelled": false,
> "collations": [
>
> ]
>   }
> }
>
> On Mon, Oct 16, 2017 at 5:03 AM, alessandro.benedetti <
> a.benede...@sease.io> wrote:
>
>> Interesting, what happens when you pass it as spellcheck.q=polt ?
>> What is the behavior you get ?
>>
>>
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


Re: spell-check does not return collations when using search query with filter

2017-10-16 Thread Arnold Bronley
with spellcheck.q I don't get anything back at all.

{
  "responseHeader": {
"status": 0,
"QTime": 10,
"params": {
  "spellcheck.collateExtendedResults": "true",
  "spellcheck.q": "tag:polt",
  "indent": "true",
  "spellcheck": "true",
  "spellcheck.accuracy": "0.72",
  "spellcheck.maxCollations": "3",
  "spellcheck.onlyMorePopular": "true",
  "spellcheck.count": "7",
  "spellcheck.maxCollationTries": "3",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "spellcheck.collate": "true"
}
  },
  "response": {
"numFound": 0,
"start": 0,
"docs": [

]
  },
  "spellcheck": {
"suggestions": [

],
"correctlySpelled": false,
"collations": [

]
  }
}

On Mon, Oct 16, 2017 at 5:03 AM, alessandro.benedetti 
wrote:

> Interesting, what happens when you pass it as spellcheck.q=polt ?
> What is the behavior you get ?
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: spell-check does not return collations when using search query with filter

2017-10-14 Thread Arnold Bronley
Thanks for replying. I tried spellcheck.q=polt  and it does not help.

Here is how the query looks like:
http://solr:8983/solr/myapp/select?q=tag:polt=json=true=true=7=true=true=true=3=3=true

One workaround I found was that if you pass in \ before polt i.e. tag:\polt
then it works as expected and collations object is included in the json
response. I definitely think that this is a bug unless I am missing some
obvious thing here.

On Mon, Oct 9, 2017 at 9:45 AM, alessandro.benedetti 
wrote:

> Does spellcheck.q=polt help ?
> How your queries normally look ?
> How would you like the collation to be returned ?
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


spell-check does not return collations when using search query with filter

2017-10-06 Thread Arnold Bronley
When 'polt' is passed as keyword, both suggestions and collations
parameters are returned. But if I pass 'tag:polt' as search query then only
suggestions parameter is returned. Is this a bug?


How to remove control characters in stored value at Solr side

2017-09-14 Thread Arnold Bronley
I know I can apply PatternReplaceFilterFactory to remove control characters
from indexed value. However, is it possible to do similar thing for stored
value? Because of some control characters included in indexing request,
Solr throws Illegal Character Exception.


origFreq/freq ratio for filtering spell-check suggestions

2017-09-07 Thread Arnold Bronley
Hi Solr users,

I can see there are some parameters that can help in controlling the
trigger condition for spellcheck mechanism or filter the spell suggestions
like maxQueryFrequency or thresholdTokenFrequency. I could not find a
parameter that will filter the suggestions based on (origFreq/freq) ratio.
Is there any parameter like this? Or will I need to add custom logic at
client side to handle this? Please help.