Re: Filter by sibling ?

2021-03-02 Thread Joel Bernstein
Solr's graph expressions can do this type of thing. It allows you to walk
the relationships in a graph with filters:

https://lucene.apache.org/solr/guide/8_6/graph-traversal.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 2, 2021 at 9:00 AM Manoj Mokashi 
wrote:

> Hi,
>
> If I have a nested document structure, with say parent type:PR, child 1
> type:C1 and child2 type:C2,
> would it possible to fetch documents of type C1  that are children of
> parents that have child2 docs with a certain condition ?
> e.g. for
> { type:PR,
>   Title: "XXX",
>   Children1 : [ { type:C1, city:ABC} ],
>   Children2 : [ { type:C2, status:Done}]
> }
>
> Can I fetch type:C1 documents which are children of parent docs that have
> child C2 docs with status:Done ?
>
> Regards,
> manoj
>
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of the
> intended recipient and may contain confidential and privileged information.
> Any unauthorized view, use, disclosure or distribution is prohibited. If
> you are not the intended recipient, please contact the sender by reply
> email and destroy all copies of the original message. Anju Software, Inc.
> 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.
>


Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
Also the parallel function builds hash partitioning filters that could lead
to timeouts if they take too long to build. Try the query without the
parallel function if you're still getting timeouts when making the query
smaller.



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 4:03 PM Joel Bernstein  wrote:

> The settings in your version are 30 seconds and 15 seconds for socket and
> connection timeouts.
>
> Typically timeouts occur because one or more shards in the query are idle
> beyond the timeout threshold. This happens because lot's of data is being
> read from other shards.
>
> Breaking the query into small parts would be a good strategy.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz 
> wrote:
>
>> Hello Mr. Bernstein,
>>
>> I’m using version 8.4. So, if I understand correctly, I can’t increase
>> timeouts and they are bound to happen in such a large stream. Should I just
>> reduce the output of my search expressions?
>>
>> Maybe I can split my search results into ~100 parts and run the same
>> query 100 times in series. Each part would emit ~3M documents so they
>> should finish before timeout?
>>
>> Is this a reasonable solution?
>>
>> Btw how long is the default hard-coded timeout value? Because yesterday I
>> ran another query which took more than 1 hour without any timeouts and
>> finished successfully.
>>
>> Sent from Mail for Windows 10
>>
>> From: Joel Bernstein
>> Sent: 01 March 2021 23:03
>> To: solr-user@lucene.apache.org
>> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>>
>> Oh wait, I misread your email. The idle timeout issue is configurable in:
>>
>> https://issues.apache.org/jira/browse/SOLR-14672
>>
>> This unfortunately missed the 8.8 release and will be 8.9.
>>
>>
>>
>> This i
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:
>>
>> > What version are you using?
>> >
>> > Solr 8.7 has changes that caused these errors to hit the logs. These
>> used
>> > to be suppressed. This has been fixed in Solr 9.0 but it has not been
>> back
>> > ported to Solr 8.x.
>> >
>> > The errors are actually normal operational occurrences when doing joins
>> so
>> > should be suppressed in the logs and were before the specific release.
>> >
>> > It might make sense to do a release that specifically suppresses these
>> > errors without backporting the full Solr 9.0 changes which impact the
>> > memory footprint of export.
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> >
>> > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz > >
>> > wrote:
>> >
>> >> Hello all,
>> >>
>> >> I’m running a large streaming expression and feeding the result to
>> update
>> >> expression.
>> >>
>> >>  update(targetCollection, ...long running stream here...,
>> >>
>> >> I tried sending the exact same query multiple times, it sometimes works
>> >> and indexes some results, then gives exception, other times fails with
>> an
>> >> exception after 2 minutes.
>> >>
>> >> Response is like:
>> >> "EXCEPTION":"java.util.concurrent.ExecutionException:
>> >> java.io.IOException: params distrib=false=4 and my long
>> >> stream expression
>> >>
>> >> Server log (short):
>> >> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
>> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concurrent.TimeoutException: Idle timeout expired:
>> 12/12
>> >> ms
>> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concurrent.TimeoutException: Idle timeout expired:
>> 12/12
>> >> ms
>> >>
>> >> I tried to increase the jetty idle timeout value on the node which
>> hosts
>> >> my target collection to something like an hour. It didn’t affect.
>> >>
>> >>
>> >> Server logs (long)
>> >> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
>> >> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
>> >> java.util.concur

Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
The settings in your version are 30 seconds and 15 seconds for socket and
connection timeouts.

Typically timeouts occur because one or more shards in the query are idle
beyond the timeout threshold. This happens because lot's of data is being
read from other shards.

Breaking the query into small parts would be a good strategy.




Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz 
wrote:

> Hello Mr. Bernstein,
>
> I’m using version 8.4. So, if I understand correctly, I can’t increase
> timeouts and they are bound to happen in such a large stream. Should I just
> reduce the output of my search expressions?
>
> Maybe I can split my search results into ~100 parts and run the same query
> 100 times in series. Each part would emit ~3M documents so they should
> finish before timeout?
>
> Is this a reasonable solution?
>
> Btw how long is the default hard-coded timeout value? Because yesterday I
> ran another query which took more than 1 hour without any timeouts and
> finished successfully.
>
> Sent from Mail for Windows 10
>
> From: Joel Bernstein
> Sent: 01 March 2021 23:03
> To: solr-user@lucene.apache.org
> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>
> Oh wait, I misread your email. The idle timeout issue is configurable in:
>
> https://issues.apache.org/jira/browse/SOLR-14672
>
> This unfortunately missed the 8.8 release and will be 8.9.
>
>
>
> This i
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:
>
> > What version are you using?
> >
> > Solr 8.7 has changes that caused these errors to hit the logs. These used
> > to be suppressed. This has been fixed in Solr 9.0 but it has not been
> back
> > ported to Solr 8.x.
> >
> > The errors are actually normal operational occurrences when doing joins
> so
> > should be suppressed in the logs and were before the specific release.
> >
> > It might make sense to do a release that specifically suppresses these
> > errors without backporting the full Solr 9.0 changes which impact the
> > memory footprint of export.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz  >
> > wrote:
> >
> >> Hello all,
> >>
> >> I’m running a large streaming expression and feeding the result to
> update
> >> expression.
> >>
> >>  update(targetCollection, ...long running stream here...,
> >>
> >> I tried sending the exact same query multiple times, it sometimes works
> >> and indexes some results, then gives exception, other times fails with
> an
> >> exception after 2 minutes.
> >>
> >> Response is like:
> >> "EXCEPTION":"java.util.concurrent.ExecutionException:
> >> java.io.IOException: params distrib=false=4 and my long
> >> stream expression
> >>
> >> Server log (short):
> >> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
> >> java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12
> >> ms
> >> o.a.s.s.HttpSolrCall null:java.io.IOException:
> >> java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12
> >> ms
> >>
> >> I tried to increase the jetty idle timeout value on the node which hosts
> >> my target collection to something like an hour. It didn’t affect.
> >>
> >>
> >> Server logs (long)
> >> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
> >> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
> >> java.util.concurrent.TimeoutException: Idle timeout expired: 1
> >> 2/12 ms
> >> solr-01|at
> >>
> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
> >> solr-01|at
> >> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
> >> solr-01|at
> >> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
> >> solr-01|at
> >>
> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
> >> solr-01|at
> >> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
> >> solr-01|at
> >> java.base/sun.nio.cs.StreamEncoder.implWrite(Stream

Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
Oh wait, I misread your email. The idle timeout issue is configurable in:

https://issues.apache.org/jira/browse/SOLR-14672

This unfortunately missed the 8.8 release and will be 8.9.



This i



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 2:56 PM Joel Bernstein  wrote:

> What version are you using?
>
> Solr 8.7 has changes that caused these errors to hit the logs. These used
> to be suppressed. This has been fixed in Solr 9.0 but it has not been back
> ported to Solr 8.x.
>
> The errors are actually normal operational occurrences when doing joins so
> should be suppressed in the logs and were before the specific release.
>
> It might make sense to do a release that specifically suppresses these
> errors without backporting the full Solr 9.0 changes which impact the
> memory footprint of export.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz 
> wrote:
>
>> Hello all,
>>
>> I’m running a large streaming expression and feeding the result to update
>> expression.
>>
>>  update(targetCollection, ...long running stream here...,
>>
>> I tried sending the exact same query multiple times, it sometimes works
>> and indexes some results, then gives exception, other times fails with an
>> exception after 2 minutes.
>>
>> Response is like:
>> "EXCEPTION":"java.util.concurrent.ExecutionException:
>> java.io.IOException: params distrib=false=4 and my long
>> stream expression
>>
>> Server log (short):
>> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1]
>> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
>> ms
>> o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
>> ms
>>
>> I tried to increase the jetty idle timeout value on the node which hosts
>> my target collection to something like an hour. It didn’t affect.
>>
>>
>> Server logs (long)
>> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
>> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
>> java.util.concurrent.TimeoutException: Idle timeout expired: 1
>> 2/12 ms
>> solr-01|at
>> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
>> solr-01|at
>> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
>> solr-01|at
>> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
>> solr-01|at
>> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281)
>> solr-01|at
>> java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> solr-01|at java.base/java.io
>> .OutputStreamWriter.write(OutputStreamWriter.java:211)
>> solr-01|at
>> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140)
>> solr-01|at
>> org.apache.solr.common.util.FastWriter.write(FastWriter.java:54)
>> solr-01|at
>> org.apache.solr.response.JSONWriter._writeChar(JSONWriter.java:173)
>> solr-01|at
>> org.apache.solr.common.util.JsonTextWriter.writeStr(JsonTextWriter.java:86)
>> solr-01|at
>> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:52)
>> solr-01|at
>> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
>> solr-01|at
>> org.apache.solr.common.util.JsonTextWriter$2.put(JsonTextWriter.java:176)
>> solr-01|at
>> org.apache.solr.common.MapWriter$EntryWriter.put(MapWriter.java:154)
>> solr-01|at
>> org.apache.solr.handler.export.StringFieldWriter.write(StringFieldWriter.java:77)
>> solr-01|at
>> org.apache.solr.handler.export.ExportWriter.writeDoc(ExportWriter.java:313)
>> solr-01|at
>> org.apache.solr.handler.export.ExportWriter.lambda$addDocsToItemWriter$4(ExportWriter.java:263)
>> --
>> solr-01|at org.eclipse.jetty.io
>> .FillInterest.fillable(FillInterest.java:103)
>> solr-01|at org.eclipse.jetty.io
>> .ChannelEndPoint$2.run(ChannelEndPoi

Re: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread Joel Bernstein
What version are you using?

Solr 8.7 has changes that caused these errors to hit the logs. These used
to be suppressed. This has been fixed in Solr 9.0 but it has not been back
ported to Solr 8.x.

The errors are actually normal operational occurrences when doing joins so
should be suppressed in the logs and were before the specific release.

It might make sense to do a release that specifically suppresses these
errors without backporting the full Solr 9.0 changes which impact the
memory footprint of export.




Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz 
wrote:

> Hello all,
>
> I’m running a large streaming expression and feeding the result to update
> expression.
>
>  update(targetCollection, ...long running stream here...,
>
> I tried sending the exact same query multiple times, it sometimes works
> and indexes some results, then gives exception, other times fails with an
> exception after 2 minutes.
>
> Response is like:
> "EXCEPTION":"java.util.concurrent.ExecutionException: java.io.IOException:
> params distrib=false=4 and my long stream expression
>
> Server log (short):
> [c:DNM s:shard1 r:core_node2 x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 12/12 ms
> o.a.s.s.HttpSolrCall null:java.io.IOException:
> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
> ms
>
> I tried to increase the jetty idle timeout value on the node which hosts
> my target collection to something like an hour. It didn’t affect.
>
>
> Server logs (long)
> ERROR (qtp832292933-589) [c:DNM s:shard1 r:core_node2
> x:DNM_shard1_replica_n1] o.a.s.s.HttpSolrCall null:java.io.IOException:
> java.util.concurrent.TimeoutException: Idle timeout expired: 1
> 2/12 ms
> solr-01|at
> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
> solr-01|at
> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
> solr-01|at
> org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:524)
> solr-01|at
> org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:134)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281)
> solr-01|at
> java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> solr-01|at java.base/java.io
> .OutputStreamWriter.write(OutputStreamWriter.java:211)
> solr-01|at
> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140)
> solr-01|at
> org.apache.solr.common.util.FastWriter.write(FastWriter.java:54)
> solr-01|at
> org.apache.solr.response.JSONWriter._writeChar(JSONWriter.java:173)
> solr-01|at
> org.apache.solr.common.util.JsonTextWriter.writeStr(JsonTextWriter.java:86)
> solr-01|at
> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:52)
> solr-01|at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> solr-01|at
> org.apache.solr.common.util.JsonTextWriter$2.put(JsonTextWriter.java:176)
> solr-01|at
> org.apache.solr.common.MapWriter$EntryWriter.put(MapWriter.java:154)
> solr-01|at
> org.apache.solr.handler.export.StringFieldWriter.write(StringFieldWriter.java:77)
> solr-01|at
> org.apache.solr.handler.export.ExportWriter.writeDoc(ExportWriter.java:313)
> solr-01|at
> org.apache.solr.handler.export.ExportWriter.lambda$addDocsToItemWriter$4(ExportWriter.java:263)
> --
> solr-01|at org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:103)
> solr-01|at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> solr-01|at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> solr-01|at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
> solr-01|at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
> solr-

Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-27 Thread Joel Bernstein
Congratulations Jan!

Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Feb 22, 2021 at 2:41 AM Danilo Tomasoni  wrote:

> Congratulations Jan!
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..=mailto%3acalabro%40cosbi.eu
> >
> http://www.cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..=http%3a%2f%2fwww.cosbi.eu%2f
> >
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
> 
> Da: Yonik Seeley 
> Inviato: domenica 21 febbraio 2021 05:51
> A: solr-user@lucene.apache.org 
> Cc: Lucene Dev 
> Oggetto: Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!
>
> [CAUTION: EXTERNAL SENDER]
> [Please check correspondence between Sender Display Name and Sender Email
> Address before clicking on any link or opening attachments]
>
>
> Congrats Jan! Go Solr!
> -Yonik
>
>
> On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta 
> wrote:
>
> > Hi everyone,
> >
> > I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated
> > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> > President. This decision was approved by the board in its February 2021
> > meeting.
> >
> > Congratulations Jan!
> >
> > --
> > Anshum Gupta
> >
>


Re: nodes() stream to infinite depth

2021-02-19 Thread Joel Bernstein
You could see if this meets you needs:

https://lucene.apache.org/solr/guide/8_8/stream-source-reference.html#shortestpath





Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 19, 2021 at 2:45 PM Subhajit Das 
wrote:

> Hi Joel,
>
> Thanks for response. But, is there any way to simulate the same?
>
>
> From: Joel Bernstein<mailto:joels...@gmail.com>
> Sent: 20 February 2021 01:13 AM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Subject: Re: nodes() stream to infinite depth
>
> Nodes is designed for a stepwise graph walk. It doesn't do a full
> traversal.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Feb 19, 2021 at 4:47 AM Subhajit Das 
> wrote:
>
> >
> > Hi,
> >
> > “{!graph ...}” goes to infinite depth by default. But “nodes()” stream
> > does not go to infinite depth.
> >
> > Is there any way to go to infinite depth?
> >
> >
>
>


Re: nodes() stream to infinite depth

2021-02-19 Thread Joel Bernstein
Nodes is designed for a stepwise graph walk. It doesn't do a full traversal.


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 19, 2021 at 4:47 AM Subhajit Das 
wrote:

>
> Hi,
>
> “{!graph ...}” goes to infinite depth by default. But “nodes()” stream
> does not go to infinite depth.
>
> Is there any way to go to infinite depth?
>
>


Re: Significant terms expression giving error "length needs to be >= 1"

2021-02-16 Thread Joel Bernstein
Can you include the stack trace from the logs?


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Feb 15, 2021 at 3:53 PM ufuk yılmaz 
wrote:

> We have a SolrCloud cluster, version 8.4
>
> At the customer’s site there’s a collection with very few documents,
> around 12. We usually have collections with hundreds of millions of
> documents, so that collection is a bit of an exception.
>
> When I send a significantTerms streaming expression it immediately gets a
> “IllegalArgumentException("length needs to be >= 1")” from that
> collection’s shard. I took a look at it, but it doesn’t seem to have
> anything different in it from other collections. We also don’t get that
> exception in our own cluster, which is very similar to customer’s.
>
> I found the exception log in
> “lucene-solr\lucene\core\src\java\org\apache\lucene\util\SparseFixedBitSet.java”
> but I don’t have enough knowlegde on the inner workings of the streaming
> expression to interpret it.
>
> What may cause this?
>
> --ufuk
>
> Sent from Mail for Windows 10
>
>


Re: Is there way to autowarm new searcher using recently ran queries

2021-01-27 Thread Joel Bernstein
Typically what you would do is add static warming queries to warm all the
caches. These queries are hardcoded into the solrconfig.xml. You'll want to
run the facets you're using in the warming queries particularly facets on
string fields.

Once you add these it will take longer to warm the new searcher so you may
need to change the auto-commit intervals.




Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 27, 2021 at 5:30 PM Pushkar Raste 
wrote:

> Hi,
>
> A rookie question. We have a Solr cluster that doesn't get too much
> traffic. We see that our queries take long time unless we run a script to
> send more traffic to Solr.
>
> We are indexing data all the time and use autoCommit.
>
> I am wondering if there is a way to warmup new searcher on commit by
> rerunning queries processed by the last searcher. May be it happens by
> default but then I can't understand why we see high query times if those
> searchers are being warmed.
>


Re: Streaming expressions, what is the effect of collection name inthe request url

2021-01-26 Thread Joel Bernstein
I have never tried this and didn't even know that you could have multiple
collections in the URL. So, I'm really not sure what the behavior will be.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Jan 26, 2021 at 1:19 PM ufuk yılmaz 
wrote:

> Does it have any ill side effects when url has multiple collections? Like
> can it cause expression to compile and run on many nodes at once?
>
> Our scripts generate the url, when we are doing a regular searching on
> multiple collections, that url is necessary but if it’s ill for streaming
> ones, I should change them.
>
> Many thanks  Joel
>
> PS: how are tomato thiefs doing? 
>
> Sent from Mail for Windows 10
>
> From: Joel Bernstein
> Sent: 26 January 2021 21:14
> To: solr-user@lucene.apache.org
> Subject: Re: Streaming expressions, what is the effect of collection name
> inthe request url
>
> The URL path should be for one collection. This will be where the
> collection is compiled and run. It has no effect on what is actually being
> searched. That is specified in the expression themselves.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 20, 2021 at 1:34 PM ufuk yılmaz 
> wrote:
>
> > Do collection names in request url affect how the query works in any way?
> >
> > A streaming expression is sent to
> http://mySolrHost/solr/col1,col2/stream
> > (notice multiple collections in url)
> >
> > Col1 has 2 shards, each have 3 replicas.
> > * Shard1 has replicas on nodes A, B, C
> > * Shard2 has replicas on D,E,F
> >
> > Col2 has 2 shards, each have 3 replicas. Its shards have the same
> > configuration as Col1.
> >
> >
> > Lets say we have a simple search expression:
> > search(
> > "colA,colB",
> > q="*:*",
> > qt="/export",
> > fl="fl1,fl2",
> > sort="id asc"
> > )
> >
> > Collection names in search expression denotes which collections should be
> > searched, so we can’t change them. But what would change if we sent the
> > query to
> > http://mySolrHost/solr/someOtherCollection/stream
> >
> > and someOtherCollection has 1 shard and 6 replicas in nodes A,B,C,D,E,F ?
> >
> > I read about worker collections a bit, but as long as I don’t explicitly
> > use parallel streams, what is the difference?
> >
> >
> >
> > Sent from Mail for Windows 10
> >
> >
>
>


Re: NullPointerException in Graph Traversal nodes streaming expression

2021-01-26 Thread Joel Bernstein
How are you constructing the Stream with classes or using a Streaming
Expression?

In either case can you post either the code or expression?

Are there more errors in the logs? The place where this NPE is occurring is
that an underlying stream is null, which leads me to believe there would be
some exceptions before this, possibly on a different server if this has
multiple servers involved.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jan 21, 2021 at 5:46 PM Mike Drob  wrote:

> Can you provide a sample expression that would be able to reproduce this?
> Are you able to try a newer version by chance - I know we've fixed a few
> NPEs recently, maybe https://issues.apache.org/jira/browse/SOLR-14700
>
> On Thu, Jan 21, 2021 at 4:13 PM ufuk yılmaz 
> wrote:
>
> > Solr version 8.4. I’m getting an unexplanetory NullPointerException when
> > executing a simple 2 level nodes stream, do you have any idea what may
> > cause this?
> >
> > I tried setting /stream?partialResults=true=true and
> > shards.tolerant=true in nodes expressions, with no luck. I also tried
> > reading source of GatherNodesStream in branch 8_4, but couldn’t
> understand
> > it. Here is a beautiful stack trace:
> >
> > solr| 2021-01-21 22:00:12.726 ERROR (qtp832292933-25149)
> > [c:WorkerCollection s:shard1 r:core_node10
> > x:WorkerCollection_shard1_replica_n9] o.a.s.c.s.i.s.ExceptionStream
> > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: java.lang.NullPointerException
> > solr|   at
> >
> org.apache.solr.client.solrj.io.graph.GatherNodesStream.read(GatherNodesStream.java:607)
> > solr|   at
> >
> org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:71)
> > solr|   at
> >
> org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:454)
> > solr|   at
> >
> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:84)
> > solr|   at
> >
> org.apache.solr.common.util.JsonTextWriter.writeIterator(JsonTextWriter.java:141)
> > solr|   at
> > org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:67)
> > solr|   at
> >
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> > solr|   at
> > org.apache.solr.common.util.JsonTextWriter$2.put(JsonTextWriter.java:176)
> > solr|   at
> >
> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:81)
> > solr|   at
> >
> org.apache.solr.common.util.JsonTextWriter.writeMap(JsonTextWriter.java:164)
> > solr|   at
> > org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:69)
> > solr|   at
> >
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> > solr|   at
> >
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> > solr|   at
> >
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
> > solr|   at
> > org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
> > solr|   at
> >
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)
> > solr|   at
> >
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
> > solr|   at
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
> > solr|   at
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
> > solr|   at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> > solr|   at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> > solr|   at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
> > solr|   at
> >
> org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:311)
> > solr|   at
> >
> org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:265)
> > solr|   at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> > solr|   at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> > solr|   at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> > solr|   at

Re: Parallel streaming expression java.lang.IndexOutOfBoundsException

2021-01-26 Thread Joel Bernstein
Yes, this is not ideal. It means that the worker collection needs to have N
shards rather than N replicas. Changing this should not be difficult if
you'd like to provide a patch.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jan 21, 2021 at 8:00 AM ufuk yılmaz 
wrote:

> Looked at the source code of the parallel stream and it seems I need equal
> number of SHARDS and workers count parameter. I thought I needed as many
> replicas, it was shards.
> Maybe helps someone.
>
> Sent from Mail for Windows 10
>
> From: ufuk yılmaz
> Sent: 21 January 2021 11:16
> To: solr-user@lucene.apache.org
> Subject: Parallel streaming expression java.lang.IndexOutOfBoundsException
>
> Hello all,
>
>
> https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#parallel
>
> I’m sending the same query in the docs, (just collection names changed) to
> my Solr but always getting the exception:
>
> {
>   "result-set":{
> "docs":[{
> "EXCEPTION":"java.lang.IndexOutOfBoundsException: Index 1 out of
> bounds for length 1",
> "EOF":true,
> "RESPONSE_TIME":93}]}}
>
> My query:
>
> null(
> parallel(
> WorkerCollection,
> rollup(
> search(
> colA,
> q="username: c*",
> qt="/export",
> fl="timestamp,user.id",
> sort="user.id asc",
> partitionKeys="user.id"
> ),
> over="user.id",count(*)
> ),
> workers="4",
> sort="timestamp asc"
> )
> )
>
> WorkerCollection has 1 shard and 4 replicas on 4 different machines. I
> double-triple checked for a silly syntax mistake but there’s none that I
> can see. I tried doing this a few months ago with no success, getting the
> same exception again. colA is on different machines from WorkerCollection.
> user.id is a string field, timestamp is long.
>
> What am I missing?
>
>
> Sent from Mail for Windows 10
>
>
>


Re: Streaming expressions, what is the effect of collection name in the request url

2021-01-26 Thread Joel Bernstein
The URL path should be for one collection. This will be where the
collection is compiled and run. It has no effect on what is actually being
searched. That is specified in the expression themselves.


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 20, 2021 at 1:34 PM ufuk yılmaz 
wrote:

> Do collection names in request url affect how the query works in any way?
>
> A streaming expression is sent to http://mySolrHost/solr/col1,col2/stream
> (notice multiple collections in url)
>
> Col1 has 2 shards, each have 3 replicas.
> * Shard1 has replicas on nodes A, B, C
> * Shard2 has replicas on D,E,F
>
> Col2 has 2 shards, each have 3 replicas. Its shards have the same
> configuration as Col1.
>
>
> Lets say we have a simple search expression:
> search(
> "colA,colB",
> q="*:*",
> qt="/export",
> fl="fl1,fl2",
> sort="id asc"
> )
>
> Collection names in search expression denotes which collections should be
> searched, so we can’t change them. But what would change if we sent the
> query to
> http://mySolrHost/solr/someOtherCollection/stream
>
> and someOtherCollection has 1 shard and 6 replicas in nodes A,B,C,D,E,F ?
>
> I read about worker collections a bit, but as long as I don’t explicitly
> use parallel streams, what is the difference?
>
>
>
> Sent from Mail for Windows 10
>
>


Re: Steps to write a custom StreamingExpression

2021-01-26 Thread Joel Bernstein
I believe that would be the best path.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Jan 26, 2021 at 7:50 AM ufuk yılmaz 
wrote:

> Should I create a java project with a dependency on solrj, or solr core ?,
> then implement the Expressible interface
> then build my project as a jar and put it into each node of SolrColud’s
> classpath?
>
> Or should I take a completely different route?
>
> Many thanks
> ~ufuk
>
> Sent from Mail for Windows 10
>
>


Re: [Solr8.7] Performance of group.ngroups ?

2021-01-15 Thread Joel Bernstein
You can try collapse as well.



Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jan 15, 2021 at 4:51 AM Bruno Mannina  wrote:

> Hello,
>
>
>
> I found a temporary solution to my problem.
>
>
>
> I do a request without ngroups=true => result is quickly
>
> And just after, I do a simple request with my query and this param:
>
> ….={x:"unique(fid)"}
>
> Where the field « fid » is my group field name.
>
>
>
> 88 sec => 3~4 sec for both requests.
>
>
>
> Regards,
>
> Bruno
>
>
>
> De : Matheo Software [mailto:i...@matheo-software.com]
> Envoyé : jeudi 14 janvier 2021 14:48
> À : solr-user@lucene.apache.org
> Objet : [Solr8.7] Performance of group.ngroups ?
>
>
>
> Hi All,
>
>
>
> I have more than 130 million documents, with an index size of more than
> 400GB on Solr8.7.
>
>
>
> I do a simple query and it takes around 1400ms, it’s ok but when I use
> ngroups=true, I get an answer in 88sec.
>
> I know it’s because Solr calculates the number of groups on a specific
> field
> but is exist a solution to improve that? An alternate solution?
>
>
>
> Many thanks,
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
>  <http://www.matheo-software.com> www.matheo-software.com
>
>  <http://www.patent-pulse.com> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
>  <https://www.facebook.com/PatentPulse> facebook (1)
> <https://twitter.com/matheosoftware> 1425551717
> <https://www.linkedin.com/company/matheo-software> 1425551737
> <https://www.youtube.com/user/MatheoSoftware> 1425551760
>
>
>
>
>
>
>
> <
> https://www.avast.com/sig-email?utm_medium=email_source=link_campai
> gn=sig-email_content=emailclient
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient>>
>
>
> Garanti sans virus.
> <
> https://www.avast.com/sig-email?utm_medium=email_source=link_campai
> gn=sig-email_content=emailclient
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient>>
> www.avast.com
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>


Re: Solr collapse & expand queries.

2020-11-30 Thread Joel Bernstein
Both collapse and grouping are used quite often so I'm not sure I would
agree with the preference for collapse. There is a very specific use case
where collapse performs better and in these scenarios collapse might be the
only option that would work.

The use case where collapse works better is:

1) High cardinality grouping field, like product id.
2) Larger result sets
3) The need to know the full number of groups that match the result set. In
grouping this is group.ngroups.

At a certain point grouping will become too slow under the scenario
described above. It will all depend on the scale of #1 and #2 above. If you
remove group.ngroups grouping will usually be just as fast or faster then
collapse.

So in your testing, make sure you're testing the full data set with
representative queries, and decide if group.ngroups is needed.







Joel Bernstein
http://joelsolr.blogspot.com/


On Sat, Nov 28, 2020 at 3:42 AM Parshant Kumar
 wrote:

> Hi community,
>
> I want to implement collapse queries instead of group queries . In solr
> documentation it is stated that we should prefer collapse & expand queries
> instead of group queries.Please explain how the collapse & expand queries
> is better than grouped queries ? How can I implement it ? Do i need to add
> anything in *solrconfig.xml file* as well or just need to make changes in
> solr queries like below:
>
>
> *fq={!collapse field=*field*}=n=true  instead of
> group.field=*field*=true=n*
>
> I have done performance testing by making above changes in solr queries and
> found that query times are almost the same for both collapse queries and
> group queries.
>
> Please help me how to implement it and its advantage over grouped queries.
>
> Thanks,
> Parshant Kumar.
>
> --
>
>


Re: Use stream result like a query (alternative to innerJoin)

2020-11-23 Thread Joel Bernstein
Here is the documentation for fetch:

https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#fetch


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Nov 23, 2020 at 3:22 PM Joel Bernstein  wrote:

> There are two streams that behave like that.
>
> One is the "nodes" expression, which is not going to work for this use
> case because it does everything in memory.
>
> The second one is the "fetch" expression which behaves like a nested loop
> join with some limitations. Unfortunately the main limitation is likely to
> be a blocker for you which is that it doesn't support one-to-many joins yet.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz 
> wrote:
>
>> Hi all,
>>
>> I’m looking for a way to query two collections and find documents that
>> exist in both, I know this can be done with innerJoin streaming expression
>> but I want to avoid it, since one of the collection streams can possibly
>> have billions of results:
>>
>> Let’s say two collections are:
>>
>> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
>> items = [
>> {
>> id: 1,
>> name: "a"
>> },
>> {   id: 2,
>> name: "b"
>> },
>> {
>> id: 3,
>> name: "c"
>> }.
>> ]
>>
>> “deletedItems” contain a few documents compared to “items” collection
>> (1mil vs 2-3 bil). If I query them both with a typical query in our system,
>> deletedItems gives a few thousand results but items give tens/hundreds of
>> millions. To use innerJoin, I have to stream the whole items result to
>> worker node over network.
>>
>> Is there a way to avoid this, something like using “deletedItems” result
>> as a query to “items” stream?
>>
>> Thanks in advance for the help
>>
>> Sent from Mail for Windows 10
>>
>>


Re: Use stream result like a query (alternative to innerJoin)

2020-11-23 Thread Joel Bernstein
There are two streams that behave like that.

One is the "nodes" expression, which is not going to work for this use case
because it does everything in memory.

The second one is the "fetch" expression which behaves like a nested loop
join with some limitations. Unfortunately the main limitation is likely to
be a blocker for you which is that it doesn't support one-to-many joins yet.

Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz 
wrote:

> Hi all,
>
> I’m looking for a way to query two collections and find documents that
> exist in both, I know this can be done with innerJoin streaming expression
> but I want to avoid it, since one of the collection streams can possibly
> have billions of results:
>
> Let’s say two collections are:
>
> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
> items = [
> {
> id: 1,
> name: "a"
> },
> {   id: 2,
> name: "b"
> },
> {
> id: 3,
> name: "c"
> }.
> ]
>
> “deletedItems” contain a few documents compared to “items” collection
> (1mil vs 2-3 bil). If I query them both with a typical query in our system,
> deletedItems gives a few thousand results but items give tens/hundreds of
> millions. To use innerJoin, I have to stream the whole items result to
> worker node over network.
>
> Is there a way to avoid this, something like using “deletedItems” result
> as a query to “items” stream?
>
> Thanks in advance for the help
>
> Sent from Mail for Windows 10
>
>


Re: How to use the "eval" streaming expression?

2020-11-19 Thread Joel Bernstein
This blog gets more specific with some of the ideas behind the eval
expression:

https://joelsolr.blogspot.com/2017/04/having-talk-with-solr-using-new-echo.html



Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Nov 19, 2020 at 12:21 PM Joel Bernstein  wrote:

> You could have a program that writes a Streaming Expression
> programmatically then use eval to run it. You can also save Streaming
> Expression data structures: tuple, list, array etc... and eval them into
> live streams that can be iterated.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Nov 18, 2020 at 7:49 PM ufuk yılmaz 
> wrote:
>
>> Hey,
>>
>> Can anyone give me an example on how can eval
>> https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#eval
>> be used?
>>
>> Docs says it allows to run streaming expressions those created on the
>> fly, but I can’t wrap my head on how an expression can be created on the
>> fly, maybe unless it was stored in a field in Solr?
>>
>> Best
>>
>> Sent from Mail for Windows 10
>>
>>


Re: How to use the "eval" streaming expression?

2020-11-19 Thread Joel Bernstein
You could have a program that writes a Streaming Expression
programmatically then use eval to run it. You can also save Streaming
Expression data structures: tuple, list, array etc... and eval them into
live streams that can be iterated.




Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Nov 18, 2020 at 7:49 PM ufuk yılmaz 
wrote:

> Hey,
>
> Can anyone give me an example on how can eval
> https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#eval
> be used?
>
> Docs says it allows to run streaming expressions those created on the fly,
> but I can’t wrap my head on how an expression can be created on the fly,
> maybe unless it was stored in a field in Solr?
>
> Best
>
> Sent from Mail for Windows 10
>
>


Re: Using Multiple collections with streaming expressions

2020-11-12 Thread Joel Bernstein
The multiple collection syntax has been implemented for only a few stream
sources: search, timeseries, facet and stats. Eventually it will be
implemented for all stream sources.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 10, 2020 at 12:32 PM ufuk yılmaz 
wrote:

> Thanks again Erick, that’s a good idea!
>
> Alternatively, I use an alias covering multiple collections in these
> situations, but there may be too many combinations of collections, so it’s
> not always suitable.
>
> Merged significantTerms streams will have meaningles scores in tuples I
> think, it would be comparing apples and oranges, but in this case I’m only
> interested in getting foreground counts, so it’s another day’s problem
>
> What seemed strange to me was source code for streams appeared to be
> handling this case.
>
>
> Sent from Mail for Windows 10
>
> From: Erick Erickson
> Sent: 10 November 2020 16:48
> To: solr-user@lucene.apache.org
> Subject: Re: Using Multiple collections with streaming expressions
>
> Y
>
>


Re: Strange fetch streaming expression doesn't fetch fields sometimes?

2020-10-14 Thread Joel Bernstein
Yes, the docs mention one-to-one and many-to-one fetches, but one-to-many
is not supported currently. I've never really been happy with fetch. It
really needs to be replaced with a standard nested loop join that handles
all scenarios.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Oct 13, 2020 at 6:30 PM uyilmaz  wrote:

> I think I found the reason right after asking (facepalm), but it took me
> days to realize this.
>
> I think fetch performs a naive "in" query, something like:
>
> q="userid:(123123 123123123 12432423321323)={batchSize}"
>
> When userid to document relation is one-to-many, it is possible that above
> query will result in documents consisting entirely of last two userid's
> documents, so the first one is left out, resulting in empty username. Docs
> state that one to many is not supported with fetch, but I didn't stumble
> onto this issue until recently so I just assumed it would work.
>
> Sorry to take your time, I hope this helps somebody later.
>
> Have a nice day.
>
> On Wed, 14 Oct 2020 00:38:05 +0300
> uyilmaz  wrote:
>
> >
> > Hi all,
> >
> > I have a streaming expression looking like:
> >
> > fetch(
> >   myAlias,
> >   top(
> >   n=3,
> >   various expressions here
> > sort="count(*) desc"
> >   ),
> >   fl="username", on="userid=userid", batchSize=3
> > )
> >
> > which fails to fetch username field for the 1st result:
> >
> > {
> >  "result-set":{
> >   "docs":[{
> > "userid":"123123",
> > "count(*)":58}
> >,{
> > "userid":"123123123",
> > "count(*)":32,
> > "username":"Ayha"}
> >,{
> > "userid":"12432423321323",
> > "count(*)":30,
> > "username":"MEHM"}
> >,{
> > "EOF":true,
> > "RESPONSE_TIME":34889}]}}
> >
> > But strangely, when I change n and batchSize both to 2 and touch nothing
> else, fetch fetches the first username correctly:
> >
> > fetch(
> >   myAlias,
> >   top(
> >   n=2,
> >   various expressions here
> > sort="count(*) desc"
> >   ),
> >   fl="username", on="userid=userid", batchSize=2
> > )
> >
> > Result is:
> >
> > {
> >  "result-set":{
> >   "docs":[{
> > "userid":"123123",
> > "count(*)":58,
> > "username":"mura"}
> >,{
> > "userid":"123123123",
> > "count(*)":32,
> > "username":"Ayha"}
> >,{
> > "EOF":true,
> > "RESPONSE_TIME":34889}]}}
> >
> > What can be the problem?
> >
> > Regards
> >
> > ~~ufuk
> >
> > --
> > uyilmaz 
>
>
> --
> uyilmaz 
>


Re: Using streaming expressions with shards filter

2020-10-06 Thread Joel Bernstein
Actually it's:

.shards=shard1,shard2,shard3...



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Oct 6, 2020 at 2:38 PM Joel Bernstein  wrote:

>
> There is a parameter in streaming expressions for this but it is not
> available for use in every stream source. The search expression should
> honor it though.
>
> If you pass the .shard=shard1,shard2,shard3...
>
> The search stream will honor this.
>
> This work was originally done for supporting no-SolrCloud streaming
> expressions but was not fully realized yet.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Oct 1, 2020 at 11:31 AM Gael Jourdan-Weil <
> gael.jourdan-w...@kelkoogroup.com> wrote:
>
>> Hello,
>>
>> I am trying to use a Streaming Expression to query only a subset of the
>> shards of a collection.
>> I expected to be able to use the "shards" parameter like on a regular
>> query on "/select" for instance but this appear to not work or I don't know
>> how to do it.
>>
>> Is this somehow a feature/restriction of Streaming expressions?
>> Or am I missing something?
>>
>> Note that the Streaming Expression I use is actually using the "/export"
>> request handler.
>>
>> Example of the streaming expression:
>> curl -X POST -v --data-urlencode
>> 'expr=search(myCollection,q="*:*",fl="id",sort="id asc",qt="/export")' '
>> http://myserver/solr/myCollection/stream'
>>
>> Solr version: 8.4
>>
>> Best regards,
>> Gaël
>
>


Re: Using streaming expressions with shards filter

2020-10-06 Thread Joel Bernstein
There is a parameter in streaming expressions for this but it is not
available for use in every stream source. The search expression should
honor it though.

If you pass the .shard=shard1,shard2,shard3...

The search stream will honor this.

This work was originally done for supporting no-SolrCloud streaming
expressions but was not fully realized yet.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Oct 1, 2020 at 11:31 AM Gael Jourdan-Weil <
gael.jourdan-w...@kelkoogroup.com> wrote:

> Hello,
>
> I am trying to use a Streaming Expression to query only a subset of the
> shards of a collection.
> I expected to be able to use the "shards" parameter like on a regular
> query on "/select" for instance but this appear to not work or I don't know
> how to do it.
>
> Is this somehow a feature/restriction of Streaming expressions?
> Or am I missing something?
>
> Note that the Streaming Expression I use is actually using the "/export"
> request handler.
>
> Example of the streaming expression:
> curl -X POST -v --data-urlencode
> 'expr=search(myCollection,q="*:*",fl="id",sort="id asc",qt="/export")' '
> http://myserver/solr/myCollection/stream'
>
> Solr version: 8.4
>
> Best regards,
> Gaël


Re: Loading JSON docs into Solr with Streaming Expressions?

2020-07-24 Thread Joel Bernstein
It's probably time to add JSON loading support to streaming
expressions, but nothing yet. This ticket is almost done and paves the way
for a suite of parseXYZ functions:

https://issues.apache.org/jira/browse/SOLR-14673



Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 24, 2020 at 1:00 PM Eric Pugh 
wrote:

> Hey all,   I wanted to load some JSON docs into Solr and as I load them,
> do some manipulations to the documents as they go in.   I looked at
> https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
> <
> https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>,
> however I also wanted to see if Streaming would help.
>
> I’ve used the combination of cat and parseCSV streaming functions
> successfully to load data into Solr, so I looked a bit at what we could do
> with JSON source format.
>
> I didn’t see an obvious path for taking a .json file and loading it, so I
> played around and made this JSON w/ Lines formatted file streaming
> expression:
> https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 <
> https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3>
>
> The expression looks like
> commit(icecat,
>   update(icecat,
> parseJSONL(
>   cat('two_docs.jsonl')
> )
>   )
> )
> I was curious what other folks have done?  I saw that there is a
> JSONTupleStream, but it didn’t quite seem to fit the need.
>
> Eric
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: Parallel SQL join on multivalue fields

2020-07-01 Thread Joel Bernstein
There isn't any real support for joins in Parallel SQL currently. I'm
surprised that you're having some success doing them. Can you provide a
sample SQL join that is working for you?



Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jun 26, 2020 at 3:32 AM Piero Scrima  wrote:

> Hi,
>
> Although there is no trace of join functionality in the official Solr
> documentation
> (https://lucene.apache.org/solr/guide/7_4/parallel-sql-interface.html),
> joining in parallel sql works in practice. It only works if the field is
> not a multivalued field. For my project it would be fantastic if it also
> worked with multivalued fields.
> Is there any way to do it? working with the streaming expression I managed
> to do it with the following expression:
>
> innerJoin(
> sort(
> cartesianProduct(
>
>
> search(census_defence_system,q="*:*",fl="id,defence_system,description,supplier",sort="id
> asc",qt="/select",rows="1000"),
>   supplier
> ),
> by="supplier asc"
> ),
> sort(
>   cartesianProduct(
>
> search(census_components,q="*:*",fl="id,compoenent_name,supplier",sort="id
> asc",qt="/select",rows="1"),
> supplier
> ),
> by="supplier asc"
> ),
>   on="supplier"
> )
>
> suplier of course is a multivalued field.
>
> Is there a way to do this with parallel sql, and if not can we plan a new
> feature to add it? I could also work on it .
>
> (version 7.4)
>
> Thank you
>


Re: Use cases for the graph streams

2020-05-21 Thread Joel Bernstein
Good question. Let me first point to an interesting example in the Visual
Guide to Streaming Expressions and Math Expressions:

https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/search-sample.adoc#nodes

This example gets to the heart of the core use case for the nodes
expression which is to discover the relationships between nodes in a graph.
So it's a discovery tool to learn something new about the data that you
can't see without having this specific ability of walking the nodes in a
graph.

In the broader context the nodes expression is part of a much wider set of
tools that allow people to use Solr to explore the relationships in their
data. This is described here:

https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/math-expressions.adoc

The goal of all this is to move search engines beyond basic aggregations to
study the correlations and relationships within the data set.

Graph traversal is part of this broader goal which will get developed more
over time. I'd be interested in hearing more about specific graph use cases
that you're interested in solving.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 20, 2020 at 12:32 PM Nightingale, Jonathan A (US) <
jonathan.nighting...@baesystems.com> wrote:

> This is kind of  broad question, but I was playing with the graph streams
> and was having trouble making the tools work for what I wanted to do. I'm
> wondering if the use case for the graph streams really supports standard
> graph queries you might use with Gemlin or the like? I ask because right
> now we have two implementations of our data storage to support these two
> ways of looking at it, the standard query and the semantic filtering.
>
> The usecases I usually see for the graph streams always seem to be limited
> to one link traversal for finding things related to nodes gathered from a
> query. But even with that it wasn't clear the best way to do things with
> lists of docvalues. So for example if you wanted to represent a node that
> had many doc values I had to use cross products to make a node for each doc
> value. The traversal didn't allow for that kind of node linking inherently
> it seemed.
>
> So my question really is (and maybe this is not the place for this) what
> is the intent of these graph features and what is the goal for them in the
> future? I was really hoping at one point to only use solr for our product
> but it didn't seem feasible, at least not easily.
>
> Thanks for all your help
> Jonathan
>
> Jonathan Nightingale
> GXP Solutions Engineer
> (office) 315 838 2273
> (cell) 315 271 0688
>
>


Re: REINDEXCOLLECTION not working on an alias

2020-05-19 Thread Joel Bernstein
I believe the issue is that under the covers this feature is using the
"topic" streaming expressions which it was just reported doesn't work with
aliases. This is something that will get fixed, but for the current release
there isn't a workaround for this issue.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 19, 2020 at 8:25 AM Bjarke Buur Mortensen 
wrote:

> Hi list,
>
> I seem to be unable to get REINDEXCOLLECTION to work on a collection alias
> (running Solr 8.2.0). The documentation seems to state that that should be
> possible:
>
> https://lucene.apache.org/solr/guide/8_2/collection-management.html#reindexcollection
> "name
> Source collection name, may be an alias. This parameter is required."
>
> If I run on my alias (qa_supplier_products):
> curl "
>
> http://localhost:8983/solr/admin/collections?action=REINDEXCOLLECTION=qa_supplier_products=1=start
> I get an error:
> "org.apache.solr.common.SolrException: Unable to copy documents from
> qa_supplier_products to .rx_qa_supplier_products_6:
> {\"result-set\":{\"docs\":[\n
>  {\"DaemonOp\":\"Deamon:.rx_qa_supplier_products_6 started on
> .rx_qa_supplier_products_0_shard1_replica_n1\"
>
> If I instead point to the underlying collection, everything works fine. Now
> I have an alias pointing to an alias, which works, but ideally I would like
> to just have my main alias point to the newly reindexed collection.
>
> Can anybody help me out here?
>
> Thanks,
> /Bjarke
>


Re: using aliases in topic stream

2020-05-16 Thread Joel Bernstein
I think you probably found the problem. The persisting of checkpoints
should not use aliases but in theory the aliases could work for the main
collection. It may be that the decision was made not to support aliases for
the topic stream because the checkpoints for each shard are saved and
changing the alias would break the topic. But, I'm not sure that's a good
enough reason not to support aliases with the topic if its note that
aliases must be stable in the documentation.


Joel Bernstein
http://joelsolr.blogspot.com/


On Sat, May 16, 2020 at 11:10 AM Nightingale, Jonathan A (US) <
jonathan.nighting...@baesystems.com> wrote:

> Joel,
> I looked at the commits from that enhancement briefly. It looks like the
> work to resolve te slices was put into a central method in CLoudSolrStream.
> But to check aliases the Boolean flag needs to be true.
>
> Everywhere this is used in TopicStream its set to false (3 places). The
> first two are for retrieving and persisting the topi checkpoints, bringing
> up the question, should aliases be allowed for that collection as well?
>
> The third is probably what I'm running into.
>
>   protected void constructStreams() throws IOException {
> try {
>   ZkStateReader zkStateReader = cloudSolrClient.getZkStateReader();
>   Slice[] slices = CloudSolrStream.getSlices(this.collection,
> zkStateReader, false);
>
> Anyways, this was my 5 minute investigation. I haven't taken the effort to
> try and change it. Do you think this could be it?
>
> Jonathan
> -Original Message-
> From: Joel Bernstein 
> Sent: Thursday, May 14, 2020 6:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: using aliases in topic stream
>
> *** WARNING ***
> EXTERNAL EMAIL -- This message originates from outside our organization.
>
>
> This is where the alias work was done:
>
> https://issues.apache.org/jira/browse/SOLR-9077
>
> It could be though that there is a bug here. I'll see if I can reproduce
> it locally.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, May 14, 2020 at 6:24 PM Nightingale, Jonathan A (US) <
> jonathan.nighting...@baesystems.com> wrote:
>
> > I'm looking on master on git hub, the solrj tests assume never use
> > aliases Just as an example. that’s all over the place in the tests
> >
> >
> > https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/
> > org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java
> >
> > @Test
> >   public void testTerminatingDaemonStream() throws Exception {
> > Assume.assumeTrue(!useAlias);
> >
> > -Original Message-
> > From: Joel Bernstein 
> > Sent: Wednesday, May 13, 2020 1:11 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: using aliases in topic stream
> >
> > *** WARNING ***
> > EXTERNAL EMAIL -- This message originates from outside our organization.
> >
> >
> > What version of Solr are you using? The topic stream in master seems
> > to have the code in place to query aliases.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Wed, May 13, 2020 at 12:33 PM Nightingale, Jonathan A (US) <
> > jonathan.nighting...@baesystems.com> wrote:
> >
> > > Hi Everyone,
> > >
> > > I'm trying to run this stream and I get the following error
> > >
> > > topic(topics,collection1,
> > > q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,chec
> > > kp
> > > ointEvery=-1)
> > >
> > > {
> > >   "result-set": {
> > > "docs": [
> > >   {
> > > "EXCEPTION": "Slices not found for collection1",
> > > "EOF": true,
> > > "RESPONSE_TIME": 6
> > >   }
> > > ]
> > >   }
> > > }
> > >
> > > "collection1" is an alias. I can search using the alias perfectly
> > > fine. In fact the search stream operation works fine with the alias.
> > > It's just this topic one I've seen so far. Does anyone know why this
> is?
> > >
> > > Thanks!
> > > Jonathan Nightingale
> > >
> > >
> >
>


Re: Solr 8.1.5 Postlogs - Basic Authentication Error

2020-05-15 Thread Joel Bernstein
Right now this is not, but this would be fairly easy to add. I'll see if I
can get that in for the next release.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, May 11, 2020 at 5:03 PM Waheed, Imran 
wrote:

> Is there a way to use bin/postllogs with basic authentication on? I am
> getting error if do not give username/password
>
> bin/postlogs http://localhost:8983/solr/logs server/logs/<
> http://localhost:8983/solr/logs%20server/logs/> server/logs
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:8983/solr/logs: Expected mime type
> application/octet-stream but got text/html. 
> 
> 
> Error 401 require authentication
> 
> HTTP ERROR 401 require authentication
> 
> URI:/solr/logs/update
> STATUS:401
> MESSAGE:require authentication
> SERVLET:default
> 
>
> I get a different error if I try
> bin/postlogs -u user:@password http://localhost:8983/solr/logs
> server/logs/
>
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.solr.util.SolrLogPostTool.gatherFiles(SolrLogPostTool.java:127)
> at
> org.apache.solr.util.SolrLogPostTool.main(SolrLogPostTool.java:65)
>
> thank you,
> Imran
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>


Re: using aliases in topic stream

2020-05-14 Thread Joel Bernstein
This is where the alias work was done:

https://issues.apache.org/jira/browse/SOLR-9077

It could be though that there is a bug here. I'll see if I can reproduce it
locally.



Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, May 14, 2020 at 6:24 PM Nightingale, Jonathan A (US) <
jonathan.nighting...@baesystems.com> wrote:

> I'm looking on master on git hub, the solrj tests assume never use aliases
> Just as an example. that’s all over the place in the tests
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java
>
> @Test
>   public void testTerminatingDaemonStream() throws Exception {
> Assume.assumeTrue(!useAlias);
>
> -Original Message-
> From: Joel Bernstein 
> Sent: Wednesday, May 13, 2020 1:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: using aliases in topic stream
>
> *** WARNING ***
> EXTERNAL EMAIL -- This message originates from outside our organization.
>
>
> What version of Solr are you using? The topic stream in master seems to
> have the code in place to query aliases.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, May 13, 2020 at 12:33 PM Nightingale, Jonathan A (US) <
> jonathan.nighting...@baesystems.com> wrote:
>
> > Hi Everyone,
> >
> > I'm trying to run this stream and I get the following error
> >
> > topic(topics,collection1,
> > q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,checkp
> > ointEvery=-1)
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "EXCEPTION": "Slices not found for collection1",
> > "EOF": true,
> > "RESPONSE_TIME": 6
> >   }
> > ]
> >   }
> > }
> >
> > "collection1" is an alias. I can search using the alias perfectly
> > fine. In fact the search stream operation works fine with the alias.
> > It's just this topic one I've seen so far. Does anyone know why this is?
> >
> > Thanks!
> > Jonathan Nightingale
> >
> >
>


Re: using aliases in topic stream

2020-05-13 Thread Joel Bernstein
What version of Solr are you using? The topic stream in master seems to
have the code in place to query aliases.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 13, 2020 at 12:33 PM Nightingale, Jonathan A (US) <
jonathan.nighting...@baesystems.com> wrote:

> Hi Everyone,
>
> I'm trying to run this stream and I get the following error
>
> topic(topics,collection1,
> q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,checkpointEvery=-1)
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Slices not found for collection1",
> "EOF": true,
> "RESPONSE_TIME": 6
>   }
> ]
>   }
> }
>
> "collection1" is an alias. I can search using the alias perfectly fine. In
> fact the search stream operation works fine with the alias. It's just this
> topic one I've seen so far. Does anyone know why this is?
>
> Thanks!
> Jonathan Nightingale
>
>


Re: facets & docValues

2020-05-07 Thread Joel Bernstein
You can be pretty sure that adding static warming queries will improve your
performance following softcommits. But, opening new searchers every 2
seconds may be too fast to allow for warming so you may need to adjust. As
a general rule you cannot open searchers faster than you can warm them.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 5, 2020 at 5:54 PM Revas  wrote:

> Hi joel, No, we have not, we have softCommit requirement of 2 secs.
>
> On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:
>
> > Have you configured static warming queries for the facets? This will warm
> > the cache structures for the facet fields. You just want to make sure you
> > commits are spaced far enough apart that the warming completes before a
> new
> > searcher starts warming.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
> >
> > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> does
> > > doc Values help at all ?
> > >
> > > 1) indexed=true, docValues=true =>  all facets
> > >
> > > 2)
> > >
> > >-  indexed=true , docValues=true => only for subfacets
> > >- inexed=true, docValues=false=> facet query
> > >- docValues=true, indexed=false=> term facets
> > >
> > >
> > >
> > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > performance improved many fold
> > > In case of  2=>  over all performance showed only slight
> > > improvement
> > >
> > > Does that mean turning on docValues even for facet query helps improve
> > the
> > > performance,  fetching from docValues for facet query is faster than
> > > fetching from stored fields ?
> > >
> > > Thanks
> > >
> > >
> > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > DocValues should help when faceting over fields, i.e.
> facet.field=blah.
> > > >
> > > > I would expect docValues to help with sub facets and, but don’t know
> > > > the code well enough to say definitely one way or the other.
> > > >
> > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > and
> > > > turn docValues off. What that means is that if any operation tries to
> > > > uninvert
> > > > the index on the Java heap, you’ll get an exception like:
> > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > uninvertible=true and the type supports Uninversion:”
> > > >
> > > > See SOLR-12962
> > > >
> > > > Speed is only one issue. The entire point of docValues is to not
> > > “uninvert”
> > > > the field on the heap. This used to lead to very significant memory
> > > > pressure. So when turning docValues off, you run the risk of
> > > > reverting back to the old behavior and having unexpected memory
> > > > consumption, not to mention slowdowns when the uninversion
> > > > takes place.
> > > >
> > > > Also, unless your documents are very large, this is a tiny corpus. It
> > can
> > > > be
> > > > quite hard to get realistic numbers, the signal gets lost in the
> noise.
> > > >
> > > > You should only shard when your individual query times exceed your
> > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > time.
> > > >
> > > > Let’s further say that you can meet that requirement with 50
> > > > queries/second,
> > > > but when you get to 75 queries/second your response time exceeds your
> > > > requirements. Do NOT shard at this point. Add another replica
> instead.
> > > > Sharding adds inevitable overhead and should only be considered when
> > > > you can’t get adequate response time even under fairly light query
> > loads
> > > > as a general rule.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > > > >
> > > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> > and
> > > > > turning on the indexing on the facet fields helped improve the
> > timings
> > > of
> > > > >

Re: facets & docValues

2020-05-05 Thread Joel Bernstein
Have you configured static warming queries for the facets? This will warm
the cache structures for the facet fields. You just want to make sure you
commits are spaced far enough apart that the warming completes before a new
searcher starts warming.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, May 4, 2020 at 10:27 AM Revas  wrote:

> Hi Erick, Thanks for the explanation and advise. With facet queries, does
> doc Values help at all ?
>
> 1) indexed=true, docValues=true =>  all facets
>
> 2)
>
>-  indexed=true , docValues=true => only for subfacets
>- inexed=true, docValues=false=> facet query
>- docValues=true, indexed=false=> term facets
>
>
>
> In case of 1 above, => Indexing slowed considerably. over all facet
> performance improved many fold
> In case of  2=>  over all performance showed only slight
> improvement
>
> Does that mean turning on docValues even for facet query helps improve the
> performance,  fetching from docValues for facet query is faster than
> fetching from stored fields ?
>
> Thanks
>
>
> On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
> wrote:
>
> > DocValues should help when faceting over fields, i.e. facet.field=blah.
> >
> > I would expect docValues to help with sub facets and, but don’t know
> > the code well enough to say definitely one way or the other.
> >
> > The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> > turn docValues off. What that means is that if any operation tries to
> > uninvert
> > the index on the Java heap, you’ll get an exception like:
> > "can not sort on a field w/o docValues unless it is indexed=true
> > uninvertible=true and the type supports Uninversion:”
> >
> > See SOLR-12962
> >
> > Speed is only one issue. The entire point of docValues is to not
> “uninvert”
> > the field on the heap. This used to lead to very significant memory
> > pressure. So when turning docValues off, you run the risk of
> > reverting back to the old behavior and having unexpected memory
> > consumption, not to mention slowdowns when the uninversion
> > takes place.
> >
> > Also, unless your documents are very large, this is a tiny corpus. It can
> > be
> > quite hard to get realistic numbers, the signal gets lost in the noise.
> >
> > You should only shard when your individual query times exceed your
> > requirement. Say you have a 95%tile requirement of 1 second response
> time.
> >
> > Let’s further say that you can meet that requirement with 50
> > queries/second,
> > but when you get to 75 queries/second your response time exceeds your
> > requirements. Do NOT shard at this point. Add another replica instead.
> > Sharding adds inevitable overhead and should only be considered when
> > you can’t get adequate response time even under fairly light query loads
> > as a general rule.
> >
> > Best,
> > Erick
> >
> > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > >
> > > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > > turning on the indexing on the facet fields helped improve the timings
> of
> > > the facet query a lot which has (sub facets and facet queries). So does
> > > docValues help at all for sub facets and facet query, our tests
> > > revealed further query time improvement when we turned off the
> docValues.
> > > is that the right approach?
> > >
> > > Currently we have only 1 shard and  we are thinking of scaling by
> > > increasing the number of shards when we see a deterioration on query
> > time.
> > > Any suggestions?
> > >
> > > Thanks.
> > >
> > >
> > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > >>
> > >> I think the key is the facet queries. Now, I’m talking from
> > >> theory rather than diving into the code, but querying on
> > >> a docValues=true, indexed=false field is really doing a
> > >> search. And searching on a field like that is effectively
> > >> analogous to a table scan. Even if somehow an internal
> > >> structure would be constructed to deal with it, it would
> > >> probably be on the heap, where you don’t want it.
> > >>
> > >> So the test would be to take the queries out and measure
> > >> performance, but I think that’s the root issue here.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> > >>>
> > >>> We have faceting fields that have been defined as indexed=false,
> > >>> stored=false and docValues=true
> > >>>
> > >>> However we use a lot of subfacets  using  json facets and facet
> ranges
> > >>> using facet.queries. We see that after every soft-commit our
> > performance
> > >>> worsens and performs ideal between commits
> > >>>
> > >>> how is that docValue fields are affected by soft-commit and do we
> need
> > to
> > >>> enable indexing if we use subfacets and facet query to improve
> > >> performance?
> > >>>
> > >>> Tha
> > >>
> > >>
> >
> >
>


Re: a new CLI tool bin/postlogs

2020-03-29 Thread Joel Bernstein
As long as the data is loading you are fine I believe. We can create a
ticket to figure out that error, but it's not affecting the logic of the
load in any way.


Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Mar 29, 2020 at 2:29 AM Kayak28  wrote:

> Hello, Community:
>
> Thank you for replaying.
>
>
> I run for the single log file, then I could upload solr.log file to the
> core...
> But, I am still failing to load class "org.slf4j.impl.StaticLoggerBinder".
> Should I download some jar file and configure it to some configuration
> file?
>
>
> Sincerely,
> Kaya Ota
>
>
>
> 2020年3月28日(土) 2:13 Joel Bernstein :
>
> > It looks like it's not finding any files. Here is the code thats failing:
> >
> >
> >
> https://github.com/apache/lucene-solr/blob/35d8e3de6d5931bfd6cba3221cfd0dca7f97c1a1/solr/core/src/java/org/apache/solr/util/SolrLogPostTool.java#L126
> >
> > A couple of things to note:
> >
> > postlogs should only be run on log files. So if there are different types
> > of files in the directory it's pointed to it will have unexpected
> behavior.
> > So you can run it on a single log file, or a directory containing only
> log
> > files.
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Fri, Mar 27, 2020 at 5:18 AM Kayak28  wrote:
> >
> > > Hello, Community:
> > >
> > > Thank you for releasing Solr 8.5.0, which contains several interesting
> > > tools.
> > >  Especially, bin/postlogs is interesting one.
> > > So, I have tried to run it on my computer (not-production use) as the
> > > following.
> > >
> > > bin/postlogs http://localhost:8983/solr/logs ./server/logs/solr
> > >
> > > The result ended in:
> > >
> > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > >
> > > SLF4J: Defaulting to no-operation (NOP) logger implementation
> > >
> > > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> > further
> > > details.
> > >
> > > Exception in thread "main" java.lang.NullPointerException
> > >
> > > at
> > >
> >
> org.apache.solr.util.SolrLogPostTool.gatherFiles(SolrLogPostTool.java:127)
> > >
> > > at org.apache.solr.util.SolrLogPostTool.main(SolrLogPostTool.java:65)
> > >
> > >
> > > Is there anything I have to do before running the postlogs command ?
> > >
> > > Sincerely,
> > > Kaya Ota
> > >
> > > --
> > >
> > > Sincerely,
> > > Kaya
> > > github: https://github.com/28kayak
> > >
> >
>
>
> --
>
> Sincerely,
> Kaya
> github: https://github.com/28kayak
>


Re: a new CLI tool bin/postlogs

2020-03-27 Thread Joel Bernstein
It looks like it's not finding any files. Here is the code thats failing:

https://github.com/apache/lucene-solr/blob/35d8e3de6d5931bfd6cba3221cfd0dca7f97c1a1/solr/core/src/java/org/apache/solr/util/SolrLogPostTool.java#L126

A couple of things to note:

postlogs should only be run on log files. So if there are different types
of files in the directory it's pointed to it will have unexpected behavior.
So you can run it on a single log file, or a directory containing only log
files.



Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Mar 27, 2020 at 5:18 AM Kayak28  wrote:

> Hello, Community:
>
> Thank you for releasing Solr 8.5.0, which contains several interesting
> tools.
>  Especially, bin/postlogs is interesting one.
> So, I have tried to run it on my computer (not-production use) as the
> following.
>
> bin/postlogs http://localhost:8983/solr/logs ./server/logs/solr
>
> The result ended in:
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>
> SLF4J: Defaulting to no-operation (NOP) logger implementation
>
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
>
> Exception in thread "main" java.lang.NullPointerException
>
> at
> org.apache.solr.util.SolrLogPostTool.gatherFiles(SolrLogPostTool.java:127)
>
> at org.apache.solr.util.SolrLogPostTool.main(SolrLogPostTool.java:65)
>
>
> Is there anything I have to do before running the postlogs command ?
>
> Sincerely,
> Kaya Ota
>
> --
>
> Sincerely,
> Kaya
> github: https://github.com/28kayak
>


Re: Stream InnerJoin to merge hierarchal data

2020-02-07 Thread Joel Bernstein
This is working as designed I believe. I issue is that innerJoin relies on
the sort order of the streams in order to perform streaming merge join. The
first join works because the sorts line up on childId.

  innerJoin(search(collection_name,
q="type:grandchild",
qt="/export",
fl="grandchild. name, grandId, childId,
parentId",
sort="childId asc"),
   search(collection_name,
q="type:child",
   qt="/export",
   fl="child.name, childId, parentId",
  sort="childId asc")

The second join though is attempting join on parentId but the sorts do not
allow that as one of the joins is sorted on childid.

One possible solution is to use fetch to retrieve the parent for the child:
https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 7, 2020 at 2:23 PM sambasivarao giddaluri <
sambasiva.giddal...@gmail.com> wrote:

> Hi All,
>
> Our dataset is of 50M records and we are using complex graph query and now
> trying to do innerjoin on the records and facing the below issue .
> This is a critical issue .
>
> Parent
> {
> parentId:"1"
> parent.name:"foo"
> type:"parent"
>
> }
> Child
> {
> childId:"2"
> parentId:"1"
> child.name:"bar"
> type:"child"
> }
> GrandChild
> {
> grandId:"3"
> childId:"2"
> parentId:"1"
> grandchild.name:"too"
> type:"grandchild"
> }
>
> innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl="
> grandchild.name,grandId,childId,parentId", sort="childId asc"),
> search(collection_name, q="type:child", qt="/export",
> fl="child.name,childId,parentId",
> sort="childId asc"),
> on="childId")
>
> this works and gives result
> {
> "parentId": "1",
> "childId": "2",
> "grandId: "3",
> "grandchild.name": "too",
> "child.name": "bar"
>  }
>
> but if i try to join the parent as well with another innerjoin this gives
> error
>
> innerJoin(
> innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl="
> grandchild.name,grandId,childId,parentId", sort="childId asc"),
> search(collection_name, q="type:child", qt="/export",
> fl="child.name,childId,parentId",
> sort="childId asc"),
> on="childId"),
> search(collection_name, q="type:parent", qt="/export", fl="parent.name,
> parentId", sort="parentId asc"),on="parentId")
>
> ERROR
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Invalid JoinStream - all incoming stream comparators
> (sort) must be a superset of this stream's equalitor.",
> "EOF": true
>   }
> ]
>   }
> }
>
>
> If we change the key parentId in child doc to childParentId and similarly
> childId,parentId in grandchild doc to grandchildId,grandParentId then query
> will work but this is a big change in schema..
> i also refered this issue https://issues.apache.org/jira/browse/SOLR-10512
>
> Thanks
> sam
>


Re: Bug in scoreNodes function of streaming expressions?

2020-01-29 Thread Joel Bernstein
Here is the ticket:
https://issues.apache.org/jira/browse/SOLR-14231


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 29, 2020 at 10:03 AM Joel Bernstein  wrote:

> Hi Pratik,
>
> I'll create the ticket now and report back. If you've got a fix please
> post it to the ticket and I'll try to get this in for the next release.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex 
> wrote:
>
>> Joel Bernstein wrote
>> > Ok, that sounds like a bug. I can create a ticket for this.
>> >
>> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel 
>>
>> > pratik@
>>
>> >  wrote:
>> >
>> >> I think the problem was that my streaming expression was always
>> returning
>> >> just one node. When I added more data so that I can have more than one
>> >> node, I started seeing the result.
>> >>
>> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel 
>>
>> > pratik@
>>
>> >  wrote:
>> >>
>> >>> Hello Everyone,
>> >>>
>> >>> I am trying to execute following streaming expression with
>> "scoreNodes"
>> >>> function in it. This is taken from the documentation.
>> >>>
>> >>> scoreNodes(top(n="50",
>> >>>sort="count(*) desc",
>> >>>nodes(baskets,
>> >>>  random(baskets, q="productID:ABC",
>> >>> fl="basketID", rows="500"),
>> >>>  walk="basketID->basketID",
>> >>>  fq="-productID:ABC",
>> >>>  gather="productID",
>> >>>  count(*
>> >>>
>> >>> I have ensured that I have the collection and data present for it.
>> >>> Upon executing this, I am getting an error message as follows.
>> >>>
>> >>> "No collection param specified on request and no default collection
>> has
>> >>> been set: []"
>> >>>
>> >>> Upon digging into the source code I found that there is a possible bug
>> >>> in
>> >>> ScoreNodesStream.java
>> >>>
>> >>> StringBuilder instance is never appended any string and the block
>> which
>> >>> initializes collection, needs the length of that instance to be more
>> >>> than
>> >>> zero. This condition will always be false and hence the collection
>> will
>> >>> never be set.
>> >>>
>> >>> I checked this file in solr version 8.1 and that also has the same
>> >>> issue.
>> >>> Is there any JIRA open for this or any patch available?
>> >>>
>> >>> [image: image.png]
>> >>>
>> >>> Thanks,
>> >>> Pratik
>> >>>
>> >>
>>
>>
>> Hi Joel,
>>
>> You mentioned creating a ticket for this bug, I can't find any, was it
>> created? If not then I can create one. Currently, ScoreNodes has two
>> issues.
>>
>> 1. It fails when result has only one node.
>> 2. It triggers a GET request instead of POST. GET fails if number of nodes
>> is large.
>>
>> I have been using a custom class as workaround for #2, it would be good to
>> use the original SolrJ class.
>>
>> Thanks,
>> Pratik
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>


Re: Bug in scoreNodes function of streaming expressions?

2020-01-29 Thread Joel Bernstein
Hi Pratik,

I'll create the ticket now and report back. If you've got a fix please post
it to the ticket and I'll try to get this in for the next release.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex 
wrote:

> Joel Bernstein wrote
> > Ok, that sounds like a bug. I can create a ticket for this.
> >
> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel 
>
> > pratik@
>
> >  wrote:
> >
> >> I think the problem was that my streaming expression was always
> returning
> >> just one node. When I added more data so that I can have more than one
> >> node, I started seeing the result.
> >>
> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel 
>
> > pratik@
>
> >  wrote:
> >>
> >>> Hello Everyone,
> >>>
> >>> I am trying to execute following streaming expression with "scoreNodes"
> >>> function in it. This is taken from the documentation.
> >>>
> >>> scoreNodes(top(n="50",
> >>>sort="count(*) desc",
> >>>nodes(baskets,
> >>>  random(baskets, q="productID:ABC",
> >>> fl="basketID", rows="500"),
> >>>  walk="basketID->basketID",
> >>>  fq="-productID:ABC",
> >>>  gather="productID",
> >>>  count(*
> >>>
> >>> I have ensured that I have the collection and data present for it.
> >>> Upon executing this, I am getting an error message as follows.
> >>>
> >>> "No collection param specified on request and no default collection has
> >>> been set: []"
> >>>
> >>> Upon digging into the source code I found that there is a possible bug
> >>> in
> >>> ScoreNodesStream.java
> >>>
> >>> StringBuilder instance is never appended any string and the block which
> >>> initializes collection, needs the length of that instance to be more
> >>> than
> >>> zero. This condition will always be false and hence the collection will
> >>> never be set.
> >>>
> >>> I checked this file in solr version 8.1 and that also has the same
> >>> issue.
> >>> Is there any JIRA open for this or any patch available?
> >>>
> >>> [image: image.png]
> >>>
> >>> Thanks,
> >>> Pratik
> >>>
> >>
>
>
> Hi Joel,
>
> You mentioned creating a ticket for this bug, I can't find any, was it
> created? If not then I can create one. Currently, ScoreNodes has two
> issues.
>
> 1. It fails when result has only one node.
> 2. It triggers a GET request instead of POST. GET fails if number of nodes
> is large.
>
> I have been using a custom class as workaround for #2, it would be good to
> use the original SolrJ class.
>
> Thanks,
> Pratik
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: SQL selectable fields

2020-01-24 Thread Joel Bernstein
Does "_nest_path_" come back in a normal search? I would expect that the
fields that are returned by normal searches would also work in SQL. If that
turns out to be the case you could derive the fields from performing a
search and seeing what fields are returned.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jan 23, 2020 at 3:02 PM Nick Vercammen 
wrote:

> Hey All,
>
> is there a way to get a list of all fields in a collection that can be used
> in an SQL query? Currently I retrieve a list of fields through the schema
> api: GET col/schema/fields.
>
> This returns all fields in a collection. But when I do a select on all
> fields I get an exception because apparently _nest_path_ is no column in
> the collection table:
>
> Failed to execute sqlQuery 'SELECT  films._text_ AS text, films._nest_path_
> FROM films LIMIT 2000' against JDBC connection 'jdbc:calcitesolr:'.
> Error while executing SQL "SELECT  films._text_ AS text, films._nest_path_
> FROM films LIMIT 2000": From line 1, column 37 to line 1, column 47: Column
> '_nest_path_' not found in table 'films'
>
> Can I determine which fields can be used in a SQL query? By means of the
> type?
>
> kind regards,
>
> Nick
>


Re: Select with concat function not working in 8.4.1

2020-01-24 Thread Joel Bernstein
The concat function was changed to an evaluator. An example of the new
syntax is here:
https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/loading.adoc#unique-ids

Sorry for confusion the language should be settling down now with very of
these types of changes happening going forward.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jan 24, 2020 at 5:16 AM Guilherme Nunes <
guilherme.nu...@biologis.com> wrote:

> Greetings.
>
> A follow-up to the below with my findings.
>
> The problem seems to be that the binding of “concat” on the
> StreamFactory.functionNames maps to ConcatEvaluator in 8.4.1, while in 8.2
> it mapped to ConcatOperation.
> The mapping is defined in the file Lang.java
>
> Branch_8_4:
>
> Branch_8_2:
> The workaround to achieve a functioning move from 8.2 to 8.4.1, in our
> case, was to override the mapping of concat in the solrconfig.xml with
>  class=“org.apache.solr.client.solrj.io.ops.ConcatOperation” />
>
> Kind regards,
> Guilherme Nunes
>
> On 20 Jan 2020, at 11:38, Guilherme Nunes 
> wrote:
>
> Greetings.
>
> In upgrading to solr 8.4.1, the following streaming expression is not
> working for me:
>
> select(
> cartesianProduct(
> tuple(k1="1", k2=array(a)),
> k2, productSort="k1 asc"),
> "k1”,
> concat(fields="k1",delim=",",as="node")  )
>
> Returning “"Invalid expression
> select(cartesianProduct(tuple(k1=1,k2=array(a)),k2,productSort=\”k1
> asc\"),\"k1\",concat(fields=k1,delim=\",\",as=node)) - unknown operands
> found"”.
>
> But it works fine in 8.2.0.
>
> Am I missing something, or should a jira ticket be opened for this?
>
> Thank you
>
> Kind regards,
> Guilherme Nunes
>
>
>


Re: JSON Facet doesn't allow date range facets

2019-12-12 Thread Joel Bernstein
Searching on a single point in time would be extremely limiting because it
has millisecond precision. So range queries would be the only real way to
search the DatePointField.

I've used this construct many times on the DatePointField:

[2000-05-01T00:00:01Z TO 2019-06-02T00:00:01Z]


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Dec 12, 2019 at 7:19 AM Mel Mason 
wrote:

> I need to store date ranges in the index. While DatePointField can be
> queried using date ranges, the actual value can only be a single date -
> it can't represent a range of dates as far as I know.
>
> On 12/12/2019 12:11, Joel Bernstein wrote:
> > So something like this should work:
> >
> > [2000-05-01T00:00:01Z TO 2019-06-02T00:00:01Z]
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Dec 12, 2019 at 7:08 AM Joel Bernstein 
> wrote:
> >
> >> With the DatePointField you can still do the range query, but I believe
> >> you'll need to specify the full ISO date string: 2000-05-01T01:01:01Z
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >>
> >> On Thu, Dec 12, 2019 at 6:46 AM Mel Mason 
> >> wrote:
> >>
> >>> Unfortunately I need a date range field, e.g [2000-05-01 TO
> 2019-06-02].
> >>> DatePointFields can't represent that as far as I know.
> >>>
> >>> On 12/12/2019 11:40, Joel Bernstein wrote:
> >>>> There is a field type in the schema called pdate:
> >>>>
> >>>> 
> >>>>
> >>>> This should work for you.
> >>>>
> >>>> The timeseries Streaming Expression uses the JSON facet API for range
> >>>> faceting and works really well.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Joel Bernstein
> >>>> http://joelsolr.blogspot.com/
> >>>>
> >>>>
> >>>> On Thu, Dec 12, 2019 at 6:28 AM Mel Mason <
> mel.ma...@bodleian.ox.ac.uk>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I'm trying to have a range facet on a field of type
> >>> solr.DateRangeField.
> >>>>> As far as I can tell, this isn't possible with JSONFacet, only with
> the
> >>>>> old facet system - a quick google turns up several other people with
> >>> the
> >>>>> same problem. When using JSONFacet I get problems with this line of
> >>>>> code:
> >>>>>
> >>>>>
> >>>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java#L255
> >>>>> It looks like during the change to JSONFacet the range facets have
> been
> >>>>> restricted to only allow Trie or PointField fields. Is this likely to
> >>> be
> >>>>> fixed in future updates, or were there problems with using
> >>>>> DateRangeFields? I could use the old parameter facet system, but
> there
> >>>>> are features in JSONFacet I need.
> >>>>>
> >>>>> For completeness, the error I get:
> >>>>>
> >>>>> org.apache.solr.common.SolrException: Unable to range facet on
> >>>>> field:date_dtr
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetRangeProcessor.getCalcForField(FacetRange.java:238)
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetRangeProcessor.(FacetRange.java:122)
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetRange.createFacetProcessor(FacetRange.java:65)
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:397)
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:475)
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:432)
> >>>>>   at
> >>>>>
> >>>
> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
> >>>>>   at
> &g

Re: JSON Facet doesn't allow date range facets

2019-12-12 Thread Joel Bernstein
So something like this should work:

[2000-05-01T00:00:01Z TO 2019-06-02T00:00:01Z]

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Dec 12, 2019 at 7:08 AM Joel Bernstein  wrote:

> With the DatePointField you can still do the range query, but I believe
> you'll need to specify the full ISO date string: 2000-05-01T01:01:01Z
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Dec 12, 2019 at 6:46 AM Mel Mason 
> wrote:
>
>> Unfortunately I need a date range field, e.g [2000-05-01 TO 2019-06-02].
>> DatePointFields can't represent that as far as I know.
>>
>> On 12/12/2019 11:40, Joel Bernstein wrote:
>> > There is a field type in the schema called pdate:
>> >
>> > 
>> >
>> > This should work for you.
>> >
>> > The timeseries Streaming Expression uses the JSON facet API for range
>> > faceting and works really well.
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> >
>> > On Thu, Dec 12, 2019 at 6:28 AM Mel Mason 
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm trying to have a range facet on a field of type
>> solr.DateRangeField.
>> >> As far as I can tell, this isn't possible with JSONFacet, only with the
>> >> old facet system - a quick google turns up several other people with
>> the
>> >> same problem. When using JSONFacet I get problems with this line of
>> >> code:
>> >>
>> >>
>> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java#L255
>> >>
>> >> It looks like during the change to JSONFacet the range facets have been
>> >> restricted to only allow Trie or PointField fields. Is this likely to
>> be
>> >> fixed in future updates, or were there problems with using
>> >> DateRangeFields? I could use the old parameter facet system, but there
>> >> are features in JSONFacet I need.
>> >>
>> >> For completeness, the error I get:
>> >>
>> >> org.apache.solr.common.SolrException: Unable to range facet on
>> >> field:date_dtr
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetRangeProcessor.getCalcForField(FacetRange.java:238)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetRangeProcessor.(FacetRange.java:122)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetRange.createFacetProcessor(FacetRange.java:65)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:397)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:475)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:432)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
>> >>  at
>> >>
>> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:401)
>> >>  at
>> >> org.apache.solr.search.facet.FacetModule.process(FacetModule.java:146)
>> >>  at
>> >>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
>> >>  at
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>> >>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
>> >>  at
>> >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)
>> >>  at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)
>> >>  at
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)
>> >>  at
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>> >>  at
>> >>
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>> >>  at
>> >>
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>> >>  at
>> >>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>> >>  at
>> >>
>>

Re: JSON Facet doesn't allow date range facets

2019-12-12 Thread Joel Bernstein
With the DatePointField you can still do the range query, but I believe
you'll need to specify the full ISO date string: 2000-05-01T01:01:01Z

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Dec 12, 2019 at 6:46 AM Mel Mason 
wrote:

> Unfortunately I need a date range field, e.g [2000-05-01 TO 2019-06-02].
> DatePointFields can't represent that as far as I know.
>
> On 12/12/2019 11:40, Joel Bernstein wrote:
> > There is a field type in the schema called pdate:
> >
> > 
> >
> > This should work for you.
> >
> > The timeseries Streaming Expression uses the JSON facet API for range
> > faceting and works really well.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Dec 12, 2019 at 6:28 AM Mel Mason 
> > wrote:
> >
> >> Hi,
> >>
> >> I'm trying to have a range facet on a field of type solr.DateRangeField.
> >> As far as I can tell, this isn't possible with JSONFacet, only with the
> >> old facet system - a quick google turns up several other people with the
> >> same problem. When using JSONFacet I get problems with this line of
> >> code:
> >>
> >>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java#L255
> >>
> >> It looks like during the change to JSONFacet the range facets have been
> >> restricted to only allow Trie or PointField fields. Is this likely to be
> >> fixed in future updates, or were there problems with using
> >> DateRangeFields? I could use the old parameter facet system, but there
> >> are features in JSONFacet I need.
> >>
> >> For completeness, the error I get:
> >>
> >> org.apache.solr.common.SolrException: Unable to range facet on
> >> field:date_dtr
> >>  at
> >>
> org.apache.solr.search.facet.FacetRangeProcessor.getCalcForField(FacetRange.java:238)
> >>  at
> >>
> org.apache.solr.search.facet.FacetRangeProcessor.(FacetRange.java:122)
> >>  at
> >>
> org.apache.solr.search.facet.FacetRange.createFacetProcessor(FacetRange.java:65)
> >>  at
> >> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:397)
> >>  at
> >>
> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:475)
> >>  at
> >>
> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:432)
> >>  at
> >>
> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
> >>  at
> >> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:401)
> >>  at
> >> org.apache.solr.search.facet.FacetModule.process(FacetModule.java:146)
> >>  at
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
> >>  at
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> >>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
> >>  at
> >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)
> >>  at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)
> >>  at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)
> >>  at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
> >>  at
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> >>  at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> >>  at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> >>  at
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> >>  at
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> >>  at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> >>  at
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
> >>  at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> >>  at
> >>
> org.eclipse.jetty.server.h

Re: JSON Facet doesn't allow date range facets

2019-12-12 Thread Joel Bernstein
There is a field type in the schema called pdate:



This should work for you.

The timeseries Streaming Expression uses the JSON facet API for range
faceting and works really well.




Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Dec 12, 2019 at 6:28 AM Mel Mason 
wrote:

> Hi,
>
> I'm trying to have a range facet on a field of type solr.DateRangeField.
> As far as I can tell, this isn't possible with JSONFacet, only with the
> old facet system - a quick google turns up several other people with the
> same problem. When using JSONFacet I get problems with this line of
> code:
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java#L255
>
> It looks like during the change to JSONFacet the range facets have been
> restricted to only allow Trie or PointField fields. Is this likely to be
> fixed in future updates, or were there problems with using
> DateRangeFields? I could use the old parameter facet system, but there
> are features in JSONFacet I need.
>
> For completeness, the error I get:
>
> org.apache.solr.common.SolrException: Unable to range facet on
> field:date_dtr
> at
> org.apache.solr.search.facet.FacetRangeProcessor.getCalcForField(FacetRange.java:238)
> at
> org.apache.solr.search.facet.FacetRangeProcessor.(FacetRange.java:122)
> at
> org.apache.solr.search.facet.FacetRange.createFacetProcessor(FacetRange.java:65)
> at
> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:397)
> at
> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:475)
> at
> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:432)
> at
> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
> at
> org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:401)
> at
> org.apache.solr.search.facet.FacetModule.process(FacetModule.java:146)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:502)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)

Re: Solr Admin Console hangs on Chrome

2019-12-10 Thread Joel Bernstein
Did a recent change to Chrome cause this?

In Solr 8x, I'm not seeing slowness with Chrome on Mac.



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Dec 10, 2019 at 8:26 PM SAGAR INGALE 
wrote:

> I am also facing the same issue for v6.4.0
>
> On Wed, 11 Dec, 2019, 5:37 AM Joel Bernstein,  wrote:
>
> > What version of Solr?
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Dec 10, 2019 at 5:58 PM Arnold Bronley 
> > wrote:
> >
> > > I am also facing similar issue. I have also switched to other browsers
> to
> > > solve this issue.
> > >
> > > On Tue, Dec 10, 2019 at 2:22 PM Webster Homer <
> > > webster.ho...@milliporesigma.com> wrote:
> > >
> > > > It seems like the Solr Admin console has become slow when you use it
> on
> > > > the chrome browser. If I go to the query tab and execute a query,
> even
> > > the
> > > > default *:* after that the browser window becomes very slow.
> > > > I'm using chrome Version 78.0.3904.108 (Official Build) (64-bit) on
> > > Windows
> > > >
> > > > The work around is to use Firefox
> > > >
> > > >
> > > >
> > > > This message and any attachment are confidential and may be
> privileged
> > or
> > > > otherwise protected from disclosure. If you are not the intended
> > > recipient,
> > > > you must not copy this message or attachment or disclose the contents
> > to
> > > > any other person. If you have received this transmission in error,
> > please
> > > > notify the sender immediately and delete the message and any
> attachment
> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not accept liability for any omissions or errors in
> > this
> > > > message which may arise as a result of E-Mail-transmission or for
> > damages
> > > > resulting from any unauthorized changes of the content of this
> message
> > > and
> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not guarantee that this message is free of viruses
> and
> > > does
> > > > not accept liability for any damages caused by any virus transmitted
> > > > therewith.
> > > >
> > > >
> > > >
> > > > Click http://www.merckgroup.com/disclaimer to access the German,
> > French,
> > > > Spanish and Portuguese versions of this disclaimer.
> > > >
> > >
> >
>


Re: Solr Admin Console hangs on Chrome

2019-12-10 Thread Joel Bernstein
What version of Solr?



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Dec 10, 2019 at 5:58 PM Arnold Bronley 
wrote:

> I am also facing similar issue. I have also switched to other browsers to
> solve this issue.
>
> On Tue, Dec 10, 2019 at 2:22 PM Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
>
> > It seems like the Solr Admin console has become slow when you use it on
> > the chrome browser. If I go to the query tab and execute a query, even
> the
> > default *:* after that the browser window becomes very slow.
> > I'm using chrome Version 78.0.3904.108 (Official Build) (64-bit) on
> Windows
> >
> > The work around is to use Firefox
> >
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> >
> >
> > Click http://www.merckgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>


Re: How do I add my own Streaming Expressions?

2019-11-20 Thread Joel Bernstein
Yeah this not documented. Here are two links that will be helpful:

https://issues.apache.org/jira/browse/SOLR-9103

Slide 40 Shows the solrconfig.xml approach to registering new streams:
https://www.slideshare.net/lucidworks/creating-new-streams-presented-by-dennis-gove-bloomberg-lp



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 19, 2019 at 3:04 PM Eric Pugh 
wrote:

> The documentation in the StreamHandler suggests adding into Solrconfig
> some streamFunctions:
>
>  * 
>  *   name="group">org.apache.solr.client.solrj.io.stream.ReducerStream
>  *   name="count">org.apache.solr.client.solrj.io.stream.RecordCountStream
>  * 
>
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/StreamHandler.java#L114
>
> What is happening in StreamHandler doesn’t seem to be working, however in
> the similar GraphHandler, there is a call to “streamFunctions”:
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/GraphHandler.java#L90
>
> I’m still debugging this…
>
> Eric
>
>
>
> > On Nov 15, 2019, at 9:43 PM, Eric Pugh 
> wrote:
> >
> > What is the process for adding new Streaming Expressions?
> >
> > It appears that the org.apache.solr.client.solrj.io.Lang method
> statically loads all the streaming expressions?
> >
> > Eric
> >
> > ___
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: Solr 8.3 Solrj streaming expressions do not return all field values

2019-11-05 Thread Joel Bernstein
I'll probably need some more details. One thing that's useful is to look at
the logs and see the underlying Solr queries that are generated. Then try
those underlying queries against the Solr index and see what comes back. If
you're not seeing the fields with the plain Solr queries then we know it's
something going on below streaming expressions. If you are seeing the
fields then it's the expressions themselves that are not handling the data
as expected.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Nov 4, 2019 at 9:09 AM Jörn Franke  wrote:

> Most likely this issue can bei also reproduced in the admin UI for the
> streaming handler of a collection.
>
> > Am 04.11.2019 um 13:32 schrieb Jörn Franke :
> >
> > Hi,
> >
> > I use streaming expressions, e.g.
> > Sort(Select(search(...),id,if(eq(1,1),Y,N) as found), by=“field A asc”)
> > (Using export handler, sort is not really mandatory , I will remove it
> later anyway)
> >
> > This works perfectly fine if I use Solr 8.2.0 (server + client). It
> returns Tuples in the form { “id”,”12345”, “found”:”Y”}
> >
> > However, if I use Solr 8.2.0 as server and Solr 8.3.0 as client then the
> above statement only returns the id field, but not the found field.
> >
> > Questions:
> > 1) is this expected behavior, ie Solr client 8.3.0 is in this case not
> compatible with Solr 8.2.0 and server upgrade to Solr 8.3.0 will fix this?
> > 2) has the syntax for the above expression changed? If so how?
> > 3) is this not expected behavior and I should create a Jira for it?
> >
> > Thank you.
> > Best regards
>


Re: [CAUTION] Converting graph query to stream graph query

2019-10-18 Thread Joel Bernstein
I believe we were debugging why graph results were not being returned in a
different thread. It looks like the same problem.

Is your Solr instance a straight install or have you moved config files
from an older version of Solr to a newer version of Solr.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 16, 2019 at 1:09 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> I need to gather all the children of docid  1 . Root item has parent as
> null. (Sample data below)
>
> Tried as below
>
> nodes(graphtest,
>   walk="1->parent",
>   gather="docid",
>   scatter="branches, leaves")
>
> Response :
> {
>   "result-set": {
> "docs": [
>   {
> "node": "1",
> "collection": "graphtest,",
> "field": "node",
> "level": 0
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 5
>   }
> ]
>   }
> }
>
> Query just gets the  root item and not it's children. Looks like I am
> missing something obvious . Any pointers , please.
>
> As I said earlier the below graph query gets all the children of docid 1.
>
> fq={!graph from=parent to=docid}docid:"1"
>
> Thanks,
> Rajeswari
>
>
>
> On 10/15/19, 12:04 PM, "Natarajan, Rajeswari" <
> rajeswari.natara...@sap.com> wrote:
>
> Hi,
>
>
> curl -XPOST -H 'Content-Type: application/json' '
> http://localhost:8983/solr/ggg/update' --data-binary '{
> "add" : { "doc" : { "id" : "a", "docid" : "1", "name" : "Root document
> one" } },
> "add" : { "doc" : { "id" : "b", "docid" : "2", "name" : "Root document
> two" } },
> "add" : { "doc" : {  "id" : "c", "docid" : "3", "name" : "Root
> document three" } },
> "add" : { "doc" : {  "id" : "d", "docid" : "11", "parent" : "1",
> "name" : "First level document 1, child one" } },
> "add" : { "doc" : {  "id" : "e", "docid" : "12", "parent" : "1",
> "name" : "First level document 1, child two" } },
> "add" : { "doc" : {  "id" : "f", "docid" : "13", "parent" : "1",
> "name" : "First level document 1, child three" } },
> "add" : { "doc" : {  "id" : "g", "docid" : "21", "parent" : "2",
> "name" : "First level document 2, child one" } },
> "add" : { "doc" : {  "id" : "h", "docid" : "22", "parent" : "2",
> "name" : "First level document 2, child two" } },
> "add" : { "doc" : {  "id" : "j", "docid" : "121", "parent" : "12",
> "name" : "Second level document 12, child one" } },
> "add" : { "doc" : {  "id" : "k", "docid" : "122", "parent" : "12",
> "name" : "Second level document 12, child two" } },
> "add" : { "doc" : {  "id" : "l", "docid" : "131", "parent" : "13",
> "name" : "Second level document 13, child three" } },
> "commit" : {}
> }'
>
>
> For the above data , the below query gets all the children of document
> with docid 1.
>
>
> http://localhost:8983/solr/graphtest/select?q=*:*={!graph%20from=parent%20to=docid}docid
> <http://localhost:8983/solr/graphtest/select?q=*:*=%7B!graph%20from=parent%20to=docid%7Ddocid>
> :"1<
> http://localhost:8983/solr/graphtest/select?q=*:*=%7b!graph%20from=parent%20to=docid%7ddocid:%221
> >"
>
>
> How can I convert this query into streaming graph query with nodes
> expression.
>
> Thanks,
> Rajeswari
>
>
>
>


Re: Help with Stream Graph

2019-10-18 Thread Joel Bernstein
The query that is created to me looks looked good but it returns no
results. Let's just do a basic query using the select handler:

product_s:product1

If this brings back zero results then we know we have a problem with the
data.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Oct 18, 2019 at 1:41 PM Rajeswari Natarajan 
wrote:

> Hi Joel,
>
> Do you see anything wrong in the config or data . I am using 7.6.
>
> Thanks,
> Rajeswari
>
> On Thu, Oct 17, 2019 at 8:36 AM Rajeswari Natarajan 
> wrote:
>
> > My config is from
> >
> >
> >
> https://github.com/apache/lucene-solr/tree/branch_7_6/solr/solrj/src/test-files/solrj/solr/configsets/streaming/conf
> >
> >
> > 
> >
> >  > docValues="true"/>
> >
> >
> >
> > 
> >
> >  omitNorms="true"
> > positionIncrementGap="0"/>
> >
> >
> >
> > Thanks,
> >
> > Rajeswari
> >
> > On Thu, Oct 17, 2019 at 8:16 AM Rajeswari Natarajan 
> > wrote:
> >
> >> I tried below query  and it returns o results
> >>
> >>
> >>
> http://localhost:8983/solr/knr/export?{!terms+f%3Dproduct_s}product1=false=basket_s,product_s=basket_s+asc,product_s+asc=json=2.2
> <http://localhost:8983/solr/knr/export?%7B!terms+f%3Dproduct_s%7Dproduct1=false=basket_s,product_s=basket_s+asc,product_s+asc=json=2.2>
> >> <
> http://localhost:8983/solr/knr/export?%7B!terms+f%3Dproduct_s%7Dproduct1=false=basket_s,product_s=basket_s+asc,product_s+asc=json=2.2
> >
> >>
> >>
> >> {
> >>   "responseHeader":{"status":0},
> >>   "response":{
> >> "numFound":0,
> >> "docs":[]}}
> >>
> >> Regards,
> >> Rajeswari
> >> On Thu, Oct 17, 2019 at 8:05 AM Rajeswari Natarajan  >
> >> wrote:
> >>
> >>> Thanks Joel.
> >>>
> >>> Here is the logs for below request
> >>>
> >>> curl --data-urlencode
> >>> 'expr=gatherNodes(knr,walk="product1->product_s",gather="basket_s")'
> >>> http://localhost:8983/solr/knr/stream
> >>>
> >>> 2019-10-17 15:02:06.969 INFO  (qtp952486988-280) [c:knr s:shard1
> >>> r:core_node2 x:knr_shard1_replica_n1] o.a.s.c.S.Request
> >>> [knr_shard1_replica_n1]  webapp=/solr path=/stream
> >>>
> params={expr=gatherNodes(knr,walk%3D"product1->product_s",gather%3D"basket_s")}
> >>> status=0 QTime=0
> >>>
> >>> 2019-10-17 15:02:06.975 INFO  (qtp952486988-192) [c:knr s:shard1
> >>> r:core_node2 x:knr_shard1_replica_n1] o.a.s.c.S.Request
> >>> [knr_shard1_replica_n1]  webapp=/solr path=/export
> >>>
> params={q={!terms+f%3Dproduct_s}product1=false=off=basket_s,product_s=basket_s+asc,product_s+asc=json=2.2}
> >>> hits=0 status=0 QTime=1
> >>>
> >>>
> >>>
> >>> Here is the logs for
> >>>
> >>>
> >>>
> >>> curl --data-urlencode
> >>>
> 'expr=gatherNodes(knr,walk="product1->product_s",gather="basket_s",scatter="branches,
> >>> leaves")' http://localhost:8983/solr/knr/stream
> >>>
> >>>
> >>> 2019-10-17 15:03:57.068 INFO  (qtp952486988-356) [c:knr s:shard1
> >>> r:core_node2 x:knr_shard1_replica_n1] o.a.s.c.S.Request
> >>> [knr_shard1_replica_n1]  webapp=/solr path=/stream
> >>>
> params={expr=gatherNodes(knr,walk%3D"product1->product_s",gather%3D"basket_s",scatter%3D"branches,+leaves")}
> >>> status=0 QTime=0
> >>>
> >>> 2019-10-17 15:03:57.071 INFO  (qtp952486988-400) [c:knr s:shard1
> >>> r:core_node2 x:knr_shard1_replica_n1] o.a.s.c.S.Request
> >>> [knr_shard1_replica_n1]  webapp=/solr path=/export
> >>>
> params={q={!terms+f%3Dproduct_s}product1=false=off=basket_s,product_s=basket_s+asc,product_s+asc=json=2.2}
> >>> hits=0 status=0 QTime=0
> >>>
> >>>
> >>>
> >>>
> >>> Thank you,
> >>>
> >>> Rajeswari
> >>>
> >>> On Thu, Oct 17, 2019 at 5:23 AM Joel Bernstein 
> >>> wrote:
> >>>
> >>>> Can you show the logs from this request. There will be a Solr query
> that
> >>>> gets sent with product1 searched against the product_s field. Let's
> see
> >>>&

Re: Help with Stream Graph

2019-10-17 Thread Joel Bernstein
Can you show the logs from this request. There will be a Solr query that
gets sent with product1 searched against the product_s field. Let's see how
many documents that query returns.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Oct 17, 2019 at 1:41 AM Rajeswari Natarajan 
wrote:

> Hi,
>
> Since the stream graph query for my use case , didn't work as  i took the
> data from solr source code test and also copied the schema and
> solrconfig.xml from solr 7.6 source code.  Had to substitute few variables.
>
> Posted below data
>
> curl -X POST http://localhost:8983/solr/knr/update -H
> 'Content-type:text/csv' -d '
> id, basket_s, product_s, prics_f
> 90,basket1,product1,20
> 91,basket1,product3,30
> 92,basket1,product5,1
> 93,basket2,product1,2
> 94,basket2,product6,5
> 95,basket2,product7,10
> 96,basket3,product4,20
> 97,basket3,product3,10
> 98,basket3,product1,10
> 99,basket4,product4,40
> 110,basket4,product3,10
> 111,basket4,product1,10'
> After this I committed and made sure the data got published. to solr
>
> curl --data-urlencode
> 'expr=gatherNodes(knr,walk="product1->product_s",gather="basket_s")'
> http://localhost:8983/solr/knr/stream
>
> {
>
>   "result-set":{
>
> "docs":[{
>
> "EOF":true,
>
> "RESPONSE_TIME":4}]}}
>
>
> and if I add *scatter="branches, leaves" , there is one doc.*
>
>
>
> curl --data-urlencode
>
> 'expr=gatherNodes(knr,walk="product1->product_s",gather="basket_s",scatter="branches,
> leaves")' http://localhost:8983/solr/knr/stream
>
> {
>
>   "result-set":{
>
> "docs":[{
>
> "node":"product1",
>
> "collection":"knr",
>
> "field":"node",
>
> "level":0}
>
>   ,{
>
> "EOF":true,
>
> "RESPONSE_TIME":4}]}}
>
>
>
>
> Below is the data I got from
>
> https://github.com/apache/lucene-solr/blob/branch_7_6/solr/solrj/src/test/org/apache/solr/client/solrj/io/graph/GraphExpressionTest.java#L271
>
>
>
> According to this test 4 docs are expected.
>
>
> I am not sure what I am missing. Any pointers, please
>
>
> Thanks you,
>
> Rajeswari
>


Re: The Visual Guide to Streaming Expressions and Math Expressions

2019-10-16 Thread Joel Bernstein
Hi Pratik,

The visualizations are all done using Apache Zeppelin and the Zeppelin-Solr
interpreter. The getting started part of the user guide provides links for
Zeppelin-Solr. The install process in pretty quick. This is all open
source, freely available software. It's possible that Zepplin-Solr can be
incorporated into the Solr code eventually but the test frameworks are
quite different. I think some simple scripts can be included with the Solr
to automated the downloads for Zeppelin and Zeppelin-Solr.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 16, 2019 at 11:27 AM Pratik Patel  wrote:

> Hi Joel,
>
> Looks like this is going to be very helpful, thank you! I am wondering
> whether the visualizations are generated through third party library or is
> it something which would be part of solr distribution?
>
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/visualization.adoc#visualization
>
>
> Thanks,
> Pratik
>
>
> On Wed, Oct 16, 2019 at 10:54 AM Joel Bernstein 
> wrote:
>
> > Hi,
> >
> > The Visual Guide to Streaming Expressions and Math Expressions is now
> > complete. It's been published to Github at the following location:
> >
> >
> >
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/math-expressions.adoc#streaming-expressions-and-math-expressions
> >
> > The guide will eventually be part of Solr's release when the RefGuide is
> > ready to accommodate it. In the meantime its been designed to be easily
> > read directly from Github.
> >
> > The guide contains close to 200 visualizations and examples showing how
> to
> > use Streaming Expressions and Math Expressions for data analysis and
> > visualization. The visual guide is also designed to guide users that are
> > not experts in math in how to apply the functions to analysis and
> visualize
> > data.
> >
> > The new visual data loading feature in Solr 8.3 is also covered in the
> > guide. This feature should cut down on the time it takes to load CSV
> files
> > so that more time can be spent on analysis and visualization.
> >
> >
> >
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/loading.adoc#loading-data
> >
> > Joel Bernstein
> >
>


The Visual Guide to Streaming Expressions and Math Expressions

2019-10-16 Thread Joel Bernstein
Hi,

The Visual Guide to Streaming Expressions and Math Expressions is now
complete. It's been published to Github at the following location:

https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/math-expressions.adoc#streaming-expressions-and-math-expressions

The guide will eventually be part of Solr's release when the RefGuide is
ready to accommodate it. In the meantime its been designed to be easily
read directly from Github.

The guide contains close to 200 visualizations and examples showing how to
use Streaming Expressions and Math Expressions for data analysis and
visualization. The visual guide is also designed to guide users that are
not experts in math in how to apply the functions to analysis and visualize
data.

The new visual data loading feature in Solr 8.3 is also covered in the
guide. This feature should cut down on the time it takes to load CSV files
so that more time can be spent on analysis and visualization.

https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/loading.adoc#loading-data

Joel Bernstein


Re: igain query parser generating invalid output

2019-10-11 Thread Joel Bernstein
This sounds like a great patch. I can help with the review and commit after
the jira is created.

Thanks!

Joel


On Fri, Oct 11, 2019 at 1:06 AM Peter Davie <
peter.da...@convergentsolutions.com.au> wrote:

> Hi,
>
> I apologise in advance for the length of this email, but I want to share
> my discovery steps to make sure that I haven't missed anything during my
> investigation...
>
> I am working on a classification project and will be using the
> classify(model()) stream function to classify documents.  I have noticed
> that models generated include many noise terms from the (lexically)
> early part of the term list.  To test, I have used the /BBC articles
> fulltext and category //dataset from Kaggle/
> (https://www.kaggle.com/yufengdev/bbc-fulltext-and-category). I have
> indexed the data into a Solr collection (news_categories) and am
> performing the following operation to generate a model for documents
> categorised as "BUSINESS" (only keeping the 100th iteration):
>
> having(
>  train(
>  news_categories,
>  features(
>  news_categories,
>  zkHost="localhost:9983",
>  q="*:*",
>  fq="role:train",
>  fq="category:BUSINESS",
>  featureSet="business",
>  field="body",
>  outcome="positive",
>  numTerms=500
>  ),
>  fq="role:train",
>  fq="category:BUSINESS",
>  zkHost="localhost:9983",
>  name="business_model",
>  field="body",
>  outcome="positive",
>  maxIterations=100
>  ),
>  eq(iteration_i, 100)
> )
>
> The output generated includes "noise" terms, such as the following
> "1,011.15", "10.3m", "01", "02", "03", "10.50", "04", "05", "06", "07",
> "09", and these terms all have the same value for idfs_ds ("-Infinity").
>
> Investigating the "features()" output, it seems that the issue is that
> the noise terms are being returned with NaN for the score_f field:
>
>  "docs": [
>{
>  "featureSet_s": "business",
>  "score_f": "NaN",
>  "term_s": "1,011.15",
>  "idf_d": "-Infinity",
>  "index_i": 1,
>  "id": "business_1"
>},
>{
>  "featureSet_s": "business",
>  "score_f": "NaN",
>  "term_s": "10.3m",
>  "idf_d": "-Infinity",
>  "index_i": 2,
>  "id": "business_2"
>},
>{
>  "featureSet_s": "business",
>  "score_f": "NaN",
>  "term_s": "01",
>  "idf_d": "-Infinity",
>  "index_i": 3,
>  "id": "business_3"
>},
>{
>  "featureSet_s": "business",
>  "score_f": "NaN",
>  "term_s": "02",
>  "idf_d": "-Infinity",
>  "index_i": 4,
>  "id": "business_4"
>},...
>
> I have examined the code within
> org/apache/solr/client/solrj/io/streamFeatureSelectionStream.java and
> see that the scores being returned by {!igain} include NaN values, as
> follows:
>
> {
>"responseHeader":{
>  "zkConnected":true,
>  "status":0,
>  "QTime":20,
>  "params":{
>"q":"*:*",
>"distrib":"false",
>"positiveLabel":"1",
>"field":"body",
>"numTerms":"300",
>"fq":["category:BUSINESS",
>  "role:train",
>  "{!igain}"],
>"version":"2",
>"wt":"json",
>"outcome":"positive",
>"_":"1569982496170"}},
>"featuredTerms":[
>  "0","NaN",
>  "0.0051","NaN",
>  "0.01","NaN",
>  "0.02","NaN",
>  "0.03","NaN",
>
> Looking intoorg/apache/solr/search/IGainTermsQParserPlugin.java, it
> seems that when a term is not included in the positive or negative
> documents, the docFreq calculation (docFreq = xc + nc) is 0, which means
> that subsequent calculations result in NaN (division by 0) which
> generates these meaningless values for the computed score.
>
> I have patched a local version of Solr to skip terms for which docFreq
> is 0 in the finish() method of IGainTermsQParserPlugin and this is now
> the result:
>
> {
>"responseHeader":{
>  "zkConnected":true,
>  "status":0,
>  "QTime":260,
>  "params":{
>"q":"*:*",
>"distrib":"false",
>"positiveLabel":"1",
>"field":"body",
>"numTerms":"300",
>"fq":["category:BUSINESS",
>  "role:train",
>  "{!igain}"],
>"version":"2",
>"wt":"json",
>"outcome":"positive",
>"_":"1569983546342"}},
>"featuredTerms":[
>  "3",-0.0173133558644304,
>  "authority",-0.0173133558644304,
>  "brand",-0.0173133558644304,
>  "commission",-0.0173133558644304,
>  "compared",-0.0173133558644304,
>  "condition",-0.0173133558644304,
>  "continuing",-0.0173133558644304,
>  "deficit",-0.0173133558644304,
>  "expectation",-0.0173133558644304,
>
> To my (admittedly inexpert) eye, it seems like this is producing more

Re: Trying to add model name to classify() output

2019-09-25 Thread Joel Bernstein
You can use the val function, which will just returns the string.

val(CRIME) as expected

Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Sep 23, 2019 at 10:00 PM Peter Davie <
peter.da...@convergentsolutions.com.au> wrote:

> Hi,
>
> I have trained a number of logistic regression classification models
> (using train()) and I am now trying to evaluate these models.  I want to
> add the model name to the classify() (output) stream.  I am trying to
> use the following select() with setValue() as follows:
>
> select(
>  classify(
>  model(
>  models,
>  id="crime_model",
>  cacheMillis=5000
>  ),
>  search(
>  news_categories,
>  sort="id asc",
>  q="role:test",
>  qt="/export",
>  fl="id,body",
>  rows=5
>  ),
>  field="body"
>  ),
>  id,
>  score_d,
>  probability_d,
>  setValue("expected","CRIME") as expected
> )
>
> However, I am not seeing the "expected" field in the output stream:
>
> {
>"result-set": {
>  "docs": [
>{
>  "probability_d": 0.9807157418649378,
>  "score_d": 1.7570993028820825,
>  "id": "0001b92f-da6e-41a6-8518-a0d083c0f870"
>},
>{
>  "probability_d": 0.7310585786300049,
>  "score_d": 0.24253562092781067,
>  "id": "0003b45b-aab9-4635-8f93-903c6f492355"
>},
>{
>  "probability_d": 0.7310585786300049,
>  "score_d": 0.2773500978946686,
>  "id": "0008ecb1-3add-4ef5-85e1-736bf37a834b"
>},
>etc.
>  ]}
> }
>
> Can anyone point out what am I doing wrong?
>
> Peter
>
>
>


Re: Incremental export of a huge collection

2019-09-12 Thread Joel Bernstein
This will do what you describe:

https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#topic

Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Sep 9, 2019 at 4:18 PM Mikhail Khludnev  wrote:

> Isn't _version_ a timestamp of insertion by default?
>
> On Mon, Sep 9, 2019 at 9:47 PM Vidit Asthana 
> wrote:
>
> > Hi,
> >
> > I am building a service where I have to continously read data from a Solr
> > collection and insert it into another database. Collection will receive
> > daily updates. Initial size of collection is very large. After I have
> > indexed whole data(through cursor mark), on daily basis I want to only do
> > incremental inserts.
> >
> > My documents don't have anything like timestamp which I can use to fetch
> > "only newly added" documents after a certain point. Is there any internal
> > field which I can use to create this checkpoint and then later use that
> to
> > fetch "only incremental updates" from that point onwards?
> >
> > I initially tried to sort the document by ID and use last fetched cursor
> > mark, but my unique-ID field is a string and there is NO guarantee that
> > newly added document's ID will be in sorted order.
> >
> > Solr version is 8.2.0.
> >
> > Regards,
> > Vidit
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: using let() with other streaming expressions

2019-08-16 Thread Joel Bernstein
Yes, the examples you show will fail because the "let" expression reads
streams into an in-memory List. All the Streaming Expressions expect a
TupleStream to be passed in rather that a List.

There is an undocumented function that turns a List of tuples back into a
Stream. The function is called "stream".

Here is the syntax:

let(
  a=search(techproducts, q="cat:electronics", fl="id, manu, price",
sort="id asc"),
  b=search(techproducts, q="cat:electronics", fl="id, popularity,
_version_", sort="id asc"),
  c=innerJoin(stream(a),stream(b), on=id)
)




Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Aug 16, 2019 at 4:30 AM Viktors Belovs 
wrote:

> Dear Solr Comunity,
>
> Recently I've been working with the 'let()' expression.
> And I got in a sort of trouble, when I was trying combining it with the
> different streaming expressions,
> as well as trying to re-assign variables.
>
> As an example:
> let(
>   a=search(techproducts, q="cat:electronics", fl="id, manu, price",
> sort="id asc"),
>   b=search(techproducts, q="cat:electronics", fl="id, popularity,
> _version_", sort="id asc"),
>   c=innerJoin(a, b, on=id)
> )
>
> In case with re-assigning the variables:
> let(
>   a=search(techproducts, q="cat:electronics", fl="id, manu, price",
> sort="id asc"),
>   b=a,
>   c=innerJoin(a, b, on=id)
> )
>
> According to documentation (Solr v8.1 the version I use) its possible to
> store any kind values with 'let()'
> function but it seems the usage of such a function is strictly limited for
> specific mathematical operations.
>
> I was wondering if there is possible way to reduce the verbosity and
> (potentially)
> increase the efficiency of the streaming expression's performance, while
> dealing and constructing complex
> combinations of different streaming expressions.
>
> I assume the 'let()' doesn't suited for such purposes, but perhaps there
> is an alternative way to do such a thing.
>
> Regards,
> Viktors


Re: SQL equality predicate escaping single quotes

2019-08-09 Thread Joel Bernstein
It does appear that single quotes are being removed. If you want to provide
a patch that allows single quotes to get passed through, I can help with
testing and committing.


On Thu, Aug 8, 2019 at 11:28 AM Kyle Lilly  wrote:

> Hi,
>
> When using the SQL handler is there any way to escape single quotes in
> boolean predicates? A query like:
>
> SELECT title FROM books WHERE author_lastname = 'O''Reilly'
>
> Will return no results for authors with the last name "O'Reilly" but will
> return hits for books with a last name of "OReilly". I can perform a
> standard Solr term search using "lastname:O'Reilly" and get back the
> expected results. Looking through the code it appears all single quotes are
> stripped from term values in the SQL handler -
>
> https://github.com/apache/lucene-solr/blame/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/java/org/apache/solr/handler/sql/SolrFilter.java#L136
> .
> If this is by design is there any way to use single quotes in a term
> predicate with SQL?
>
> Thanks.
>
> - Kyle
>


Re: Returning multiple fields in graph streaming expression response documents

2019-07-21 Thread Joel Bernstein
Good to hear.

Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Jul 21, 2019 at 5:21 PM Ahmed Adel  wrote:

> Yeah, it turned out to be related to the data. The “fetch” method works
> fine as you described, it’s just the data distribution that caused name
> field not to be fetched in a number of responses. I tested it with two
> other collections and it worked as expected as well. Thank you for your
> help getting this running.
>
> Best,
> A. Adel
>
> On Sun, Jul 21, 2019 at 2:36 AM Joel Bernstein  wrote:
>
> > Ok, then it sounds like a different issue. Let's look at the logs
> following
> > a request and see what the issue is. There will be a log record that
> shows
> > the query that is sent to Solr by the fetch expression. When we look at
> > that log we'll be able to see what the query is, and if results are
> > returned. It could be a bug in the code or it could be something related
> to
> > the data that's being fetched.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Sat, Jul 20, 2019 at 5:21 PM Ahmed Adel  wrote:
> >
> > > To validate this, I indexed the datasets and ran the same query on Solr
> > > 6.5.0 environment (https://archive.apache.org/dist/lucene/solr/6.5.0/)
> > > before cb9f15 commit gets into release but got the same response, no
> > > additional fields, as Solr 8.1.1. I have used the default managed
> schema
> > > settings in both Solr versions, which I guess means qparser is not used
> > for
> > > /select in this case, is it?
> > >
> > > On Sat, Jul 20, 2019 at 2:02 AM Joel Bernstein 
> > wrote:
> > >
> > > > I suspect fetch is having problem due to this commit:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/lucene-solr/commit/cb9f151db4b5ad5c5f581b6b8cf2e5916ddb0f35#diff-98abfc8855d347035205c6f3afc2cde3
> > > >
> > > > Later local params were turned off for anything but the lucene
> qparser.
> > > > Which means this query doesn't work if /select is using edismax
> etc...
> > > >
> > > > This needs to be fixed.
> > > > Can you check to see if the qparser is for the /select handler on
> your
> > > > install?
> > > >
> > > > Anyway fetch needs to be reverted back to it's previous
> implementation
> > > > before the above commit basically broke it.
> > > >
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Fri, Jul 19, 2019 at 2:20 PM Ahmed Adel 
> wrote:
> > > >
> > > > > Hi - Tried swapping the equality sides but (surprisingly?) got the
> > same
> > > > > exact response. Any additional thoughts are appreciated.
> > > > >
> > > > > Best,
> > > > > A.
> > > > > http://aadel.io
> > > > >
> > > > > On Fri, Jul 19, 2019 at 5:27 PM Joel Bernstein  >
> > > > wrote:
> > > > >
> > > > > > Try:
> > > > > >
> > > > > > fetch(names,
> > > > > >  select(
> > > > > >  nodes(emails,
> > > > > >  walk="john...@apache.org->from",
> > > > > >  gather="to"),
> > > > > >  node as to_s),
> > > > > >  fl="name",
> > > > > > on="to_s=email")
> > > > > >
> > > > > >
> > > > > > According to the docs it looks like you have the fields reversed
> on
> > > the
> > > > > > fetch. If that doesn't work, I'll investigate further.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel 
> > > wrote:
> > > > > >
> > > > > > > Hi Joel,
> > > > > > >
> > > > > > > Thank you for your thoughts. I tried the fetch function,
> however,
> > > the
> > > > > > > response does not contain "fl" fields of the "fetch"
> expre

Re: Returning multiple fields in graph streaming expression response documents

2019-07-20 Thread Joel Bernstein
Ok, then it sounds like a different issue. Let's look at the logs following
a request and see what the issue is. There will be a log record that shows
the query that is sent to Solr by the fetch expression. When we look at
that log we'll be able to see what the query is, and if results are
returned. It could be a bug in the code or it could be something related to
the data that's being fetched.


Joel Bernstein
http://joelsolr.blogspot.com/


On Sat, Jul 20, 2019 at 5:21 PM Ahmed Adel  wrote:

> To validate this, I indexed the datasets and ran the same query on Solr
> 6.5.0 environment (https://archive.apache.org/dist/lucene/solr/6.5.0/)
> before cb9f15 commit gets into release but got the same response, no
> additional fields, as Solr 8.1.1. I have used the default managed schema
> settings in both Solr versions, which I guess means qparser is not used for
> /select in this case, is it?
>
> On Sat, Jul 20, 2019 at 2:02 AM Joel Bernstein  wrote:
>
> > I suspect fetch is having problem due to this commit:
> >
> >
> >
> https://github.com/apache/lucene-solr/commit/cb9f151db4b5ad5c5f581b6b8cf2e5916ddb0f35#diff-98abfc8855d347035205c6f3afc2cde3
> >
> > Later local params were turned off for anything but the lucene qparser.
> > Which means this query doesn't work if /select is using edismax etc...
> >
> > This needs to be fixed.
> > Can you check to see if the qparser is for the /select handler on your
> > install?
> >
> > Anyway fetch needs to be reverted back to it's previous implementation
> > before the above commit basically broke it.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Fri, Jul 19, 2019 at 2:20 PM Ahmed Adel  wrote:
> >
> > > Hi - Tried swapping the equality sides but (surprisingly?) got the same
> > > exact response. Any additional thoughts are appreciated.
> > >
> > > Best,
> > > A.
> > > http://aadel.io
> > >
> > > On Fri, Jul 19, 2019 at 5:27 PM Joel Bernstein 
> > wrote:
> > >
> > > > Try:
> > > >
> > > > fetch(names,
> > > >  select(
> > > >  nodes(emails,
> > > >  walk="john...@apache.org->from",
> > > >  gather="to"),
> > > >  node as to_s),
> > > >  fl="name",
> > > > on="to_s=email")
> > > >
> > > >
> > > > According to the docs it looks like you have the fields reversed on
> the
> > > > fetch. If that doesn't work, I'll investigate further.
> > > >
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel 
> wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > Thank you for your thoughts. I tried the fetch function, however,
> the
> > > > > response does not contain "fl" fields of the "fetch" expression.
> For
> > > the
> > > > > above example, the modified query is as follows:
> > > > >
> > > > > fetch(names, select(nodes(emails,
> > > > >   walk="john...@apache.org->from",
> > > > >   gather="to"), node as to_s), fl="name", on="email=to_s")
> > > > >
> > > > >
> > > > > where "names" is a collection that contains two fields representing
> > > pairs
> > > > > of name and email: ("name", "email")
> > > > >
> > > > > The response returned is:
> > > > >
> > > > > { "result-set": { "docs": [ { "to_s": "john...@apache.org"
> > > > > }, { "to_s": "johnsm...@apache.org"
> > > > > },
> > > > > ... { "EOF": true, "RESPONSE_TIME": 33 } ] } }
> > > > >
> > > > > The response should have an additional "name" field in each
> document
> > > > > returned. Any additional thoughts are appreciated.
> > > > >
> > > > > Best,
> > > > > A.
> > > > >
> > > > > On Thu, Jul 18, 2019 at 6:12 PM Joel Bernstein  >
> > > > wrote:
> > > > >
> > > > >

Re: Returning multiple fields in graph streaming expression response documents

2019-07-19 Thread Joel Bernstein
I suspect fetch is having problem due to this commit:

https://github.com/apache/lucene-solr/commit/cb9f151db4b5ad5c5f581b6b8cf2e5916ddb0f35#diff-98abfc8855d347035205c6f3afc2cde3

Later local params were turned off for anything but the lucene qparser.
Which means this query doesn't work if /select is using edismax etc...

This needs to be fixed.
Can you check to see if the qparser is for the /select handler on your
install?

Anyway fetch needs to be reverted back to it's previous implementation
before the above commit basically broke it.




Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 19, 2019 at 2:20 PM Ahmed Adel  wrote:

> Hi - Tried swapping the equality sides but (surprisingly?) got the same
> exact response. Any additional thoughts are appreciated.
>
> Best,
> A.
> http://aadel.io
>
> On Fri, Jul 19, 2019 at 5:27 PM Joel Bernstein  wrote:
>
> > Try:
> >
> > fetch(names,
> >  select(
> >  nodes(emails,
> >  walk="john...@apache.org->from",
> >  gather="to"),
> >  node as to_s),
> >  fl="name",
> > on="to_s=email")
> >
> >
> > According to the docs it looks like you have the fields reversed on the
> > fetch. If that doesn't work, I'll investigate further.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel  wrote:
> >
> > > Hi Joel,
> > >
> > > Thank you for your thoughts. I tried the fetch function, however, the
> > > response does not contain "fl" fields of the "fetch" expression. For
> the
> > > above example, the modified query is as follows:
> > >
> > > fetch(names, select(nodes(emails,
> > >   walk="john...@apache.org->from",
> > >   gather="to"), node as to_s), fl="name", on="email=to_s")
> > >
> > >
> > > where "names" is a collection that contains two fields representing
> pairs
> > > of name and email: ("name", "email")
> > >
> > > The response returned is:
> > >
> > > { "result-set": { "docs": [ { "to_s": "john...@apache.org"
> > > }, { "to_s": "johnsm...@apache.org"
> > > },
> > > ... { "EOF": true, "RESPONSE_TIME": 33 } ] } }
> > >
> > > The response should have an additional "name" field in each document
> > > returned. Any additional thoughts are appreciated.
> > >
> > > Best,
> > > A.
> > >
> > > On Thu, Jul 18, 2019 at 6:12 PM Joel Bernstein 
> > wrote:
> > >
> > > > Hi Ahmed,
> > > >
> > > > Take a look at the fetch
> > > >
> > > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch
> > > >
> > > > It probably makes sense to allow more field to be returned from a
> nodes
> > > > expression as well.
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel 
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for your reply. Could you give more details on the „join“
> > > > > operation, such as what the sides of the join and the joining
> > condition
> > > > > would be in this case?
> > > > >
> > > > > Best regards,
> > > > > A.
> > > > >
> > > > > On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
> > > > > markus.kalkbren...@biologis.com> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > You have to perform a „join“ to get more fields.
> > > > > >
> > > > > > > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > How can multiple fields be returned in graph traversal
> streaming
> > > > > > expression
> > > > > > > response documents? For example, the following query:
> > > > > > >
> > > > > > > nodes(emails,
> > > > > > >  walk="john...@apache.org->from",
> > > > > > >  gather="to")
> > > > > > >
> > > > > > >
> > > > > > > returns these documents in the response:
> > > > > > >
> > > > > > > {
> > > > > > >  "result-set": {
> > > > > > >"docs": [
> > > > > > >  {
> > > > > > >"node": "sl...@campbell.com",
> > > > > > >"collection": "emails",
> > > > > > >"field": "to",
> > > > > > >"level": 1
> > > > > > >  },
> > > > > > >  {
> > > > > > >"node": "catherine.per...@enron.com",
> > > > > > >"collection": "emails",
> > > > > > >"field": "to",
> > > > > > >"level": 1
> > > > > > >  },
> > > > > > >  {
> > > > > > >"node": "airam.arte...@enron.com",
> > > > > > >"collection": "emails",
> > > > > > >"field": "to",
> > > > > > >"level": 1
> > > > > > >  },
> > > > > > >  {
> > > > > > >"EOF": true,
> > > > > > >"RESPONSE_TIME": 44
> > > > > > >  }
> > > > > > >]
> > > > > > >  }
> > > > > > > }
> > > > > > >
> > > > > > > How can the query above be modified to return more document
> > fields,
> > > > > > > "subject" for example?
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > A.
> > > > > >
> > > > >
> > > >
> > >
> >
> --
> Sent from my iPhone
>


Re: Returning multiple fields in graph streaming expression response documents

2019-07-19 Thread Joel Bernstein
Try:

fetch(names,
 select(
 nodes(emails,
 walk="john...@apache.org->from",
 gather="to"),
 node as to_s),
 fl="name",
on="to_s=email")


According to the docs it looks like you have the fields reversed on the
fetch. If that doesn't work, I'll investigate further.




Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel  wrote:

> Hi Joel,
>
> Thank you for your thoughts. I tried the fetch function, however, the
> response does not contain "fl" fields of the "fetch" expression. For the
> above example, the modified query is as follows:
>
> fetch(names, select(nodes(emails,
>   walk="john...@apache.org->from",
>   gather="to"), node as to_s), fl="name", on="email=to_s")
>
>
> where "names" is a collection that contains two fields representing pairs
> of name and email: ("name", "email")
>
> The response returned is:
>
> { "result-set": { "docs": [ { "to_s": "john...@apache.org"
> }, { "to_s": "johnsm...@apache.org"
> },
> ... { "EOF": true, "RESPONSE_TIME": 33 } ] } }
>
> The response should have an additional "name" field in each document
> returned. Any additional thoughts are appreciated.
>
> Best,
> A.
>
> On Thu, Jul 18, 2019 at 6:12 PM Joel Bernstein  wrote:
>
> > Hi Ahmed,
> >
> > Take a look at the fetch
> >
> >
> https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch
> >
> > It probably makes sense to allow more field to be returned from a nodes
> > expression as well.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel  wrote:
> >
> > > Hi,
> > >
> > > Thank you for your reply. Could you give more details on the „join“
> > > operation, such as what the sides of the join and the joining condition
> > > would be in this case?
> > >
> > > Best regards,
> > > A.
> > >
> > > On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
> > > markus.kalkbren...@biologis.com> wrote:
> > >
> > > >
> > > >
> > > > You have to perform a „join“ to get more fields.
> > > >
> > > > > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> > > > >
> > > > > Hi,
> > > > >
> > > > > How can multiple fields be returned in graph traversal streaming
> > > > expression
> > > > > response documents? For example, the following query:
> > > > >
> > > > > nodes(emails,
> > > > >  walk="john...@apache.org->from",
> > > > >  gather="to")
> > > > >
> > > > >
> > > > > returns these documents in the response:
> > > > >
> > > > > {
> > > > >  "result-set": {
> > > > >"docs": [
> > > > >  {
> > > > >"node": "sl...@campbell.com",
> > > > >"collection": "emails",
> > > > >"field": "to",
> > > > >"level": 1
> > > > >  },
> > > > >  {
> > > > >"node": "catherine.per...@enron.com",
> > > > >"collection": "emails",
> > > > >"field": "to",
> > > > >"level": 1
> > > > >  },
> > > > >  {
> > > > >"node": "airam.arte...@enron.com",
> > > > >"collection": "emails",
> > > > >"field": "to",
> > > > >"level": 1
> > > > >  },
> > > > >  {
> > > > >"EOF": true,
> > > > >"RESPONSE_TIME": 44
> > > > >  }
> > > > >]
> > > > >  }
> > > > > }
> > > > >
> > > > > How can the query above be modified to return more document fields,
> > > > > "subject" for example?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > A.
> > > >
> > >
> >
>


Re: Returning multiple fields in graph streaming expression response documents

2019-07-18 Thread Joel Bernstein
Hi Ahmed,

Take a look at the fetch
https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch

It probably makes sense to allow more field to be returned from a nodes
expression as well.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel  wrote:

> Hi,
>
> Thank you for your reply. Could you give more details on the „join“
> operation, such as what the sides of the join and the joining condition
> would be in this case?
>
> Best regards,
> A.
>
> On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
> markus.kalkbren...@biologis.com> wrote:
>
> >
> >
> > You have to perform a „join“ to get more fields.
> >
> > > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> > >
> > > Hi,
> > >
> > > How can multiple fields be returned in graph traversal streaming
> > expression
> > > response documents? For example, the following query:
> > >
> > > nodes(emails,
> > >  walk="john...@apache.org->from",
> > >  gather="to")
> > >
> > >
> > > returns these documents in the response:
> > >
> > > {
> > >  "result-set": {
> > >"docs": [
> > >  {
> > >"node": "sl...@campbell.com",
> > >"collection": "emails",
> > >"field": "to",
> > >"level": 1
> > >  },
> > >  {
> > >"node": "catherine.per...@enron.com",
> > >"collection": "emails",
> > >"field": "to",
> > >"level": 1
> > >  },
> > >  {
> > >"node": "airam.arte...@enron.com",
> > >"collection": "emails",
> > >"field": "to",
> > >"level": 1
> > >  },
> > >  {
> > >"EOF": true,
> > >"RESPONSE_TIME": 44
> > >  }
> > >]
> > >  }
> > > }
> > >
> > > How can the query above be modified to return more document fields,
> > > "subject" for example?
> > >
> > > Best regards,
> > >
> > > A.
> >
>


Re: Bug in scoreNodes function of streaming expressions?

2019-07-02 Thread Joel Bernstein
Ok, that sounds like a bug. I can create a ticket for this.

On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel  wrote:

> I think the problem was that my streaming expression was always returning
> just one node. When I added more data so that I can have more than one
> node, I started seeing the result.
>
> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel  wrote:
>
>> Hello Everyone,
>>
>> I am trying to execute following streaming expression with "scoreNodes"
>> function in it. This is taken from the documentation.
>>
>> scoreNodes(top(n="50",
>>sort="count(*) desc",
>>nodes(baskets,
>>  random(baskets, q="productID:ABC",
>> fl="basketID", rows="500"),
>>  walk="basketID->basketID",
>>  fq="-productID:ABC",
>>  gather="productID",
>>  count(*
>>
>> I have ensured that I have the collection and data present for it.
>> Upon executing this, I am getting an error message as follows.
>>
>> "No collection param specified on request and no default collection has
>> been set: []"
>>
>> Upon digging into the source code I found that there is a possible bug in
>> ScoreNodesStream.java
>>
>> StringBuilder instance is never appended any string and the block which
>> initializes collection, needs the length of that instance to be more than
>> zero. This condition will always be false and hence the collection will
>> never be set.
>>
>> I checked this file in solr version 8.1 and that also has the same issue.
>> Is there any JIRA open for this or any patch available?
>>
>> [image: image.png]
>>
>> Thanks,
>> Pratik
>>
>


Re: creating date facets

2019-06-20 Thread Joel Bernstein
You might find this useful. If makes creating time series aggregations a
little easier. It uses JSON facets under the covers and is very fast.

https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#timeseries

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jun 19, 2019 at 1:34 PM Erick Erickson 
wrote:

> There are two approaches I might use, which is really up to you.
>
> - You can do a regex filter. So define your extra fields as you want, then
> use a regex charFilter (NOT filter, you want to transform the entire input)
> to peel out the separate parts. Then copyfield to each one, each with a
> different regex to peel out the correct part.
>
> - use a ScriptUpdateProcessor to do whatever you want in the scripting
> language you’re most comfortable with. Note that the SolrDocument that’s
> handled by a SUP is just a map of key:value pairs and you can modify it as
> you see fit.
>
> Best,
> Erick
>
> > On Jun 19, 2019, at 10:25 AM, Nightingale, Jonathan A (US) <
> jonathan.nighting...@baesystems.com> wrote:
> >
> > Hi all,
> > I'm trying to have a date field automatically generate some facets of
> itself, like hour of the day and hour of the week as examples, when its
> stored. I was looking at this tutorial and it deemed to almost do what I
> wanted
> >
> > https://nathanfriend.io/2017/06/26/smart-date-searching-with-solr.html
> >
> > I was wondering if there was a way to use these filters and tokenizers
> to generate values for different fields. Maybe in conjunction with a
> copyfield?
> >
> > I'd like it to work whenever a *_c (calendar) field is indexed it also
> indexes the hour of the day (hod_d) and hor of week (how_d) fields as well.
> >
> > Any thoughts?
> >
> > Thanks!
>
>


Re: Issues with the handling of NULLs in Streaming Expressions

2019-06-06 Thread Joel Bernstein
Interesting questions. I suspect we need to beef up our test cases that
deal with nulls and make sure they behave in a consistent manner.

One of the things that likely needs to be looked at more carefully is how
string literals are handled as opposed to nulls. In some cases I believe if
null is encountered it's treated as a string literal and doesn't preserve
the null. So I think it's worth creating a ticket outlining your findings
and we can think about solutions.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jun 6, 2019 at 9:22 AM Oleksandr Chornyi 
wrote:

> Hi guys!
>
> I'm working on a generic query builder for Streaming Expressions which
> allows building various requests containing row level expressions (i.e.
> evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into
> many issues related to the handling of NULL values by the engine. Here are
> the issues in the descending order of their severity (from my standpoint):
>
> 1. *There is no way to check if a value in a tuple is NULL* because
> *eq* function
> fails to accept *null *as an argument:
>
> > *eq(1,null) *
>
> fails with
>
> > "Unable to check eq(...) because a null value was found"
>
> even though the documentation says
> <
> https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#eq
> >
> that "If any any parameters are null and there is at least one parameter
> that is not null then false will be returned."
> This issue makes it impossible to evaluate an expression from the *if*
> function
> documentation
> <
> https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#if
> >
> :
>
> > if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then
> > null else fieldA / fieldB
>
> I think that the root cause of the issue is coming from the fact that
> *EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that
> none of the arguments is *null*, but I don't think that's what we want
> here. *Can you confirm that what I see is a bug and I should file it?*
>
> 2. The fact that *FieldValueEvaluator returns a field name when a value is
> null* breaks any evaluator/decorator which otherwise would handle
> *nulls*. Consider
> these examples (I'm using *cartesianProduct *on an integer array to get
> several tuples with integers because I couldn't find a way to do so
> directly):
>
> > cartesianProduct(
> > tuple(a=array(1,null,3)),
> > a
> > )
>
> returns values preserving *nulls: *
>
> > "docs": [
> >   {"a": 1},
> >   {"a": null},
> >   {"a": 3},
> > ...]
>
> If I just execute *add(1, null) *it works as expected and returns *null.*
> Now,
> if I'm trying to apply any stream evaluator which should work fine with
> *nulls* to this stream:
>
> > select(
> > cartesianProduct(
> > tuple(a=array(1,null,3)),
> > a
> > ),
> > add(a, 1) as a
> > )
>
> it fails to process the second record saying that:
>
> > "docs": [
> >   {"a": 2},
> >   {
> > "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric
> > value expected but found type java.lang.String for value a",
> > ...
> >   }
> > ]
>
> It looks even more confusing when running the following query:
>
> > select(
> > cartesianProduct(
> > tuple(a=array(1,null,3)),
> > a
> > ),
> > coalesce(a, 42) as a
> > )
>
> produces
>
> > "docs": [
> >   {"a": 1},
> >   {"a": "a"},
> >   {"a": 3},
> > ...]
>
>  instead of
>
> > "docs": [
> >   {"a": 1},
> >   {"a": *42*},
> >   {"a": 3},
> > ...]
>
> As I mentioned in the issue description, I think the issue lies in these
> lines of *FieldValueEvaluator:*
>
> > if(value == null) {
> >return fieldName;
> > }
>
> I consider this to be very counterintuitive. *Can you confirm that this is
> a bug, rather than a designed feature?*
>
> 3. *Most Boolean Stream Evaluators* state that they *don't work with
> NULLs.* However,
> it's very inconvenient and there is no other way to work around it (see
> item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt,
> lt, gteq, lteq. *At the moment these evaluators just throw exceptions when
> an argument is *null. **Have you considered making their behavior more
> SQL-like?* When the behavior is like this:
>
>- *gt

Re: NullPointerException with ExpandComponent on Collapsed Null Values

2019-06-04 Thread Joel Bernstein
This should be considered a bug. Feel free file jira for this.



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Jun 4, 2019 at 9:16 AM aus...@3bx.org.INVALID
 wrote:

> Just wanted to provide a bit more information on this issue after
> experimenting a bit more.
>
> The error I've described below only seems to occur when I'm
> collapsing/expanding on an integer field.  If I switch the field type to a
> string, no errors occur if there are missing field values within the
> document set.  For now, this seems to be a workaround, but I'd be curious
> if there is an issue or something I missed when trying to use this feature
> with integers.
>
>
> On Tue, May 28, 2019 at 3:22 PM aus...@3bx.org  wrote:
>
> > Hi all,
> >
> >
> >
> > I’m currently running 7.5.0 and am looking to use the collapse and expand
> > results functionality.
> >
> >
> >
> > The field I’m attempting to collapse on is an “int” field that isn’t
> > required, and I’m using a null policy of expand to create a unique group
> > for each document that has a missing field.   The majority of documents
> *are
> > *missing this field at this time.
> >
> >
> >
> > I’m running into a NullPointerException on the response from within the
> > ExpandComponent.  I’ve also tried the grouping with a field where I’ve
> > generated an ID (to cause the field to never be null), which seems to
> > resolve the issue.  I’m wondering if there’s some type of issue with how
> > Solr is handling the expansion of these null records.
> >
> >
> >
> > Any thoughts?
> >
> >
> >
> >
> >
> > Here’s an example request:
> >
> >
> >
> >
> >
> https://localhost:8985/solr/products/select?fq=%7B!collapse%20field%3DGroupId_i%20nullPolicy%3Dexpand%7D=Name%5E100=0=5=2.2=test=true=1=score%20desc=ItemId
> >
> >
> >
> > … and the response
> >
> >
> >
> > 
> >
> > 
> >
> >
> >
> >   true
> >
> >   500
> >
> >   19
> >
> >   
> >
> >  test
> >
> >  true
> >
> >  1
> >
> >  Name^100
> >
> >  ItemId
> >
> >  0
> >
> >  {!collapse field=GroupId_i
> nullPolicy=expand}
> >
> >  score desc
> >
> >  5
> >
> >  2.2
> >
> >   
> >
> >
> >
> >
> >
> >   
> >
> >  186209
> >
> >   
> >
> >   
> >
> >  3516830
> >
> >   
> >
> >   
> >
> >  9764413
> >
> >   
> >
> >   
> >
> >  9764705
> >
> >   
> >
> >   
> >
> > 9764767
> >
> >   
> >
> >
> >
> >
> >
> >   java.lang.NullPointerException
> >
> > at
> >
> org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:351)
> >
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
> >
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> >
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
> >
> > at
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
> >
> > at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
> >
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
> >
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
> >
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
> >
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
> >
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> >
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> >
> > at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> >
> > at
> >
&g

Re: Solr-8.1.0 uses much more memory

2019-05-26 Thread Joel Bernstein
I'm not sure this issue applies in this situation but it's worth taking a
look at:

https://issues.apache.org/jira/browse/SOLR-12833?focusedCommentId=16807868=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16807868

Although the memory issue in the ticket involves different versions than I
think are being discussed. It's good to understand that this issue exists
and that it's resolved going forward.

Also because the way that this issue is attached to the original ticket
that caused the bug, rather than a new bug report, it's very hard to know
that this problem actually existed.








Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, May 26, 2019 at 3:30 PM Shawn Heisey  wrote:

> On 5/26/2019 12:52 PM, Joe Doupnik wrote:
> >  I do queries while indexing, have done so for a long time, without
> > difficulty nor memory usage spikes from dual use. The system has been
> > designed to support that.
> >  Again, one may look at the numbers using "top" or similar. Try Solr
> > v8.0 and 8.1 to see the difference which I experience here. For
> > reference, the only memory adjustables set in my configuration is in the
> > Solr startup script solr.in.sh saying add "-Xss1024k" in the SOLR_OPTS
> > list and setting SOLR_HEAP="4024m".
>
> There is one significant difference between 8.0 and 8.1 in the realm of
> memory management -- we have switched from the CMS garbage collector to
> the G1 collector.  So the way that Java manages the heap has changed.
> This was done because the CMS collector is slated for removal from Java.
>
> https://issues.apache.org/jira/browse/SOLR-13394
>
> Java is unlike other programs in one respect -- once it allocates heap
> from the OS, it never gives it back.  This behavior has given Java an
> undeserved reputation as a memory hog ... but in fact Java's overall
> memory usage can be very easily limited ... an option that many other
> programs do NOT have.
>
> In your configuration, you set the max heap to a little less than 4GB.
> You have to expect that it *WILL* use that memory.  By using the
> SOLR_HEAP variable, you have instructed Solr's startup script to use the
> same setting for the minimum heap as well as the maximum heap.  This is
> the design intent.
>
> If you want to know how much heap is being used, you can't ask the
> operating system, which means tools like top.  You have to ask Java.
> And you will have to look at a long-term graph, finding the low points.
> An instananeous look at Java's heap usage could show you that the whole
> heap is allocated ... but a significant part of that allocation could be
> garbage, which becomes available once the garbage is collected.
>
> Thanks,
> Shawn
>


Re: How to use encrypted username password.

2019-05-20 Thread Joel Bernstein
Typically basic auth is encrypted using SSL.



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, May 20, 2019 at 6:49 PM Gangadhar Gangadhar 
wrote:

> Hi,
>
>I’m trying to explore if there is any way to encrypt -basicauth or
> encrypt username and password in -Dsolr.httpclient.config.
>
> Thanks
> Gangadhar
>


Re: Performance of /export requests

2019-05-12 Thread Joel Bernstein
Your query and sort criteria sound like they should be fast.

In general if you are cutting off the stream at 10K don't use the /export
handler. Use the /select handler, it will be faster for sure. The reason
for the 30K sliding winding was that it maximized throughput over a long
export (many millions of documents). If you're not doing a long export than
the export handler is likely not the most efficient approach.

Each field being exported slows down the export handler, and 16 is a lot if
fields to export. Again the only way increase the performance of exporting
16 fields is to add more shards.

Are you exporting with Streaming Expressions?




On Sun, May 12, 2019 at 8:44 AM Justin Sweeney 
wrote:

> Thanks for the quick response. We are generally seeing exports from Solr 5
> and 7 to be roughly the same, but I’ll check out Solr 8.
>
> Joel - We are generally sorting a on tlong field and criteria can vary from
> searching everything (*:*) to searching on a combination of a few tint and
> string types.
>
> All of our 16 fields are docvalues. Is there any performance degradation as
> the number of docvalues fields increases or should that not have an impact?
> Also, is the 30k sliding window configurable? In many cases we are
> streaming back a few thousand, maybe up to 10k and then cutting off the
> stream. If we could configure the size of that window, could that speed
> things up some?
>
> Thanks again for the info.
>
> On Sat, May 11, 2019 at 2:38 PM Joel Bernstein  wrote:
>
> > Can you share the sort criteria and search query? The main strategy for
> > improving performance of the export handler is adding more shards. This
> is
> > different than with typical distributed search, where deep paging issues
> > get worse as you add more shards. With the export handler if you double
> the
> > shards you double the pushing power. There are no deep paging drawbacks
> to
> > adding more shards.
> >
> > On Sat, May 11, 2019 at 2:17 PM Toke Eskildsen  wrote:
> >
> > > Justin Sweeney  wrote:
> > >
> > > [Index: 10 shards, 450M docs]
> > >
> > > > We are creating a CloudSolrStream and when we call
> > CloudSolrStream.open()
> > > > we see that call being slower than we had hoped. For some queries,
> that
> > > > call can take 800 ms. [...]
> > >
> > > As far as I can see in the code, CloudSolrStream.open() opens streams
> > > against the relevant shards and checks if there is a result. The last
> > step
> > > is important as that means the first batch of tuples must be calculated
> > in
> > > the shards. Streaming works internally by having a sliding window of
> 30K
> > > tuples through the result set in each shard, so open() results in (up
> to)
> > > 30K tuples being calculated. On the other hand, getting the first 30K
> > > tuples should be very fast after open().
> > >
> > > > We are currently using Solr 5, but we’ve also tried with Solr 7 and
> > seen
> > > > similar results.
> > >
> > > Solr 7 has a performance regression for export (or rather a regression
> > for
> > > DocValues that is very visible when using export. See
> > > https://issues.apache.org/jira/browse/SOLR-13013), so I would expect
> it
> > > to be slower than Solr 5. You could try with Solr 8 where this
> regression
> > > should be mitigated somewhat.
> > >
> > > - Toke Eskildsen
> > >
> >
>


Re: Performance of /export requests

2019-05-11 Thread Joel Bernstein
Can you share the sort criteria and search query? The main strategy for
improving performance of the export handler is adding more shards. This is
different than with typical distributed search, where deep paging issues
get worse as you add more shards. With the export handler if you double the
shards you double the pushing power. There are no deep paging drawbacks to
adding more shards.

On Sat, May 11, 2019 at 2:17 PM Toke Eskildsen  wrote:

> Justin Sweeney  wrote:
>
> [Index: 10 shards, 450M docs]
>
> > We are creating a CloudSolrStream and when we call CloudSolrStream.open()
> > we see that call being slower than we had hoped. For some queries, that
> > call can take 800 ms. [...]
>
> As far as I can see in the code, CloudSolrStream.open() opens streams
> against the relevant shards and checks if there is a result. The last step
> is important as that means the first batch of tuples must be calculated in
> the shards. Streaming works internally by having a sliding window of 30K
> tuples through the result set in each shard, so open() results in (up to)
> 30K tuples being calculated. On the other hand, getting the first 30K
> tuples should be very fast after open().
>
> > We are currently using Solr 5, but we’ve also tried with Solr 7 and seen
> > similar results.
>
> Solr 7 has a performance regression for export (or rather a regression for
> DocValues that is very visible when using export. See
> https://issues.apache.org/jira/browse/SOLR-13013), so I would expect it
> to be slower than Solr 5. You could try with Solr 8 where this regression
> should be mitigated somewhat.
>
> - Toke Eskildsen
>


Re: Streaming Expression: get the value of the array at the specified position

2019-05-11 Thread Joel Bernstein
There actually is an undocumented function called valueAt. It works both
for an array and for a matrix.

For an array:

let(echo="b", a=array(1,2,3,4,5), b=valueAt(a, 2))  should return 3.

I have lot's of documentation still to do.










Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, May 10, 2019 at 11:12 AM David Hastings <
hastings.recurs...@gmail.com> wrote:

> no.
>
> On Fri, May 10, 2019 at 11:09 AM Nazerke S  wrote:
>
> > Hi,
> >
> > I am interested in getting the value of the array at the given index. For
> > example,
> >
> > let(echo="b", a=array(1,2,3,4,5), b=getAt(a, 2))  should return 3.
> >
> > Is there a way to get access an array's element by indexing?
> >
> > Thanks!
> >
> > __Nazerke
> >
>


Re: Why did Solr stats min/max values were returned as float number for field of type="pint"?

2019-05-02 Thread Joel Bernstein
This syntax is bringing back correct data types:

rows=0=2=stock_s:10=true=1556849474583=true=javabin={!max=true
}id_i={!max=true }response_d

This is the query that the stats Stream Expressions writes under the
covers. The Streaming Expression looks like this:

stats(testapp, q="stock_s:10", max(id_i), max(response_d))











Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, May 2, 2019 at 8:47 PM Wendy2  wrote:

> Hi Solr users,
>
> I have a pint field:
>  indexed="true" stored="true"/>
>
> But Solr stats min/max values were returned as float numbers ( "min":0.0,
> "max":1356.0) . I thought "pint" type fields should return min/max as int.
> Is there something that user can do to make sure it returns as int type
> (which matches the field definition)?   Thanks!
>
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":17,
> "params":{
>   "q":"*:*",
>   "stats":"true",
>   "fl":"",
>   "rows":"0",
>   "stats.field":"rcsb_entry_info.disulfide_bond_count"}},
>   "response":{"numFound":151364,"start":0,"docs":[]
>   },
>   "stats":{
> "stats_fields":{
>   "rcsb_entry_info.disulfide_bond_count":{
> "min":0.0,
> "max":1356.0,
> "count":151363,
> "missing":1,
> "sum":208560.0,
> "sumOfSquares":5660388.0,
> "mean":1.3778796667613618,
> "stddev":5.958002695748158
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Pagination with streaming expressions

2019-05-02 Thread Joel Bernstein
There is an open ticket which deals with this:

https://issues.apache.org/jira/browse/SOLR-12209

I've been very focused though on anything that enhances the Solr Math
Expressions or has been needed for the Fusion SQL engine, which is what I
work on at LucidWorks. SOLR-12209 doesn't fall into that category.
Eventually though I will clear that ticket if someone else doesn't resolve
it first.




Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 1, 2019 at 7:56 PM Erick Erickson 
wrote:

> This sounds like an XY problem. You’re asking now to paginate, but not
> explaining the problem you want to solve with paginating.
>
> I don’t immediately see what purpose paginating serves here. What
> significance does a page have to do with the gatherNodes? How use would the
> _user_ have with these results? Especially for two unrelated queries. IOW
> if for query1 you count something for page 13, and for query2 you also
> count something for page 13 what information is the user getting in those
> two cases? Especially if the total result set for query1 is 1,000 docs but
> for query2 is 10,000,000 does?
>
> But in general no, streaming is orthogonal to most use-cases for
> pagination and isn’t really supported except if you read through the
> returns and throw away the first N pages, probably pretty inefficient.
>
> Erick
>
> > On May 1, 2019, at 1:28 PM, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > Is there a way to paginate the results of Streaming Expression?
> >
> > Let's say I have a simple gatherNodes function which has count operation
> at
> > the end of it. I can sort by the count fine but now I would like to be
> able
> > to select specific sub set of result based on pagination parameters. Is
> > there any way to do that?
> >
> > Thanks!
> > Pratik
>
>


Re: solr 7.x sql query returns null

2019-04-19 Thread Joel Bernstein
Ok I updated the ticket, we can move the discussion there.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Apr 19, 2019 at 7:44 AM David Barnett  wrote:

> Hi Joel
>
> BUG created in jira SOLR-13414
>
> Please let me know if you need more info
>
> Thanks
>
> Dave
> On 18 Apr 2019, 20:50 -0500, Joel Bernstein , wrote:
> > That stack trace points here:
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.0/solr/core/src/java/org/apache/solr/handler/sql/SolrSchema.java#L103
> >
> > So the Sql Schema is not initializing properly for this dataset. I'd be
> > interested in understanding why.
> >
> > If you want to create a jira ticket and attach your schema we can track
> > this down. I'll probably attach a special binary to the ticket which has
> > additional logging so we can can find out what field is causing the
> problem.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Apr 18, 2019 at 1:38 PM David Barnett  wrote:
> >
> > > Hi Joel, besides the solr log is there anywhere else i need to go ?
> > > anything I need to set to get more detail ?
> > >
> > > On Thu, 18 Apr 2019 at 10:46, Joel Bernstein 
> wrote:
> > >
> > > > This let's make sure the jdbc URL is correct.
> > > >
> > > > Reloading the collection shouldn't effect much unless the schema is
> > > > different.
> > > >
> > > > But as Shawn mentioned the stack trace is not coming from Solr. Is
> there
> > > > more in the logs beyond the Calcite exception?
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Thu, Apr 18, 2019 at 11:04 AM Shawn Heisey 
> > > wrote:
> > > >
> > > > > On 4/18/2019 1:47 AM, David Barnett wrote:
> > > > > > I have a large solr 7.3 collection 400m + documents.
> > > > > >
> > > > > > I’m trying to use the Solr JDBC driver to query the data but I
> get a
> > > > > >
> > > > > > java.io.IOException: Failed to execute sqlQuery 'select id from
> > > > document
> > > > > limit 10' against JDBC connection 'jdbc:calcitesolr:'.
> > > > > > Error while executing SQL "select id from document limit 10":
> null
> > > > >
> > > > > 
> > > > >
> > > > > By the way, either that JDBC url is extremely incomplete or you
> nuked
> > > it
> > > > > from the log before sharing. Seeing the construction of the full
> URL
> > > > > might be helpful. If you need to redact it in some way for privacy
> > > > > concerns, do so in a way so that we can still tell what the URL
> was -
> > > > > change a real password to PASSWORD, change things like host names
> to
> > > > > something like HOST_NAME, etc.
> > > > >
> > > > > > Caused by: java.lang.NullPointerException
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.validate(VolcanoPlanner.java:891
> > > > > > at
> > > > >
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:866)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:863)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> > > > > > at
> > > > >
> > > >
> > >
> org.apache.calcite.plan.volcano.Volc

Re: solr 7.x sql query returns null

2019-04-18 Thread Joel Bernstein
That stack trace points here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.0/solr/core/src/java/org/apache/solr/handler/sql/SolrSchema.java#L103

So the Sql Schema is not initializing properly for this dataset. I'd be
interested in understanding why.

If you want to create a jira ticket and attach your schema we can track
this down. I'll probably attach a special binary to the ticket which has
additional logging so we can can find out what field is causing the problem.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 18, 2019 at 1:38 PM David Barnett  wrote:

> Hi Joel, besides the solr log is there anywhere else i need to go ?
> anything I need to set to get more detail ?
>
> On Thu, 18 Apr 2019 at 10:46, Joel Bernstein  wrote:
>
> > This let's make sure the jdbc URL is correct.
> >
> > Reloading the collection shouldn't effect much unless the schema is
> > different.
> >
> > But as Shawn mentioned the stack trace is not coming from Solr. Is there
> > more in the logs beyond the Calcite exception?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Apr 18, 2019 at 11:04 AM Shawn Heisey 
> wrote:
> >
> > > On 4/18/2019 1:47 AM, David Barnett wrote:
> > > > I have a large solr 7.3 collection 400m + documents.
> > > >
> > > > I’m trying to use the Solr JDBC driver to query the data but I get a
> > > >
> > > > java.io.IOException: Failed to execute sqlQuery 'select id from
> > document
> > > limit 10' against JDBC connection 'jdbc:calcitesolr:'.
> > > > Error while executing SQL "select id from document limit 10": null
> > >
> > > 
> > >
> > > By the way, either that JDBC url is extremely incomplete or you nuked
> it
> > > from the log before sharing.  Seeing the construction of the full URL
> > > might be helpful.  If you need to redact it in some way for privacy
> > > concerns, do so in a way so that we can still tell what the URL was -
> > > change a real password to PASSWORD, change things like host names to
> > > something like HOST_NAME, etc.
> > >
> > > > Caused by: java.lang.NullPointerException
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.validate(VolcanoPlanner.java:891
> > > >  at
> > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:866)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> > > >  at
> > >
> >
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:863)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> > > >  at
> > >
> >
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> > > >  at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:308)
> > > >  at
> org.apache.calcite.tools.Programs$5.run(Programs.java:309)
> > > >  at
> > >
> org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:387)
> > > >  at
> > org.apache.calcite.prepare.Prepare.optimize(Prepare.java:186)
> > > >  at
> > > org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:319)
> > > >  at
> > > org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:228)
> > > >  at
> > >
> >
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:784)
> > > >  at
> > >
> >
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrep

Re: solr 7.x sql query returns null

2019-04-18 Thread Joel Bernstein
This let's make sure the jdbc URL is correct.

Reloading the collection shouldn't effect much unless the schema is
different.

But as Shawn mentioned the stack trace is not coming from Solr. Is there
more in the logs beyond the Calcite exception?

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 18, 2019 at 11:04 AM Shawn Heisey  wrote:

> On 4/18/2019 1:47 AM, David Barnett wrote:
> > I have a large solr 7.3 collection 400m + documents.
> >
> > I’m trying to use the Solr JDBC driver to query the data but I get a
> >
> > java.io.IOException: Failed to execute sqlQuery 'select id from document
> limit 10' against JDBC connection 'jdbc:calcitesolr:'.
> > Error while executing SQL "select id from document limit 10": null
>
> 
>
> By the way, either that JDBC url is extremely incomplete or you nuked it
> from the log before sharing.  Seeing the construction of the full URL
> might be helpful.  If you need to redact it in some way for privacy
> concerns, do so in a way so that we can still tell what the URL was -
> change a real password to PASSWORD, change things like host names to
> something like HOST_NAME, etc.
>
> > Caused by: java.lang.NullPointerException
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.validate(VolcanoPlanner.java:891
> >  at
>
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:866)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> >  at
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:863)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> >  at
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> >  at
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:308)
> >  at org.apache.calcite.tools.Programs$5.run(Programs.java:309)
> >  at
> org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:387)
> >  at org.apache.calcite.prepare.Prepare.optimize(Prepare.java:186)
> >  at
> org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:319)
> >  at
> org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:228)
> >  at
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:784)
> >  at
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:639)
> >  at
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:609)
> >  at
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214)
> >  at
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:603)
> >  at
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:638)
> >  at
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:149)
>
> The root cause is an NPE in pure calcite code (no Solr classes listed).
> Calcite didn't like the SQL query for some reason.  I'm not at all
> familiar with Calcite.
>
> Did you try to query a single core (shard replica) rather than the
> collection?  I wonder if doing that might make the driver think it's not
> talking to SolrCloud.  Joel is the expert here, I don't know much about it.
>
> What context is this being used in?  The dataimport handler, or
> something you wrote yourself?  I don't know if this information is
> important, just trying to provide as much information for Joel as I can.
>
> Thanks,
> Shawn
>


Re: solr 7.x sql query returns null

2019-04-18 Thread Joel Bernstein
I ask this because SQL/JDBC may return a similar error if you try to run it
on a non-Solr Cloud index.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 18, 2019 at 10:16 AM Joel Bernstein  wrote:

> Was the original index a Solr Cloud index?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Apr 18, 2019 at 7:48 AM David Barnett 
> wrote:
>
>> I have a large solr 7.3 collection 400m + documents.
>>
>> I’m trying to use the Solr JDBC driver to query the data but I get a
>>
>>
>> java.io.IOException: Failed to execute sqlQuery 'select id from document
>> limit 10' against JDBC connection 'jdbc:calcitesolr:'.
>> Error while executing SQL "select id from document limit 10": null
>>
>>
>>
>>
>> If I export the documents to JSON and reimport (after a full delete it
>> works without issue but obviously this is not a fix or understanding the
>> issue)
>>
>>
>>
>> Several things I’ve tried try to eliminate the issue but with no success:
>>
>> I’ve tried upgrading to Solr 7.7
>>
>> I’ve run the UpgradeTool on the index
>>
>> I’ve tried replicating the collection to a new instance
>>
>>
>> I am unable to understand what it is within the collection causing the
>> issue, can you suggest a way for me to get more details about the fault
>>
>>
>> Here is the full stack trace from the Logging interface Admin UI:
>>
>>
>> java.io.IOException: Failed to execute sqlQuery 'select id from document
>> limit 10' against JDBC connection 'jdbc:calcitesolr:'.
>> Error while executing SQL "select id from document limit 10": null
>> at
>> org.apache.solr.client.solrj.io.stream.JDBCStream.open(JDBCStream.java:271)
>> at
>> org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:54)
>> at
>> org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:394)
>> at
>> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:78)
>> at
>> org.apache.solr.common.util.JsonTextWriter.writeMap(JsonTextWriter.java:164)
>> at
>> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:69)
>> at
>> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
>> at
>> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
>> at
>> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
>> at
>> org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
>> at
>> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)
>> at
>> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
>> at
>> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:788)
>> at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:525)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHa

Re: solr 7.x sql query returns null

2019-04-18 Thread Joel Bernstein
Was the original index a Solr Cloud index?



Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 18, 2019 at 7:48 AM David Barnett 
wrote:

> I have a large solr 7.3 collection 400m + documents.
>
> I’m trying to use the Solr JDBC driver to query the data but I get a
>
>
> java.io.IOException: Failed to execute sqlQuery 'select id from document
> limit 10' against JDBC connection 'jdbc:calcitesolr:'.
> Error while executing SQL "select id from document limit 10": null
>
>
>
>
> If I export the documents to JSON and reimport (after a full delete it
> works without issue but obviously this is not a fix or understanding the
> issue)
>
>
>
> Several things I’ve tried try to eliminate the issue but with no success:
>
> I’ve tried upgrading to Solr 7.7
>
> I’ve run the UpgradeTool on the index
>
> I’ve tried replicating the collection to a new instance
>
>
> I am unable to understand what it is within the collection causing the
> issue, can you suggest a way for me to get more details about the fault
>
>
> Here is the full stack trace from the Logging interface Admin UI:
>
>
> java.io.IOException: Failed to execute sqlQuery 'select id from document
> limit 10' against JDBC connection 'jdbc:calcitesolr:'.
> Error while executing SQL "select id from document limit 10": null
> at
> org.apache.solr.client.solrj.io.stream.JDBCStream.open(JDBCStream.java:271)
> at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:54)
> at
> org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:394)
> at
> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:78)
> at
> org.apache.solr.common.util.JsonTextWriter.writeMap(JsonTextWriter.java:164)
> at
> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:69)
> at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> at
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> at
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
> at
> org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
> at
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)
> at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
> at
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:788)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:525)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:3

Re: Topic & Executor streaming expressions

2019-04-15 Thread Joel Bernstein
This blog covers this topic is some depth:

https://joelsolr.blogspot.com/2017/01/deploying-solrs-new-parallel-executor.html


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Apr 15, 2019 at 11:19 AM Nazerke S  wrote:

> Hi everyone!
>
> Can anyone elaborate the topic and executor streaming expressions? What I
> understand from the Solr reference guide was that the topic allows for
> subscribing to a query. So that whenever I execute a query, it returns the
> tuples that are not seen yet ??  What about executor function? Is it
> similar to the parallel streaming expression that is, the given number of
> threads do the task parallel? Can you provide an example which are not
> written in the Solr reference guide?
>
>
> Many thanks!
>


Re: gatherNodes question. Is this a bug?

2019-04-12 Thread Joel Bernstein
The outer gather nodes is bringing back a different set of nodes:

{
  "result-set": {
"docs": [
  {
"node": "01/20577-4",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/19764-4",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/09299-2",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/21532-4",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/11664-0",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"EOF": true,
"RESPONSE_TIME": 30
  }
]
  }
}

Here is the inner node set:

{
  "result-set": {
"docs": [
  {
"node": "01/02444-7",
"type_status_facet ": "Ongoing projetcs",
"amount": 154620
  },
  {
"node": "01/08149-7",
"type_status_facet ": "Ongoing projetcs",
"amount": 131115
  },
  {
"node": "01/21749-3",
"type_status_facet ": "Ongoing projetcs",
"amount": 157300
  },
  {
"node": "01/22503-8",
"type_status_facet ": "Ongoing projetcs",
"amount": 154800
  },
  {
"EOF": true,
"RESPONSE_TIME": 24
  }
]
  }
}

This is to be expected.

Try turning on tracking, with trackTraversal="true".

This will show the ancestors and hopefully make things more clear.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 11, 2019 at 12:07 PM Kojo  wrote:

> Joel,
> thank you in advance.
>
> Follows SE code and resultset for each piece. I only change some resultset
> values, without change the logic.
> I am on Solr 6.6.2.
>
> 1. First SE - Inner gatherNodes:
>
> sort(
> gatherNodes( my_collection, gatherNodes( my_collection,
> search(my_collection, qt="/export", q=*:*,
> fl="process_number", sort="process_number asc",  fq=-ancestor:*,
> fq=situacao:("On going"), fq=area_pt:("Ciências Exatas e da Terra"),
> fq=auxilio_pesquisa_pt:("07;Auxílio Pesquisador|00;Auxilio Pesquisador -
> Brasil")),
> walk="process_number->ancestor", trackTraversal="true",
> gather="process_number"), walk="node->ancestor", trackTraversal="true",
> gather="process_number", scatter="branches, leaves"),
> by="node asc")
>
>
> {
>   "result-set": {
> "docs": [
>   {
> "node": "01/09299-2",
> "collection": "my_collection",
> "field": "process_number",
> "ancestors": [],
> "level": 0
>   },
>   {
> "node": "01/11664-0",
> "collection": "my_collection",
> "field": "process_number",
> "ancestors": [],
> "level": 0
>   },
>   {
> "node": "01/19764-4",
> "collection": "my_collection",
> "field": "process_number",
> "ancestors": [],
> "level": 0
>   },
>   {
> "node": "01/20577-4",
> "collection": "my_collection",
> "field": "process_number",
> "ancestors": [],
> "level": 0
>   },
>   {
> "node": "01/21532-4",
> "collection": "my_collection",
> "field": "process_number",
> "ancesto

Re: gatherNodes question. Is this a bug?

2019-04-10 Thread Joel Bernstein
What you're trying to do should work. Possibly of you provide more detail
like the full query with some sample outputs I might be able to see what
the issue is.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Apr 10, 2019 at 10:55 AM Kojo  wrote:

> Hello everybody I have a question about Streaming Expression/Graph
> Traversal.
>
> The following pseudocode works fine:
>
> complement( search(),
> sort(
> gatherNodes( collection, search())
> ),
> )
>
>
> However, when I feed the SE resultset above to another gatherNodes
> function, I have a result different from what I expected. It returns the
> root nodes (branches) of the inner gatherNodes:
>
> gatherNodes( collection,
> complement( search(),
> sort(
>gatherNodes( collection, search())
> ),
> ),
> )
>
> In the case I tested, the outer gatherNodes does not have leaves. I was
> waiting to have the result from the "complement" function as the root nodes
> of the outter gatherNodes function. Do you know how can I achieve this?
>
> Thank you,
>


Re: var, sttdev Streaming Evaluators.

2019-04-10 Thread Joel Bernstein
They currently are not. You can use describe() to get these values and
getValue() if you want to use a specific value.

 let(arr=array(1,3,3), m=describe(a), s=getValue(m, stdev))

It makes sense to add these on there own as well.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Apr 10, 2019 at 11:13 AM Nazerke S  wrote:

> Hi,
>
> I have got a question about Streaming Expression evaluators.
> I would like to calculate mean, standard deviation and variance of the
> given array.
>
> For example, the following code works for the mean:
>let(arr=array(1,3,3), m=mean(a))
>
> Also, I want to compute variance and standard deviation as well i.e.:
>  let(echo="m,v,sd", arr=array(1,3,3), m=mean(a), v=var(a),
> sd=stddev(a))
>
> It seems the var(), stddev() evaluator functions are not implemented as
> separate functions??
>
> __Nazerke
>


Re: Gather Nodes Streaming

2019-03-21 Thread Joel Bernstein
gatherNodes requires single value fields in the tuples. In certain
scenarios the cartesianProduct streaming expression can be used to explode
a multi-value field into a single field stream. But in the scenario you
describe this might not be possible.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Mar 20, 2019 at 10:58 PM Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> What is the fieldType of your 'to field? Which tokenizers/filters is it
> using?
>
> Also, which Solr version are you using?
>
> Regards,
> Edwin
>
> On Thu, 21 Mar 2019 at 01:57, Susmit Shukla 
> wrote:
>
> > Hi,
> >
> > Trying to use solr streaming 'gatherNodes' function. It is for extracting
> > email graph based on from and to fields.
> > It requires 'to' field to be a single value field with docvalues enabled
> > since it is used internally for sorting and unique streams
> >
> > The 'to' field can contain multiple email addresses - each being a node.
> > How to map multiple comma separated email addresses from the 'to' fields
> as
> > separate graph nodes?
> >
> > Thanks
> >
> >
> >
> > >
> > >
> >
>


Re: Re-read from CloudSolrStream

2019-02-20 Thread Joel Bernstein
It sounds like you just need to catch the exception?


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Feb 18, 2019 at 3:14 AM SOLR4189  wrote:

> Hi all,
>
> Let's say I have a next code:
>
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> <
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html>
>
>
> public class StreamingClient {
>
>public static void main(String args[]) throws IOException {
>   String zkHost = args[0];
>   String collection = args[1];
>
>   Map props = new HashMap();
>   props.put("q", "*:*");
>   props.put("qt", "/export");
>   props.put("sort", "fieldA asc");
>   props.put("fl", "fieldA,fieldB,fieldC");
>
>   CloudSolrStream cstream = new CloudSolrStream(zkHost,
> collection,
> props);
>   try {
>
> cstream.open();
> while(true) {
>
>   Tuple tuple = cstream.read();
>   if(tuple.EOF) {
>  break;
>   }
>
>   String fieldA = tuple.getString("fieldA");
>   String fieldB = tuple.getString("fieldB");
>   String fieldC = tuple.getString("fieldC");
>   System.out.println(fieldA + ", " + fieldB + ", " + fieldC);
> }
>
>   } finally {
>cstream.close();
>   }
>}
> }
>
> What can I do if I get exception in the line *Tuple tuple =
> cstream.read();*? How can I re-read the same tuple, i.e. to continue from
> exception moment ?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Under-utilization during streaming expression execution

2019-02-15 Thread Joel Bernstein
Use large batches and fetch instead of hashjoin and lots of parallel
workers.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 15, 2019 at 7:48 PM Joel Bernstein  wrote:

> You can run in parallel and that should help quite a bit. But at a really
> large batch job is better done like this:
>
>
> https://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Feb 14, 2019 at 6:10 PM Gus Heck  wrote:
>
>> Hi Folks,
>>
>> I'm looking for ideas on how to speed up processing for a streaming
>> expression. I can't post the full details because it's customer related,
>> but the structure is shown here: https://imgur.com/a/98sENVT What that
>> does
>> is take the results of two queries, join them and push them back into the
>> collection as a new (denormalized) doc. The second (hash) join just
>> updates
>> a field that distinguishes the new docs from either of the old docs so
>> it's
>> hashing exactly one value, and thus this is not of concern for performance
>> (if there were a good way to tell select to modify only one field and keep
>> all the rest without listing the fields explicitly it wouldn't be needed)
>> .
>>
>>
>> When I run it across a test index with 1377364 and 5146620 docs for the
>> two
>> queries. The result is that it inserts 4742322 new documents, in ~10
>> minutes. This seems pretty spiffy except this test index is ~1/1000 of the
>> real index... so obviously I want to find *at least* a factor of 10
>> improvement. So far I managed a factor of about 3 to get it down to
>> slightly over 200 seconds by programmatically building the queries
>> partitioning based on a set of percentiles from a stats query on one of
>> the
>> fields that is a floating point number with good distribution, but this
>> seems to stop helping 10-12 splits on my 50 node cluster, scaling up to
>> split to all 50 nodes brings things back to ~400 seconds.
>>
>> The CPU utilization on the machines mostly stabilizes around 30-50%, Disk
>> metrics don't seem to look bad (disk idle stat in AWS stays over 90%).
>> Still trying to get a good handle on network numbers, but I'm guessing
>> that
>> I'm either network limited or there's an inefficiency with contention
>> somewhere inside solr (no I haven't put a profiler on it yet).
>>
>> Here's the interesting bit. I happen to know that the join key in the
>> leftJoin is on a key that is used for document routing, so we're only
>> joining up with documents on the same node. Furthermore, the id generated
>> is a concatenation of these id's with a value from one of the fields and
>> should also route to the same node... Is there any way to make the whole
>> expression run locally on the nodes to avoid throwing the data back and
>> forth across the network needlessly?
>>
>> Any other ideas for making this go another factor of 2-3 faster?
>>
>> -Gus
>>
>


Re: Under-utilization during streaming expression execution

2019-02-15 Thread Joel Bernstein
You can run in parallel and that should help quite a bit. But at a really
large batch job is better done like this:

https://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Feb 14, 2019 at 6:10 PM Gus Heck  wrote:

> Hi Folks,
>
> I'm looking for ideas on how to speed up processing for a streaming
> expression. I can't post the full details because it's customer related,
> but the structure is shown here: https://imgur.com/a/98sENVT What that
> does
> is take the results of two queries, join them and push them back into the
> collection as a new (denormalized) doc. The second (hash) join just updates
> a field that distinguishes the new docs from either of the old docs so it's
> hashing exactly one value, and thus this is not of concern for performance
> (if there were a good way to tell select to modify only one field and keep
> all the rest without listing the fields explicitly it wouldn't be needed) .
>
>
> When I run it across a test index with 1377364 and 5146620 docs for the two
> queries. The result is that it inserts 4742322 new documents, in ~10
> minutes. This seems pretty spiffy except this test index is ~1/1000 of the
> real index... so obviously I want to find *at least* a factor of 10
> improvement. So far I managed a factor of about 3 to get it down to
> slightly over 200 seconds by programmatically building the queries
> partitioning based on a set of percentiles from a stats query on one of the
> fields that is a floating point number with good distribution, but this
> seems to stop helping 10-12 splits on my 50 node cluster, scaling up to
> split to all 50 nodes brings things back to ~400 seconds.
>
> The CPU utilization on the machines mostly stabilizes around 30-50%, Disk
> metrics don't seem to look bad (disk idle stat in AWS stays over 90%).
> Still trying to get a good handle on network numbers, but I'm guessing that
> I'm either network limited or there's an inefficiency with contention
> somewhere inside solr (no I haven't put a profiler on it yet).
>
> Here's the interesting bit. I happen to know that the join key in the
> leftJoin is on a key that is used for document routing, so we're only
> joining up with documents on the same node. Furthermore, the id generated
> is a concatenation of these id's with a value from one of the fields and
> should also route to the same node... Is there any way to make the whole
> expression run locally on the nodes to avoid throwing the data back and
> forth across the network needlessly?
>
> Any other ideas for making this go another factor of 2-3 faster?
>
> -Gus
>


Re: Solr collapse result repeat in 6.6.5 cloud example techproducts.

2019-02-07 Thread Joel Bernstein
Do you have more then one shard? Collapse requires that all docs in the
same collapse group be co-located on the same shard.

Grouping I believe does not require this is some scenarios.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Feb 7, 2019 at 4:07 PM 刘正  wrote:

> i try request this to techproducts collection
>
> {code}
>
> select?fl=id,genre_s={!collapse%20field=genre_s}=on=genre_s:*=json
> {code}
>
> and i get response
>
> {code:json}
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":6,
> "params":{
>   "q":"genre_s:*",
>   "indent":"on",
>   "fl":"id,genre_s",
>   "fq":"{!collapse field=genre_s}",
>   "wt":"json"}},
>   "response":{"numFound":3,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"0812521390",
> "genre_s":"fantasy"},
>   {
> "id":"0553573403",
> "genre_s":"fantasy"},
>   {
> "id":"0553293354",
> "genre_s":"scifi"}]
>   }}
> {code}
>
> when i request in grouping
>
> {code}
>
> select?fl=id,genre_s=genre_s=1=true=on=genre_s:*=json
> {code}
>
> i get the response
>
> {code:json}
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":9,
> "params":{
>   "q":"genre_s:*",
>   "indent":"on",
>   "fl":"id,genre_s",
>   "group.limit":"1",
>   "wt":"json",
>   "group.field":"genre_s",
>   "group":"true"}},
>   "grouped":{
> "genre_s":{
>   "matches":10,
>   "groups":[{
>   "groupValue":"fantasy",
>   "doclist":{"numFound":8,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"0553573403",
> "genre_s":"fantasy"}]
>   }},
> {
>   "groupValue":"scifi",
>   "doclist":{"numFound":2,"start":0,"docs":[
>   {
> "id":"0553293354",
> "genre_s":"scifi"}]
>   }}]}}}
> {code}
>


Re: Single query to get the count for all individual collections

2019-01-21 Thread Joel Bernstein
Streaming Expressions can do this:

plist(stats(collection1, q="*:*", count(*)),
stats(collection2, q="*:*", count(*)),
stats(collection2, q="*:*", count(*)))

The plist function is a parallel list of expressions. It will spin each
expression off in it's own thread and concatenate the results of each
expression into a single result set.
Here are the docs:
https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#stats
https://lucene.apache.org/solr/guide/7_6/stream-decorator-reference.html#plist

plist is quite new, but "list" has been around for a while if you have an
older version of Solr

https://lucene.apache.org/solr/guide/7_6/stream-decorator-reference.html#list_expression








Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Jan 21, 2019 at 12:53 PM Jens Brandt  wrote:

> Hi,
>
> maybe adding =true might help. In case of SolrCloud this
> gives you numFound for each shard.
>
> Regards,
>   Jens
>
> > Am 10.01.2019 um 04:40 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > I would like to find out, is there any way that I can send a single query
> > to retrieve the numFound for all the individual collections?
> >
> > I have tried with this query
> >
> http://localhost:8983/solr/collection1/select?q=*:*=collection1,collection2
> > However, this query is doing the sum of all the collections, instead of
> > showing the count for each of the collection.
> >
> > I am using Solr 7.5.0.
> >
> > Regards,
> > Edwin
>
>


Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified

2019-01-21 Thread Joel Bernstein
You find the significantTerms streaming expressions useful:

https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#significantterms


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Jan 21, 2019 at 3:02 PM Pratik Patel  wrote:

> Aman,
>
> Thanks for the reply!
>
> I have tried with corrected query but it doesn't solve the problem. also,
> my tags filter matches multiple documents, however the interestingTerms
> seems to correspond to just the first document.
> Here is an example of a query which matches 1900 documents.
>
>
> http://localhost:8081/solr/collection1/mlt?debugQuery=on=tags:voltage=true=my_field=details=1=2=3=*:*=100=0
>
>
> Thanks,
> Pratik
>
>
> On Mon, Jan 21, 2019 at 2:52 PM Aman Tandon 
> wrote:
>
> > I see two rows params, looks like which will be overwritten by rows=2,
> and
> > then your tags filter is resulting only one document. Please remove extra
> > rows and try.
> >
> > On Mon, Jan 21, 2019, 08:44 Pratik Patel  >
> > > Hi Everyone!
> > >
> > > I am trying to use MLT request handler. My query matches more than one
> > > documents but the response always seems to pick up the first document
> and
> > > interestingTerms also seems to be corresponding to that single document
> > > only.
> > >
> > > What I am expecting is that if my query matches multiple documents then
> > the
> > > InterestingTerms handler result also corresponds to that set of
> documents
> > > and not the first document.
> > >
> > > Following is my query,
> > >
> > >
> > >
> >
> http://localhost:8081/solr/collection1/mlt?debugQuery=on=tags:test=true=mlt.fl=textpropertymlt=details=1=2=3=*:*=100=2=0
> > >
> > > Ultimately, my goal is to get interesting terms corresponding to this
> > whole
> > > set of documents. I don't need similar documents as such. If not with
> > mlt,
> > > is there any other way I can achieve this? that is, given a query
> > matching
> > > set of documents, find interestingTerms for that set of documents based
> > on
> > > tf-idf?
> > >
> > > Thanks!
> > > Pratik
> > >
> >
>


Re: Error using collapse parser with /export

2019-01-21 Thread Joel Bernstein
I haven't had time to look into the details of this issue but it's not
clear that these two features will be able to be used together. Although
that it would be nice if the could.

A couple of questions about your use case:

1) After collapsing is it not possible to use the /select handler?
2) After exporting is it possible to unique the records using the unique
Streaming Expression?

Either of those cases would be the typical uses of these features.

Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Jan 20, 2019 at 10:13 PM Rahul Goswami 
wrote:

> Hello,
>
> Following up on my query. I know this might be too specific an issue. But I
> just want to know that it's a legitimate bug and the supported operation is
> allowed with the /export handler. If someone has an idea about this and
> could confirm, that would be great.
>
> Thanks,
> Rahul
>
> On Thu, Jan 17, 2019 at 4:58 PM Rahul Goswami 
> wrote:
>
> > Hello,
> >
> > I am using SolrCloud on Solr 7.2.1.
> > I get the NullPointerException in the Solr logs (in ExportWriter.java)
> > when the /stream handler is invoked with a search() streaming expression
> > with qt="/export" containing fq="{!collapse field=id_field sort="time
> > desc"} (among other fq's. I tried eliminating one fq at a time to find
> the
> > problematic one. The one with collapse parser is what makes it fail).
> >
> >
> > I see an open JIRA for this issue (with a submitted patch which has not
> > yet been accepted):
> >
> > https://issues.apache.org/jira/browse/SOLR-8291
> >
> >
> >
> > In my case useFilterForSortedQuery=false
> >
> > org.apache.solr.servlet.HttpSolrCall null:java.lang.NullPointerException
> > at org.apache.lucene.util.BitSetIterator.(BitSetIterator.java:61)
> > at org.apache.solr.handler.ExportWriter.writeDocs(ExportWriter.java:243)
> > at
> > org.apache.solr.handler.ExportWriter.lambda$null$1(ExportWriter.java:222)
> > at
> >
> org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523)
> > at
> >
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:180)
> > at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)
> > at
> > org.apache.solr.handler.ExportWriter.lambda$null$2(ExportWriter.java:222)
> > at
> > org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
> > at
> >
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:198)
> > at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)
> > at
> >
> org.apache.solr.handler.ExportWriter.lambda$write$3(ExportWriter.java:220)
> > at
> > org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
> > at org.apache.solr.handler.ExportWriter.write(ExportWriter.java:218)
> > at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2627)
> > at
> >
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49)
> > at
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:788)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:525)
> >
> >
> > Above is a smaller trace; I can provide the complete stacktrace if it
> > helps. Before considering making a fix in ExportWriter.java and
> rebuilding
> > Solr as a last resort, I want to make sure I am not using something which
> > is not supported on SolrCloud. Can anybody please help?
> >
> >
> >
>


Re: 6.3 -> 6.4 Sorting responseWriter renamed

2019-01-11 Thread Joel Bernstein
The functionality should be exactly the same. The config files though need
to be changed. I would recommend adding any custom configs that you have to
the new configs following the ExportWriter changes.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jan 10, 2019 at 11:21 AM Raveendra Yerraguntla
 wrote:

> Hello All,
>
> In 6.4 (Solr-9717)  SortingResponseWriter is renamed to ExportWriter and
> moved to a different package.
>
> For migrating to higher Solr (post 6.4) versions, I  need to help with
> compatible functionalities.
>
>
> Application is using  SortingResponseWriter in the searcher handlers
> inform method to register responseWriters for the xSort.
>
> Since the class and write methods Signature   is changed, what are
> alternative ways to use the functionality.
>
>
>  Thanks
> Ravi
>
>


Re: Combine & Sort - SOLR and SQL Data

2018-12-17 Thread Joel Bernstein
You can take a look a Solr Streaming expressions to see if it meets your
needs. The "jdbc" Stream and "search" streams can be combined using the
"merge" stream.

http://lucene.apache.org/solr/guide/7_6/stream-source-reference.html
http://lucene.apache.org/solr/guide/7_6/stream-decorator-reference.html#merge

Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Dec 17, 2018 at 8:57 AM Tech Support 
wrote:

> Dear Team,
>
> Once per day, my data importing tool will collect data from SQL server and
> add it into SOLRCloud.
>
> Current day data will be imported at the end of the day.
>
> I want to apply search and sorting for all my data. For this case, how can
> I
> combine & sort, both SQL and SOLR data?
>
> Is it possible to achieve both SOLR and SQL Server
>
> Note:
>
> My SOLRCloud setup is running on Windows OS with following Softwares
>
> * SOLR - 2 Instance - 7.5 Version
>
> * External Zookeeper - 3 Instance - 3.4.13 Version
>
>
>
> Thanks,
>
> Karthick Ramu
>
>
>
>


Re: Shuffling Tuples - Parallel SQL

2018-12-07 Thread Joel Bernstein
I'm not sure I understand the question. You seem to be asking how do you
know what values are in a particular field. With shuffling you don't need
to the know the values, they are automatically shuffled. If you need to the
know the values you can use various queries to look at the data in the
field.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Dec 6, 2018 at 9:25 AM swathi  wrote:

> Hi,
>
> I am reading on Solr’s Parallel SQL.
>
>
> How each replica partition the results by using HashQParser plugin and
> shuffle the tuples with same values in the partition key field to same
> worker node ?
>
>
> How do we know the partition key field values?
>
>
>
>
> Thanks,
>
> Swathi.
>


Re: Streaming In Solr

2018-11-14 Thread Joel Bernstein
The implementation is as follows:

1) There are "stream sources" that generate results from Solr Cloud
collections. Some of these include: search, facet, knnSearch, random,
timeseries, nodes, sql etc...
2) There are "stream decorators" that wrap stream sources and operated over
the result set tuples. Some of these decorators operate over sorted result
sets and don't need to keep much data in memory. For example the innerJoin
stream decorator merge joins two sorted streams of tuples. Other stream
decorators read data into memory and perform operations entirely in memory.
An example of this is the sort stream decorator.
3) The are "stream evaluators" that evaluate expressions over the data.
This includes math expressions. These expressions can operate in both a
streaming context using the "select" expression or an in memory context
using the "let" expression to set variables and operate of vectors and
matrices in memory.

But basically you can think of it as decorators operating over streams of
data.





Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Nov 14, 2018 at 3:26 AM Lucky Sharma  wrote:

> Hi Prakhar,
> Thanks for the reply, But What I am actually curious to know how is it
> implemented Internally?
> On Wed, Nov 14, 2018 at 12:47 PM Prakhar Nigam
>  wrote:
> >
> > HI Lucky Prakhar here
> >
> > we have met at a training at Mahindra Comviva.  I have found this
> article it may be a little helpful
> >
> >
> https://medium.com/@sarkaramrit2/getting-started-with-streaming-expressions-in-apache-solr-b49111a417e3
> >
> >
> >
> >
> >
> > Regards,
> >
> > Prakhar
> >
> >
> >
> > From: Lucky Sharma<mailto:goku0...@gmail.com>
> > Sent: Wednesday, November 14, 2018 9:25 AM
> > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> > Subject: Streaming In Solr
> >
> >
> >
> > Hi I have some doubt regarding how the streaming expressions and
> > parallel SQL queries are evaluated in SOLR, I tried to dig deep in the
> > code but wasn't able to find much, A little help will be much
> > appreciated
> >
> > --
> > Warm Regards,
> >
> > Lucky Sharma
> > Contact No :+91 9821559918
> > This e-mail and all material transmitted with it are for the use of the
> intended recipient(s) ONLY and contains confidential information. If you
> are not the intended recipient, please contact the sender by reply e-mail
> and destroy all copies and the original message. Any unauthorized review,
> use, disclosure, dissemination, forwarding, printing or copying of this
> email or any action taken pursuant to the contents of the present e-mail is
> strictly prohibited and is unlawful. The recipient acknowledges that
> Comviva Technologies Limited or its management or directors, are unable to
> exercise control or ensure the integrity over /of the contents of the
> information contained in e-mail. Any views expressed herein are those of
> the individual sender only and no binding nature of the contents shall be
> implied or assumed unless the sender does so expressly with due authority
> of Comviva Technologies Limited. E-mail and any contents transmitted with
> it are prone to viruses and related defects despite all efforts to avoid
> such by Comviva Technologies Limited.
>
>
>
> --
> Warm Regards,
>
> Lucky Sharma
> Contact No :+91 9821559918
>


Re: Median in Solr json facet api

2018-11-14 Thread Joel Bernstein
The JSON facet API uses the t-digest approach to estimate the percentiles.

You can also use Solr Math Expressions to take a random sample from a field
and estimate the median from the sample. Here is the Streaming Expression:

let(a=random(collection1, q="*:*", fl="filesize_d", rows="25000"),
 b=col(a, filesize_d),
 median=percentile(b, 50))

The example above takes a random sample and sets it to variable "a".
Then the filesize_d field from the sample (in variable "a") are copied to a
vector and set to variable "b".
Then the percentile function is called on the vector and the results are
set to variable "median".

The results look like this:

{ "result-set": { "docs": [ { "median": 39980.53459335005 }, { "EOF": true,
"RESPONSE_TIME": 365 } ] } }

You can adjust the sample size to see how it effects the estimate.

Here is the link to Solr Math Expressions in the User Guide:

https://lucene.apache.org/solr/guide/7_5/math-expressions.html






Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Nov 14, 2018 at 8:21 AM Toke Eskildsen  wrote:

> On Wed, 2018-11-14 at 17:53 +0530, Anil wrote:
> > I don;t see median aggregation in JSON facet api documentation.
>
> It's the 50 percentile:
>
>
>
> https://lucene.apache.org/solr/guide/7_5/json-facet-api.html#metrics-example
>
> - Toke Eskildsen, Royal Danish Library
>
>
>


Re: Unable to get Solr Graph Traversal working

2018-11-08 Thread Joel Bernstein
The basic syntax looks ok. Try it first on the /stream handler to rule out
any issues that might be related to /graph handler. Can you provide the
logs from one of the shards in the rec_coll collection that are generated
by this request? The logs will show the query that is actually being run on

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Nov 7, 2018 at 1:22 PM Vidhya Kailash 
wrote:

> I am unable to get even simple graph traversal expressions like the one
> below to work in my environment (7.4 and 7.5 versions). They simply yield
> no results, even though I know the data exists.
> curl --data-urlencode 'expr=gatherNodes(rec_coll,
>
> walk="35d40c4b9d6ddfsdf45cbb0fe4aesd75->USER_ID",
> gather="ITEM_ID")'
> http://localhost:8983/solr/rec_coll/graph
>
> Can someone help?
>
> thanks
> Vidhya
>


Re: streaming expressions substring-evaluator

2018-10-31 Thread Joel Bernstein
The replace operator is going to be "replaced" :)

Let's create an umbrella ticket for string operations and list out what
would be nice to have. They can probably be added very quickly.


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 31, 2018 at 8:49 AM Gus Heck  wrote:

> Probably ReplaceWithSubstringOperation (similar to
> ReplaceWithFieldOperation thought that would probably add another class be
> subject to https://issues.apache.org/jira/browse/SOLR-9661)
>
> On Wed, Oct 31, 2018 at 8:32 AM Joel Bernstein  wrote:
>
> > I don't think there is a substring or similar function. This would be
> quite
> > nice to add along with other string manipulations.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Wed, Oct 31, 2018 at 2:37 AM Aroop Ganguly 
> > wrote:
> >
> > > Hey Team
> > >
> > >
> > > Is there a way to extract a part of a string field and group by on it
> and
> > > obtain a histogram ?
> > >
> > > for example the filed value is DateTime of the form: 20180911T00 and
> > > I want to do a substring like substring(field1,0,7), and then do a
> > > streaming expression of the form :
> > >
> > > rollup(
> > > select(
> > >  search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7)
> as
> > > date)
> > >,on= date, count(*)
> > > )
> > >
> > > Is there a substring operator available or an alternate in streaming
> > > expressions?
> > >
> > > Thanks
> > > Aroop
> >
>
>
> --
> http://www.the111shift.com
>


  1   2   3   4   5   6   7   8   >