Re: How to know which value matched in multivalued field

2019-07-12 Thread Takashi Sasaki
I found this page.
https://stackoverflow.com/questions/2135072/determine-which-value-produced-a-hit-in-solr-multivalued-field-type
Hmmm...

2019年7月12日(金) 22:08 Takashi Sasaki :
>
> Hi Solr experts,
>
> I have multivalued location on RPT field.
> Is there a way to know which location matched by query?
>
> sample query:
> =:={!bbox sfield=store}=45.15,-93.85=5
>
> Of course I can recalculate on the client side,
> but I want to know how to do it using Solr's features.
>
> Solr version is 7.3.1.
>
> Thanks,
> Takashi Sasaki


Re: indexing slow in solr 8.0.0

2019-07-12 Thread Jan Høydahl
You reduce cpu in half and see slower indexing. That is to be expected. But you 
fail to tell us any real details about your setup, your docs, how you index, 
how you measure throughput, what your bottleneck is etc.

Also note that you get better throughput when indexing for the first time than 
if you re-index on top of an existing index.

Jan

> 12. jul. 2019 kl. 15:25 skrev derrick cui :
> 
> Hi,
> I am facing an problem now. I just moved my solr cloud from one environment 
> to another one, but performance is extremely slow in the new servers. the  
> only difference is CPU. also I just copy my whole solr folder from old env to 
> new env and changed the configuration file.
> before:hardware: three servers: 8 core cpu, mem 32G, ssd:300Gindexing 400k 
> only needs 5 minutescollection: 3 shareds/2 replicas/3 nodes
> now:hardware: three servers: 4 core cpu, mem 32G, ssd:300G
> indexing 400k, less than 1 documents per minutes
> collection: 3 shareds/2 replicas/3 nodes
> 
> anyone what could cause the issue? thanks advance


Solr 7.7 restore issue

2019-07-12 Thread Mark Thill
I have a 4 node cluster.  My goal is to have 2 shards with two replicas
each and only allowing 1 core on each node.  I have a cluster policy set to:

[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

I then manually create a collection with:

name: test
config set: test
numShards: 2
replicationFact: 2

This works and I get a collection that looks like what I expect.  I then
backup this collection.  But when I try to restore the collection it fails
and says

"Error getting replica locations : No node can satisfy the rules"
[{"replica":"2", "shard": "#EACH", "collection":"test",
"port":"8983"},{"cores":"1", "node":"#ANY"}]

If I set my cluster-policy rules back to [] and try to restore it then
successfully restores my collection exactly how I expect it to be.  It
appears that having any cluster-policy rules in place is affecting my
restore, but the "error getting replica locations" is strange.

Any suggestions?

mark 


Re: SolrCloud indexing triggers merges and timeouts

2019-07-12 Thread Rahul Goswami
Upon further investigation on this issue, I see the below log lines during
the indexing process:

2019-06-06 22:24:56.203 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87
x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84]
org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: trigger
flush: activeBytes=352402600 deleteBytes=279 vs limit=104857600
2019-06-06 22:24:56.203 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87
x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84]
org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: thread
state has 352402600 bytes; docInRAM=1
2019-06-06 22:24:56.204 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87
x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84]
org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: 1 in-use
non-flushing threads states
2019-06-06 22:24:56.204 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87

I have the below questions:
1) The log line which says "thread state has 352402600 bytes; docInRAM=1 ",
does it mean that the buffer was flushed to disk with only one huge
document ?
2) If yes, does this flush create a segment with just one document ?
3) Heap dump analysis shows large (>350 MB) instances of
DocumentWritersPerThread. Does one instance of this class correspond to one
document?


Help is much appreciated.

Thanks,
Rahul


On Fri, Jul 5, 2019 at 2:11 AM Rahul Goswami  wrote:

> Shawn,Erick,
> Thank you for the explanation. The merge scheduler params make sense now.
>
> Thanks,
> Rahul
>
> On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson 
> wrote:
>
>> Two more tidbits to add to Shawn’s explanation:
>>
>> There are heuristics built in to ConcurrentMergeScheduler.
>> From the Javadocs:
>> * If it's an SSD,
>> *  {@code maxThreadCount} is set to {@code max(1, min(4,
>> cpuCoreCount/2))},
>> *  otherwise 1.  Note that detection only currently works on
>> *  Linux; other platforms will assume the index is not on an SSD.
>>
>> Second, TieredMergePolicy (the default) merges in “tiers” that
>> are of similar size. So you can have multiple merges going on
>> at the same time on disjoint sets of segments.
>>
>> Best,
>> Erick
>>
>> > On Jul 3, 2019, at 7:54 AM, Shawn Heisey  wrote:
>> >
>> > On 7/2/2019 10:53 PM, Rahul Goswami wrote:
>> >> Hi Shawn,
>> >> Thank you for the detailed suggestions. Although, I would like to
>> >> understand the maxMergeCount and maxThreadCount params better. The
>> >> documentation
>> >> <
>> https://lucene.apache.org/solr/guide/7_3/indexconfig-in-solrconfig.html#mergescheduler
>> >
>> >> mentions
>> >> that
>> >> maxMergeCount : The maximum number of simultaneous merges that are
>> allowed.
>> >> maxThreadCount : The maximum number of simultaneous merge threads that
>> >> should be running at once
>> >> Since one thread can only do 1 merge at any given point of time, how
>> does
>> >> maxMergeCount being greater than maxThreadCount help anyway? I am
>> having
>> >> difficulty wrapping my head around this, and would appreciate if you
>> could
>> >> help clear it for me.
>> >
>> > The maxMergeCount setting controls the number of merges that can be
>> *scheduled* at the same time.  As soon as that number of merges is reached,
>> the indexing thread(s) will be paused until the number of merges in the
>> schedule drops below this number.  This ensures that no more merges will be
>> scheduled.
>> >
>> > By setting maxMergeCount higher than the number of merges that are
>> expected in the schedule, you can ensure that indexing will never be
>> paused.  It would require very atypical merge policy settings for the
>> number of scheduled merges to ever reach six.  On my own indexing, I
>> reached three scheduled merges quite frequently.  The default setting for
>> maxMergeCount is three.
>> >
>> > The maxThreadCount setting controls how many of the scheduled merges
>> will be simultaneously executed. With index data on standard spinning
>> disks, you do not want to increase this number beyond 1, or you will have a
>> performance problem due to thrashing disk heads.  If your data is on SSD,
>> you can make it larger than 1.
>> >
>> > Thanks,
>> > Shawn
>>
>>


Re: Spark-Solr connector

2019-07-12 Thread Dwane Hall
Thanks Shawn I'll raise a question on the GitHub page. Cheers,
Dwane

From: Shawn Heisey 
Sent: Friday, 12 July 2019 10:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Spark-Solr connector

On 7/11/2019 8:50 PM, Dwane Hall wrote:
> I’ve just started looking at the excellent spark-solr project (thanks Tim 
> Potter, Kiran Chitturi, Kevin Risden and Jason Gerlowski for their efforts 
> with this project it looks really neat!!).
>
> I’m only at the initial stages of my exploration but I’m running into a class 
> not found exception when connecting to a secure solr cloud instance (basic 
> auth, ssl).  Everything is working as expected on a non-secure solr cloud 
> instance.
>
> The process looks pretty straightforward according to the doco so I’m 
> wondering if I’m missing anything obvious or if I need to bring any extra 
> classes to the classpath when using this project?
>
> Any advice would be greatly appreciated.

The exception here (which I did not quote) is in code from Google,
Spark, and Lucidworks.  There are no Solr classes mentioned at all in
the stacktrace.

Which means that we won't be able to help you on this list.  Looking
closer at the stacktrace, it looks to me like you're going to need to
talk to Lucidworks about this problem.

Thanks,
Shawn


Re: Getting list of unique values in a field

2019-07-12 Thread David Hastings
i found this:

https://stackoverflow.com/questions/14485031/faceting-using-solrj-and-solr4

and this

https://www.programcreek.com/java-api-examples/?api=org.apache.solr.client.solrj.response.FacetField


just from a google search

On Fri, Jul 12, 2019 at 9:46 AM Steven White  wrote:

> Thanks David.  But is there a SolrJ sample code on how to do this?  I need
> to see one, or at least the API, so I know how to make the call.
>
> Steven
>
> On Fri, Jul 12, 2019 at 9:42 AM David Hastings <
> hastings.recurs...@gmail.com>
> wrote:
>
> > just use a facet on the field should work yes?
> >
> > On Fri, Jul 12, 2019 at 9:39 AM Steven White 
> wrote:
> >
> > > Hi everyone,
> > >
> > > One of my indexed field is as follows:
> > >
> > >  > > multiValued="false" indexed="true" required="true" stored="false"/>
> > >
> > > It holds the file extension of the files I'm indexing.  That is, let us
> > say
> > > I indexed 10 million files and the result of such indexing, the field
> > > CC_FILE_EXT will now have the file extension.  In my case the unique
> file
> > > extension list is about 300.
> > >
> > > Using SolrJ, is there a quick and fast way for me to get back all the
> > > unique values this field has across all of my document?  I don't and
> > cannot
> > > scan all the 10 million indexed documents in Solr to build that list.
> > That
> > > would be very inefficient.
> > >
> > > Thanks,
> > >
> > > Steven
> > >
> >
>


Re: Getting list of unique values in a field

2019-07-12 Thread Steven White
Thanks David.  But is there a SolrJ sample code on how to do this?  I need
to see one, or at least the API, so I know how to make the call.

Steven

On Fri, Jul 12, 2019 at 9:42 AM David Hastings 
wrote:

> just use a facet on the field should work yes?
>
> On Fri, Jul 12, 2019 at 9:39 AM Steven White  wrote:
>
> > Hi everyone,
> >
> > One of my indexed field is as follows:
> >
> >  > multiValued="false" indexed="true" required="true" stored="false"/>
> >
> > It holds the file extension of the files I'm indexing.  That is, let us
> say
> > I indexed 10 million files and the result of such indexing, the field
> > CC_FILE_EXT will now have the file extension.  In my case the unique file
> > extension list is about 300.
> >
> > Using SolrJ, is there a quick and fast way for me to get back all the
> > unique values this field has across all of my document?  I don't and
> cannot
> > scan all the 10 million indexed documents in Solr to build that list.
> That
> > would be very inefficient.
> >
> > Thanks,
> >
> > Steven
> >
>


Re: Getting list of unique values in a field

2019-07-12 Thread David Hastings
just use a facet on the field should work yes?

On Fri, Jul 12, 2019 at 9:39 AM Steven White  wrote:

> Hi everyone,
>
> One of my indexed field is as follows:
>
>  multiValued="false" indexed="true" required="true" stored="false"/>
>
> It holds the file extension of the files I'm indexing.  That is, let us say
> I indexed 10 million files and the result of such indexing, the field
> CC_FILE_EXT will now have the file extension.  In my case the unique file
> extension list is about 300.
>
> Using SolrJ, is there a quick and fast way for me to get back all the
> unique values this field has across all of my document?  I don't and cannot
> scan all the 10 million indexed documents in Solr to build that list.  That
> would be very inefficient.
>
> Thanks,
>
> Steven
>


Getting list of unique values in a field

2019-07-12 Thread Steven White
Hi everyone,

One of my indexed field is as follows:



It holds the file extension of the files I'm indexing.  That is, let us say
I indexed 10 million files and the result of such indexing, the field
CC_FILE_EXT will now have the file extension.  In my case the unique file
extension list is about 300.

Using SolrJ, is there a quick and fast way for me to get back all the
unique values this field has across all of my document?  I don't and cannot
scan all the 10 million indexed documents in Solr to build that list.  That
would be very inefficient.

Thanks,

Steven


indexing slow in solr 8.0.0

2019-07-12 Thread derrick cui
Hi,
I am facing an problem now. I just moved my solr cloud from one environment to 
another one, but performance is extremely slow in the new servers. the  only 
difference is CPU. also I just copy my whole solr folder from old env to new 
env and changed the configuration file.
before:hardware: three servers: 8 core cpu, mem 32G, ssd:300Gindexing 400k only 
needs 5 minutescollection: 3 shareds/2 replicas/3 nodes
now:hardware: three servers: 4 core cpu, mem 32G, ssd:300G
indexing 400k, less than 1 documents per minutes
collection: 3 shareds/2 replicas/3 nodes

anyone what could cause the issue? thanks advance

How to know which value matched in multivalued field

2019-07-12 Thread Takashi Sasaki
Hi Solr experts,

I have multivalued location on RPT field.
Is there a way to know which location matched by query?

sample query:
=:={!bbox sfield=store}=45.15,-93.85=5

Of course I can recalculate on the client side,
but I want to know how to do it using Solr's features.

Solr version is 7.3.1.

Thanks,
Takashi Sasaki


Re: Spark-Solr connector

2019-07-12 Thread Shawn Heisey

On 7/11/2019 8:50 PM, Dwane Hall wrote:

I’ve just started looking at the excellent spark-solr project (thanks Tim 
Potter, Kiran Chitturi, Kevin Risden and Jason Gerlowski for their efforts with 
this project it looks really neat!!).

I’m only at the initial stages of my exploration but I’m running into a class 
not found exception when connecting to a secure solr cloud instance (basic 
auth, ssl).  Everything is working as expected on a non-secure solr cloud 
instance.

The process looks pretty straightforward according to the doco so I’m wondering 
if I’m missing anything obvious or if I need to bring any extra classes to the 
classpath when using this project?

Any advice would be greatly appreciated.


The exception here (which I did not quote) is in code from Google, 
Spark, and Lucidworks.  There are no Solr classes mentioned at all in 
the stacktrace.


Which means that we won't be able to help you on this list.  Looking 
closer at the stacktrace, it looks to me like you're going to need to 
talk to Lucidworks about this problem.


Thanks,
Shawn


Re: QTime

2019-07-12 Thread Edward Ribeiro
Yeah, for network latency I would recommend a tool like charlesproxy.

Edward

Em qui, 11 de jul de 2019 20:59, Erick Erickson 
escreveu:

> true, although there’s still network that can’t be included.
>
> > On Jul 11, 2019, at 5:55 PM, Edward Ribeiro 
> wrote:
> >
> > Wouldn't be the case of using =0 parameter on those requests? Wdyt?
> >
> > Edward
> >
> > Em qui, 11 de jul de 2019 14:24, Erick Erickson  >
> > escreveu:
> >
> >> Not only does Qtime not include network latency, it also doesn't include
> >> the time it takes to assemble the docs for return, which can be lengthy
> >> when rows is large..
> >>
> >> On Wed, Jul 10, 2019, 14:39 Shawn Heisey  wrote:
> >>
> >>> On 7/10/2019 3:17 PM, Lucky Sharma wrote:
>  I am seeing one very weird behaviour of QTime of SOLR.
> 
>  Scenario is :
>  When I am hitting the Solr Cloud Instance, situated at a DC with my
> >> local
>  machine while load test I was seeing 400ms Qtime response and 1sec
> Http
>  Response time.
> >>>
> >>> How much data was in the response?  If it's large, I can see it taking
> >>> that long to transfer.  This is even more likely if there is a lot of
> >>> network latency in the network between the client and the server.
> >>>
>  While I am trying to do the same process within the same DC location,
> I
> >>> am
>  getting 100 ms Solr QTime and 130ms Response Time.
> 
>  Does QTime counts network latency too??
> >>>
> >>> There's no way Solr can include the time to send the response over the
> >>> network in QTime.  The value is calculated and put into the response
> >>> before Solr starts sending.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>
>
>