Re: how to get rid of double quotes in solr

2020-04-15 Thread Paras Lehana
Hi,

Are you referring to the double quotes in the JSON result?

On Tue, 14 Apr 2020 at 08:29, sefty nindyastuti 
wrote:

> the data that I use is log from hadoop, my problem is hadoop log from
> cluster,
> the schema I use is filebeat --> logstash --> solr, I use logstash config
> to parse the hadoop log, the hadoop log is inputted to the logstash via
> filebeat then output from the logstash indexed to the solr
>
> Pada tanggal Sen, 13 Apr 2020 pukul 19.07 Erick Erickson <
> erickerick...@gmail.com> menulis:
>
> > I don’t quite know what you’re asking about. Is that input or intput to
> > Solr? Or is it output from logstash?
> >
> > What are you indexing? Because that doesn't look like data from a solr
> log.
> >
> > You might want to review: https://wiki.apache.org/solr/UsingMailingLists
> >
> > Best,
> > Erick
> >
> > > On Apr 13, 2020, at 12:24 AM, sefty nindyastuti 
> > wrote:
> > >
> > > I have a problem when indexing log data clusters in solr using logstash
> > and filebeat. there are double quotes in the solr index results,
> > > how to solve this problem, please help
> > >
> > > expect the results of the index that appears in solr as below:
> > >
> > >  {
> > > "input": "log"
> > > "hostname": "localhost"
> > > "id": "22eddbc9-e60f-29cd-a352-b40154ba1736",
> > > "type": "filebeat"
> > > "ephemeral_id": "1a31d6e0-8ed9-1307-215f-5dfd361364c9"
> > > "version": "7.6.1"
> > > "offset": "2061794 "
> > > "path": "
> /var/log/hadoop/hdfs/hadoop-hdfs-secondarynamenode-xx.log "
> > > "host": "localhostxxx",
> > > "message": "2020-04-11 19: 04: 28,575 INFO common.Util
> > (Util.java:receiveFile(314)) - Combined time for file downloads and fsync
> > to all disks stores 0.02s. The file download stores 0.02s at 58750.00 KB
> /
> > s Synchronous (fsync) write to disk of / hadoop / hdfs / namesecondary /
> > current / edits_tmp_ "
> > > {
> > >
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 


Re: How upgrade to Solr 8 impact performance

2020-04-15 Thread Paras Lehana
In January, we upgraded Solr from version 6 to 8 skipping all versions in
between.

The hardware and Solr configurations were kept the same but we still faced
degradation in response time by 30-50%. We had exceptional Query times
around 25 ms with Solr 6 and now we are hovering around 36 ms.

Since response times under 50 ms are very good even for Auto-Suggest, we
have not tried any changes regarding this. Nevertheless, you can try using
Caffeine Cache. Looking forward to read community inputs as well.



On Thu, 16 Apr 2020 at 01:34, ChienHuaWang  wrote:

> Do anyone have experience to upgrade the application with Solr 7.X to 8.X?
> How's the query performance?
> Found out a little slower response time from application with Solr8 based
> on
> current measurement, still looking into more detail it.
> But wondering is any one have similar experience? is that something we
> should expect for Solr 8.X?
>
> Please kindly share, thanks.
>
> Regards,
> ChienHua
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 


How upgrade to Solr 8 impact performance

2020-04-15 Thread ChienHuaWang
Do anyone have experience to upgrade the application with Solr 7.X to 8.X?
How's the query performance? 
Found out a little slower response time from application with Solr8 based on
current measurement, still looking into more detail it. 
But wondering is any one have similar experience? is that something we
should expect for Solr 8.X? 

Please kindly share, thanks.

Regards,
ChienHua



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread TK Solr

It doesn't tell much:

"debug":{ "rawquerystring":"email:*@aol.com", "querystring":"email:*@aol.com", 
"parsedquery":"(email:*@aol.com)", "parsedquery_toString":"email:*@aol.com", 
"explain":{ "11d6e092-58b5-4c1b-83bc-f3b37e0797fd":{ "match":true, "value":1.0, 
"description":"email:*@aol.com"},


The email field uses ReversedWildcardFilter for both indexing and query.

On 4/15/20 12:04 PM, Erick Erickson wrote:

What do you see if you add =query? That should tell you….

Best,
Erick


On Apr 15, 2020, at 2:40 PM, TK Solr  wrote:

Thank you.

Is there any harm if I use it on the query side too? In my case it seems working OK (even 
with withOriginal="false"), and even faster.
I see the query parser code is taking a look at index analyzer and applying 
ReversedWildcardFilter at query time. But I didn't
quite understand what happens if the query analyzer also uses 
ReversedWildcardFilter.

On 4/15/20 1:51 AM, Colvin Cowie wrote:

You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:


I experimented with the index-time only use of ReversedWildcardFilter and
the
both time use.

My result shows using ReverseWildcardFilter both times runs twice as fast
but my
dataset is not very large (in the order of 10k docs), so I'm not sure if I
can
make a conclusion.

On 4/8/20 2:49 PM, TK Solr wrote:

In the usage example shown in ReversedWildcardFilter
<

https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>


in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the

filter

is used only for indexing.







maxPosQuestion="2"

maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>










Is it incorrect to use the same analyzer for query like?








maxPosQuestion="0"

maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>



In the description of filter, I see "Tokens without wildcards are not

reversed."

But the wildcard appears only in the query string. How can
ReversedWildcardFilter know if the wildcard is being used
if the filter is used only at the indexing time?

TK






Re: Unable to RESTORE collections via Collections API

2020-04-15 Thread Eugene Livis
In case this helps somebody in the future, given how completely unhelpful
the Solr error message is - turns out the problem was occurring because in
solrconfig.xml the  updateLog was disabled. I have enabled  updateLog the
following way and "restore" operation started working:


  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}



On Wed, Apr 8, 2020 at 5:17 PM Eugene Livis  wrote:

> Hello,
>
> I have been unsuccessfully trying to find a way to restore a collection
> using the Collections API for the last several days. *I would
> greatly appreciate any help as I am now stuck.* I am using Solr 8.2 in
> cloud mode. To simplify things, I have only a single Solr node on my local
> machine (though I have also tried multi-node setups). I am able to
> successfully create collections, index documents, and run searches against
> the index. General gist of what I am trying to do is to be able to backup
> an existing collection, delete that collection, and then restore the
> collection from backup when needed. Our use case requires creating
> thousands of collections, which I believe is highly not recommended in Solr
> Cloud (is that correct?), so this backup-delete-restore mechanism is our
> way of reducing the number of collections in Solr at any given time. Just
> FYI, in our use case it is ok for searches to take a couple of minutes to
> produce results.
>
> So far I have not been able to restore a single collection using the
> collection RESTORE API. I am able to create backups by running the
> following HTTP command in Solr admin console:
>
>
> http://localhost:8983/solr/admin/collections?action=BACKUP=test1_backup2=test1_20200407_170438_20200407_170441=C:\TEST\DELETE\BACKUPS
>
>
> The backup appears to be successfully created. It contains a snapshot from
> "shard1", which is the only shard in the cluster. It also contains the
> "zk_backup" directory, which contains my Solr config and
> "collection_state.json" file. And it contains a "backup.properties" file.
> Everything looks good to me.
>
> However, when I attempt to restore the collection using the following HTTP
> command:
>
>
> http://localhost:8983/solr/admin/collections?action=RESTORE=test1_backup2=C:\TEST\DELETE\BACKUPS=test1_backup_NEW
>
>
> I get the same extremely vague error messages in Solr logs:
>
>
> *RequestHandlerBase org.apache.solr.common.SolrException: Error CREATEing
> SolrCore 'test1_backup_NEW_shard1_replica_n1': Unable to create core
> [test1_backup_NEW_shard1_replica_n1] Caused by: nullERROR
> (qtp1099855928-24) [c:test1_backup_NEW   ] o.a.s.s.HttpSolrCall
> null:org.apache.solr.common.SolrException: ADDREPLICA failed to create
> replica   *
>
> Below is the full Solr log of the restore command:
>
> 2020-04-08 21:10:56.753 INFO  (qtp1099855928-24) [   ]
> o.a.s.h.a.CollectionsHandler Invoked Collection Action :restore with params
> name=test1_backup2=RESTORE=C:\TEST\DELETE\BACKUPS=test1_backup_NEW
> and sendToOCPQueue=true
> 2020-04-08 21:10:56.783 INFO
>  (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
> [c:test1_backup_NEW   ] o.a.s.c.a.c.RestoreCmd Using existing config
> AutopsyConfig
> 2020-04-08 21:10:56.783 INFO
>  (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
> [c:test1_backup_NEW   ] o.a.s.c.a.c.RestoreCmd Starting restore into
> collection=test1_backup_NEW with backup_name=test1_backup2 at
> location=file:///C:/TEST/DELETE/BACKUPS/
> 2020-04-08 21:10:56.784 INFO
>  (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
> [c:test1_backup_NEW   ] o.a.s.c.a.c.CreateCollectionCmd Create collection
> test1_backup_NEW
> 2020-04-08 21:10:56.900 WARN
>  (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
> [c:test1_backup_NEW   ] o.a.s.c.a.c.CreateCollectionCmd It is unusual to
> create a collection (test1_backup_NEW) without cores.
> 2020-04-08 21:10:56.908 INFO
>  (OverseerStateUpdate-72057883336638464-localhost:8983_solr-n_03) [
>   ] o.a.s.c.o.SliceMutator Update shard state invoked for collection:
> test1_backup_NEW with message: {
>   "shard1":"construction",
>   "collection":"test1_backup_NEW",
>   "operation":"updateshardstate"}
> 2020-04-08 21:10:56.908 INFO
>  (OverseerStateUpdate-72057883336638464-localhost:8983_solr-n_03) [
>   ] o.a.s.c.o.SliceMutator Update shard state shard1 to construction
> 2020-04-08 21:10:56.925 INFO
>  (OverseerThreadFactory-9-thread-1-processing-n:localhost:8983_solr)
> [c:test1_backup_NEW   ] o.a.s.c.a.c.RestoreCmd Adding replica for
> shard=shard1
> collection=DocCollection(test1_backup_NEW//collections/test1_backup_NEW/state.json/0)={
>   "pullReplicas":0,
>   "replicationFactor":1,
>   "shards":{"shard1":{
>   "range":"8000-7fff",
>   "state":"active",
>   "replicas":{}}},
>   "router":{"name":"compositeId"},
>   "maxShardsPerNode":"1",
>   "autoAddReplicas":"false",
>   "nrtReplicas":1,
>   "tlogReplicas":0}
> 2020-04-08 21:10:56.928 INFO
>  

Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread Erick Erickson
What do you see if you add =query? That should tell you….

Best,
Erick

> On Apr 15, 2020, at 2:40 PM, TK Solr  wrote:
> 
> Thank you.
> 
> Is there any harm if I use it on the query side too? In my case it seems 
> working OK (even with withOriginal="false"), and even faster.
> I see the query parser code is taking a look at index analyzer and applying 
> ReversedWildcardFilter at query time. But I didn't
> quite understand what happens if the query analyzer also uses 
> ReversedWildcardFilter.
> 
> On 4/15/20 1:51 AM, Colvin Cowie wrote:
>> You only need apply it in the index analyzer:
>> https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
>> If it appears in the index analyzer, the query part of it is automatically
>> applied at query time.
>> 
>> The ReversedWildcardFilter indexes *every* token in reverse, with a special
>> character at the start ('\u0001' I believe) to avoid false positive matches
>> when the query term isn't reversed (e.g. if the term being indexed is mar,
>> then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
>> accidentally match that). If *withOriginal* is set to true then it will
>> reverse the normal token as well as the reversed token.
>> 
>> 
>> On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:
>> 
>>> I experimented with the index-time only use of ReversedWildcardFilter and
>>> the
>>> both time use.
>>> 
>>> My result shows using ReverseWildcardFilter both times runs twice as fast
>>> but my
>>> dataset is not very large (in the order of 10k docs), so I'm not sure if I
>>> can
>>> make a conclusion.
>>> 
>>> On 4/8/20 2:49 PM, TK Solr wrote:
 In the usage example shown in ReversedWildcardFilter
 <
>>> https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>
>>> 
 in Solr Ref Guide,
 and only usage find in managed-schema to define text_general_rev, the
>>> filter
 is used only for indexing.
 
 >>> positionIncrementGap="100">
 
 
 >>> ignoreCase="true"/>
 
 >> maxPosQuestion="2"
 maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>
 
 
 
 >>> ignoreCase="true" synonyms="synonyms.txt"/>
 >>> ignoreCase="true"/>
 
 
 
 
 
 Is it incorrect to use the same analyzer for query like?
 
 >>> positionIncrementGap="100">
 
 
 
 
 >> maxPosQuestion="0"
 maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>
 
 
 
 In the description of filter, I see "Tokens without wildcards are not
>>> reversed."
 But the wildcard appears only in the query string. How can
 ReversedWildcardFilter know if the wildcard is being used
 if the filter is used only at the indexing time?
 
 TK
 
 



Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread TK Solr

Thank you.

Is there any harm if I use it on the query side too? In my case it seems working 
OK (even with withOriginal="false"), and even faster.
I see the query parser code is taking a look at index analyzer and applying 
ReversedWildcardFilter at query time. But I didn't
quite understand what happens if the query analyzer also uses 
ReversedWildcardFilter.


On 4/15/20 1:51 AM, Colvin Cowie wrote:

You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:


I experimented with the index-time only use of ReversedWildcardFilter and
the
both time use.

My result shows using ReverseWildcardFilter both times runs twice as fast
but my
dataset is not very large (in the order of 10k docs), so I'm not sure if I
can
make a conclusion.

On 4/8/20 2:49 PM, TK Solr wrote:

In the usage example shown in ReversedWildcardFilter
<

https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>


in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the

filter

is used only for indexing.







maxPosQuestion="2"

maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>










Is it incorrect to use the same analyzer for query like?








maxPosQuestion="0"

maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>



In the description of filter, I see "Tokens without wildcards are not

reversed."

But the wildcard appears only in the query string. How can
ReversedWildcardFilter know if the wildcard is being used
if the filter is used only at the indexing time?

TK




Re: Defaults Merge Policy

2020-04-15 Thread Erick Erickson
The number of deleted documents will bounce around.
The default TieredMergePolicy has a rather complex
algorithm that decides which segments to 
merge, and the percentage of deleted docs in any
given segment is a factor, but not the sole determinant.

Merging is not really based on the raw number of segments,
rather on the number of segments of similar size.

But the short answer is “no, you don’t have to configure
anything explicitly”. The percentage of deleted docs
should max out at around 30% or so, although that’s a
soft number, it’s usually lower.

Unless you have some provable performance problem,
I wouldn’t worry about it. And don’t infer anything
until you’ve indexed a _lot_ of docs.

Oh, and I kind of dislike numDocs as the trigger and
tend to use time on the theory that it’s easier to explain,
whereas when commits happen when using maxDocs
varies depending on the throughput rate.

Best,
Erick

> On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
> 
> Hello, Solr Community:
> 
> I would like to ask about Default's Merge Policy for Solr 8.3.0.
> My client (SolrJ) makes a commit every 10'000 doc.
> I have not explicitly configured Merge Policy via solrconfig.xml
> For each indexing time, some documents are updated or deleted.
> I think the Default Merge Policy will merge segments automatically
> if there are too many segments.
> But, the number of deleted documents is increasing.
> 
> Is there a Default Merge Policy Configuration?
> Or, do I have to configure it?
> 
> Sincerely,
> Kaya Ota
> 
> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak



Re: Solr index size has increased in solr 7.7.2

2020-04-15 Thread David Hastings
i wouldnt worry about the index size until you get above a half terabyte or
so.  adding doc values and other features means you sacrifice things that
dont matter, like size.  memory and ssd's are cheap.

On Wed, Apr 15, 2020 at 1:21 PM Rajdeep Sahoo 
wrote:

> Hi all
> We are migrating from solr 4.6 to solr 7.7.2.
> In solr 4.6 the size was 2.5 gb but here in solr 7.7.2 the solr index size
> is showing 6.8 gb with the same no of documents. Is it expected behavior or
> any suggestions how to optimize the size.
>


Defaults Merge Policy

2020-04-15 Thread Kayak28
Hello, Solr Community:

I would like to ask about Default's Merge Policy for Solr 8.3.0.
My client (SolrJ) makes a commit every 10'000 doc.
I have not explicitly configured Merge Policy via solrconfig.xml
For each indexing time, some documents are updated or deleted.
I think the Default Merge Policy will merge segments automatically
if there are too many segments.
But, the number of deleted documents is increasing.

Is there a Default Merge Policy Configuration?
Or, do I have to configure it?

Sincerely,
Kaya Ota



-- 

Sincerely,
Kaya
github: https://github.com/28kayak


Solr index size has increased in solr 7.7.2

2020-04-15 Thread Rajdeep Sahoo
Hi all
We are migrating from solr 4.6 to solr 7.7.2.
In solr 4.6 the size was 2.5 gb but here in solr 7.7.2 the solr index size
is showing 6.8 gb with the same no of documents. Is it expected behavior or
any suggestions how to optimize the size.


Re: Optimal size for queries?

2020-04-15 Thread Mark H. Wood
On Wed, Apr 15, 2020 at 10:09:59AM +0100, Colvin Cowie wrote:
> Hi, I can't answer the question as to what the optimal size of rows per
> request is. I would expect it to depend on the number of stored fields
> being marshaled, and their type, and your hardware.

It was a somewhat naive question, but I wasn't sure how to ask a
better one.  Having thought a bit more, I expect that the eventual
solution to my problem will include a number of different changes,
including larger pages, tuning several caches, providing a progress
indicator to the user, and (as you point out below) re-thinking how I
ask Solr for so many documents.

> But using start + rows is a *bad thing* for deep paging. You need to use
> cursorMark, which looks like it was added in 4.7 originally
> https://issues.apache.org/jira/browse/SOLR-5463
> There's a description on the newer reference guide
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> and in the 4.10 PDF on page 305
> https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf
> 
> http://yonik.com/solr/paging-and-deep-paging/

Thank you for the links.  I think these will be very helpful.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: ZooKeeper 3.4 end of life

2020-04-15 Thread Jörn Franke
The problem with Solr related to use TLS with ZK is the following:
* 3.5.5 seem to only support tls certificate authentication together with TLS . 
Solr support es only digest and Kerberos authentication. However, I have to 
check in the ZK jiras if this has changed with higher ZK versions
* quorum tls will work but again only with tls authentication (not Kerberos 
etc). Again, one needs to check the ZK jiras which versions is affected and if 
this has been confirmed

If you don’t use TLS then zk in any version (potentially also ZK 3.6 - to be 
tested) should work. If you need TLS check if your authentication methods are 
supported.

> Am 15.04.2020 um 10:19 schrieb Bram Van Dam :
> 
> On 09/04/2020 16:03, Bram Van Dam wrote:
>> Thanks, Erick. I'll give it a go this weekend and see how it behaves.
>> I'll report back so there's a record of my attempts in case anyone else
>> ends up asking the same question.
> 
> Here's a quick update after non-exhaustive testing: Running SolrCloud
> 7.7.2 against ZK 3.5.7 seems to work. This is using the same Ensemble
> configuration as in 3.4, but with 4-letter-words now explicitly enabled.
> 
> ZK 3.5 allegedly makes it easier to use TLS throughout the ensemble, but
> I haven't tried that in conjunction with Solr yet. I'll give it a go if
> I can find the time.
> 
> - Bram


Re: On the delay in electing a leader when the leader is dead(Solr 7.5)

2020-04-15 Thread Erick Erickson
There’s no way leader election, even with tlog replay should take a day.
10,000 docs/minute doesn’t sound like enough to clog up 
replay either, so something’s definitely not what I’d expect.

What is your hard commit interval? That controls how big the tlog
is and thus how long it’d take to replay. Here’s more than you want to
know about that:

https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

I’d set it to, say, 15 seconds (openSearcher=false). This is entirely
independent of the soft commit interval which governs the ability
to search the docs…

Best,
Erick

> On Apr 15, 2020, at 1:31 AM, Taisuke Miyazaki  
> wrote:
> 
> Hi,
> 
> Made Of: tlog replicas + pull replicas
> Writing: leader and tlog replicas
> Loading: pull replica only
> 
> Solr version: 7.5
> Number of shards: 1
> Write throughput: 1 docs/minutes
> Number of documents: 4,500,000
> Size per document: about 4KB
> 
> During verification, the replay of the transaction log took a lot of time,
> and it took 1 day and 6 hours to select a leader.
> The tlog files are a few GB at best, so I don't think IO is the bottleneck,
> but it's taking a lot longer than I imagined!
> 
> I'd like to make the leader election quicker because I can't write until
> the leader is elected.
> Is there a faster way to do it?
> 
> There is a mechanism to re-write records that have failed to write.
> So, if I can't do leader selection quickly, I'm thinking of losing the
> replica in the tlog and starting it automatically when the leader goes down
> (if it's faster to recover that way).
> 
> Thank you to everyone in the community for always being so supportive.
> 
> Translated with www.DeepL.com/Translator (free version)



Re: facets & docValues

2020-04-15 Thread Erick Erickson
In a word, “yes”. I also suspect your corpus isn’t very big.

I think the key is the facet queries. Now, I’m talking from
theory rather than diving into the code, but querying on
a docValues=true, indexed=false field is really doing a
search. And searching on a field like that is effectively
analogous to a table scan. Even if somehow an internal
structure would be constructed to deal with it, it would 
probably be on the heap, where you don’t want it.

So the test would be to take the queries out and measure
performance, but I think that’s the root issue here.

Best,
Erick

> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> 
> We have faceting fields that have been defined as indexed=false,
> stored=false and docValues=true
> 
> However we use a lot of subfacets  using  json facets and facet ranges
> using facet.queries. We see that after every soft-commit our performance
> worsens and performs ideal between commits
> 
> how is that docValue fields are affected by soft-commit and do we need to
> enable indexing if we use subfacets and facet query to improve performance?
> 
> Tha



Re: ZooKeeper 3.4 end of life

2020-04-15 Thread Erick Erickson
Good to hear and thanks for reporting back.

The other thing ZK 3.5 allegedly makes easier is dynamically
reconfiguring the ensemble. Again, haven’t personally
tried it but I’d be cautious about that since Solr won’t be
using the 3.5 jars and just dropping the 3.5 jars in for
Solr to use would be totally uncharted territory.

> On Apr 15, 2020, at 4:19 AM, Bram Van Dam  wrote:
> 
> On 09/04/2020 16:03, Bram Van Dam wrote:
>> Thanks, Erick. I'll give it a go this weekend and see how it behaves.
>> I'll report back so there's a record of my attempts in case anyone else
>> ends up asking the same question.
> 
> Here's a quick update after non-exhaustive testing: Running SolrCloud
> 7.7.2 against ZK 3.5.7 seems to work. This is using the same Ensemble
> configuration as in 3.4, but with 4-letter-words now explicitly enabled.
> 
> ZK 3.5 allegedly makes it easier to use TLS throughout the ensemble, but
> I haven't tried that in conjunction with Solr yet. I'll give it a go if
> I can find the time.
> 
> - Bram



Re: Optimal size for queries?

2020-04-15 Thread Colvin Cowie
Hi, I can't answer the question as to what the optimal size of rows per
request is. I would expect it to depend on the number of stored fields
being marshaled, and their type, and your hardware.

But using start + rows is a *bad thing* for deep paging. You need to use
cursorMark, which looks like it was added in 4.7 originally
https://issues.apache.org/jira/browse/SOLR-5463
There's a description on the newer reference guide
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
and in the 4.10 PDF on page 305
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf

http://yonik.com/solr/paging-and-deep-paging/


On Fri, 10 Apr 2020 at 19:05, Mark H. Wood  wrote:

> I need to pull a *lot* of records out of a core, to be statistically
> analyzed and the stat.s presented to the user, who is sitting at a
> browser waiting.  So far I haven't seen a way to calculate the stat.s
> I need in Solr itself.  It's difficult to know the size of the total
> result, so I'm running the query repeatedly and windowing the results
> with 'start' and 'rows'.  I just guessed that a window of 1000
> documents would be reasonable.  We currently have about 48GB in the
> core.
>
> The product uses Solr 4.10.  Yes, I know that's very old.
>
> What I got is that every three seconds or so I get another 1000
> documents, totalling around 500KB per response.  For a user request
> for a large range, this is taking way longer than the user's browser
> is willing to wait.  The single CPU on my test box is at 99%
> continuously, and Solr's memory use is around 90% of 8GB.  The test
> hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @
> 2.70GHz'.
>
> A sample query:
>
> 0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET
> /solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2
> HTTP/1.1" 200 497475 "-"
> "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0"
>
> As you can see, my test was getting close to 1000 windows.  It's still
> going.  I don't know how far along that is.
>
> So I'm wondering:
>
> o  how can I do better than guessing that 1000 is a good window size?
>How big a response is too big?
>
> o  what else should I be thinking about?
>
> o  given that my test on a full-sized copy of the live data has been
>running for an hour and is still going, is it totally impractical
>to expect that I can improve the process enough to give a response
>to an ad-hoc query while-you-wait?
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread Colvin Cowie
You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:

> I experimented with the index-time only use of ReversedWildcardFilter and
> the
> both time use.
>
> My result shows using ReverseWildcardFilter both times runs twice as fast
> but my
> dataset is not very large (in the order of 10k docs), so I'm not sure if I
> can
> make a conclusion.
>
> On 4/8/20 2:49 PM, TK Solr wrote:
> > In the usage example shown in ReversedWildcardFilter
> > <
> https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>
>
> > in Solr Ref Guide,
> > and only usage find in managed-schema to define text_general_rev, the
> filter
> > is used only for indexing.
> >
> >> positionIncrementGap="100">
> > 
> >   
> >> ignoreCase="true"/>
> >   
> >maxPosQuestion="2"
> > maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>
> > 
> > 
> >   
> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >> ignoreCase="true"/>
> >   
> > 
> >   
> >
> >
> > Is it incorrect to use the same analyzer for query like?
> >
> >> positionIncrementGap="100">
> > 
> > 
> >   
> >   
> >maxPosQuestion="0"
> > maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>
> > 
> >   
> >
> > In the description of filter, I see "Tokens without wildcards are not
> reversed."
> > But the wildcard appears only in the query string. How can
> > ReversedWildcardFilter know if the wildcard is being used
> > if the filter is used only at the indexing time?
> >
> > TK
> >
> >
>


Re: ZooKeeper 3.4 end of life

2020-04-15 Thread Bram Van Dam
On 09/04/2020 16:03, Bram Van Dam wrote:
> Thanks, Erick. I'll give it a go this weekend and see how it behaves.
> I'll report back so there's a record of my attempts in case anyone else
> ends up asking the same question.

Here's a quick update after non-exhaustive testing: Running SolrCloud
7.7.2 against ZK 3.5.7 seems to work. This is using the same Ensemble
configuration as in 3.4, but with 4-letter-words now explicitly enabled.

ZK 3.5 allegedly makes it easier to use TLS throughout the ensemble, but
I haven't tried that in conjunction with Solr yet. I'll give it a go if
I can find the time.

 - Bram