SOLR not deleting records

2017-11-13 Thread vbindal
We have to SOLR colos. 

We issues a command to delete: IDS DELETED: 1000236662963,
1000224906023, 1000240171970, 1000241597424, 1000241604072,
1000241604073, 1000240171754, 1000241604056, 1000241604062,
1000237569503]

COLO1 deleted everything but COLO2 skipped some of the records. For ex:
1000224906023 was not deleted. This happens consistently. 

We are running them in Hard-commit, Soft Commit is off.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Long blocking during indexing + deleteByQuery

2017-11-13 Thread Chris Troullis
I've noticed something weird since implementing the change Shawn suggested,
I wonder if someone can shed some light on it:

Since changing from delete by query _root_:.. to querying for ids _root_:
and then deleteById(ids from root query), we have started to notice some
facet counts for child document facets not matching the actual query
results. For example, facet shows a count of 10, click on the facet which
applies a FQ with block join to return parent docs, and the number of
results is less than the facet count, when they should match (facet count
is doing a unique(_root_) so is only counting parents). I suspect that this
may be somehow caused by orphaned child documents since the delete process
changed.

Does anyone know if changing from a DBQ: _root_ to the aforementioned
querying for ids _root_ and delete by id would cause any issues with
deleting child documents? Just trying manually it seems to work fine, but
something is going on in some of our test environments.

Thanks,

Chris

On Thu, Nov 9, 2017 at 2:52 PM, Chris Troullis  wrote:

> Thanks Mike, I will experiment with that and see if it does anything for
> this particular issue.
>
> I implemented Shawn's workaround and the problem has gone away, so that is
> good at least for the time being.
>
> Do we think that this is something that should be tracked in JIRA for 6.X?
> Or should I confirm if it is still happening in 7.X before logging anything?
>
> On Wed, Nov 8, 2017 at 6:23 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I'm not sure this is what's affecting you, but you might try upgrading to
>> Lucene/Solr 7.1; in 7.0 there were big improvements in using multiple
>> threads to resolve deletions:
>> http://blog.mikemccandless.com/2017/07/lucene-gets-concurren
>> t-deletes-and.html
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Nov 7, 2017 at 2:26 PM, Chris Troullis 
>> wrote:
>>
>> > @Erick, I see, thanks for the clarification.
>> >
>> > @Shawn, Good idea for the workaround! I will try that and see if it
>> > resolves the issue.
>> >
>> > Thanks,
>> >
>> > Chris
>> >
>> > On Tue, Nov 7, 2017 at 1:09 PM, Erick Erickson > >
>> > wrote:
>> >
>> > > bq: you think it is caused by the DBQ deleting a document while a
>> > > document with that same ID
>> > >
>> > > No. I'm saying that DBQ has no idea _if_ that would be the case so
>> > > can't carry out the operations in parallel because it _might_ be the
>> > > case.
>> > >
>> > > Shawn:
>> > >
>> > > IIUC, here's the problem. For deleteById, I can guarantee the
>> > > sequencing through the same optimistic locking that regular updates
>> > > use (i.e. the _version_ field). But I'm kind of guessing here.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > On Tue, Nov 7, 2017 at 8:51 AM, Shawn Heisey 
>> > wrote:
>> > > > On 11/5/2017 12:20 PM, Chris Troullis wrote:
>> > > >> The issue I am seeing is when some
>> > > >> threads are adding/updating documents while other threads are
>> issuing
>> > > >> deletes (using deleteByQuery), solr seems to get into a state of
>> > extreme
>> > > >> blocking on the replica
>> > > >
>> > > > The deleteByQuery operation cannot coexist very well with other
>> > indexing
>> > > > operations.  Let me tell you about something I discovered.  I think
>> > your
>> > > > problem is very similar.
>> > > >
>> > > > Solr 4.0 and later is supposed to be able to handle indexing
>> operations
>> > > > at the same time that the index is being optimized (in Lucene,
>> > > > forceMerge).  I have some indexes that take about two hours to
>> > optimize,
>> > > > so having indexing stop while that happens is a less than ideal
>> > > > situation.  Ongoing indexing is similar in many ways to a merge,
>> enough
>> > > > that it is handled by the same Merge Scheduler that handles an
>> > optimize.
>> > > >
>> > > > I could indeed add documents to the index without issues at the same
>> > > > time as an optimize, but when I would try my full indexing cycle
>> while
>> > > > an optimize was underway, I found that all operations stopped until
>> the
>> > > > optimize finished.
>> > > >
>> > > > Ultimately what was determined (I think it was Yonik that figured it
>> > > > out) was that *most* indexing operations can happen during the
>> > optimize,
>> > > > *except* for deleteByQuery.  The deleteById operation works just
>> fine.
>> > > >
>> > > > I do not understand the low-level reasons for this, but apparently
>> it's
>> > > > not something that can be easily fixed.
>> > > >
>> > > > A workaround is to send the query you plan to use with
>> deleteByQuery as
>> > > > a standard query with a limited fl parameter, to retrieve matching
>> > > > uniqueKey values from the index, then do a deleteById with that
>> list of
>> > > > ID values instead.
>> > > >
>> > > > Thanks,
>> > > > Shawn
>> > > >
>> > >
>> >
>>
>
>


Re: solr cloud updatehandler stats mismatch

2017-11-13 Thread Wei
Thanks Amrit. Can you explain a bit more what kind of requests won't be
logged?  Is that something configurable for solr?

Best,
Wei

On Thu, Nov 9, 2017 at 3:12 AM, Amrit Sarkar  wrote:

> Wei,
>
> Are the requests coming through to collection has multiple shards and
> replicas. Please mind a update request is received by a node, redirected to
> particular shard the doc belong, and then distributed to replicas of the
> collection. On each replica, each core, update request is played.
>
> Can be a probable reason b/w mismatch between Mbeans stats and manual
> counting in logs, as not everything gets logged. Need to check that once.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Nov 9, 2017 at 4:34 PM, Furkan KAMACI 
> wrote:
>
> > Hi Wei,
> >
> > Do you compare it with files which are under /var/solr/logs by default?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Sun, Nov 5, 2017 at 6:59 PM, Wei  wrote:
> >
> > > Hi,
> > >
> > > I use the following api to track the number of update requests:
> > >
> > > /solr/collection1/admin/mbeans?cat=UPDATE&stats=true&wt=json
> > >
> > >
> > > Result:
> > >
> > >
> > >- class: "org.apache.solr.handler.UpdateRequestHandler",
> > >- version: "6.4.2.1",
> > >- description: "Add documents using XML (with XSLT), CSV, JSON, or
> > >javabin",
> > >- src: null,
> > >- stats:
> > >{
> > >   - handlerStart: 1509824945436,
> > >   - requests: 106062,
> > >   - ...
> > >
> > >
> > > I am quite confused that the number of requests reported above is quite
> > > different from the count from solr access logs. A few times the handler
> > > stats is much higher: handler reports ~100k requests but in the access
> > log
> > > there are only 5k update requests. What could be the possible cause?
> > >
> > > Thanks,
> > > Wei
> > >
> >
>


Re: Error when indexing EML files in Solr 7.1.0

2017-11-13 Thread Zheng Lin Edwin Yeo
Hi Erick,

I have added the apache-mime4j-core-0.7.2.jar in the Java Build Path of the
Eclipse, but it is also not working.

Regards,
Edwin

On 13 November 2017 at 23:33, Erick Erickson 
wrote:

> Where are you getting your mime4j file? MimeConfig is in
> /extraction/lib/apache-mime4j-core-0.7.2.jar and you need to make sure
> you're including that at a guess.
>
> Best,
> Erick
>
> On Mon, Nov 13, 2017 at 6:15 AM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > I am using Solr 7.1.0, and I am trying to index EML files using the
> > SimplePostTools.
> >
> > However, I get the following error
> >
> > java.lang.NoClassDefFoundError:
> > org/apache/james/mime4j/stream/MimeConfig$Builder
> >
> >
> > Is there any we class or dependencies which I need to add as compared to
> > Solr 6?
> >
> > The indexing is ok for other files type like .doc, .ppt. I only face the
> > error when indexing .eml files.
> >
> > Regards,
> > Edwin
>


Re: Sol rCloud collection design considerations / best practice

2017-11-13 Thread Erick Erickson
Have you considered collection aliasing? You can create an alias that
points to multiple collections. So you could keep specific collections
and have aliases that encompass your regions

The one caveat here is that sorting the final result set by score will
require that the collections be roughly similar in terms of TF/IDF.

Best,
Erick

On Mon, Nov 13, 2017 at 11:33 AM, Shamik Bandopadhyay  wrote:
> Hi,
>
> I'm looking for some input on design considerations for defining
> collections in a SolrCloud cluster. Right now, our cluster consists of two
> collections in a 2 shard / 2 replica mode. Each collection has a dedicated
> set of source and don't overlap, which made it an easy decision.
> Recently, we've a requirement to index a bunch of new sources that are
> region based. The search result corresponding to those region needs to come
> from their specific source as well sources from one of our existing
> collection. Here's an example of our existing collection and their
> corresponding source(s).
>
> Existing Collection:
> --
> Collection A --> Source_A, Source_B
> Collection B --> Source_C, Source_D, Source_E
>
> Proposed Collection:
> 
> Collection_Asia --> Source_Asia, Source_C, Source_D, Source_E
> Collection_Europe --> Source_Europe, Source_C, Source_D, Source_E
> Collection_Australia --> Source_Asutralia, Source_C, Source_D, Source_E
>
> The proposed collection part shows that each geo has its dedicated source
> as well as source(s) from existing collection B.
>
> Just wondering if creating a dedicated collection for each geo is the right
> approach here. The main motivation is to support a geo-specific relevancy
> model which can easily be customized without stepping into each other. On
> the downside, I'm not sure if it's a good idea to replicate data from the
> same source across various collections. Moreover, the data within the
> source are not relational, so joining across collection might not be
> an easy proposition.
> The other consideration is the hardware design. Right now, both shards and
> their replicas run on their dedicated instance. With two collections, we
> sometimes run into OOM scenarios, so I'm a little bit worried about adding
> more collections. Does the best practice (I know it's subjective) in
> scenarios like this call for a dedicated Solr cluster per collection? From
> index size perspective, Source_C,Source_D and Source_E combines close to10
> million documents with 60gb volume size. Each geo based source is small,
> won't exceed more than 500k documents.
>
> Any pointers will be appreciated.
>
> Thanks,
> Shamik


Re: Limiting by range of sum across documents

2017-11-13 Thread Emir Arnautović
Hi Chris,

I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted 
products.

Can you share full json faceting request - numFound:0 suggest that something is 
completely wrong.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Nov 2017, at 21:56, ch...@yeeplusplus.com wrote:
> 
> 
> 
> 
> �
> Hi Emir,
> I can't apply filters to the original query because I don't know in advance 
> which filters will meet the criterion I'm looking for.� Unless I'm missing 
> something obvious.�
> �
> I tried the JSON facet you suggested but received
> 
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> 
>   � },
> 
>   � "facet_counts":{
> 
>   � � "facet_queries":{},
> 
>   � � "facet_fields":{},
> 
>   � � "facet_dates":{},
> 
>   � � "facet_ranges":{},
> 
>   � � "facet_intervals":{},
> 
>   � � "facet_heatmaps":{}},
> 
>   � "facets":{
> 
>   � � "count":0}}
> 
>   �
> 
> �
> 
> 
>> Hi Chris,
> 
>> You mention it returns all manufacturers? Even after you apply filters 
>> (don’t see filter in your example)? You can control how many facets 
>> are returned with facet.limit and you can use face.pivot.mincount to 
>> determine how many facets are returned. If you calculate sum on all
> manufacturers, it can last.
>> 
> 
>> Maybe you can try json faceting. Something like (url style):
> 
>> 
> 
>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”
> 
>> 
> 
>> HTH,
> 
>> Emir
> 
>> --
> 
>> Monitoring - Log Management - Alerting - Anomaly Detection
> 
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
>> 
> 
>> 
> 
>> 
> 
>>> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote:
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> I have documents in solr that look like this:
> 
>>> {
> 
>>> "id": "acme-1",
> 
>>> "manufacturer": "acme",
> 
>>> "product_name": "Foo",
> 
>>> "price": 3.4
> 
>>> }
> 
>>> 
> 
>>> There are about
> 
>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 
>>> products.
> 
>>> I'd like to return the sum of all prices that are in the range [100, 200], 
>>> faceted by manufacturer. In other words, for each manufacturer, sum the 
>>> prices of all products for that manufacturer,
> 
>>> and return the sum and the manufacturer name. For example:
> 
>>> [
> 
>>> {
> 
>>> "manufacturer": "acme",
> 
>>> "sum": 150.5
> 
>>> },
> 
>>> {
> 
>>> "manufacturer": "Johnson,
> 
>>> Inc.",
> 
>>> "sum": 167.0
> 
>>> },
> 
>>> ...
> 
>>> ]
> 
>>> 
> 
>>> I tried this:
> 
>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 
>>> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
> 
>>> which "works" on a test
> 
>>> subset of 1,000 manufacturers. However, there are two problems:
> 
>>> 1) This query returns all the manufacturers, so I have to iterate over the 
>>> entire response object to extract the ones I want.
> 
>>> 2) The query on the whole data set takes more than 600 seconds to return, 
>>> which doesn't fit
> 
>>> our target response time
> 
>>> 
> 
>>> How can I perform this query?
> 
>>> We're using solr version 5.5.5.
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> Thanks,
> 
>>> Chris
> 
>>> 
> 
>> 
> 
>> 



Re: TimeoutException, IOException, Read timed out

2017-11-13 Thread Fengtan
I am happy to report that <1> fixed these:
  PERFORMANCE WARNING: Overlapping onDeckSearchers=2

We still occasionnally see timeouts so we may have to explore <2>.





On Thu, Oct 26, 2017 at 12:12 PM, Fengtan  wrote:

> Thanks Erick and Emir -- we are going to start with <1> and possibly <2>.
>
> On Thu, Oct 26, 2017 at 7:06 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Fengtan,
>> I would just add that when merging collections, you might want to use
>> document routing (https://lucene.apache.org/sol
>> r/guide/6_6/shards-and-indexing-data-in-solrcloud.html#Shard
>> sandIndexingDatainSolrCloud-DocumentRouting <
>> https://lucene.apache.org/solr/guide/6_6/shards-and-indexin
>> g-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting>)
>> - since you are keeping separate collections, I guess you have a
>> “collection ID” to use as routing key. This will enable you to have one
>> collection but query only shard(s) with data from one “collection”.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 25 Oct 2017, at 19:25, Erick Erickson 
>> wrote:
>> >
>> > <1> It's not the explicit commits are expensive, it's that they happen
>> > too fast. An explicit commit and an internal autocommit have exactly
>> > the same cost. Your "overlapping ondeck searchers"  is definitely an
>> > indication that your commits are happening from somwhere too quickly
>> > and are piling up.
>> >
>> > <2> Likely a good thing, each collection increases overhead. And
>> > 1,000,000 documents is quite small in Solr's terms unless the
>> > individual documents are enormous. I'd do this for a number of
>> > reasons.
>> >
>> > <3> Certainly an option, but I'd put that last. Fix the commit problem
>> first ;)
>> >
>> > <4> If you do this, make the autowarm count quite small. That said,
>> > this will be very little use if you have frequent commits. Let's say
>> > you commit every second. The autowarming will warm caches, which will
>> > then be thrown out a second later. And will increase the time it takes
>> > to open a new searcher.
>> >
>> > <5> Yeah, this would probably just be a band-aid.
>> >
>> > If I were prioritizing these, I'd do
>> > <1> first. If you control the client, just don't call commit. If you
>> > do not control the client, then what you've outlined is fine. Tip: set
>> > your soft commit settings to be as long as you can stand. If you must
>> > have very short intervals, consider disabling your caches completely.
>> > Here's a long article on commits
>> > https://lucidworks.com/2013/08/23/understanding-transaction-
>> logs-softcommit-and-commit-in-sorlcloud/
>> >
>> > <2> Actually, this and <1> are pretty close in priority.
>> >
>> > Then re-evaluate. Fixing the commit issue may buy you quite a bit of
>> > time. Having 1,000 collections is pushing the boundaries presently.
>> > Each collection will establish watchers on the bits it cares about in
>> > ZooKeeper, and reducing the watchers by a factor approaching 1,000 is
>> > A Good Thing.
>> >
>> > Frankly, between these two things I'd pretty much expect your problems
>> > to disappear. wouldn't be the first time I've been totally wrong, but
>> > it's where I'd start ;)
>> >
>> > Best,
>> > Erick
>> >
>> > On Wed, Oct 25, 2017 at 8:54 AM, Fengtan  wrote:
>> >> Hi,
>> >>
>> >> We run a SolrCloud 6.4.2 cluster with ZooKeeper 3.4.6 on 3 VM's.
>> >> Each VM runs RHEL 7 with 16 GB RAM and 8 CPU and OpenJDK 1.8.0_131 ;
>> each
>> >> VM has one Solr and one ZK instance.
>> >> The cluster hosts 1,000 collections ; each collection has 1 shard and
>> >> between 500 and 50,000 documents.
>> >> Documents are indexed incrementally every day ; the Solr client mostly
>> does
>> >> searching.
>> >> Solr runs with -Xms7g -Xmx7g.
>> >>
>> >> Everything has been working fine for about one month but a few days
>> ago we
>> >> started to see Solr timeouts: https://pastebin.com/raw/E2prSrQm
>> >>
>> >> Also we have always seen these:
>> >>  PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>> >>
>> >>
>> >> We are not sure what is causing the timeouts, although we have
>> identified a
>> >> few things that could be improved:
>> >>
>> >> 1) Ignore explicit commits using IgnoreCommitOptimizeUpdateProc
>> essorFactory
>> >> -- we are aware that explicit commits are expensive
>> >>
>> >> 2) Drop the 1,000 collections and use a single one instead (all our
>> >> collections use the same schema/solrconfig.xml) since stability
>> problems
>> >> are expected when the number of collections reaches the low hundreds
>> >> . The
>> >> downside is that the new collection would contain 1,000,000 documents
>> which
>> >> may bring new challenges.
>> >>
>> >> 3) Tune the GC and possibly switch from CMS to G1 as it seems to bring
>> a
>> >> better performance according to this

Re: Limiting by range of sum across documents

2017-11-13 Thread chris



�
Hi Emir,
I can't apply filters to the original query because I don't know in advance 
which filters will meet the criterion I'm looking for.� Unless I'm missing 
something obvious.��
�
I tried the JSON facet you suggested but received

"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

� },

� "facet_counts":{

� � "facet_queries":{},

� � "facet_fields":{},

� � "facet_dates":{},

� � "facet_ranges":{},

� � "facet_intervals":{},

� � "facet_heatmaps":{}},

� "facets":{

� � "count":0}}

�

�


> Hi Chris,

> You mention it returns all manufacturers? Even after you apply filters 
> (don’t see filter in your example)? You can control how many facets are 
> returned with facet.limit and you can use face.pivot.mincount to determine 
> how many facets are returned. If you calculate sum on all
manufacturers, it can last.
>

> Maybe you can try json faceting. Something like (url style):

>

> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”

>

> HTH,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote:

>>

>>

>>

>>

>> I have documents in solr that look like this:

>> {

>> "id": "acme-1",

>> "manufacturer": "acme",

>> "product_name": "Foo",

>> "price": 3.4

>> }

>>

>> There are about

>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 
>> products.

>> I'd like to return the sum of all prices that are in the range [100, 200], 
>> faceted by manufacturer. In other words, for each manufacturer, sum the 
>> prices of all products for that manufacturer,

>> and return the sum and the manufacturer name. For example:

>> [

>> {

>> "manufacturer": "acme",

>> "sum": 150.5

>> },

>> {

>> "manufacturer": "Johnson,

>> Inc.",

>> "sum": 167.0

>> },

>> ...

>> ]

>>

>> I tried this:

>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 
>> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer

>> which "works" on a test

>> subset of 1,000 manufacturers. However, there are two problems:

>> 1) This query returns all the manufacturers, so I have to iterate over the 
>> entire response object to extract the ones I want.

>> 2) The query on the whole data set takes more than 600 seconds to return, 
>> which doesn't fit

>> our target response time

>>

>> How can I perform this query?

>> We're using solr version 5.5.5.

>>

>>

>>

>> Thanks,

>> Chris

>>

>

>


Sol rCloud collection design considerations / best practice

2017-11-13 Thread Shamik Bandopadhyay
Hi,

I'm looking for some input on design considerations for defining
collections in a SolrCloud cluster. Right now, our cluster consists of two
collections in a 2 shard / 2 replica mode. Each collection has a dedicated
set of source and don't overlap, which made it an easy decision.
Recently, we've a requirement to index a bunch of new sources that are
region based. The search result corresponding to those region needs to come
from their specific source as well sources from one of our existing
collection. Here's an example of our existing collection and their
corresponding source(s).

Existing Collection:
--
Collection A --> Source_A, Source_B
Collection B --> Source_C, Source_D, Source_E

Proposed Collection:

Collection_Asia --> Source_Asia, Source_C, Source_D, Source_E
Collection_Europe --> Source_Europe, Source_C, Source_D, Source_E
Collection_Australia --> Source_Asutralia, Source_C, Source_D, Source_E

The proposed collection part shows that each geo has its dedicated source
as well as source(s) from existing collection B.

Just wondering if creating a dedicated collection for each geo is the right
approach here. The main motivation is to support a geo-specific relevancy
model which can easily be customized without stepping into each other. On
the downside, I'm not sure if it's a good idea to replicate data from the
same source across various collections. Moreover, the data within the
source are not relational, so joining across collection might not be
an easy proposition.
The other consideration is the hardware design. Right now, both shards and
their replicas run on their dedicated instance. With two collections, we
sometimes run into OOM scenarios, so I'm a little bit worried about adding
more collections. Does the best practice (I know it's subjective) in
scenarios like this call for a dedicated Solr cluster per collection? From
index size perspective, Source_C,Source_D and Source_E combines close to10
million documents with 60gb volume size. Each geo based source is small,
won't exceed more than 500k documents.

Any pointers will be appreciated.

Thanks,
Shamik


Re: minimum should match for only for few fields

2017-11-13 Thread Emir Arnautović
Hi Vincenzo,
It is not perfect, but you could achieve something similar using _query_ hook, 
e.g.:

&defType=lucene&q=_query_:”{defType=edismax qf=‘f1 f2’ mm=‘2’}my query” OR 
_query_:”{defType=edismax qf=‘f3 f4’ mm=‘1’}my query”

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Nov 2017, at 11:22, Vincenzo D'Amore  wrote:
> 
> Hi All,
> 
> Not sure if what I'm asking is possible, but I'm looking for a way to
> define minimum should match ("mm") only for few fields in a query.
> 
> And if possibile, is there any chance to configure this behaviour in a
> request handler in solrconfig.xml?
> 
> Best regards,
> Vincenzo



Re: Multiple collections for a write-alias

2017-11-13 Thread S G
We are actually very close to doing what Shawn has suggested.

Emir has a good point about new collections failing on deletes/updates of
older documents which were not present in the new collection. But even if
this
feature can be implemented for an append-only log, it would make a good
feature IMO.


Use-case for re-indexing everything again is generally that of an attribute
change like
enabling "indexed" or "docValues" on a field or adding a new field to a
schema.
While the reading client-code sits behind a flag to start using the new
attribute/field, we
have to re-index all the data without stopping older-format reads.
Currently, we have to do
dual writes to the new collections or play catch-up-after-a-bootstrap.


Note that the catch-up-after-a-bootstrap is not very easy too (it is very
similar to the one
described by Shwan). If this special place is Kafka or some table in the
DB, then we have to
do dual writes to the regular source-of-truth and this special place. Dual
writes with DB and Kafka
suffer from being transaction-less (and thus lack consistency) while dual
write to DB increase
the load on DB.


Having created_date / modified_date fields and querying the DB to find
live-traffic documents has
its own problems and is taxing on the DB again.


Dual writes to Solr's multiple collections directly is the simplest to
implement for a client and
that is exactly what this new feature could be. With a
dual-write-collection-alias, it becomes
easier for the client to not implement any of the above if the
dual-write-collection-alias does the following:

- Deletes on missing documents in new collection are simply ignored.
- Incremental updates just throw an error for not being supported on
multi-write-collection-alias.
- Regular updates (i.e. Delete-Then-Insert) should work just fine because
they will just treat the document as a brand new one and versioning
strategies can take care of out-of-order updates.


SG


On Fri, Nov 10, 2017 at 6:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> This approach could work only if it is append only index. In case you have
> updates/deletes, you have to process in order, otherwise you will get
> incorrect results. I am thinking that is one of the reasons why it might
> not be supported since not too useful.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 9 Nov 2017, at 19:09, S G  wrote:
> >
> > Hi,
> >
> > We have a use-case to re-create a solr-collection by re-ingesting
> > everything but not tolerate a downtime while that is happening.
> >
> > We are using collection alias feature to point to the new collection when
> > it has been re-ingested fully.
> >
> > However, re-ingestion takes several hours to complete and during that
> time,
> > the customer has to write to both the collections - previous collection
> and
> > the one being bootstrapped.
> > This dual-write is harder to do from the client side (because client
> needs
> > to have a retry logic to ensure any update does not succeed in one
> > collection and fails in another - consistency problem) and it would be a
> > real welcome addition if collection aliasing can support this.
> >
> > Proposal:
> > If can enhance the write alias to point to multiple collections such that
> > any update to the alias is written to all the collections it points to,
> it
> > would help the client to avoid dual writes and also issue just a single
> > http call from the client instead of multiple. It would also reduce the
> > retry logic inside the client code used to keep the collections
> consistent.
> >
> >
> > Thanks
> > SG
>
>


Re: Limiting by range of sum across documents

2017-11-13 Thread Emir Arnautović
Hi Chris,
You mention it returns all manufacturers? Even after you apply filters (don’t 
see filter in your example)? You can control how many facets are returned with 
facet.limit and you can use face.pivot.mincount to determine how many facets 
are returned. If you calculate sum on all manufacturers, it can last.

Maybe you can try json faceting. Something like (url style):

…&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Nov 2017, at 19:09, ch...@yeeplusplus.com wrote:
> 
> 
> 
> 
> I have documents in solr that look like this:
> {
>   "id": "acme-1",
>   "manufacturer": "acme",
>   "product_name": "Foo",
>   "price": 3.4
> }
>  
> There are about
> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 
> products.  
> I'd like to return the sum of all prices that are in the range [100, 200], 
> faceted by manufacturer.  In other words, for each manufacturer, sum the 
> prices of all products for that manufacturer,
> and return the sum and the manufacturer name.  For example:
> [
>   {
> "manufacturer": "acme",
> "sum": 150.5
>   },
>   {
> "manufacturer": "Johnson,
> Inc.",
> "sum": 167.0
>   },
> ...
> ]
>  
> I tried this:
> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 
> sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
> which "works" on a test
> subset of 1,000 manufacturers.  However, there are two problems:
> 1) This query returns all the manufacturers, so I have to iterate over the 
> entire response object to extract the ones I want.
> 2) The query on the whole data set takes more than 600 seconds to return, 
> which doesn't fit
> our target response time
>  
> How can I perform this query?
> We're using solr version 5.5.5.
>
> 
>  
> Thanks,
> Chris
>  



RE: Phrase suggester - field limit and order

2017-11-13 Thread ruby
thanks for your reply.
I'm not seeing any documentation explaining exactly how the weightField is
used.

So, is it just a field which I define on each document and populate with
some number during index. And during search it will be used to sort the
suggestions?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Spellcheck returning suggestions for words that exist in the dictionary

2017-11-13 Thread Sanjana Sridhar
Hi Alessandro,

I'm currently on Solr version 6.2.1, but will soon be moving to 6.6. I'm
not using DirectSolrSpellcheck, but using Index and File based.
The words I was testing against are definitely available in the File and
possibly in the Index as well.

What I found was if I don't set the maxResultsForSuggest field, Solr would
always try to spell correct. So for example,

Searching for "nike", gets corrected to "bike",

{"responseHeader":{"status":0,"QTime":2167,"params":{"spellcheck.q":"*nike*
","spellcheck":"true","wt":"json","spellcheck.build":"true","spellcheck.extendedResults":"true"}},"command":"build","response":{"numFound":0,"start":0,"docs":[]},"spellcheck":{"suggestions":["nike",{"numFound":1,"startOffset":0,"endOffset":4,"origFreq":0,"suggestion":[{"word":"
*bike*
","freq":-1}]}],"correctlySpelled":false,"collations":["collation","bike"]}}

But searching for "bike", gets corrected to "bake"

{"responseHeader":{"status":0,"QTime":2048,"params":{"spellcheck.q":"*bike*
","spellcheck":"true","wt":"json","spellcheck.build":"true","spellcheck.extendedResults":"true"}},"command":"build","response":{"numFound":0,"start":0,"docs":[]},"spellcheck":{"suggestions":["bike",{"numFound":1,"startOffset":0,"endOffset":4,"origFreq":0,"suggestion":[{"word":"
*bake*
","freq":-1}]}],"correctlySpelled":false,"collations":["collation","bake"]}}




On Mon, Nov 13, 2017 at 10:43 AM, alessandro.benedetti  wrote:

> Which Solr version are you using ?
>
> From the documentation :
> "Only query words, which are absent in index or too rare ones (below
> maxQueryFrequency ) are considered as misspelled and used for finding
> suggestions.
> ...
> These parameters (maxQueryFrequency and thresholdTokenFrequency) can be a
> percentage (such as .01, or 1%) or an absolute value (such as 4)."
>
> Checking in the latest source code[1] : public static final float
> DEFAULT_MAXQUERYFREQUENCY = 0.01f;
>
> This means that for the direct Solr Spellcheck, you should not get the
> suggestion if the term has a Document Frequency >=0.01 ( so if a term is in
> the index ) .
> Can you show us the snippet of the result you got ?
>
>
>
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 

 

Sanjana Sridhar
Flipp Corporation

p: 226-600-2281
e: sanjana.srid...@flipp.com

-- 
IMPORTANT NOTICE:  This message, including any attachments (hereinafter 
collectively referred to as "Communication"), is intended only for the 
addressee(s) 
named above.  This Communication may include information that is 
privileged, confidential and exempt from disclosure under applicable law. 
 If the recipient of this Communication is not the intended recipient, or 
the employee or agent responsible for delivering this Communication to the 
intended recipient, you are notified that any dissemination, distribution 
or copying of this Communication is strictly prohibited.  If you have 
received this Communication in error, please notify the sender immediately 
by phone or email and permanently delete this Communication from your 
computer without making a copy. Thank you.


Re: Spellcheck returning suggestions for words that exist in the dictionary

2017-11-13 Thread alessandro.benedetti
Which Solr version are you using ?

>From the documentation : 
"Only query words, which are absent in index or too rare ones (below
maxQueryFrequency ) are considered as misspelled and used for finding
suggestions.
...
These parameters (maxQueryFrequency and thresholdTokenFrequency) can be a
percentage (such as .01, or 1%) or an absolute value (such as 4)."

Checking in the latest source code[1] : public static final float
DEFAULT_MAXQUERYFREQUENCY = 0.01f;

This means that for the direct Solr Spellcheck, you should not get the
suggestion if the term has a Document Frequency >=0.01 ( so if a term is in
the index ) .
Can you show us the snippet of the result you got ?








-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: minimum should match for only for few fields

2017-11-13 Thread Erick Erickson
By definition mm is applied across all the fields you define in "df"
in your edismax handler.

You can always override df on a per-query basis, but there's no way
that I know of to say "mm really only applies to fields a, b, c even
though df is set to a, b, c, d, e, f


Best,
Erick

On Mon, Nov 13, 2017 at 2:22 AM, Vincenzo D'Amore  wrote:
> Hi All,
>
> Not sure if what I'm asking is possible, but I'm looking for a way to
> define minimum should match ("mm") only for few fields in a query.
>
> And if possibile, is there any chance to configure this behaviour in a
> request handler in solrconfig.xml?
>
> Best regards,
> Vincenzo


Re: Error when indexing EML files in Solr 7.1.0

2017-11-13 Thread Erick Erickson
Where are you getting your mime4j file? MimeConfig is in
/extraction/lib/apache-mime4j-core-0.7.2.jar and you need to make sure
you're including that at a guess.

Best,
Erick

On Mon, Nov 13, 2017 at 6:15 AM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> I am using Solr 7.1.0, and I am trying to index EML files using the
> SimplePostTools.
>
> However, I get the following error
>
> java.lang.NoClassDefFoundError:
> org/apache/james/mime4j/stream/MimeConfig$Builder
>
>
> Is there any we class or dependencies which I need to add as compared to
> Solr 6?
>
> The indexing is ok for other files type like .doc, .ppt. I only face the
> error when indexing .eml files.
>
> Regards,
> Edwin


Re: Using Ltr and payload together

2017-11-13 Thread alessandro.benedetti
It depends how you want to use the payloads.

If you want to use the payloads to calculate additional features, you can
implement a payload feature:

This feature could calculate the sum of numerical payload for the query
terms in each document ( so it will be a query dependent feature and will
leverage the encoded indexed payload for the field).

Alternatively you could use the payloads to affect the original Solr score
before the re-ranking happens ( this makes sense only if you use the
original Solr score as a feature) .

I recommend you this blog about playloads [1].

So, long story short, it depends.

[1] https://lucidworks.com/2017/09/14/solr-payloads/



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to routing document for send to particular shard range

2017-11-13 Thread Amrit Sarkar
Surely someone else can chim in;

but when you say: "so regarding to it we need to index the particular
> client data into particular shard so if its  manageable than we will
> improve the performance as we need"


You can / should create different collections for different client data, so
that you can for surely improve performance as per need. There are multiple
configurations which drives indexing and querying capabilities and
incorporating everything in single collection will hinder that flexibility.
Also if you need to add new client in future, you don't need to think about
sharding again, add new collection and tweak its configuration as per need.

Still if you need to use compositeKey to acheive your use-case, I am not
sure how to do that honestly. Since shards are predefined when collection
will be created. You cannot add more shards and such. You can only split a
shard, which will divide the index and hence the hash range. I will
strongly recommend you to reconsider your SolrCloud design technique for
your use-case.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki  wrote:

>
> Thanks Amrit,
>
> My requirement to achieve best performance while using document routing
> facility in solr so regarding to it we need to index the particular client
> data into particular shard so if its  manageable than we will improve the
> performance as we need.
>
> Please do needful.
>
>
> Regards,
>
>
> -Original Message-
> From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> Sent: Friday, November 10, 2017 5:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to routing document for send to particular shard range
>
> Ketan,
>
> here I have also created new field 'core' which value is any shard where I
> > need to send documents and on retrieval use '_route_'  parameter with
> > mentioning the particular shard. But issue facing still my
> > clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
>
>
> Only answering this query, as Erick has already mentioned in the above
> comment. You need to RECREATE the collection passinfg the "route.field" in
> the "create collection" api parameters as "route.field" is
> collection-specific property maintained at zookeeper (state.json /
> clusterstate.json).
>
> https://lucene.apache.org/solr/guide/6_6/collections-
> api.html#CollectionsAPI-create
>
> I highly recommend not to alter core.properties manually when dealing with
> SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki  wrote:
>
> > Hi Erik,
> >
> > My requirement to index the documents of particular organization to
> > specific shard. Also I have made changes in core.properties as menions
> > below.
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > Workset Collection:
> > name=workset
> > shard=shard1
> > collection=workset
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > here I have also created new field 'core' which value is any shard
> > where I need to send documents and on retrieval use '_route_'
> > parameter with mentioning the particular shard. But issue facing still
> > my clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
> >
> > Please do needful.
> >
> > Regards,
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Friday, November 10, 2017 12:06 PM
> > To: solr-user
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > You cannot just make configuration changes, whether you use implicit
> > or compositeId is defined when you _create_ the collection and cannot
> > be changed later.
> >
> > You need to create a new collection and specify router.name=implicit
> > when you create it. Then you can route documents as you desire.
> >
> > I would caution against this though. If you use implicit routing _you_
> > have to insure balancing. For instance, you could have 10,000,000
> > documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced
> shards.
> >
> > Implicit routing is particularly useful for time-series indexing,
> > where you, say, index a day's worth of documents to each shard. It may
> > be appropriate in your case, but so far you haven't told us _why_ you
> > think routing docs to particular shards is desirable.
> >
> > Best,
> > 

Error when indexing EML files in Solr 7.1.0

2017-11-13 Thread Zheng Lin Edwin Yeo
Hi,

I am using Solr 7.1.0, and I am trying to index EML files using the
SimplePostTools.

However, I get the following error

java.lang.NoClassDefFoundError:
org/apache/james/mime4j/stream/MimeConfig$Builder


Is there any we class or dependencies which I need to add as compared to
Solr 6?

The indexing is ok for other files type like .doc, .ppt. I only face the
error when indexing .eml files.

Regards,
Edwin


RE: How to routing document for send to particular shard range

2017-11-13 Thread Ketan Thanki

Thanks Amrit,

My requirement to achieve best performance while using document routing 
facility in solr so regarding to it we need to index the particular client data 
into particular shard so if its  manageable than we will improve the 
performance as we need.

Please do needful.


Regards,


-Original Message-
From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
Sent: Friday, November 10, 2017 5:34 PM
To: solr-user@lucene.apache.org
Subject: Re: How to routing document for send to particular shard range

Ketan,

here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with 
> mentioning the particular shard. But issue facing still my 
> clusterstate.json showing the "router":{"name":"compositeId"} is it 
> means my settings not impacted? or its default.


Only answering this query, as Erick has already mentioned in the above comment. 
You need to RECREATE the collection passinfg the "route.field" in the "create 
collection" api parameters as "route.field" is collection-specific property 
maintained at zookeeper (state.json / clusterstate.json).

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-create

I highly recommend not to alter core.properties manually when dealing with 
SolrCloud and instead relying on SolrCloud APIs to make necessary change.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki  wrote:

> Hi Erik,
>
> My requirement to index the documents of particular organization to 
> specific shard. Also I have made changes in core.properties as menions 
> below.
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> Workset Collection:
> name=workset
> shard=shard1
> collection=workset
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> here I have also created new field 'core' which value is any shard 
> where I need to send documents and on retrieval use '_route_'
> parameter with mentioning the particular shard. But issue facing still 
> my clusterstate.json showing the "router":{"name":"compositeId"} is it 
> means my settings not impacted? or its default.
>
> Please do needful.
>
> Regards,
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, November 10, 2017 12:06 PM
> To: solr-user
> Subject: Re: How to routing document for send to particular shard 
> range
>
> You cannot just make configuration changes, whether you use implicit 
> or compositeId is defined when you _create_ the collection and cannot 
> be changed later.
>
> You need to create a new collection and specify router.name=implicit 
> when you create it. Then you can route documents as you desire.
>
> I would caution against this though. If you use implicit routing _you_ 
> have to insure balancing. For instance, you could have 10,000,000 
> documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards.
>
> Implicit routing is particularly useful for time-series indexing, 
> where you, say, index a day's worth of documents to each shard. It may 
> be appropriate in your case, but so far you haven't told us _why_ you 
> think routing docs to particular shards is desirable.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki  wrote:
> > Thanks Amrit,
> >
> > For suggesting me the approach.
> >
> > I have got some understanding regarding to it and i need to 
> > implement
> implicit routing for specific shard based. I have try by make changes 
> on core.properties. but it can't work So can you please let me for the 
> configuration changes needed. Is it need to create extra field for 
> document to rout?
> >
> > I have below configuration Collection created manually:
> > 1: Workset with 4 shard and 4 replica
> > 2: Model with 4 shard and 4 replica
> >
> >
> > For e.g Core.properties for 1 shard :
> > Workset Colection:
> > name=workset
> > shard=shard1
> > collection=workset
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> >
> >
> > So can u please let me the changes needed in configuration for the
> implicit routing.
> >
> > Please do needful.
> >
> > Regards,
> >
> >
> > -Original Message-
> > From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> > Sent: Wednesday, November 08, 2017 12:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How to routing document for send to particular shard 
> > range
> >
> > Ketan,
> >
> > If you know defined indexing architecture; isn't it better to use
> "implicit" router by writing logic on your own end.
> >
> > If the document is of "Org1", send the document with extra param*
> > "_route_:shard1"* and likewise.
> >
> > Snippet from official doc:

Re: cannot create core when SSL is enabled

2017-11-13 Thread misschak
In your *solr.in.sh*, set

# By default the start script uses "localhost"; override the hostname here
# for production SolrCloud environments to control the hostname exposed to
cluster state
SOLR_HOST= 


Younge, Kent A - Norman, OK - Contractor wrote
> Hello,
> 
> I am getting an error message when trying to create a core when ssl is
> enabled ERROR: Certificate for 
> 
>  doesn't match any of the subject alternative names:
> 
> However, if I turn off ssl I can create the core just fine.   I have my
> certificates in the solr-6.5.1 directory should they be placed somewhere
> else to resolve this issue?
> 
> 
> 
> 
> 
> Thanks,
> 
> Kent





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


minimum should match for only for few fields

2017-11-13 Thread Vincenzo D'Amore
Hi All,

Not sure if what I'm asking is possible, but I'm looking for a way to
define minimum should match ("mm") only for few fields in a query.

And if possibile, is there any chance to configure this behaviour in a
request handler in solrconfig.xml?

Best regards,
Vincenzo


area of overlap in polygon intersection

2017-11-13 Thread Thaer Sammar
Hi all,

we are looking for a way to use solr functionality  to return the overlapping 
area result from polygon intersection. The intersection as far as we know will 
return all polygons that intersect with the given radius, but we are interested 
on the overlap part only. In our index, we store polygons using 
SpatialRecursivePrefixTreeFieldType. 

regards,
Thaer

Are in-place doc values field updates real time?

2017-11-13 Thread Samuel Tatipamula
Hello,

I was just wondering if the in-place doc values updates are real time in
Solr 7. Since these fields are neither indexed, nor stored, and are only
present in doc values space, are the updates to these fields real time?

I tried on my local Solr 7 instance, and the updates are becoming only
visible after a hard/soft commit, unless I am missing something obvious.

Can anyone please explain the internals of how this works?

Thanks,
Samuel