Re: does copyFields increase indexe size ?
> So what will be added is just another set of pointers to each relevant > term. That's not going to be very large. Probably Hi Shawn. This explains much ! Thanks. In case of text fields, the highlight is done on the source fields and the _text_ field is only used for lookup. This behavior is perfect for my needs. On Fri, Dec 27, 2019 at 05:28:25PM -0700, Shawn Heisey wrote: > On 12/26/2019 1:21 PM, Nicolas Paris wrote: > > Below a part of the managed-schema. There is 1k section* fields. The > > second experience, I removed the copyField, droped the collection and > > re-indexed the whole. To mesure the index size, I went to solr-cloud and > > looked in the cloud part: 40GO per shard. I also look at the folder > > size. I made some tests and the _text_ field is indexed. > > Your schema says that the destination field is not stored and doesn't have > docValues. So the only thing it has is indexed. > > All of the terms generated by index analysis will already be in the index > from the source fields. So what will be added is just another set of > pointers to each relevant term. That's not going to be very large. Probably > only a few bytes for each term. > > So with this copyField, the index will get larger, but probably not > significantly. > > Thanks, > Shawn > -- nicolas
Re: does copyFields increase indexe size ?
On 12/26/2019 1:21 PM, Nicolas Paris wrote: Below a part of the managed-schema. There is 1k section* fields. The second experience, I removed the copyField, droped the collection and re-indexed the whole. To mesure the index size, I went to solr-cloud and looked in the cloud part: 40GO per shard. I also look at the folder size. I made some tests and the _text_ field is indexed. Your schema says that the destination field is not stored and doesn't have docValues. So the only thing it has is indexed. All of the terms generated by index analysis will already be in the index from the source fields. So what will be added is just another set of pointers to each relevant term. That's not going to be very large. Probably only a few bytes for each term. So with this copyField, the index will get larger, but probably not significantly. Thanks, Shawn
Re: does copyFields increase indexe size ?
The field is stored somewhere > On Dec 26, 2019, at 3:22 PM, Nicolas Paris wrote: > > Hi Eric > > Below a part of the managed-schema. There is 1k section* fields. The > second experience, I removed the copyField, droped the collection and > re-indexed the whole. To mesure the index size, I went to solr-cloud and > looked in the cloud part: 40GO per shard. I also look at the folder > size. I made some tests and the _text_ field is indexed. > > multiValued="true"/> > multiValued="true"/> > > > positionIncrementGap="100"> > > > > > replacement=" " replace="all"/> > > > articles="lang/contractions_fr.txt"/> > > words="lang/stopwords_fr.txt" format="snowball" /> > > > > > synonyms="synonyms-fr.txt" ignoreCase="true" expand="true"/> > replacement=" " replace="all"/> > > > articles="lang/contractions_fr.txt"/> > > words="lang/stopwords_fr.txt" format="snowball" /> > > > > > > > > >> On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote: >> This simply cannot be true unless the destination copyField is >> indexed=false, docValues=false stored=false. I.e. “some circumstances” means >> there’s really no use in using the copyField in the first place. I suppose >> that if you don’t store any term vectors, no position information nothing >> except, say, the terms then maybe you’ll have extremely minimal size. But >> even in that case, I’d use the original field in an “fq” clause which >> doesn’t use any scoring in place of using the copyField. >> >> Each field is stored in a separate part of the relevant files (.tim, .pos, >> etc). Term frequencies are kept on a _per field_ basis for instance. >> >> So this pretty much has to be small sample size or other measurement error. >> >> Best, >> Erick >> On Dec 26, 2019, at 9:27 AM, Nicolas Paris wrote: >>> >>> Anyway, that´s good news copy field does not increase indexe size in >>> some circumstance: >>> - the copied fields and the target field share the same datatype >>> - the target field is not stored >>> >>> this is tested on text fields >>> >>> >>> On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote: On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: > #2 you initially said you were talking about 1k documents. Hi Dave. Again, sorry for the confusion. This is 1k fields (general_text), over 50M large documents copied into one _text_ field. 4 shards, 40GB per shard in both case, with/without the _text_ field > >> On Dec 25, 2019, at 3:07 AM, Nicolas Paris >> wrote: >> >> >>> >>> If you are redoing the indexing after changing the schema and >>> reloading/restarting, then you can ignore me. >> >> I am sorry to say that I have to ignore you. Indeed, my tests include >> recreating the collection from scratch - with and without the copy >> fields. >> In both cases the index size is the same ! (while the _text_ field is >> working correctly) >> >>> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: On 12/24/2019 5:11 PM, Nicolas Paris wrote: Do you mean "copy fields" is only an action of changing the schema ? I was thinking it was adding a new field and eventually a new index to the collection >>> >>> The copy that copyField does happens at index time. Reindexing is >>> required >>> after changing the schema, or nothing happens. >>> >>> If you are redoing the indexing after changing the schema and >>> reloading/restarting, then you can ignore me. >>> >>> Thanks, >>> Shawn >>> >> >> -- >> nicolas > -- nicolas >>> >>> -- >>> nicolas >> > > -- > nicolas
Re: does copyFields increase indexe size ?
Hi Eric Below a part of the managed-schema. There is 1k section* fields. The second experience, I removed the copyField, droped the collection and re-indexed the whole. To mesure the index size, I went to solr-cloud and looked in the cloud part: 40GO per shard. I also look at the folder size. I made some tests and the _text_ field is indexed. On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote: > This simply cannot be true unless the destination copyField is indexed=false, > docValues=false stored=false. I.e. “some circumstances” means there’s really > no use in using the copyField in the first place. I suppose that if you don’t > store any term vectors, no position information nothing except, say, the > terms then maybe you’ll have extremely minimal size. But even in that case, > I’d use the original field in an “fq” clause which doesn’t use any scoring in > place of using the copyField. > > Each field is stored in a separate part of the relevant files (.tim, .pos, > etc). Term frequencies are kept on a _per field_ basis for instance. > > So this pretty much has to be small sample size or other measurement error. > > Best, > Erick > > > On Dec 26, 2019, at 9:27 AM, Nicolas Paris wrote: > > > > Anyway, that´s good news copy field does not increase indexe size in > > some circumstance: > > - the copied fields and the target field share the same datatype > > - the target field is not stored > > > > this is tested on text fields > > > > > > On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote: > >> > >> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: > >>> #2 you initially said you were talking about 1k documents. > >> > >> Hi Dave. Again, sorry for the confusion. This is 1k fields > >> (general_text), over 50M large documents copied into one _text_ field. > >> 4 shards, 40GB per shard in both case, with/without the _text_ field > >> > >>> > On Dec 25, 2019, at 3:07 AM, Nicolas Paris > wrote: > > > > > > If you are redoing the indexing after changing the schema and > > reloading/restarting, then you can ignore me. > > I am sorry to say that I have to ignore you. Indeed, my tests include > recreating the collection from scratch - with and without the copy > fields. > In both cases the index size is the same ! (while the _text_ field is > working correctly) > > > On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: > >> On 12/24/2019 5:11 PM, Nicolas Paris wrote: > >> Do you mean "copy fields" is only an action of changing the schema ? > >> I was thinking it was adding a new field and eventually a new index to > >> the collection > > > > The copy that copyField does happens at index time. Reindexing is > > required > > after changing the schema, or nothing happens. > > > > If you are redoing the indexing after changing the schema and > > reloading/restarting, then you can ignore me. > > > > Thanks, > > Shawn > > > > -- > nicolas > >>> > >> > >> -- > >> nicolas > >> > > > > -- > > nicolas > -- nicolas
Re: does copyFields increase indexe size ?
This simply cannot be true unless the destination copyField is indexed=false, docValues=false stored=false. I.e. “some circumstances” means there’s really no use in using the copyField in the first place. I suppose that if you don’t store any term vectors, no position information nothing except, say, the terms then maybe you’ll have extremely minimal size. But even in that case, I’d use the original field in an “fq” clause which doesn’t use any scoring in place of using the copyField. Each field is stored in a separate part of the relevant files (.tim, .pos, etc). Term frequencies are kept on a _per field_ basis for instance. So this pretty much has to be small sample size or other measurement error. Best, Erick > On Dec 26, 2019, at 9:27 AM, Nicolas Paris wrote: > > Anyway, that´s good news copy field does not increase indexe size in > some circumstance: > - the copied fields and the target field share the same datatype > - the target field is not stored > > this is tested on text fields > > > On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote: >> >> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: >>> #2 you initially said you were talking about 1k documents. >> >> Hi Dave. Again, sorry for the confusion. This is 1k fields >> (general_text), over 50M large documents copied into one _text_ field. >> 4 shards, 40GB per shard in both case, with/without the _text_ field >> >>> On Dec 25, 2019, at 3:07 AM, Nicolas Paris wrote: > > If you are redoing the indexing after changing the schema and > reloading/restarting, then you can ignore me. I am sorry to say that I have to ignore you. Indeed, my tests include recreating the collection from scratch - with and without the copy fields. In both cases the index size is the same ! (while the _text_ field is working correctly) > On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: >> On 12/24/2019 5:11 PM, Nicolas Paris wrote: >> Do you mean "copy fields" is only an action of changing the schema ? >> I was thinking it was adding a new field and eventually a new index to >> the collection > > The copy that copyField does happens at index time. Reindexing is > required > after changing the schema, or nothing happens. > > If you are redoing the indexing after changing the schema and > reloading/restarting, then you can ignore me. > > Thanks, > Shawn > -- nicolas >>> >> >> -- >> nicolas >> > > -- > nicolas
Re: does copyFields increase indexe size ?
Anyway, that´s good news copy field does not increase indexe size in some circumstance: - the copied fields and the target field share the same datatype - the target field is not stored this is tested on text fields On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote: > > On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: > > #2 you initially said you were talking about 1k documents. > > Hi Dave. Again, sorry for the confusion. This is 1k fields > (general_text), over 50M large documents copied into one _text_ field. > 4 shards, 40GB per shard in both case, with/without the _text_ field > > > > > > On Dec 25, 2019, at 3:07 AM, Nicolas Paris > > > wrote: > > > > > > > > >> > > >> If you are redoing the indexing after changing the schema and > > >> reloading/restarting, then you can ignore me. > > > > > > I am sorry to say that I have to ignore you. Indeed, my tests include > > > recreating the collection from scratch - with and without the copy > > > fields. > > > In both cases the index size is the same ! (while the _text_ field is > > > working correctly) > > > > > >> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: > > >>> On 12/24/2019 5:11 PM, Nicolas Paris wrote: > > >>> Do you mean "copy fields" is only an action of changing the schema ? > > >>> I was thinking it was adding a new field and eventually a new index to > > >>> the collection > > >> > > >> The copy that copyField does happens at index time. Reindexing is > > >> required > > >> after changing the schema, or nothing happens. > > >> > > >> If you are redoing the indexing after changing the schema and > > >> reloading/restarting, then you can ignore me. > > >> > > >> Thanks, > > >> Shawn > > >> > > > > > > -- > > > nicolas > > > > -- > nicolas > -- nicolas
Re: does copyFields increase indexe size ?
On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: > #2 you initially said you were talking about 1k documents. Hi Dave. Again, sorry for the confusion. This is 1k fields (general_text), over 50M large documents copied into one _text_ field. 4 shards, 40GB per shard in both case, with/without the _text_ field > > > On Dec 25, 2019, at 3:07 AM, Nicolas Paris wrote: > > > > > >> > >> If you are redoing the indexing after changing the schema and > >> reloading/restarting, then you can ignore me. > > > > I am sorry to say that I have to ignore you. Indeed, my tests include > > recreating the collection from scratch - with and without the copy > > fields. > > In both cases the index size is the same ! (while the _text_ field is > > working correctly) > > > >> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: > >>> On 12/24/2019 5:11 PM, Nicolas Paris wrote: > >>> Do you mean "copy fields" is only an action of changing the schema ? > >>> I was thinking it was adding a new field and eventually a new index to > >>> the collection > >> > >> The copy that copyField does happens at index time. Reindexing is required > >> after changing the schema, or nothing happens. > >> > >> If you are redoing the indexing after changing the schema and > >> reloading/restarting, then you can ignore me. > >> > >> Thanks, > >> Shawn > >> > > > > -- > > nicolas > -- nicolas
Re: does copyFields increase indexe size ?
#1 merry Xmas thing #2 you initially said you were talking about 1k documents. That will not be a large enough sample size to see the index size differences with this new field, in any case the index size should never really matter. But if you go to a few million you will notice the size has increased by a good amount. Other things come into play like if the index was wiped clean with a commit before indexing or if it was reindexed with out, or if we are taking about documents that have a lot of similar words between them, so many other scenarios can increase or decrease the index. But no matter what if you have a copy field, the text is going somewhere > On Dec 25, 2019, at 3:07 AM, Nicolas Paris wrote: > > >> >> If you are redoing the indexing after changing the schema and >> reloading/restarting, then you can ignore me. > > I am sorry to say that I have to ignore you. Indeed, my tests include > recreating the collection from scratch - with and without the copy > fields. > In both cases the index size is the same ! (while the _text_ field is > working correctly) > >> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: >>> On 12/24/2019 5:11 PM, Nicolas Paris wrote: >>> Do you mean "copy fields" is only an action of changing the schema ? >>> I was thinking it was adding a new field and eventually a new index to >>> the collection >> >> The copy that copyField does happens at index time. Reindexing is required >> after changing the schema, or nothing happens. >> >> If you are redoing the indexing after changing the schema and >> reloading/restarting, then you can ignore me. >> >> Thanks, >> Shawn >> > > -- > nicolas
Re: does copyFields increase indexe size ?
> If you are redoing the indexing after changing the schema and > reloading/restarting, then you can ignore me. I am sorry to say that I have to ignore you. Indeed, my tests include recreating the collection from scratch - with and without the copy fields. In both cases the index size is the same ! (while the _text_ field is working correctly) On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: > On 12/24/2019 5:11 PM, Nicolas Paris wrote: > > Do you mean "copy fields" is only an action of changing the schema ? > > I was thinking it was adding a new field and eventually a new index to > > the collection > > The copy that copyField does happens at index time. Reindexing is required > after changing the schema, or nothing happens. > > If you are redoing the indexing after changing the schema and > reloading/restarting, then you can ignore me. > > Thanks, > Shawn > -- nicolas
Re: does copyFields increase indexe size ?
On 12/24/2019 5:11 PM, Nicolas Paris wrote: Do you mean "copy fields" is only an action of changing the schema ? I was thinking it was adding a new field and eventually a new index to the collection The copy that copyField does happens at index time. Reindexing is required after changing the schema, or nothing happens. If you are redoing the indexing after changing the schema and reloading/restarting, then you can ignore me. Thanks, Shawn
Re: does copyFields increase indexe size ?
> The action of changing the schema makes zero changes in the index. It > merely changes how Solr interacts with the index. Do you mean "copy fields" is only an action of changing the schema ? I was thinking it was adding a new field and eventually a new index to the collection On Tue, Dec 24, 2019 at 10:59:03AM -0700, Shawn Heisey wrote: > On 12/24/2019 10:45 AM, Nicolas Paris wrote: > > From my understanding, copy fields creates an new indexes from the > > copied fields. > > From my tests, I copied 1k textual fields into _text_ with copyFields. > > As a result there is no increase in the size of the collection. All the > > source fields are indexed and stored. The _text_ field is indexed but > > not stored. > > > > This is a great surprise but is this behavior expected ? > > The action of changing the schema makes zero changes in the index. It > merely changes how Solr interacts with the index. > > If you want the index to change when the schema is changed, you need to > restart or reload and then re-do the indexing after the change is saved. > > https://cwiki.apache.org/confluence/display/solr/HowToReindex > > Thanks, > Shawn > -- nicolas
Re: does copyFields increase indexe size ?
On 12/24/2019 10:45 AM, Nicolas Paris wrote: From my understanding, copy fields creates an new indexes from the copied fields. From my tests, I copied 1k textual fields into _text_ with copyFields. As a result there is no increase in the size of the collection. All the source fields are indexed and stored. The _text_ field is indexed but not stored. This is a great surprise but is this behavior expected ? The action of changing the schema makes zero changes in the index. It merely changes how Solr interacts with the index. If you want the index to change when the schema is changed, you need to restart or reload and then re-do the indexing after the change is saved. https://cwiki.apache.org/confluence/display/solr/HowToReindex Thanks, Shawn
does copyFields increase indexe size ?
Hi >From my understanding, copy fields creates an new indexes from the copied fields. >From my tests, I copied 1k textual fields into _text_ with copyFields. As a result there is no increase in the size of the collection. All the source fields are indexed and stored. The _text_ field is indexed but not stored. This is a great surprise but is this behavior expected ? -- nicolas