Re: does copyFields increase indexe size ?

2019-12-28 Thread Nicolas Paris


> So what will be added is just another set of pointers to each relevant
> term. That's not going to be very large. Probably

Hi Shawn. This explains much ! Thanks.
In case of text fields, the highlight is done on the source fields and
the _text_ field is only used for lookup. This behavior is perfect for
my needs.

On Fri, Dec 27, 2019 at 05:28:25PM -0700, Shawn Heisey wrote:
> On 12/26/2019 1:21 PM, Nicolas Paris wrote:
> > Below a part of the managed-schema. There is 1k section* fields. The
> > second experience, I removed the copyField, droped the collection and
> > re-indexed the whole. To mesure the index size, I went to solr-cloud and
> > looked in the cloud part: 40GO per shard. I also look at the folder
> > size. I made some tests and the _text_ field is indexed.
> 
> Your schema says that the destination field is not stored and doesn't have
> docValues.  So the only thing it has is indexed.
> 
> All of the terms generated by index analysis will already be in the index
> from the source fields.  So what will be added is just another set of
> pointers to each relevant term.  That's not going to be very large. Probably
> only a few bytes for each term.
> 
> So with this copyField, the index will get larger, but probably not
> significantly.
> 
> Thanks,
> Shawn
> 

-- 
nicolas


Re: does copyFields increase indexe size ?

2019-12-27 Thread Shawn Heisey

On 12/26/2019 1:21 PM, Nicolas Paris wrote:

Below a part of the managed-schema. There is 1k section* fields. The
second experience, I removed the copyField, droped the collection and
re-indexed the whole. To mesure the index size, I went to solr-cloud and
looked in the cloud part: 40GO per shard. I also look at the folder
size. I made some tests and the _text_ field is indexed.


Your schema says that the destination field is not stored and doesn't 
have docValues.  So the only thing it has is indexed.


All of the terms generated by index analysis will already be in the 
index from the source fields.  So what will be added is just another set 
of pointers to each relevant term.  That's not going to be very large. 
Probably only a few bytes for each term.


So with this copyField, the index will get larger, but probably not 
significantly.


Thanks,
Shawn


Re: does copyFields increase indexe size ?

2019-12-26 Thread David Hastings
The field is stored somewhere 

> On Dec 26, 2019, at 3:22 PM, Nicolas Paris  wrote:
> 
> Hi Eric
> 
> Below a part of the managed-schema. There is 1k section* fields. The
> second experience, I removed the copyField, droped the collection and
> re-indexed the whole. To mesure the index size, I went to solr-cloud and
> looked in the cloud part: 40GO per shard. I also look at the folder
> size. I made some tests and the _text_ field is indexed.
> 
> multiValued="true"/> 
> multiValued="true"/>
>
> 
> positionIncrementGap="100">
> 
>
>  
> 
> replacement=" " replace="all"/>
>  
>
> articles="lang/contractions_fr.txt"/>
>
> words="lang/stopwords_fr.txt" format="snowball" />
>
>  
>  
>  
> synonyms="synonyms-fr.txt" ignoreCase="true" expand="true"/>
> replacement=" " replace="all"/>
>  
>
> articles="lang/contractions_fr.txt"/>
>
> words="lang/stopwords_fr.txt" format="snowball" />
>
>  
>
> 
> 
> 
> 
> 
>> On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote:
>> This simply cannot be true unless the destination copyField is 
>> indexed=false, docValues=false stored=false. I.e. “some circumstances” means 
>> there’s really no use in using the copyField in the first place. I suppose 
>> that if you don’t store any term vectors, no position information nothing 
>> except, say, the terms then maybe you’ll have extremely minimal size. But 
>> even in that case, I’d use the original field in an “fq” clause which 
>> doesn’t use any scoring in place of using the copyField.
>> 
>> Each field is stored in a separate part of the relevant files (.tim, .pos, 
>> etc). Term frequencies are kept on a _per field_ basis for instance.
>> 
>> So this pretty much has to be small sample size or other measurement error.
>> 
>> Best,
>> Erick
>> 
 On Dec 26, 2019, at 9:27 AM, Nicolas Paris  
 wrote:
>>> 
>>> Anyway, that´s good news copy field does not increase indexe size in
>>> some circumstance:
>>> - the copied fields and the target field share the same datatype
>>> - the target field is not stored
>>> 
>>> this is tested on text fields
>>> 
>>> 
>>> On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote:
 
 On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote:
> #2 you initially said you were talking about 1k documents. 
 
 Hi Dave. Again, sorry for the confusion. This is 1k fields
 (general_text), over 50M large  documents copied into one _text_ field. 
 4 shards, 40GB per shard in both case, with/without the _text_ field
 
> 
>> On Dec 25, 2019, at 3:07 AM, Nicolas Paris  
>> wrote:
>> 
>> 
>>> 
>>> If you are redoing the indexing after changing the schema and
>>> reloading/restarting, then you can ignore me.
>> 
>> I am sorry to say that I have to ignore you. Indeed, my tests include
>> recreating the collection from scratch - with and without the copy
>> fields.
>> In both cases the index size is the same ! (while the _text_ field is
>> working correctly)
>> 
>>> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
 On 12/24/2019 5:11 PM, Nicolas Paris wrote:
 Do you mean "copy fields" is only an action of changing the schema ?
 I was thinking it was adding a new field and eventually a new index to
 the collection
>>> 
>>> The copy that copyField does happens at index time.  Reindexing is 
>>> required
>>> after changing the schema, or nothing happens.
>>> 
>>> If you are redoing the indexing after changing the schema and
>>> reloading/restarting, then you can ignore me.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
>> -- 
>> nicolas
> 
 
 -- 
 nicolas
 
>>> 
>>> -- 
>>> nicolas
>> 
> 
> -- 
> nicolas


Re: does copyFields increase indexe size ?

2019-12-26 Thread Nicolas Paris
Hi Eric

Below a part of the managed-schema. There is 1k section* fields. The
second experience, I removed the copyField, droped the collection and
re-indexed the whole. To mesure the index size, I went to solr-cloud and
looked in the cloud part: 40GO per shard. I also look at the folder
size. I made some tests and the _text_ field is indexed.

 















  
  









  






On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote:
> This simply cannot be true unless the destination copyField is indexed=false, 
> docValues=false stored=false. I.e. “some circumstances” means there’s really 
> no use in using the copyField in the first place. I suppose that if you don’t 
> store any term vectors, no position information nothing except, say, the 
> terms then maybe you’ll have extremely minimal size. But even in that case, 
> I’d use the original field in an “fq” clause which doesn’t use any scoring in 
> place of using the copyField.
> 
> Each field is stored in a separate part of the relevant files (.tim, .pos, 
> etc). Term frequencies are kept on a _per field_ basis for instance.
> 
> So this pretty much has to be small sample size or other measurement error.
> 
> Best,
> Erick
> 
> > On Dec 26, 2019, at 9:27 AM, Nicolas Paris  wrote:
> > 
> > Anyway, that´s good news copy field does not increase indexe size in
> > some circumstance:
> > - the copied fields and the target field share the same datatype
> > - the target field is not stored
> > 
> > this is tested on text fields
> > 
> > 
> > On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote:
> >> 
> >> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote:
> >>> #2 you initially said you were talking about 1k documents. 
> >> 
> >> Hi Dave. Again, sorry for the confusion. This is 1k fields
> >> (general_text), over 50M large  documents copied into one _text_ field. 
> >> 4 shards, 40GB per shard in both case, with/without the _text_ field
> >> 
> >>> 
>  On Dec 25, 2019, at 3:07 AM, Nicolas Paris  
>  wrote:
>  
>  
> > 
> > If you are redoing the indexing after changing the schema and
> > reloading/restarting, then you can ignore me.
>  
>  I am sorry to say that I have to ignore you. Indeed, my tests include
>  recreating the collection from scratch - with and without the copy
>  fields.
>  In both cases the index size is the same ! (while the _text_ field is
>  working correctly)
>  
> > On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
> >> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
> >> Do you mean "copy fields" is only an action of changing the schema ?
> >> I was thinking it was adding a new field and eventually a new index to
> >> the collection
> > 
> > The copy that copyField does happens at index time.  Reindexing is 
> > required
> > after changing the schema, or nothing happens.
> > 
> > If you are redoing the indexing after changing the schema and
> > reloading/restarting, then you can ignore me.
> > 
> > Thanks,
> > Shawn
> > 
>  
>  -- 
>  nicolas
> >>> 
> >> 
> >> -- 
> >> nicolas
> >> 
> > 
> > -- 
> > nicolas
> 

-- 
nicolas


Re: does copyFields increase indexe size ?

2019-12-26 Thread Erick Erickson
This simply cannot be true unless the destination copyField is indexed=false, 
docValues=false stored=false. I.e. “some circumstances” means there’s really no 
use in using the copyField in the first place. I suppose that if you don’t 
store any term vectors, no position information nothing except, say, the terms 
then maybe you’ll have extremely minimal size. But even in that case, I’d use 
the original field in an “fq” clause which doesn’t use any scoring in place of 
using the copyField.

Each field is stored in a separate part of the relevant files (.tim, .pos, 
etc). Term frequencies are kept on a _per field_ basis for instance.

So this pretty much has to be small sample size or other measurement error.

Best,
Erick

> On Dec 26, 2019, at 9:27 AM, Nicolas Paris  wrote:
> 
> Anyway, that´s good news copy field does not increase indexe size in
> some circumstance:
> - the copied fields and the target field share the same datatype
> - the target field is not stored
> 
> this is tested on text fields
> 
> 
> On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote:
>> 
>> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote:
>>> #2 you initially said you were talking about 1k documents. 
>> 
>> Hi Dave. Again, sorry for the confusion. This is 1k fields
>> (general_text), over 50M large  documents copied into one _text_ field. 
>> 4 shards, 40GB per shard in both case, with/without the _text_ field
>> 
>>> 
 On Dec 25, 2019, at 3:07 AM, Nicolas Paris  
 wrote:
 
 
> 
> If you are redoing the indexing after changing the schema and
> reloading/restarting, then you can ignore me.
 
 I am sorry to say that I have to ignore you. Indeed, my tests include
 recreating the collection from scratch - with and without the copy
 fields.
 In both cases the index size is the same ! (while the _text_ field is
 working correctly)
 
> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
>> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
>> Do you mean "copy fields" is only an action of changing the schema ?
>> I was thinking it was adding a new field and eventually a new index to
>> the collection
> 
> The copy that copyField does happens at index time.  Reindexing is 
> required
> after changing the schema, or nothing happens.
> 
> If you are redoing the indexing after changing the schema and
> reloading/restarting, then you can ignore me.
> 
> Thanks,
> Shawn
> 
 
 -- 
 nicolas
>>> 
>> 
>> -- 
>> nicolas
>> 
> 
> -- 
> nicolas



Re: does copyFields increase indexe size ?

2019-12-26 Thread Nicolas Paris
Anyway, that´s good news copy field does not increase indexe size in
some circumstance:
- the copied fields and the target field share the same datatype
- the target field is not stored

this is tested on text fields


On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote:
> 
> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote:
> > #2 you initially said you were talking about 1k documents. 
> 
> Hi Dave. Again, sorry for the confusion. This is 1k fields
> (general_text), over 50M large  documents copied into one _text_ field. 
> 4 shards, 40GB per shard in both case, with/without the _text_ field
> 
> > 
> > > On Dec 25, 2019, at 3:07 AM, Nicolas Paris  
> > > wrote:
> > > 
> > > 
> > >> 
> > >> If you are redoing the indexing after changing the schema and
> > >> reloading/restarting, then you can ignore me.
> > > 
> > > I am sorry to say that I have to ignore you. Indeed, my tests include
> > > recreating the collection from scratch - with and without the copy
> > > fields.
> > > In both cases the index size is the same ! (while the _text_ field is
> > > working correctly)
> > > 
> > >> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
> > >>> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
> > >>> Do you mean "copy fields" is only an action of changing the schema ?
> > >>> I was thinking it was adding a new field and eventually a new index to
> > >>> the collection
> > >> 
> > >> The copy that copyField does happens at index time.  Reindexing is 
> > >> required
> > >> after changing the schema, or nothing happens.
> > >> 
> > >> If you are redoing the indexing after changing the schema and
> > >> reloading/restarting, then you can ignore me.
> > >> 
> > >> Thanks,
> > >> Shawn
> > >> 
> > > 
> > > -- 
> > > nicolas
> > 
> 
> -- 
> nicolas
> 

-- 
nicolas


Re: does copyFields increase indexe size ?

2019-12-25 Thread Nicolas Paris


On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote:
> #2 you initially said you were talking about 1k documents. 

Hi Dave. Again, sorry for the confusion. This is 1k fields
(general_text), over 50M large  documents copied into one _text_ field. 
4 shards, 40GB per shard in both case, with/without the _text_ field

> 
> > On Dec 25, 2019, at 3:07 AM, Nicolas Paris  wrote:
> > 
> > 
> >> 
> >> If you are redoing the indexing after changing the schema and
> >> reloading/restarting, then you can ignore me.
> > 
> > I am sorry to say that I have to ignore you. Indeed, my tests include
> > recreating the collection from scratch - with and without the copy
> > fields.
> > In both cases the index size is the same ! (while the _text_ field is
> > working correctly)
> > 
> >> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
> >>> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
> >>> Do you mean "copy fields" is only an action of changing the schema ?
> >>> I was thinking it was adding a new field and eventually a new index to
> >>> the collection
> >> 
> >> The copy that copyField does happens at index time.  Reindexing is required
> >> after changing the schema, or nothing happens.
> >> 
> >> If you are redoing the indexing after changing the schema and
> >> reloading/restarting, then you can ignore me.
> >> 
> >> Thanks,
> >> Shawn
> >> 
> > 
> > -- 
> > nicolas
> 

-- 
nicolas


Re: does copyFields increase indexe size ?

2019-12-25 Thread Dave
#1 merry Xmas thing 
#2 you initially said you were talking about 1k documents.  That will not be a 
large enough sample size to see the index size differences with this new field, 
in any case the index size should never really matter.  But if you go to a few 
million you will notice the size has increased by a good amount. Other things 
come into play like if the index was wiped clean with a commit before indexing 
or if it was reindexed with out, or if we are taking about documents that have 
a lot of similar words between them, so many other scenarios can increase or 
decrease the index. But no matter what if you have a copy field, the text is 
going somewhere 

> On Dec 25, 2019, at 3:07 AM, Nicolas Paris  wrote:
> 
> 
>> 
>> If you are redoing the indexing after changing the schema and
>> reloading/restarting, then you can ignore me.
> 
> I am sorry to say that I have to ignore you. Indeed, my tests include
> recreating the collection from scratch - with and without the copy
> fields.
> In both cases the index size is the same ! (while the _text_ field is
> working correctly)
> 
>> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
>>> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
>>> Do you mean "copy fields" is only an action of changing the schema ?
>>> I was thinking it was adding a new field and eventually a new index to
>>> the collection
>> 
>> The copy that copyField does happens at index time.  Reindexing is required
>> after changing the schema, or nothing happens.
>> 
>> If you are redoing the indexing after changing the schema and
>> reloading/restarting, then you can ignore me.
>> 
>> Thanks,
>> Shawn
>> 
> 
> -- 
> nicolas


Re: does copyFields increase indexe size ?

2019-12-25 Thread Nicolas Paris
> If you are redoing the indexing after changing the schema and
> reloading/restarting, then you can ignore me.

I am sorry to say that I have to ignore you. Indeed, my tests include
recreating the collection from scratch - with and without the copy
fields.
In both cases the index size is the same ! (while the _text_ field is
working correctly)

On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
> > Do you mean "copy fields" is only an action of changing the schema ?
> > I was thinking it was adding a new field and eventually a new index to
> > the collection
> 
> The copy that copyField does happens at index time.  Reindexing is required
> after changing the schema, or nothing happens.
> 
> If you are redoing the indexing after changing the schema and
> reloading/restarting, then you can ignore me.
> 
> Thanks,
> Shawn
> 

-- 
nicolas


Re: does copyFields increase indexe size ?

2019-12-24 Thread Shawn Heisey

On 12/24/2019 5:11 PM, Nicolas Paris wrote:

Do you mean "copy fields" is only an action of changing the schema ?
I was thinking it was adding a new field and eventually a new index to
the collection


The copy that copyField does happens at index time.  Reindexing is 
required after changing the schema, or nothing happens.


If you are redoing the indexing after changing the schema and 
reloading/restarting, then you can ignore me.


Thanks,
Shawn


Re: does copyFields increase indexe size ?

2019-12-24 Thread Nicolas Paris
> The action of changing the schema makes zero changes in the index.  It
> merely changes how Solr interacts with the index.

Do you mean "copy fields" is only an action of changing the schema ?
I was thinking it was adding a new field and eventually a new index to
the collection

On Tue, Dec 24, 2019 at 10:59:03AM -0700, Shawn Heisey wrote:
> On 12/24/2019 10:45 AM, Nicolas Paris wrote:
> >  From my understanding, copy fields creates an new indexes from the
> > copied fields.
> >  From my tests, I copied 1k textual fields into _text_ with copyFields.
> > As a result there is no increase in the size of the collection. All the
> > source fields are indexed and stored. The _text_ field is indexed but
> > not stored.
> > 
> > This is a great surprise but is this behavior expected ?
> 
> The action of changing the schema makes zero changes in the index.  It
> merely changes how Solr interacts with the index.
> 
> If you want the index to change when the schema is changed, you need to
> restart or reload and then re-do the indexing after the change is saved.
> 
> https://cwiki.apache.org/confluence/display/solr/HowToReindex
> 
> Thanks,
> Shawn
> 

-- 
nicolas


Re: does copyFields increase indexe size ?

2019-12-24 Thread Shawn Heisey

On 12/24/2019 10:45 AM, Nicolas Paris wrote:

 From my understanding, copy fields creates an new indexes from the
copied fields.
 From my tests, I copied 1k textual fields into _text_ with copyFields.
As a result there is no increase in the size of the collection. All the
source fields are indexed and stored. The _text_ field is indexed but
not stored.

This is a great surprise but is this behavior expected ?


The action of changing the schema makes zero changes in the index.  It 
merely changes how Solr interacts with the index.


If you want the index to change when the schema is changed, you need to 
restart or reload and then re-do the indexing after the change is saved.


https://cwiki.apache.org/confluence/display/solr/HowToReindex

Thanks,
Shawn


does copyFields increase indexe size ?

2019-12-24 Thread Nicolas Paris
Hi

>From my understanding, copy fields creates an new indexes from the
copied fields.
>From my tests, I copied 1k textual fields into _text_ with copyFields.
As a result there is no increase in the size of the collection. All the
source fields are indexed and stored. The _text_ field is indexed but
not stored.

This is a great surprise but is this behavior expected ?


-- 
nicolas