Re: SolrInputDocument setField method

2019-06-26 Thread Mark Sholund
I noticed this yesterday as well. The toString() and jsonStr() (in later 
versions) of SolrJ both include things like

toString(): 
{id=id=[foo123](https://www.nga.mil/careers/studentopp/Pages/default.aspx), ...}
or
jsonStr(): 
{"id":"id=[foo123](https://www.nga.mil/careers/studentopp/Pages/default.aspx)",...}

However Solr does not reject the documents so this must just be an issue with 
the two methods.

On Wed, Jun 26, 2019 at 12:31 PM, Samuel Kasimalla  wrote:

> Hi Vicenzo,
>
> May be looking at the overridden toString() would give you a clue.
>
> The second part, I don't think SolrJ holds it it twice(if you are worried
> about redundant usage of memory), BUT if you haven't used SolrJ so far and
> wanted to know if this is the format in which it pushes to Solr, I'm pretty
> sure it doesn't push this format into Solr.
>
> Thanks,
> Sam
> https://www.linkedin.com/in/skasimalla
>
> On Wed, Jun 26, 2019 at 11:52 AM Vincenzo D'Amore 
> wrote:
>
>> Hi all,
>>
>> I have a very basic question related to the SolrInputDocument behaviour.
>>
>> Looking at SolrInputDocument source code I found how the method setField
>> works:
>>
>> public void setField(String name, Object value )
>> {
>> SolrInputField field = new SolrInputField( name );
>> _fields.put( name, field );
>> field.setValue( value );
>> }
>>
>> The field name is "duplicated" into the SolrInputField.
>>
>> For example, if I'm storing a field "color" with value "red" what we have
>> is a Map like this:
>>
>> { "key" : "color", "value" : { "name" : "color", "value" : "red" } }
>>
>> the name field "color" appears twice. Very likely there is a reason for
>> this, could you please point me in the right direction?
>>
>> For example, I'm worried about at what happens with SolrJ when I'm sending
>> a lot of documents, where for each field the fieldName is sent twice.
>>
>> Thanks,
>> Vincenzo
>>
>>
>> --
>> Vincenzo D'Amore
>>-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: SolrInputDocument setField method

2019-06-26 Thread Shawn Heisey

On 6/26/2019 9:52 AM, Vincenzo D'Amore wrote:

I have a very basic question related to the SolrInputDocument behaviour.

Looking at SolrInputDocument source code I found how the method setField
works:

   public void setField(String name, Object value )
   {
 SolrInputField field = new SolrInputField( name );
 _fields.put( name, field );
 field.setValue( value );
   }

The field name is "duplicated" into the SolrInputField.


What this does is creates an entirely new SolrInputField object -- one 
that does not have a value.  Then it puts that object into a map of all 
fields for this document.  Then it assigns the value directly to the 
Field object, which is already inside the map.


Side note:  The "put" method used there will replace any existing field 
with the same name, turning that field object into garbage that Java 
will eventually collect.


If there is already an existing Field object in the document's map 
object with the same name, it will likely have no references, so the 
garbage collector will eventually collect that object and its component 
objects.


The only duplication I can see here is that both the inner field object 
and the outer map contain the name of the field.  Unless you have a 
really huge number of fields, this would not have a significant impact 
on the amount of memory required.


The map object (_fields) that basically represents the whole document 
needs *something* to map each entry.  The field name is convenient and 
relevant.  It is also usually a fairly short string.


It is likely that other code that uses a SolrInputField object will only 
have that object, not the map, so the name of the field must be in the 
field object.


It is probably possible to achieve slightly better memory efficiency by 
switching the internal implementation from Map to List or Set ... but it 
would make SolrInputDocument MUCH less efficient in other ways, 
including the setField method you have quoted above.  I do not think it 
would be a worthwhile trade.


Thanks,
Shawn


Re: SolrInputDocument setField method

2019-06-26 Thread Samuel Kasimalla
Hi Vicenzo,

May be looking at the overridden toString() would give you a clue.

The second part, I don't think SolrJ holds it it twice(if you are worried
about redundant usage of memory), BUT if you haven't used SolrJ so far and
wanted to know if this is the format in which it pushes to Solr, I'm pretty
sure it doesn't push this format into Solr.

Thanks,
Sam
https://www.linkedin.com/in/skasimalla

On Wed, Jun 26, 2019 at 11:52 AM Vincenzo D'Amore 
wrote:

> Hi all,
>
> I have a very basic question related to the SolrInputDocument behaviour.
>
> Looking at SolrInputDocument source code I found how the method setField
> works:
>
>   public void setField(String name, Object value )
>   {
> SolrInputField field = new SolrInputField( name );
> _fields.put( name, field );
> field.setValue( value );
>   }
>
> The field name is "duplicated" into the SolrInputField.
>
> For example, if I'm storing a field "color" with value "red"  what we have
> is a Map like this:
>
> { "key" : "color", "value" : { "name" : "color", "value" : "red" } }
>
> the name field "color" appears twice. Very likely there is a reason for
> this, could you please point me in the right direction?
>
> For example, I'm worried about at what happens with SolrJ when I'm sending
> a lot of documents, where for each field the fieldName is sent twice.
>
> Thanks,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
>


SolrInputDocument setField method

2019-06-26 Thread Vincenzo D'Amore
Hi all,

I have a very basic question related to the SolrInputDocument behaviour.

Looking at SolrInputDocument source code I found how the method setField
works:

  public void setField(String name, Object value )
  {
SolrInputField field = new SolrInputField( name );
_fields.put( name, field );
field.setValue( value );
  }

The field name is "duplicated" into the SolrInputField.

For example, if I'm storing a field "color" with value "red"  what we have
is a Map like this:

{ "key" : "color", "value" : { "name" : "color", "value" : "red" } }

the name field "color" appears twice. Very likely there is a reason for
this, could you please point me in the right direction?

For example, I'm worried about at what happens with SolrJ when I'm sending
a lot of documents, where for each field the fieldName is sent twice.

Thanks,
Vincenzo


-- 
Vincenzo D'Amore