[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-272:
---

Attachment: SolrInputDoc.patch

This is an alternative version of SolrDocument that only creates Collections 
for mulitvalued fields... The one big difference to Yoniks suggestion above is 
that it returns a Collection for getFieldValues() even if it is a 
single valued field.  

Running the perf test for 1M docs 5 times for each implementation:

[100] SolrInputDocument:   9992  9827  9823  9854  9948  
[100] SolrInputDocument2:  9636   9719  9699  9807  9729
[100] DocumentBuilder: 8866   8818  8946  8812  8953

To be honest, I'm not sure the complexity of dealing with a Map 
(where the Object may be a collection or not) is worth the marginal speedup.  I 
suppose if the docs are all single valued it would be a more substantial 
difference.

> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, 
> SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-27 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-272:
--

Attachment: SolrDocumentPerformanceTester.java

Attraching the modified test prog I used.
I modified it to accept separate counts, and do separate runs for the different 
implementations.
For example, 10 0 0 and 0 0 1
This was to avoid any GC effects from one implementation to another, and to 
avoid hotspot optimizing for one path and then having a different 
implementation switch to a different path.

The SolrInputDocument builder also needed that change from setField to addField 
to be equivalent.

> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, 
> SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-26 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-272:
--

Attachment: SolrInputDoc.patch

> With this test, the SolrInputDocument wins every time

Not once you correct the bugs ;-)

- copyField was not being done in the SolrInputDocument version
- setField was being used the for the multiValued field instead of addField, 
resulting in fewer fields.

I modified the schema (didn't work out of the box) and removed everything that 
didn't have to do with the fields in the document (partially because copyField 
wasn't implemented).

On my P4, SolrInputDocument comes in at 14% slower I don't know how it 
would be with all the copyField and dynamicField stuff in there.  There are 
certainly scenarios were it could be faster since it can do a single lookup for 
a multivalued field.



> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-272:
---

Attachment: SolrDocumentPerformanceTester.java

Since the LuceneInputDocument is an obvious looser, I removed that from the 
test.  

I also:
* removed Random from the mix -- makes the tests inconsistent
* test simple and complex docs.
  > simple is just the id
  > complex is id + name + dynamic field + 10 subjects, the subjects each have 
a copyField to 'text'

With this test, the SolrInputDocument wins every time:  

[10]2043 :: 0.02043 mili/doc :: SolrInputDocument - true
[10]2193 :: 0.02193 mili/doc :: DocumentBuilder - true
[100]   15815 :: 0.015815 mili/doc :: SolrInputDocument - true
[100]   19223 :: 0.019223 mili/doc :: DocumentBuilder - true
[1000]  6228 :: 0.000623 mili/doc :: SolrInputDocument - false
[1000]  17263 :: 0.001726 mili/doc :: DocumentBuilder - false



> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-272:
---

Attachment: SOLR-272-SolrDocumentPerformanceTesting.patch

dooh.  I was not resetting the time after each run

> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-272:
---

Attachment: SOLR-272-SolrDocumentPerformanceTesting.patch

Contains:
* LuceneInputDocument
* changed tests to use this impl (and still pass)
* a simple comparison test (far from a perfect representation)

> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.