[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-272: --- Attachment: SolrInputDoc.patch This is an alternative version of SolrDocument that only creates Collections for mulitvalued fields... The one big difference to Yoniks suggestion above is that it returns a Collection for getFieldValues() even if it is a single valued field. Running the perf test for 1M docs 5 times for each implementation: [100] SolrInputDocument: 9992 9827 9823 9854 9948 [100] SolrInputDocument2: 9636 9719 9699 9807 9729 [100] DocumentBuilder: 8866 8818 8946 8812 8953 To be honest, I'm not sure the complexity of dealing with a Map (where the Object may be a collection or not) is worth the marginal speedup. I suppose if the docs are all single valued it would be a more substantial difference. > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, > SolrInputDoc.patch, SolrInputDoc.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-272: -- Attachment: SolrDocumentPerformanceTester.java Attraching the modified test prog I used. I modified it to accept separate counts, and do separate runs for the different implementations. For example, 10 0 0 and 0 0 1 This was to avoid any GC effects from one implementation to another, and to avoid hotspot optimizing for one path and then having a different implementation switch to a different path. The SolrInputDocument builder also needed that change from setField to addField to be equivalent. > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, > SolrInputDoc.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-272: -- Attachment: SolrInputDoc.patch > With this test, the SolrInputDocument wins every time Not once you correct the bugs ;-) - copyField was not being done in the SolrInputDocument version - setField was being used the for the multiValued field instead of addField, resulting in fewer fields. I modified the schema (didn't work out of the box) and removed everything that didn't have to do with the fields in the document (partially because copyField wasn't implemented). On my P4, SolrInputDocument comes in at 14% slower I don't know how it would be with all the copyField and dynamicField stuff in there. There are certainly scenarios were it could be faster since it can do a single lookup for a multivalued field. > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java, SolrInputDoc.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-272: --- Attachment: SolrDocumentPerformanceTester.java Since the LuceneInputDocument is an obvious looser, I removed that from the test. I also: * removed Random from the mix -- makes the tests inconsistent * test simple and complex docs. > simple is just the id > complex is id + name + dynamic field + 10 subjects, the subjects each have a copyField to 'text' With this test, the SolrInputDocument wins every time: [10]2043 :: 0.02043 mili/doc :: SolrInputDocument - true [10]2193 :: 0.02193 mili/doc :: DocumentBuilder - true [100] 15815 :: 0.015815 mili/doc :: SolrInputDocument - true [100] 19223 :: 0.019223 mili/doc :: DocumentBuilder - true [1000] 6228 :: 0.000623 mili/doc :: SolrInputDocument - false [1000] 17263 :: 0.001726 mili/doc :: DocumentBuilder - false > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-272: --- Attachment: SOLR-272-SolrDocumentPerformanceTesting.patch dooh. I was not resetting the time after each run > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-272: --- Attachment: SOLR-272-SolrDocumentPerformanceTesting.patch Contains: * LuceneInputDocument * changed tests to use this impl (and still pass) * a simple comparison test (far from a perfect representation) > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.