[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796581#comment-16796581 ] Noble Paul commented on SOLR-13312: --- bq.So pseudo code based on your response yeah, pretty much, that is how it should work bq.That might work if the codec/code for writing out responses only ever iterates linearly through the document anyway... That's what response writers do anyway. bq.One might even imagine a composition based strategy with an .optimizeFieldAccess() method that flips the map yeah, we can have a whitelist of methods which can be accessed without creating the Map. say, * {{forEach()}} * {{writeMap()}}, * {{getFieldValue())}} * {{getFirstValue()}} if any other method is invoked, we can lazily construct the Map based structure that we use today > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796571#comment-16796571 ] Gus Heck commented on SOLR-13312: - That interface sounds interesting. So pseudo code based on your response: {code:java} docInter = wrapOrConvert(luceneDoc, have_transformers) for(transformers:t) t.transform(docInter) sendResponse(convertDocToWt(wt, docInter){code} That might work if the codec/code for writing out responses only ever iterates linearly through the document anyway... which seems likely for writing a response. If the interface provides direct field access, the performance of field access would vary depending on the impl behind it, one favoring memory at the expense of cpu the other favoring cpu at the expense of memory (for cases expecting lots of direct field access). Certain use cases (low mem systems) might want to force the tradeoff regardless. One might even imagine a composition based strategy with an .optimizeFieldAccess() method that flips the map based backing implementation on by swapping in a SolrDocument as a new delegate on demand, so that transformers that do nothing but add one more field don't have to require the more memory expensive implementation either. Maybe convert the current SolrDocument class to an inner class of a wrapper that takes it's name, and that wrapper that can delegate either to a lucene doc or the current impl. Then have an optimizeForFiledAccess() method that code in transfomrers (or elsewhere) can call to hint that a map based backing may be helpful for performance (I imagine perhaps allowing a sysprop or config setting to deny this request for memory constrained systems or systems handling documents with very few fields). A new constructor would create the lucene backed version, and the existing constructors create one backed by maps as before... Certain methods such as "getFieldValuesAsMap()" might automatically cause conversion... Just a thought. > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796466#comment-16796466 ] Noble Paul commented on SOLR-13312: --- bq. By Transformer, you mean DocumentTransformer (e.g. fl=id,name,score,[shard] to add the shard info)? Well, the problem is that we have the DocTransformers working on a concrete class called {{SolrDocument}} . We should be able to make transformers work on an interface. So , any DocTransformer which implements the new interface can possibly work. The most common ones that we ship today can cut over to the interface. bq. When you say "write it out" do you mean directly generating JSON/XML/javabin? I think javabin would requrie creating a SolrDocument If not, the client side changes too. No. the output format remains same. There will be zero changes. So, even an older client should have no problem in communicating with a new Solr bq. but I it seems that javabin is shipping a serialized SolrDocument... Javabin is a serialization/deserialization format. it is very well possible to construct that format without creating an Object. > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796454#comment-16796454 ] Gus Heck commented on SOLR-13312: - By Transformer, you mean DocumentTransformer (e.g. fl=id,name,score,[shard] to add the shard info)? So most vanilla query use cases don't use those, including those underpinning streaming expression searches? When you say "write it out" do you mean directly generating JSON/XML/javabin? I think javabin would requrie creating a SolrDocument If not, the client side changes too... (though perhaps shipping the SolrDocument creation load to the solrj client will be of benefit) I'm not an expert on the codec having never had cause to work with it directly, but I it seems that javabin is shipping a serialized SolrDocument (org.apache.solr.common.util.JavaBinCodec#readSolrDocument). If the binary wire format changes you probably are proposing javabin2? (or lucenebin?) in that case it becomes slightly confusing since adding ,[shard] could require the wt param to change. Perhaps I'm entirely misinterpreting what you said. A patch would probably clarify. I'm not for or against this yet, but the description seems short. > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794346#comment-16794346 ] Noble Paul commented on SOLR-13312: --- Tradeoffs? If you have a transformer it won't use this mode. If not, it should directly write out. There will be no difference for anything else. We are not giving up anything. The changes will not affect any other part of the system. What I mean to say is, the changes are not cross cutting. The performance delta will be measured after a PoC is written. We will see how much of an improvement we get and go further from there > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794311#comment-16794311 ] Gus Heck commented on SOLR-13312: - Improving memory efficiency seems nice, but like all optimization there are likely to be trade offs. What use cases will you be evaluating for memory, CPU, and overall response time? If we can win on all fronts with a variety of use cases, (withou making dev too difficult) great, but if were giving up something, some of the time we need to know. This sounds like a pretty cross cutting change that could effect many things > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789077#comment-16789077 ] Noble Paul commented on SOLR-13312: --- [~dsmiley] SolrDocument is an expensive data structure. Yes, we may need a more efficient data structure to actually accomplish this. HashMaps are extremely memory inefficient. Skipping transformation is something we can't do now without backward incompatibility. We can probably rewrite Transformers like ChildDocTransformer to adapt to the new format. We may need to create the SolrDocument objects where other transformers are used. But most requests never use any transformers. They are paying a huge price > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13312) write out responses without creating SolrDocument objects
[ https://issues.apache.org/jira/browse/SOLR-13312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789008#comment-16789008 ] David Smiley commented on SOLR-13312: - If I'm not mistaken, I believe a major purpose of SolrDocument is to map a field to a list of values. Lucene's Document has it all flat and in no particular order. Wouldn't this be more difficult to work with? I'm skeptical how much value there is in skipping the transformation as well. > write out responses without creating SolrDocument objects > - > > Key: SOLR-13312 > URL: https://issues.apache.org/jira/browse/SOLR-13312 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Priority: Major > > Once we get a document from lucene there is no need to create a SolrDocument > object to write out the response, if there are no transformers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org