[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2017-03-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925671#comment-15925671
 ] 

David Smiley commented on SOLR-10255:
-

SOLR-10286 "Declare a field as large, don't keep value in the document cache".  
It incorporates some pieces of this patch.
I'm not sure if/when I'll return to this BinaryDocValuesField approach.

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2017-03-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925070#comment-15925070
 ] 

David Smiley commented on SOLR-10255:
-

Update: I'm tracking down a bug that might even exist without this patch 
concerning RealtimeGet and an IndexReader "AlreadyClosedException". It seems 
RTG can ask SolrIndexSearcher for a Document, then close the searcher soon 
after, and then try to serialize the document later on.  But if the document 
contains an IndexableField that is lazy (be it the current 
LazyDocument.LazyField) or any other similar concoction (be it BinaryDocValues 
or whatever) then the index will be closed and it's too late.  It seems the 
LargeLazyField here is more likely to provoke this situation because it's 
especially lazy.

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2017-03-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906431#comment-15906431
 ] 

David Smiley commented on SOLR-10255:
-

I think I want to scale back the scope of what I plan to do in the short term 
(for 6.5), while still allowing a BinaryDocValuesField (with compressed 
codec/format) in the future.

I'll file a separate issue for this and try and throw up a patch Monday:  The 
field can remain stored but go _last_ (and the DocumentBuilder can ensure this 
happens).  The last stored value will _not_ be read from disk (well first 16KB 
but whatever) due to LUCENE-6898.  On the {{SolrIndexSearcher.doc(...)}} side, 
the "lazy" fields will get a LargeAlwaysDiskLazyField but one that goes to 
stored value not BinaryDocValues.

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2017-03-09 Thread Michael Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903271#comment-15903271
 ] 

Michael Braun commented on SOLR-10255:
--

We have a use case that would be improved by this. 

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org