[jira] [Updated] (LUCENE-5513) Binary DocValues Updates

2014-03-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5513:
---

Attachment: LUCENE-5513.patch

Thanks Mike. In an edge case where there are field updates, but also deletes, 
such that all of the updated documents were deleted, I created the DVFUpdates 
instances prematurely, leading to the NPE. Patch fixes this as well as 
integrated BDV updates in TestIWExceptions.testNoLostDeletesOrUpdates.

> Binary DocValues Updates
> 
>
> Key: LUCENE-5513
> URL: https://issues.apache.org/jira/browse/LUCENE-5513
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, 
> LUCENE-5513.patch
>
>
> LUCENE-5189 was a great move toward. I wish to continue. The reason for 
> having this feature is to have "join-index" - to write children docnums into 
> parent's binaryDV. I can try to proceed the implementation, but I'm not so 
> experienced in such deep Lucene internals. [~shaie], any hint to begin with 
> is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5513) Binary DocValues Updates

2014-03-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5513:
---

Attachment: LUCENE-5513.patch

Fixed stupid bug in BinaryDocValuesFieldUpdates.merge().

> Binary DocValues Updates
> 
>
> Key: LUCENE-5513
> URL: https://issues.apache.org/jira/browse/LUCENE-5513
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch
>
>
> LUCENE-5189 was a great move toward. I wish to continue. The reason for 
> having this feature is to have "join-index" - to write children docnums into 
> parent's binaryDV. I can try to proceed the implementation, but I'm not so 
> experienced in such deep Lucene internals. [~shaie], any hint to begin with 
> is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5513) Binary DocValues Updates

2014-03-16 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5513:
---

Attachment: LUCENE-5513.patch

Patch makes the following refactoring changes (all internal API):

* DocValuesUpdate abstract class w/ common implementation for 
NumericDocValuesUpdate and BinaryDocValuesUpdate.

* DocValuesFieldUpdates hold the doc+updates for a single field. It mostly 
defines the API for the Numeric* and Binary* implementations.

* DocValuesFieldUpdates.Container holds numeric+binary updates for a set of 
fields. It is as its name says -- a container of updates used by 
ReaderAndUpdates.
** It helps not bloat the API w/ more maps being passed as well as simplified 
BufferedUpdatesStream and IndexWriter.commitMergedDeletes.
** It also serves as a factory method based on the updates Type

* Finished TestBinaryDVUpdates

* Added TestMixedDVUpdates which ports some of the 'big' tests from both 
TestNDV/BDVUpdates and mixes some NDV and BDV updates.
** I'll beast it some to make sure all edge cases are covered.

I may take a crack at simplifying IW.commitMergedDeletes even more by pulling a 
lot of duplicate code into a method. This is impossible now because those 
sections modify more than one state variables, but I'll try to stuff these 
variables in a container to make this method more sane to read.

Otherwise, I think it's ready.

> Binary DocValues Updates
> 
>
> Key: LUCENE-5513
> URL: https://issues.apache.org/jira/browse/LUCENE-5513
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-5513.patch, LUCENE-5513.patch
>
>
> LUCENE-5189 was a great move toward. I wish to continue. The reason for 
> having this feature is to have "join-index" - to write children docnums into 
> parent's binaryDV. I can try to proceed the implementation, but I'm not so 
> experienced in such deep Lucene internals. [~shaie], any hint to begin with 
> is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5513) Binary DocValues Updates

2014-03-13 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5513:
---

Attachment: LUCENE-5513.patch

Patch:

* Add IW.updateBinaryDocValue
* Makes necessary changes to DW, BufferedUpdates(Stream), ReaderAndUpdates
* Add new BinaryUpdate and BinaryFieldUpdates
* Copied TestNumericDocValuesUpdates and changed to add BDV fields:
** I still add numbers as it makes asserting easy, but I encode them as VLongs, 
so we get different lengths of byte[]
** There are some tests still disabled, see below

Patch still doesn't handle updates that came in while a merge was in flight. 
The reason is that the code in IW.commitMergedDeletes is hairy and adding 
BinaryDV updates will make it even worse. So I want to refactor how the updates 
are represented internally, such that there is a single class DocValuesUpdates 
which captures all DV updates. Since the DV fields cannot overlap (a DV field 
cannot be both numeric and binary), I think it will allow us to use a single 
UpdatesIterator from IW.commitMergedDeletes. I'll take a look at that next and 
re-enable the tests after this has been resolved.

There's one thing to note -- binary DV updates are more expensive to apply than 
NDV updates, depends on the length of the BDV. I.e. when we rewrite the DV 
file, then for NDV we know we write at most 8 bytes per document (compressed), 
but for BDV we may write a large number of bytes per document. I prefer to 
leave that as an optimization for later. One idea I have (which applies to NDV 
as well) is to do that in a sparse DV, or add "stacked" DV files. Either will 
make the producing code more complex, and therefore I prefer to explore that 
later.

> Binary DocValues Updates
> 
>
> Key: LUCENE-5513
> URL: https://issues.apache.org/jira/browse/LUCENE-5513
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-5513.patch
>
>
> LUCENE-5189 was a great move toward. I wish to continue. The reason for 
> having this feature is to have "join-index" - to write children docnums into 
> parent's binaryDV. I can try to proceed the implementation, but I'm not so 
> experienced in such deep Lucene internals. [~shaie], any hint to begin with 
> is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5513) Binary DocValues Updates

2014-03-09 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-5513:
-

Component/s: core/index
Description: LUCENE-5189 was a great move toward. I wish to continue. The 
reason for having this feature is to have "join-index" - to write children 
docnums into parent's binaryDV. I can try to proceed the implementation, but 
I'm not so experienced in such deep Lucene internals. [~shaie], any hint to 
begin with is much appreciated. 

> Binary DocValues Updates
> 
>
> Key: LUCENE-5513
> URL: https://issues.apache.org/jira/browse/LUCENE-5513
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> LUCENE-5189 was a great move toward. I wish to continue. The reason for 
> having this feature is to have "join-index" - to write children docnums into 
> parent's binaryDV. I can try to proceed the implementation, but I'm not so 
> experienced in such deep Lucene internals. [~shaie], any hint to begin with 
> is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org