[jira] Issue Comment Edited: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-26 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915130#action_12915130
 ] 

Jason Rutherglen edited comment on LUCENE-2575 at 9/27/10 1:44 AM:
---

Here are the new parallel arrays.  It seems like something went wrong and there 
are too many, however I think each is required.

{code}
final int[] skipStarts; // address where the term's skip list starts (for 
reading)
final int[] skipAddrs; // where writing left off
final int[] sliceAddrs; // the start addr of the last posting slice
final byte[] sliceLevels; // posting slice levels
final int[] skipLastDoc; // last skip doc written
final int[] skipLastAddr; // last skip addr written
{code}

In regards to writing into the skip list the start address of
the first level 9 posting slice: Because we're writing vints
into the posting slices, and vints may span more than 1 byte, we
may (and this has happened in testing) write a vint that spans
slices, so if we record the last slice address and read a vint
from that point, we'll get an incorrect vint. If we start 1+
bytes into a slice, we will not know where the slice ends
(because we are assuming they're 200 bytes in length). Perhaps
in the slice address parallel array we can somehow encode the
first slice's length, or add yet another parallel array for the
length of the first slice.  Something to think about.

  was (Author: jasonrutherglen):
Here are the new parallel arrays.  It seems like something went wrong and 
there are too many, however I think each is required.

{code}
final int[] skipStarts; // address where the term's skip list starts (for 
reading)
final int[] skipAddrs; // where writing left off
final int[] sliceAddrs; // the start addr of the last posting slice
final byte[] sliceLevels; // posting slice levels
final int[] skipLastDoc; // last skip doc written
final int[] skipLastAddr; // last skip addr written
{code}

In regards to writing into the skip list the start address of
the first level 9 posting slice: Because we're writing vints
into the posting slices, and vints may span more than 1 byte, we
may (and this has happened in testing) write a vint that spans
slices, so if we record the last slice address and read a vint
from that point, we'll get an incorrect vint. If we start 1+
bytes into a slice, we will not know where the slice ends
(because we are assuming they're 200 bytes in length). Perhaps
in the slice address parallel array we can somehow encode the
first slice's length, or add yet another parallel array for the
length of the first slice.  Something to think about.

We can't simply read
ahead 200 bytes (ie, level 9), nor can
  
> Concurrent byte and int block implementations
> -
>
> Key: LUCENE-2575
> URL: https://issues.apache.org/jira/browse/LUCENE-2575
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-24 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914607#action_12914607
 ] 

Jason Rutherglen edited comment on LUCENE-2575 at 9/24/10 3:28 PM:
---

The current MultiLevelSkipList* system relies on writing out
fixed length skip list buffers before they are readable. This
obviously will not work for RT so I'm working on modifying MLSL
into new class(es) that writes and reads from the concurrent-ish
BBP. 

In trunk, each level is a RAMOutputStream, that'll need to
change, and each level will likely be a stream keyed into
the BBP. A question is whether we will statically assign the
number of levels prior to the creation of the MLSL, or will we
need to somehow make the number of levels dynamic, in which case
using streams becomes slightly more complicated.



  was (Author: jasonrutherglen):
The current MultiLevelSkipList* system relies on writing out
fixed length skip list buffers before they are readable. This
obviously will not work for RT so I'm working on modifying MLSL
into new class(es) that writes and reads from the concurrent-ish
BBP. 

In trunk, each level is a RAMOutputStream, that'll nee to
changechange, and each level will likely be a stream keyed into
the BBP. A question is whether we will statically assign the
number of levels prior to the creation of the MLSL, or will we
need to somehow make the number of levels dynamic, in which case
using streams becomes slightly more complicated.


  
> Concurrent byte and int block implementations
> -
>
> Key: LUCENE-2575
> URL: https://issues.apache.org/jira/browse/LUCENE-2575
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org