[jira] Updated: (LUCENE-1313) Realtime Search

2009-06-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1313:
---

Fix Version/s: (was: 2.9)
   3.1

OK let's push it to 3.1.  It's very much in progress, but 1) the iterations are 
slow (it's a big patch), 2) it's a biggish change so I'd prefer to it shortly 
after a release, not shortly before, so it has plenty of time to bake on 
trunk.

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch


 Enable near realtime search in Lucene without external
 dependencies. When RAM NRT is enabled, the implementation adds a
 RAMDirectory to IndexWriter. Flushes go to the ramdir unless
 there is no available space. Merges are completed in the ram
 dir until there is no more available ram. 
 IW.optimize and IW.commit flush the ramdir to the primary
 directory, all other operations try to keep segments in ram
 until there is no more space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-06-04 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

* RAM buffer size is stored in the writer rather than set into
DocumentsWriter. This is due to the actual ram buffer limit in
NRT changing depending on the size of the ramdir. 

* NRTMergePolicy and IW.resolveRAMSegments merges all ram dir
segments to primaryDir (i.e. disk) when the ramDir is over
totalMax, or any new merges would put ramDir over totalMax.

* In DocumentsWriter we have a set limit on the buffer size
which is (tempMax - ramDirSize)/2. This keeps the total ram used
under the totalMax (or IW.maxBufferSize), while also keeping our
temporary ram usage under the tempMax amount. When DW.ramBuffer
limit is reached, it's auto flushed to the ramDir.

* All tests pass except TestIndexWriterRAMDir.testFSDirectory.
Will look into this further. When flushToRAM is on by default,
there seems to be deadlock in
org.apache.lucene.TestMergeSchedulerExternal, however when I
tried to see if there is any via jconsole by setting
ANT_OPTS=-Dcom.sun.management.jmxremote I didn't see any. I'm
not sure if this is due to not connecting to the right process?
Or something else.

* Added testReadDocuments which insures we can read documents
we've flushed to disk. This essentially tests our ability to
simultaneously read and write documents to and from the
docstore. It seemd to work on Windows.

* I think there's more that can be done to more accurately
manage the RAM however I think the way it works is a good
starting point.



 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch


 Enable near realtime search in Lucene without external
 dependencies. When RAM NRT is enabled, the implementation adds a
 RAMDirectory to IndexWriter. Flushes go to the ramdir unless
 there is no available space. Merges are completed in the ram
 dir until there is no more available ram. 
 IW.optimize and IW.commit flush the ramdir to the primary
 directory, all other operations try to keep segments in ram
 until there is no more space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-05-19 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

 * All tests pass, added more tests

* Added DocumentsWriter.growRamBufferBy/growRamDirMaxBy methods
that allow dynamically requesting more ram. We start off at
50/50, ramdir/rambuffer. Then whenever one needs more, grow* is
called.  

* We need a RAMPolicy class that allows customizing how ram is
allocated. Currently the ramdir and the rambuffer compete for
space, the user will presumably want to customize this.

* I'm not sure the flushing always occurs when it should, and
not sure yet how to test to insure it's flushing when it should
(other than watching a log). What happened to the adding logging
to Lucene patch? 

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-05-19 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Description: 
Enable near realtime search in Lucene without external
dependencies. When RAM NRT is enabled, the implementation adds a
RAMDirectory to IndexWriter. Flushes go to the ramdir unless
there is no available space. Merges are completed in the ram
dir until there is no more available ram. 

IW.optimize and IW.commit flush the ramdir to the primary
directory, all other operations try to keep segments in ram
until there is no more space.

  was:
Realtime search with transactional semantics.  

Possible future directions:
  * Optimistic concurrency
  * Replication

Encoding each transaction into a set of bytes by writing to a RAMDirectory 
enables replication.  It is difficult to replicate using other methods because 
while the document may easily be serialized, the analyzer cannot.

I think this issue can hold realtime benchmarks which include indexing and 
searching concurrently.


 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch


 Enable near realtime search in Lucene without external
 dependencies. When RAM NRT is enabled, the implementation adds a
 RAMDirectory to IndexWriter. Flushes go to the ramdir unless
 there is no available space. Merges are completed in the ram
 dir until there is no more available ram. 
 IW.optimize and IW.commit flush the ramdir to the primary
 directory, all other operations try to keep segments in ram
 until there is no more space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-05-11 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

* A single merge scheduler is used. We will need to open a new
issue for a version of ConcurrentMergeScheduler that allocates
threads perhaps based on the merge.directory? We'd also probably
want to add thread pooling.

* There's a package protected IW ctor that accepts the ram dir.
This is used in the test case for insuring we aren't creating
.cfs files in the ram dir.

* IW.optimize merges all segments (ram included) to the primary
dir

* IW.expungeDeletes merges segments with deletes, in ram ones
stay in ram (unless they won't fit), and primary dir ones are
handled as usual

* Added testOptimize, testExpungeDeletes, and some other test
cases

* Needs a test case to make sure we're merging to the primary
dir when the ram dir is full or a flush won't fit in the ram dir

* There's a mergeRamSegmentsToDir and resolveRamSegments. Two
different methods because mergeRamSegmentsToDir operates by
simply scheduling merges, resolveRamSegments operates in the
foreground like resolveExternalSegments. I'm not sure if we can
combine the two. resolveRamSegments seems to have a thread
notification problem and so hangs at times. I'll look into this
further unless it's obvious what the problem is.

* When RAM NRT is on (via the IndexWriter constructor), setting
the ram buffer size allocates half of the given number to the
DocumentsWriter buffer and half to the ram dir. It may be best
to dynamically change these numbers based on usage etc.

* Added NRTMergePolicy which is used only when RAM NRT is on. It
utilizes the regular merge policy and the ram merge policy.

* The ram dir size is pushed to DocumentsWriter

* RAMMergePolicy extends LogDocMergePolicy and defaults the
useCompoundFile and useCompoundDocStore to false

* Sorry for the whitespace stuff, I'll clean it up later, I
wanted to post the latest to get feedback


 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-05-04 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

* In DocumentsWriter.balanceRAM if NRT is on the total ram
consumed is (numBytesUsed * 2) + writer.getRamDirSize().
numBytesUsed is the current consumption of the ram buffer.
Basically what we flush to ram, we'll consume that much of the
buffer. This is now taken into account in the bufferIsFull
calculation.

* Double dir usage should be factored out.

* TestIndexWriterRamDir.testFSDirectory fails. It tries to
simulate a crashing IW. When the IW is created again it should
delete the old files, for some reason it's not with FSDirectory
(open file handles on Windows perhaps)

{quote} we could flush the new segment directly to the real dir
as one segment, and merge all prior RAM segments as a separate
new segment in the main dir, if the free RAM is large enough.
{quote}

Yeah it's unclear what the best policy is here. Do we want to
have some sort of custom merge policy method/class to take care
of this so the user can customize it?


 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-05-01 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

* IndexFileDeleter takes into account the ram directory (which
when using NRT with the FSD caused files to not be found). 

* FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
files to the primary directory (which is the same as
IW.directory). LUCENE-1618 needs to be updated with these
changes (or we simply include it in this patch as the
LUCENE-1618 patch is only a couple of files).

* Removed DocumentsWriter.ramOverLimit

* I think we need to give the option of a ram mergescheduler
because the user may want not want the ram merging and disk
merging to compete for threads. I'm thinking if of the use case
where NRT is a priority then one may allocate more threads to
the ram CMS and less to the disk CMS. This also gives us the
option of trying out more parameters when performing benchmarks
of NRT.

* We may want to default the ram mergepolicy to not use compound
files as it's not useful when using a ram dir?

* Because FSD uses IW.directory, FSD will list files that
originated from FSD and from IW.directory, we may want to keep
track of which files are supposed to be in FSD (from the
underlying primary dir) and which are not?

{quote}If NRT is never used, the behavior of IW should be
unchanged (which is not the case w/ this patch I think). RAMDir
should be created the first time a flush is done due to NRT
creation. {quote}

In the patch if ramdir is not passed in, the behavior of IW
remains the same as it is today. You're saying we should have IW
create the ramdir by default after getReader is called and
remove the IW ramdir constructor? What if the user has an
alternative ramdir implementation they want to use?

{quote}StoredFieldsWriter  TermVectorsTermsWriter now writes to
IndexWriter.getFlushDirectory(), which is confusing because that
method returns the RAMDir if set? Shouldn't this be the
opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
we should change getFlushDiretory to NOT return the
ramdir?){quote}

The attached patch uses FileSwitchDirectory, where these files
are written to the primary directory (IW.directory). So
getFlushDirectory is ok?

{quote}Why did you need to add synchronized to some of the
SegmentInfo files methods? (What breaks if you undo that?). The
contract here is IW protects access to SegmentInfo/s{quote}

SegmentInfo.files was being cleared while sizeInBytes was called
which resulted in an NPE. The alternative is sync IW in
IW.size(SegmentInfos) which seems a bit extreme just to obtain
the size of a segment info?

{quote}The MergePolicy needs some smarts when it's dealing w/
RAM. EG it should not do a merge of more than XXX% of total RAM
usage (should flush to the real directory instead){quote}

Isn't this handled well enough in updatePendingMerges or is
there more that needs to be done?

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-30 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

{quote} Would you re-use MergePolicy, or make a new
RAMMergePolicy? {quote}

MergePolicy is used as is with a special IW method that handles
merging ram segments for the real directory (which has an issue
around merging contiguous segments, can that be relaxed in this
case as I don't understand why this is?)

The patch is not committable, however I am posting it to show a
path that seems to work. It includes test cases for merging in
ram and merging to the real directory.

* IW.getFlushDirectory is used by internal calls to obtain the
directory to flush segments to. This is used in DocumentsWriter
related calls.

* DocumentsWriter.directory is removed so that methods requiring
the directory call IW.getFlushDirectory instead.

* IW.setRAMDirectory sets the ram directory to be used.

* IW.setRAMMergePolicy sets the merge policy to be used for
merging segments on the ram dir.

* In IW.updatePendingMerges totalRamUsed is the size of the ram
segments + the ram buffer used. If totalRamUsed exceeds the max
ram buffer size then IW. updatePendingRamMergesToRealDir is
called.

* IW. updatePendingRamMergesToRealDir registers a merge of the
ram segments to the real directory (currently causes a
non-contiguous segments exception)

* MergePolicy.OneMerge has a directory attribute used when
building the merge.info in _mergeInit.

* Test case includes testMergeInRam, testMergeToDisk,
testMergeRamExceeded

There is one error that occurs regularly in testMergeRamExceeded
{code} MergePolicy selected non-contiguous segments to merge
(_bo:cx83 _bm:cx4 _bn:cx2 _bl:cx1-_bj _bp:cx1-_bp _bq:cx1-_bp
_c2:cx1-_c2 _c3:cx1-_c2 _c4:cx1-_c2 vs _5x:c120 _6a:c8
_6t:c11 _bo:cx83** _bm:cx4** _bn:cx2** _bl:cx1-_bj**
_bp:cx1-_bp** _bq:cx1-_bp** _c1:c10 _c2:cx1-_c2**
_c3:cx1-_c2** _c4:cx1-_c2**), which IndexWriter (currently)
cannot handle {code} 

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-30 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

Fixed and cleaned up more.

All tests pass

Added entry in CHANGES.txt

I'm going to integrate LUCENE-1618 and test that out as a part of the next 
patch.

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-17 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

I added an IndexWriter.getRAMIndex method that returns a
RAMIndex object that can be updated and flushed to the
underlying writer. I think this is better than adding more
methods to IndexWriter and it separates out the logic of the RAM
based near realtime index and the rest of IW.

Package protected IW.addIndexesNoOptimize(DirectoryIndexReader[]
readers) is added which is used by RAMIndex.flush. I thought
this functionality could work for LUCENE-1589 as a public
method, however because of the way IndexWriter performs merges
using segment infos, handling generic IndexReader classes (which
may not use segmentinfos) would then be difficult in the
addIndexesNoOptimize case.

I think RAMIndex.flush to the underlying writer is not
synchronized. If the IW is using ConcurrentMergeScheduler then
the heavy lifting is performed in the background and so should
not delay adding more documents to the RAMIndex.

IW.getReader returns the normal IW reader and the RAMIndex
reader if there is one.

The RAMIndex writer can be obtained and modified directly as
opposed to duplicating the setter methods of IndexWriter such as
setMergeScheduler.

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-07 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.jar

Latest realtime code, transactions are removed. 

* Needs to be benchmarked

* There could be concurrency issues around deletes that occur
while directories are being flushed to disk. 

* It's Java JARed to include the files and directory structure.
The patch relies on LUCENE-1516 which if included would make the
changes incomprehensible





 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
 lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-01 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

  Component/s: (was: contrib/*)
   Index
Fix Version/s: 2.9
 Priority: Minor  (was: Major)
  Description: 
Realtime search with transactional semantics.  

Possible future directions:
  * Optimistic concurrency
  * Replication

Encoding each transaction into a set of bytes by writing to a RAMDirectory 
enables replication.  It is difficult to replicate using other methods because 
while the document may easily be serialized, the analyzer cannot.

I think this issue can hold realtime benchmarks which include indexing and 
searching concurrently.

  was:
Provides realtime search using Lucene.  Conceptually, updates are divided into 
discrete transactions.  The transaction is recorded to a transaction log which 
is similar to the mysql bin log.  Deletes from the transaction are made to the 
existing indexes.  Document additions are made to an in memory 
InstantiatedIndex.  The transaction is then complete.  After each transaction 
TransactionSystem.getSearcher() may be called which allows searching over the 
index including the latest transaction.

TransactionSystem is the main class.  Methods similar to IndexWriter are 
provided for updating.  getSearcher returns a Searcher class. 

- getSearcher()
- addDocument(Document document)
- addDocument(Document document, Analyzer analyzer)
- updateDocument(Term term, Document document)
- updateDocument(Term term, Document document, Analyzer analyzer)
- deleteDocument(Term term)
- deleteDocument(Query query)
- commitTransaction(ListDocument documents, Analyzer analyzer, ListTerm 
deleteByTerms, ListQuery deleteByQueries)

Sample code:

{code}
// setup
FSDirectoryMap directoryMap = new FSDirectoryMap(new File(/testocean), log);
LogDirectory logDirectory = directoryMap.getLogDirectory();
TransactionLog transactionLog = new TransactionLog(logDirectory);
TransactionSystem system = new TransactionSystem(transactionLog, new 
SimpleAnalyzer(), directoryMap);

// transaction
Document d = new Document();
d.add(new Field(contents, hello world, Field.Store.YES, 
Field.Index.TOKENIZED));
system.addDocument(d);

// search
OceanSearcher searcher = system.getSearcher();
ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
System.out.println(hits.length +  total results);
for (int i = 0; i  hits.length  i  10; i++) {
  Document d = searcher.doc(hits[i].doc);
  System.out.println(i +   + hits[i].score+   + d.get(contents);
}
{code}

There is a test class org.apache.lucene.ocean.TestSearch that was used for 
basic testing.  

A sample disk directory structure is as follows:

|/snapshot_105_00.xml | XML file containing which indexes and their generation 
numbers correspond to a snapshot.  Each transaction creates a new snapshot 
file.  In this file the 105 is the snapshotid, also known as the transactionid. 
 The 00 is the minor version of the snapshot corresponding to a merge.  A merge 
is a minor snapshot version because the data does not change, only the 
underlying structure of the index|
|/3 | Directory containing an on disk Lucene index|
|/log | Directory containing log files|
|/log/log0001.bin | Log file.  As new log files are created the suffix 
number is incremented|



Affects Version/s: 2.4.1
  Summary: Realtime Search  (was: Ocean Realtime Search)

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-01 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1313:
-

Attachment: LUCENE-1313.patch

The patch includes RealtimeIndex a basic class for performing atomic
transactional realtime indexing and search. A single thread
periodically flushes to disk the ram index. It relies on
LUCENE-1516.

We need to benchmark this, specifically 1) realtime w/ramdir
transaction 2) realtime w/queued documents transaction 3) normal
indexing. Realtime w/ramdir encodes the transaction to a
RAMDirectory which is added to the RAM writer using
IW.addIndexesNoOptimize. Option 1 may be slower than option 2,
however if the system is replicating it may be the only option?

Long term I believe we need to implement searching over the
IndexWriter ram buffer (if possible). However I am not sure how
option 2 would work with it?

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org