from:"Jason Rutherglen \(JIRA\)"

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-06-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291803#comment-13291803
 ] 

Jason Rutherglen commented on SOLR-2242:


Terrance, can you post a patch to the Jira?  It makes sense to start this Jira 
off non-distributed, and add a distributed version in another Jira issue...

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, 
 SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2569) Enable facile moving of cores

2012-06-08 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen resolved SOLR-2569.


Resolution: Won't Fix

 Enable facile moving of cores
 -

 Key: SOLR-2569
 URL: https://issues.apache.org/jira/browse/SOLR-2569
 Project: Solr
  Issue Type: Improvement
  Components: multicore, replication (java)
Affects Versions: 4.0
Reporter: Jason Rutherglen

 Spin-off from this thread: 
 http://search-lucene.com/m/5CO7Z1oOrh6/elastic+searchsubj=Solr+vs+ElasticSearch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109599#comment-13109599
 ] 

Jason Rutherglen commented on LUCENE-3441:
--

It would be great if the cost of (re)opening a new LTR is.  Also an explanation 
of what it's doing underneath.

 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2778) Revise distributed code inside SearchComponents

2011-09-19 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108316#comment-13108316
]

Jason Rutherglen commented on SOLR-2778:

Sweet-ness.com!

Revise distributed code inside SearchComponents
---

Key: SOLR-2778
URL: https://issues.apache.org/jira/browse/SOLR-2778
Project: Solr
Issue Type: Improvement
Reporter: Martijn van Groningen

The distributed code inside search components such as QueryComponent and
FacetComponent is complex. By structuring responsibilities
the code becomes less complex and easier to understand. There is already a
start for this that was part of distributed grouping (SOLR-2066).
The following concepts were developed inside QueryComponent for SOLR-2066:
* ShardRequestFactory is responsible for creating requests to shards in the
cluster based on the incoming request from the client.
* ShardResultTransformer. Transforming a NamedList response from the client
in for example SearchGroup or TopDocs instance.
* ShardResponseProcessor. Basically merges the shard responses. The
ShardReponseProcessor uses a ShardResultTransformer to transform the shard
response into a native structure (SearchGroup / TopGroups).
These concepts are now only used for distributed grouping, but I think can
also be used for non grouped distributed search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-15 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105474#comment-13105474
 ] 

Jason Rutherglen commented on SOLR-2066:


+1 on Concepts that can also be used for non grouped distributed searches in 
a separate issue.  The Solr distributed search code is overly complicated.

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-09-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104079#comment-13104079
 ] 

Jason Rutherglen commented on LUCENE-3433:
--

This is somewhat funny, as it seems the opinion has changed on MMap'ing and the 
potential for page faults:

http://www.lucidimagination.com/search/document/8951a336dffa9535/storing_and_loading_the_fst_directly_from_disk#8951a336dffa9535

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-09-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104165#comment-13104165
 ] 

Jason Rutherglen commented on LUCENE-3433:
--

Here's another thread discussing MMap'ing and field caches, where the consensus 
is against it:

http://www.lucidimagination.com/search/document/70623ef5879bca38/fst_and_fieldcache#45006a7fe2847c09
 posted in 1969-12-31 19:00 :)

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101391#comment-13101391
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

There are many important use cases for immediate / zero delay index readers.

I'm not sure if people realize it, but one of the major gains from this issue, 
is the ability to obtain a reader after every indexed document.  

In this case, instead of performing an array copy of the RT data structures, we 
will queue the changes, and then apply to the new reader.  For arrays like term 
freqs, we will use a temp hash map of the changes made since the main array was 
created (when the hash map grows too large we can perform a full array copy).



 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
 LUCENE-2312.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2700) transaction logging

2011-09-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099138#comment-13099138
 ] 

Jason Rutherglen commented on SOLR-2700:


I'm not sure how this feature makes any sense, the documents are already being 
serialized to disk, eg, to the docstore by StoredFieldsWriter.  Now the system 
will be serializing the exact same documents twice, that is extremely 
redundant.  

 transaction logging
 ---

 Key: SOLR-2700
 URL: https://issues.apache.org/jira/browse/SOLR-2700
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
 SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch


 A transaction log is needed for durability of updates, for a more performant 
 realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2700) transaction logging

2011-09-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099239#comment-13099239
 ] 

Jason Rutherglen commented on SOLR-2700:


This is going to best be amazing, I wonder if other projects have already 
implemented these features years ago?

 transaction logging
 ---

 Key: SOLR-2700
 URL: https://issues.apache.org/jira/browse/SOLR-2700
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
 SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch


 A transaction log is needed for durability of updates, for a more performant 
 realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2748) autocommit commits too many times

2011-09-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099722#comment-13099722
 ] 

Jason Rutherglen commented on SOLR-2748:


Seeing all of the bugs related to the Solr NRT code, I can't help but wonder 
why the 4.x version of the project needs to be backward compatible.  

Also why it's not using IndexReaderWarmer which was ostensibly created 
precisely for Solr's usage (and, it's not used in Solr and never has been).

 autocommit commits too many times
 -

 Key: SOLR-2748
 URL: https://issues.apache.org/jira/browse/SOLR-2748
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Attachments: SOLR-2748.patch


 autocommit seems to commit more frequently than configured.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-05 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097246#comment-13097246
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

I started integrating the patch into LUCENE-2312.  I think the main 
functionality missing is a reverse int[] that points from a term id to the 
sorted ords array.  That array would be used for implementing the RT version of 
DocTermsIndex, where a doc id - term id - sorted term id index.  

 Add non-desctructive sort to BytesRefHash
 -

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
 LUCENE-3199.patch


 Currently the BytesRefHash is destructive.  We can add a method that returns 
 a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-05 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097257#comment-13097257
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

Ok, solved the above comment by taking the sorted ord array and building a new 
reverse array from that... 

 Add non-desctructive sort to BytesRefHash
 -

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
 LUCENE-3199.patch


 Currently the BytesRefHash is destructive.  We can add a method that returns 
 a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3199:
-

Attachment: LUCENE-3199.patch

This is a minor update when compared with the last patch.  

It adds the option of pruning the [oversized] int[] returned by the compact 
method.  

Added are additional unit tests.

 Add non-desctructive sort to BytesRefHash
 -

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3199.patch, LUCENE-3199.patch


 Currently the BytesRefHash is destructive.  We can add a method that returns 
 a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-02 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096108#comment-13096108
]

Jason Rutherglen commented on LUCENE-3199:
--

Simon,

In summary this is using the BytesRefHash sort, performing array copies and
then merge [sorting] into a new copy / view.

Array copies are fast and counter intuitively generate far less garbage than
objects (in Java).

Instead of creating term 'segments' that would be merged while iterating the
terms enum, we'll be generating static point-in-time terms dict views. These
will be useful for enabling DocTermsIndex field caches for RT, the only
remaining design 'challenge' for RT / LUCENE-2312. Because there is a terms
hash, we can seek exact to the term rather than perform an [optimized] seek to
the term.

Add non-desctructive sort to BytesRefHash
-

Key: LUCENE-3199
URL: https://issues.apache.org/jira/browse/LUCENE-3199
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch,
LUCENE-3199.patch

Currently the BytesRefHash is destructive. We can add a method that returns
a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-02 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096231#comment-13096231
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

Simon, I think your patch should be in a different issue, eg, sorted bytes ref 
hash view or something.

 Add non-desctructive sort to BytesRefHash
 -

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
 LUCENE-3199.patch


 Currently the BytesRefHash is destructive.  We can add a method that returns 
 a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095412#comment-13095412
]

Jason Rutherglen commented on LUCENE-2312:
--

I'll post a new patch shortly that fixes bugs and adds a bit more to the
functionality.

The benchmark results are interesting. Array copies are very fast, I don't see
any problems with that, the median time is 2 ms. The concurrent skip list map
is expensive to add numerous 10s of thousands of terms to. I think that is to
be expected. The strategy of amortizing that cost by creating sorted by term
int[]s will probably be more performant than CSLM.

The sorted int[] terms can be merged just like segments, thus RT becomes a way
to remove the [NRT] cost of merging [numerous] postings lists. The int[] terms
can be merged in the background so that raw indexing speed is not affected.

Search on IndexWriter's RAM Buffer
--

Key: LUCENE-2312
URL: https://issues.apache.org/jira/browse/LUCENE-2312
Project: Lucene - Java
Issue Type: New Feature
Components: core/search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
Fix For: Realtime Branch

Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch,
LUCENE-2312.patch, LUCENE-2312.patch

In order to offer user's near realtime search, without incurring
an indexing performance penalty, we can implement search on
IndexWriter's RAM buffer. This is the buffer that is filled in
RAM as documents are indexed. Currently the RAM buffer is
flushed to the underlying directory (usually disk) before being
made searchable.
Todays Lucene based NRT systems must incur the cost of merging
segments, which can slow indexing.
Michael Busch has good suggestions regarding how to handle deletes using max
doc ids.
https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
The area that isn't fully fleshed out is the terms dictionary,
which needs to be sorted prior to queries executing. Currently
IW implements a specialized hash table. Michael B has a
suggestion here:
https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-01 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2312:
-

Fix Version/s: (was: Realtime Branch)
Affects Version/s: (was: Realtime Branch)
   4.0

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
 LUCENE-2312.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-01 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3199:
-

Attachment: LUCENE-3199.patch

Here's a version of this issue.  Added are a couple of new methods to 
TestBytesRefHash to test the new frozen compact and sorting functionality of 
BytesRefHash.

This is being posted now because it's useful in relation to LUCENE-2312 and a 
terms dictionary that is composed of sorted by term[id]s int[]s.

 Add non-desctructive sort to BytesRefHash
 -

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3199.patch


 Currently the BytesRefHash is destructive.  We can add a method that returns 
 a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-2312:
-

Attachment: LUCENE-2312.patch

Here's a new patch that incrementally adds field cache and norms values.
Meaning that as documents are added / indexed, norms and field cache values are
automatically created. The field cache values are only added to if they have
already been created.

The field cache functionality needs to be completed for all types.

We probably need to get the indexing lock while the field cache value is
initially being created (eg, the terms enumeration).

We're more or less feature complete now.

Search on IndexWriter's RAM Buffer
--

Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch,
LUCENE-2312.patch, LUCENE-2312.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-24 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-2312:
-

Attachment: LUCENE-2312.patch

This is a revised version of the LUCENE-2312 patch. The following are various
and miscelaneous notes pertaining to the patch and where it needs to go to be
committed.

Feel free to review the approach taken, eg, we're getting around non-realtime
structures through the usage of array copies (of which the arrays can be pooled
at some point).

* A copy of FreqProxPostingsArray.termFreqs is made per new reader. That array
can be pooled. This is no different than the deleted docs BitVector which is
created anew per-segment for any deletes that have occurred.

* FreqProxPostingsArray freqUptosRT, proxUptosRT, lastDocIDsRT, lastDocFreqsRT
is copied into, per new reader (as opposed to an entirely new array
instantiated for each new reader), this is a slight optimization in object
allocation.

* For deleting, a DWPT is clothed in an abstract class that exposes the
necessary methods from segment info, so that deletes may be applied to the RT
RAM reader. The deleting is still performed in BufferedDeletesStream.
BitVectors are cloned as well. There is room for improvement, eg, pooling the
BV byte[]’s.

* Documents (FieldsWriter) and term vectors are flushed on each get reader
call, so that reading will be able to load the data. We will need to test if
this is performant. We are not creating new files so this way of doing things
may well be efficient.

* We need to measure the cost of the native system array copy. It could very
well be quite fast / enough.

* Full posting functionality should be working including payloads

* Field caching may be implemented as a new field cache that is growable and
enables lock’d replacement of the underlying array

* String to string ordinal comparison caches needs to be figured out. The RAM
readers cannot maintain a sorted terms index the way statically sized segments
do

* When a field cache value is first being created, it needs to obtain the
indexing lock on the DWPT. Otherwise documents will continue to be indexed,
new values created, while the array will miss the new values. The downside is
that while the array is initially being created, indexing will stop. This can
probably be solved at some point by only locking during the creation of the
field cache array, and then notifying the DWPT of the new array. New values
would then accumulate into the array from the point of the max doc of the
reader the values creator is working from.

* The terms dictionary is a ConcurrentSkipListMap. We can periodically convert
it into a sorted [by term] int[], that has an FST on top.

Have fun reviewing! :)

Search on IndexWriter's RAM Buffer
--

Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch,
LUCENE-2312.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2700) transaction logging

2011-08-24 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090722#comment-13090722
 ] 

Jason Rutherglen commented on SOLR-2700:


Typically a transaction log configured to be written to a different hard drive 
than the indexes / database.

 transaction logging
 ---

 Key: SOLR-2700
 URL: https://issues.apache.org/jira/browse/SOLR-2700
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
 SOLR-2700.patch, SOLR-2700.patch


 A transaction log is needed for durability of updates, for a more performant 
 realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-24 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090764#comment-13090764
]

Jason Rutherglen commented on LUCENE-2312:
--

A benchmark plan is, compare the speed of NRT vs. RT.

Index documents in a single thread, in a 2nd thread open a reader and perform a
query. It would be nice to synchronize the point / max doc at which RT and NRT
open new readers to additionally verify the correctness of the directly
comparable search results. To make the test fair, concurrent merge scheduler
should be turned off in the NRT test.

The hypothesis is that array copying, even on large [RT] indexes is no big deal
compared with the excessive segment merging with NRT.

Search on IndexWriter's RAM Buffer
--

Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch,
LUCENE-2312.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3399) Enable replace-able field caches

2011-08-23 Thread Jason Rutherglen (JIRA)

Enable replace-able field caches


 Key: LUCENE-3399
 URL: https://issues.apache.org/jira/browse/LUCENE-3399
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor


For LUCENE-2312 we need to be able to synchronously replace field cache values 
and receive events on when new field cache values are created.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3399) Enable replace-able field caches

2011-08-23 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3399:
-

Attachment: LUCENE-3399.patch

A cut of this.

 Enable replace-able field caches
 

 Key: LUCENE-3399
 URL: https://issues.apache.org/jira/browse/LUCENE-3399
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3399.patch


 For LUCENE-2312 we need to be able to synchronously replace field cache 
 values and receive events on when new field cache values are created.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2702) Add support for NRTCachingDirectory

2011-08-18 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087438#comment-13087438
 ] 

Jason Rutherglen commented on SOLR-2702:


Can we mark this for Lucene 3.x as well?

 Add support for NRTCachingDirectory
 ---

 Key: SOLR-2702
 URL: https://issues.apache.org/jira/browse/SOLR-2702
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0


 would be nice to have this option for the new NRT support

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-08-17 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086754#comment-13086754
 ] 

Jason Rutherglen commented on SOLR-1431:


Can we look at backporting this one to 3.x, given 4.x is a little ways off?

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2565) Prevent IW#close and cut over to IW#commit

2011-08-17 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086755#comment-13086755
]

Jason Rutherglen commented on SOLR-2565:

Can this one be backported to 3.x? It would probably be fairly useful for
people to use now?

Prevent IW#close and cut over to IW#commit
--

Key: SOLR-2565
URL: https://issues.apache.org/jira/browse/SOLR-2565
Project: Solr
Issue Type: Improvement
Components: update
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2565.patch, SOLR-2565.patch, SOLR-2565.patch

Spinnoff from SOLR-2193. We already have a branch to work on this issue here
https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193
The main goal here is to prevent solr from closing the IW and use IW#commit
instead. AFAIK the main issues here are:
The update handler needs an overhaul.
A few goals I think we might want to look at:
1. Expose the SolrIndexWriter in the api or add the proper abstractions to
get done what we now do with special casing:
2. Stop closing the IndexWriter and start using commit (still lazy IW init
though).
3. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
4. Address the current issues we face because multiple original/'reloaded'
cores can have a different IndexWriter on the same index.
Eventually this is a preparation for NRT support in Solr which I will create
a followup issue for.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2565) Prevent IW#close and cut over to IW#commit

2011-08-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075980#comment-13075980
]

Jason Rutherglen commented on SOLR-2565:

This issue says committed in the comments, however it's status is: Unresolved?

Prevent IW#close and cut over to IW#commit
--

Attachments: SOLR-2565.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-29 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072962#comment-13072962
 ] 

Jason Rutherglen commented on LUCENE-3348:
--

Sorry to add my opinion to this, however I think that while non-blocking 
deletes are quite fancy, it seems they are  open to various bugs such as this.  
Is there a compelling reason non-locking is used, eg, performance?

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-07-13 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064751#comment-13064751
]

Jason Rutherglen commented on LUCENE-2312:
--

I had been testing out an alternative skip list. I think it's a bit too
esoteric at this point. I'm resuming work on this issue, using Java's CSLM for
the terms dict. There really isn't a good way to break up the patch, it's just
going to be large, eg, we can't separate out the terms dict from the RT
postings etc.

Search on IndexWriter's RAM Buffer
--

Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3296) Enable passing a config into PKIndexSplitter

2011-07-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062862#comment-13062862
 ] 

Jason Rutherglen commented on LUCENE-3296:
--

Uwe, the first patch [1] is implemented with CURRENT.

1. https://issues.apache.org/jira/secure/attachment/12485805/LUCENE-3296.patch

 Enable passing a config into PKIndexSplitter
 

 Key: LUCENE-3296
 URL: https://issues.apache.org/jira/browse/LUCENE-3296
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Affects Versions: 3.3, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Trivial
 Attachments: LUCENE-3296.patch, LUCENE-3296.patch


 I need to be able to pass the IndexWriterConfig into the IW used by 
 PKIndexSplitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3296) Enable passing a config into PKIndexSplitter

2011-07-10 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3296:
-

Attachment: LUCENE-3296.patch

This patch uses LUCENE_40.  All tests pass.

 Enable passing a config into PKIndexSplitter
 

 Key: LUCENE-3296
 URL: https://issues.apache.org/jira/browse/LUCENE-3296
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Trivial
 Attachments: LUCENE-3296.patch, LUCENE-3296.patch


 I need to be able to pass the IndexWriterConfig into the IW used by 
 PKIndexSplitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3296) Enable passing a config into PKIndexSplitter

2011-07-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062349#comment-13062349
 ] 

Jason Rutherglen commented on LUCENE-3296:
--

That was in there previously.  Lets change it.

 Enable passing a config into PKIndexSplitter
 

 Key: LUCENE-3296
 URL: https://issues.apache.org/jira/browse/LUCENE-3296
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Trivial
 Attachments: LUCENE-3296.patch


 I need to be able to pass the IndexWriterConfig into the IW used by 
 PKIndexSplitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-07-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062281#comment-13062281
 ] 

Jason Rutherglen commented on LUCENE-2919:
--

Sorry for the naive off/on-topic question.  

Ryan, what's the repository info that needs to be added to the pom.xml so that 
the project downloads the 4.0 snapshot?

Eg, I don't think it's:

{code}
repository
  idlucene/id
  
urlhttps://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts/org/apache//url
  snapshots
enabledtrue/enabled
  /snapshots
/repository
{code}

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3296) Enable passing a config into PKIndexSplitter

2011-07-08 Thread Jason Rutherglen (JIRA)

Enable passing a config into PKIndexSplitter


 Key: LUCENE-3296
 URL: https://issues.apache.org/jira/browse/LUCENE-3296
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Trivial


I need to be able to pass the IndexWriterConfig into the IW used by 
PKIndexSplitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3296) Enable passing a config into PKIndexSplitter

2011-07-08 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3296:
-

Attachment: LUCENE-3296.patch

Patch, all tests pass.

 Enable passing a config into PKIndexSplitter
 

 Key: LUCENE-3296
 URL: https://issues.apache.org/jira/browse/LUCENE-3296
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Trivial
 Attachments: LUCENE-3296.patch


 I need to be able to pass the IndexWriterConfig into the IW used by 
 PKIndexSplitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3245) Realtime terms dictionary

2011-06-27 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3245:
-

Attachment: LUCENE-3245.patch

Here's a cut with a first implementation of the CSLM and AIA terms 
dictionaries.  

I think we're ready to benchmark writes.

 Realtime terms dictionary
 -

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3245.patch, LUCENE-3245.patch, LUCENE-3245.patch


 For LUCENE-2312 we need a realtime terms dictionary.  While 
 ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
 overhead which can impact GC collection times and heap memory usage.  
 If we implement a skip list that uses primitive backing arrays, we can 
 hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3245) Realtime terms dictionary

2011-06-26 Thread Jason Rutherglen (JIRA)

Realtime terms dictionary
-

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor


For LUCENE-2312 we need a realtime terms dictionary.  While 
ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
overhead which can impact GC collection times and heap memory usage.  

If we implement a skip list that uses primitive backing arrays, we can 
hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3245) Realtime terms dictionary

2011-06-26 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-3245:
-

Attachment: LUCENE-3245.patch

Here's a basic initial patch implementing a single threaded writer, multiple
reader atomic integer array skip list.

The next step is to tie in the ByteBlockPool to store terms, eg, implement an
RTTermsDictAIA class, and an RTTermsDictCSLM class.

We can then load the same Wiki-EN terms, and measure the comparative write
speeds.

Then create a set of terms to lookup from each terms dict and measure the time
difference.

I am not yet sure how the speed of AtomicIntegerArray will compare with CSLM's
usage of AtomicReferenceFieldUpdater. Of note is the fact that because of
DWPTs we do not need a skip list that supports concurrent writes. And because
we're only adding new unique terms, we do not need delete functionality. Ie,
AIA could be faster, though we may need to inline code and perform various
tuning tricks.

Realtime terms dictionary
-

Key: LUCENE-3245
URL: https://issues.apache.org/jira/browse/LUCENE-3245
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
Attachments: LUCENE-3245.patch

For LUCENE-2312 we need a realtime terms dictionary. While
ConcurrentSkipListMap may be used, it has drawbacks in terms of high object
overhead which can impact GC collection times and heap memory usage.
If we implement a skip list that uses primitive backing arrays, we can
hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3245) Realtime terms dictionary

2011-06-26 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3245:
-

Attachment: LUCENE-3245.patch

Added and fixed the code that traverses the skip list to the level zero linked 
list and iterates.

I need to reuse the starts int array, that's next.

 Realtime terms dictionary
 -

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3245.patch, LUCENE-3245.patch


 For LUCENE-2312 we need a realtime terms dictionary.  While 
 ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
 overhead which can impact GC collection times and heap memory usage.  
 If we implement a skip list that uses primitive backing arrays, we can 
 hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-23 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053992#comment-13053992
 ] 

Jason Rutherglen commented on SOLR-2610:


Mark put it aptly.  The problem I think I encountered in my own version is left 
over file handles seemed to be preventing the deletion of all the files, many 
times some of them would be left over.  Also I deleted the entire core 
directory, which is useful for manual testing (eg, to avoid the directory 
exists exception).

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053581#comment-13053581
 ] 

Jason Rutherglen commented on LUCENE-3079:
--

Schemas should probably be a module that makes use of embedding the field types 
per-segment, this is something the faceting module could/should use.  I think 
is what LUCENE-2308 is aiming for?  Though I thought there was another Jira 
issue created by Simon for this as well.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053591#comment-13053591
 ] 

Jason Rutherglen commented on LUCENE-3079:
--

bq. I don't think any Facet module needs to be concerned with Schemas

Right, they should be field type aware.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-21 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052624#comment-13052624
 ] 

Jason Rutherglen commented on SOLR-2610:


This is good!  I had to write the same functionality into a custom Solr build 
on a project.

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051552#comment-13051552
 ] 

Jason Rutherglen commented on LUCENE-2919:
--

Thanks, committing this means I can remove a custom GitHub branch with only 
this patch.  Also, it'd be great if we somehow published nightly versions to 
Maven repositories.  Though they'd accumulate over time.

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051576#comment-13051576
 ] 

Jason Rutherglen commented on LUCENE-2919:
--

@Ryan Thanks!  What would one place as the artifact info into the pom.xml?

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050521#comment-13050521
 ] 

Jason Rutherglen commented on SOLR-1431:


Seems to be fine.  It'd be great to modularize Zookeeper references into a 
separate abstract interface (like what's done here), and not tie it to 
CoreContainer.  I think it could conflict with other uses of Zookeeper when the 
library versions are different.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050613#comment-13050613
]

Jason Rutherglen commented on SOLR-1431:

@Noble I agree, I don't think committing this patch should hold things up.
That was just a little note.

I've been looking at implementing Solr into HBase and am worried [somewhat]
about the ZK libaries. HBase + Solr can help with massive scale near realtime
systems you've described, eg, HBase implements splitting, partitioning, a fast
write ahead log, etc. Facebook has implemented the index directly into HBase,
which probably offers degraded indexing and search performance.

bq. We badly need the cloud features now

Right, many users are going with Elastic Search for the reasons mentioned.

CommComponent abstracted

Key: SOLR-1431
URL: https://issues.apache.org/jira/browse/SOLR-1431
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
Fix For: 4.0

Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch,
SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch,
SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch

We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050638#comment-13050638
 ] 

Jason Rutherglen commented on SOLR-1431:


Noble, the Jira issue is HBASE-3529 where much of the code is offline on Git 
because of the different pieces involved.  That being said, I've linked the 
various Lucene and Solr Jira issues that are required to implement Solr in 
HBase, eg LUCENE-2919 and SOLR-2563.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-06-14 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048987#comment-13048987
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

I think the issue with this, as it relates to realtime search, is in order to 
sort, we'll need to freeze indexing.

 Add non-desctructive sort to BytesRefHash
 -

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor

 Currently the BytesRefHash is destructive.  We can add a method that returns 
 a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-06-13 Thread Jason Rutherglen (JIRA)

Add non-desctructive sort to BytesRefHash
-

 Key: LUCENE-3199
 URL: https://issues.apache.org/jira/browse/LUCENE-3199
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor


Currently the BytesRefHash is destructive.  We can add a method that returns a 
non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047275#comment-13047275
 ] 

Jason Rutherglen commented on SOLR-1431:


I just downloaded http://svn.apache.org/repos/asf/lucene/dev/trunk and applied 
the patch, and test-core passed.  However the patch command mentioned specific 
hunks, though there was no .rej file.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening

2011-06-09 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046644#comment-13046644
]

Jason Rutherglen commented on LUCENE-2955:
--

Perhaps we can merge this functionality with SOLR-2565 and/or SOLR-2566, such
that Solr utilizes it for reader opening. However why would this issue use a
background thread and Solr performs a max time reopen?

Add utitily class to manage NRT reopening
-

Key: LUCENE-2955
URL: https://issues.apache.org/jira/browse/LUCENE-2955
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 3.3

Attachments: LUCENE-2955.patch, LUCENE-2955.patch

I created a simple class, NRTManager, that tries to abstract away some
of the reopen logic when using NRT readers.
You give it your IW, tell it min and max nanoseconds staleness you can
tolerate, and it privately runs a reopen thread to periodically reopen
the searcher.
It subsumes the SearcherManager from LIA2. Besides running the reopen
thread, it also adds the notion of a generation containing changes
you've made. So eg it has addDocument, returning a long. You can
then take that long value and pass it back to the getSearcher method
and getSearcher will return a searcher that reflects the changes made
in that generation.
This gives your app the freedom to force immediate consistency (ie
wait for the reopen) only for those searches that require it, like a
verifier that adds a doc and then immediately searches for it, but
also use eventual consistency for other searches.
I want to also add support for the new applyDeletions option when
pulling an NRT reader.
Also, this is very new and I'm sure buggy -- the concurrency is either
wrong over overly-locking. But it's a start...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3176) TestNRTThreads test failure

2011-06-06 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044981#comment-13044981
 ] 

Jason Rutherglen commented on LUCENE-3176:
--

It's probably the new DWPT code.  There was a specific issue to fix this 
problem LUCENE-2956.

 TestNRTThreads test failure
 ---

 Key: LUCENE-3176
 URL: https://issues.apache.org/jira/browse/LUCENE-3176
 Project: Lucene - Java
  Issue Type: Bug
 Environment: trunk
Reporter: Robert Muir
Assignee: Michael McCandless

 hit a fail in TestNRTThreads running tests over and over:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1431:
---

Fix Version/s: (was: 3.2)
 Priority: Major  (was: Trivial)
Affects Version/s: (was: 1.4)
   4.0

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1431:
---

Remaining Estimate: (was: 24h)
 Original Estimate: (was: 24h)

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1431:
---

Attachment: SOLR-1431.patch

Here's a patch updated to trunk.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1431:
---

Attachment: SOLR-1431.patch

Methods moved up into abstract class ShardHandler.  All tests pass.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-02 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042866#comment-13042866
 ] 

Jason Rutherglen commented on SOLR-1431:


No worries mate!

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2563) Allow generic pluggable file system implementations

2011-06-02 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042870#comment-13042870
 ] 

Jason Rutherglen commented on SOLR-2563:


One way to test this out would be to create a Solr unit test that tries to 
create a Solr instance on top of HDFS using an HDFSSolrResourceLoader.  Then I 
think the problem areas would reveal themselves.  It would be nice to run all 
of the Solr unit tests this way, however that seems much more complex.

 Allow generic pluggable file system implementations
 ---

 Key: SOLR-2563
 URL: https://issues.apache.org/jira/browse/SOLR-2563
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Jason Rutherglen

 For things like configuration files, they can be loaded from places other 
 than the local filesystem, such as Zookeeper or HDFS.  In this issue I will 
 abstract that functionality out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-06-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041995#comment-13041995
]

Jason Rutherglen commented on SOLR-2193:

I'm curious if someone who doesn't work at Lucid can be involved in Solr design
discussions. In any case, please autocratically continue.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Robert Muir
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

The update handler needs an overhaul.
A few goals I think we might want to look at:
1. Cleanup - drop DirectUpdateHandler(2) line - move to something like
UpdateHandler, DefaultUpdateHandler
2. Expose the SolrIndexWriter in the api or add the proper abstractions to
get done what we now do with special casing:
if (directupdatehandler2)
success
else
failish
3. Stop closing the IndexWriter and start using commit (still lazy IW init
though).
4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
5. Keep NRT support in mind.
6. Keep microsharding in mind (maintain logical index as multiple physical
indexes)
7. Address the current issues we face because multiple original/'reloaded'
cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-06-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042004#comment-13042004
]

Jason Rutherglen commented on SOLR-2193:

This article is an indicator of the types of benchmarks to perform:
http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Robert Muir
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-06-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042011#comment-13042011
]

Jason Rutherglen commented on SOLR-2193:

bq. Jason, this issue isn't intended to solve NRT

What is this line doing?

{code}
newReader = currentReader.reopen(indexWriterProvider.getIndexWriter(), true);
{code}

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Robert Muir
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-06-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042013#comment-13042013
]

Jason Rutherglen commented on SOLR-2193:

Also:

https://issues.apache.org/jira/browse/SOLR-2193?focusedCommentId=13016875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13016875

And this comment:

{quote}
Can you elaborate on why you don't think it's implementing NRT? I've tested
basic indexing/searching using wikipedia documents at about 50-100 documents a
second, opening a new reader every second. That felt pretty near-real-time to
me, but the phrase is subjective.
{quote}

from:
https://issues.apache.org/jira/browse/SOLR-2193?focusedCommentId=13041268page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041268

Robert, your statement's confusing.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Robert Muir
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2569) Enable facile moving of cores

2011-06-01 Thread Jason Rutherglen (JIRA)

Enable facile moving of cores
-

 Key: SOLR-2569
 URL: https://issues.apache.org/jira/browse/SOLR-2569
 Project: Solr
  Issue Type: Improvement
  Components: multicore, replication (java)
Affects Versions: 4.0
Reporter: Jason Rutherglen


Spin-off from this thread: 
http://search-lucene.com/m/5CO7Z1oOrh6/elastic+searchsubj=Solr+vs+ElasticSearch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-06-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042259#comment-13042259
]

Jason Rutherglen commented on SOLR-2193:

Simon, thanks for opening new issues.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Robert Muir
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041464#comment-13041464
]

Jason Rutherglen commented on SOLR-2193:

bq. I enjoyed our dialogue honestly

I'd prefer to simply get things done rather than banter with no results.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041691#comment-13041691
]

Jason Rutherglen commented on SOLR-2193:

As previously suggested, we need a new issue that refactors IndexWriter into
SolrCore, instead of placing it into an UpdateHandler. Then we can iterate on
re/factoring the NRT functionality.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041693#comment-13041693
]

Jason Rutherglen commented on SOLR-2193:

{quote}this is a fundamentally wrong direction{quote}

Yes. The idea of adding NRT is good though.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2563) Allow generic pluggable file system implementations

2011-05-31 Thread Jason Rutherglen (JIRA)

Allow generic pluggable file system implementations
---

 Key: SOLR-2563
 URL: https://issues.apache.org/jira/browse/SOLR-2563
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Jason Rutherglen


For things like configuration files, they can be loaded from places other than 
the local filesystem, such as Zookeeper or HDFS.  In this issue I will abstract 
that functionality out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2563) Allow generic pluggable file system implementations

2011-05-31 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041713#comment-13041713
 ] 

Jason Rutherglen commented on SOLR-2563:


Uwe, thanks, I think though there was an issue even trying to use that.  I'll 
take a look and report back!

 Allow generic pluggable file system implementations
 ---

 Key: SOLR-2563
 URL: https://issues.apache.org/jira/browse/SOLR-2563
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Jason Rutherglen

 For things like configuration files, they can be loaded from places other 
 than the local filesystem, such as Zookeeper or HDFS.  In this issue I will 
 abstract that functionality out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041715#comment-13041715
]

Jason Rutherglen commented on SOLR-2193:

{quote}I haven't looked that closely a this patch yet, but it already fixes a
long standing problem in Solr, that a long running merge blocks a Solr commit,
because it switches to IW.commit instead of closing/opening the writer.{quote}

Yes, that is/was not clear in the issue. Thank you for spelling it out.
However I think the patch is creating new abstract classes, that would then go
away? Why not spend a little more time trying to do a more overall design for
future refactoring?

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041767#comment-13041767
]

Jason Rutherglen commented on SOLR-2193:

{quote}solving our rather nasty reload a core, briefly different writers on the
same index problem (usually avoided because the overlap is brief and the
IndexWriter created lazily).{quote}

Robert I fully agree, however then the title of the Jira is incorrect.

Also the whole ref counted thing in Solr:

{code}
RefCountedSolrIndexSearcher holder = core.getNewestSearcher(false);
SolrIndexSearcher s = holder.get();
holder.decref();
// since there could be two commits in a row, don't test for a specific new
searcher
// just test that the old one has been replaced.
{code}

Should not be needed anymore. We're also adding ref counting on IWs now as
well? All of this is unnecessary. If we're modularizing, this isn't right
path to go one.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2563) Allow generic pluggable file system implementations

2011-05-31 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041924#comment-13041924
 ] 

Jason Rutherglen commented on SOLR-2563:


I don't think CoreContainer is completely removed from the local file system.  
Checkout persist, persistFile, etc.  Those should either be turned off, or 
should write to the underlying generic file system.

It looks like libs are hard coded in CoreContainer?

{code}
if (libDir != null) {
  File f = FileUtils.resolvePath(new File(dir), libDir);
  log.info( loading shared library: +f.getAbsolutePath() );
  libLoader = SolrResourceLoader.createClassLoader(f, null);
}
{code}

CoreDescriptor.getDataDir() is ambiguous.

QueryElevationComponent is hardcoded:

{code}
// check if using ZooKeeper
ZkController zkController = 
core.getCoreDescriptor().getCoreContainer().getZkController();
if(zkController != null) {
{code}

IndexBasedSpellChecker.initSourceReader()

SolrIndexWriter hardcodes writing the infoStream to the local file system.

The benchmark code is as well however that's somewhat less of a priority.

 Allow generic pluggable file system implementations
 ---

 Key: SOLR-2563
 URL: https://issues.apache.org/jira/browse/SOLR-2563
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Jason Rutherglen

 For things like configuration files, they can be loaded from places other 
 than the local filesystem, such as Zookeeper or HDFS.  In this issue I will 
 abstract that functionality out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1395) Integrate Katta

2011-05-30 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041026#comment-13041026
 ] 

Jason Rutherglen commented on SOLR-1395:


I think John Wu brings up excellent points. I don't think Solr Cloud
offers the same thing as this issue, and/or it's not articulated well on
the wiki. Lucene out of the box doesn't offer facets and other search
component features. These are things Solr provides but could/should be
modularized out as already proposed. Solr is currently too tightly
interwoven, this is perhaps why this patch is challenging to operate.
Integrating alternative systems into Solr seems to be political from my
point of view, eg, politicalSolr + Katta/political

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.2

 Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
 katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
 log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
 solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
 solr-1395-katta-0.6.2.patch, solr1395.jpg, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041265#comment-13041265
]

Jason Rutherglen commented on SOLR-2193:

I think the Solr ref counting code should go/exit, it's prone to pile up.
Instead as with Twitter's system, a new reader is opened per query,
because the readers are lightweight enough. I think that's a better path
to pursue than monkey wrenching Solr's existing system which from the
ground up, is not designed for NRT. If this patch isn't implementing NRT,
what is the point?

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041276#comment-13041276
]

Jason Rutherglen commented on SOLR-2193:

bq. This patch certainly won't complete the NRT work needed

Mark, I was reading this comment.

bq. You are questioning my whole patch

I think it'll be easier to add what's needed for this patch into Lucene rather
than retrofit Solr. I mentioned this a while back however there was pushback
on re-architecting Solr. Making everything per-segment would be much more
productive than allowing NRT at this stage. Ah, I think you're simply trying
to avoid the stop the world Solr has right now? If so that should be more
prevalent in the Jira.

bq. IndexWriter writer =
((DirectUpdateHandler2)core.getUpdateHandler()).getIndexWriterProvider().getIndexWriter();

Ugly Solr style code?!

The commit in X time can be simple contrib class for Lucene. It doesn't need
to be Solr specific.

Anyways I tried to do this 2 years ago for NRT, there was pushback just get the
IndexWriter like the above code from the update handler.
politicalWow/political

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041399#comment-13041399
]

Jason Rutherglen commented on SOLR-2193:

bq. IndexWriter writer
=((DirectUpdateHandler2)core.getUpdateHandler()).getIndexWriterProvider().getIndexWriter();

Why isn't IW a part of SolrCore? It's the main class running the show. How
can there be a Solr core without an IW? I think IW never gets closed until
the SolrCore is closed. The next move would be to place all of the caches
at the segment level.

It's been clear for quite a while that you folks at Lucid are trying to
protect your golden goose, eg, Solr from changing much unless dictated by
your staff or a paying customer. I think in politics those are called
bribes? Hence a large part of the recent fracas regarding modularizing the
goose, whose 'resolution' has resulted in no changes.

It's astonishing the changes that are OK for Solr by some people, that are
no OK from others. This is not a meritocracy. If you insist on driving,
you should incorporate some of the feedback given. Solr was hacked
together from the beginning and this is yet another ugly retrofit that is
being steamrolled in. If you're confident in your abilities you're
confident enough to make major changes. I've never seen that on the Solr
side of the Lucene project.

bq. I remember that issue - I tried to make some comments to help you out with
it

No there was push back on something silly and simple, eg, getting the IW
from the UpdateHandler, just as you have done here. What is the point in
contributing when they are blocked for no reason?

bq. SOLR-1155

What happened to this poor guys patch? Nothing.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041414#comment-13041414
]

Jason Rutherglen commented on SOLR-2193:

Mark,

That's an odd non-technical answer, and in the meritocracy of comedy, not funny
either.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041422#comment-13041422
]

Jason Rutherglen commented on SOLR-2193:

Mark I think you're missing the point. If you're committer then it's implied
you review patches and interact with the community, nicely. That's not
happening with in this issue, or in Solr as noted by in fact many people.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041428#comment-13041428
]

Jason Rutherglen commented on SOLR-2193:

-1 on the patch, I just reviewed again. IndexWriter should be a part of
SolrCore (IW is canonical), as we should not be opening and closing IWs in the
life of a Solr core.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-30 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041433#comment-13041433
]

Jason Rutherglen commented on SOLR-2193:

bq. Okay, -1 accepted. You win, good fight

Mark this was no fight, this is the open source Apache way.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch,
SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-29 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040845#comment-13040845
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

I already posted a patch to this issue a while back, 
https://issues.apache.org/jira/secure/attachment/12468030/LUCENE-2793.patch  It 
seems we're looping here.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents

2011-05-17 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034734#comment-13034734
 ] 

Jason Rutherglen commented on LUCENE-3112:
--

I think perhaps like a Hadoop input format split, we can define meta-data at 
the segment level as to where the documents live so that if one is 'splitting' 
the index, as is being implemented with HBase, the 'splitter' can be 'smart'.

 Add IW.add/updateDocuments to support nested documents
 --

 Key: LUCENE-3112
 URL: https://issues.apache.org/jira/browse/LUCENE-3112
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3112.patch


 I think nested documents (LUCENE-2454) is a very compelling addition
 to Lucene.  It's also a popular (many votes) issue.
 Beyond supporting nested document querying, which is already an
 incredible addition since it preserves the relational model on
 indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
 should also enable speedups in grouping implementation when you group
 by a nested field.
 For the same reason, it can also enable very fast post-group facet
 counting impl (LUCENE-3097) when you what to
 count(distinct(nestedField)), instead of unique documents, as your
 identifier.  I expect many apps that use faceting need this ability
 (to count(distinct(nestedField)) not distinct(docID)).
 To support these use cases, I believe the only core change needed is
 the ability to atomically add or update multiple documents, which you
 cannot do today since in between add/updateDocument calls a flush (eg
 due to commit or getReader()) could occur.
 This new API (addDocuments(IterableDocument), updateDocuments(Term
 delTerm, IterableDocument) would also further guarantee that the
 documents are assigned sequential docIDs in the order the iterator
 provided them, and that the docIDs all reside in one segment.
 Segment merging never splits segments apart, so this invariant would
 hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-14 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019823#comment-13019823
]

Jason Rutherglen commented on LUCENE-2956:
--

{quote}Jason I think nothing prevents you from start working on this again
Yet, I think we should freeze the branch now and only allow merging, bug fixes,
tests and documentation fixes until we land on trunk. Once we are there we can
freely push stuff in the branch again and make it work with seq. ids.
{quote}

OK, great. I remember now that our main concern was the memory usage of using
a short[] (for the seq ids) if the total number of documents is numerous (eg,
10s of millions). Also at some point we'd have double the memory usage when we
roll over to the next set, until the previous readers are closed.

bq. I think we should freeze the branch now and only allow merging, bug fixes,
tests and documentation fixes until we land on trunk

Maybe once LUCENE-2312 sequence ids work for deletes, we can look at creating a
separate branch that implements seq id deletes for all segments, and compare
with the BV approach. Eg, performance, memory usage, and simplicity.

Support updateDocument() with DWPTs
---

Key: LUCENE-2956
URL: https://issues.apache.org/jira/browse/LUCENE-2956
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2956.patch, LUCENE-2956.patch

With separate DocumentsWriterPerThreads (DWPT) it can currently happen that
the delete part of an updateDocument() is flushed and committed separately
from the corresponding new document.
We need to make sure that updateDocument() is always an atomic operation from
a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more
details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-13 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019370#comment-13019370
]

Jason Rutherglen commented on LUCENE-2956:
--

Simon, nice work. I agree with Michael B. that the deletes are super complex.
We had discussed using sequence ids for all segments (not just the RT enabled
DWPT ones) however we never worked out a specification, eg, for things like
wrap around if a primitive short[] was used.

Shall we start again on LUCENE-2312? I think we still need/want to use
sequence ids there. The RT DWPTs shouldn't have so many documents that using a
long[] for the sequence ids is too RAM consuming?

Support updateDocument() with DWPTs
---

Attachments: LUCENE-2956.patch, LUCENE-2956.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-04-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019391#comment-13019391
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

In the current patch, I'm copying the parallel array for the end of a term's 
postings per reader [re]open.  However in the case where we're opening a reader 
after each document is indexed, this is wasteful.  We can simply queue the term 
ids from the last indexed document, and only copy the newly updated values over 
to the 'read' only consistent parallel array.

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018645#comment-13018645
 ] 

Jason Rutherglen commented on LUCENE-2956:
--

I think I have an idea, however can you explain the ticketQueue?

 Support updateDocument() with DWPTs
 ---

 Key: LUCENE-2956
 URL: https://issues.apache.org/jira/browse/LUCENE-2956
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2956.patch


 With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
 the delete part of an updateDocument() is flushed and committed separately 
 from the corresponding new document.
 We need to make sure that updateDocument() is always an atomic operation from 
 a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
 details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2186) First cut at column-stride fields (index values storage)

2011-04-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017886#comment-13017886
 ] 

Jason Rutherglen commented on LUCENE-2186:
--

bq. changing this to a random access seekable API should be not too hard

I think we can offer the option of MMap'ing the field caches, which I think 
will help alleviate OOMs?

 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2186) First cut at column-stride fields (index values storage)

2011-04-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017679#comment-13017679
 ] 

Jason Rutherglen commented on LUCENE-2186:
--

I'm wondering if there is a limitation on whether or not we can randomly access 
the doc values from the underlying Directory implementation, rather than need 
to load all the values directly into the main heap space.  This seems doable, 
and if so let me know if I can provide a patch.

 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017084#comment-13017084
 ] 

Jason Rutherglen commented on LUCENE-2956:
--

What is the status of this one?  If no one's working on it, I can take a stab.

 Support updateDocument() with DWPTs
 ---

 Key: LUCENE-2956
 URL: https://issues.apache.org/jira/browse/LUCENE-2956
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


 With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
 the delete part of an updateDocument() is flushed and committed separately 
 from the corresponding new document.
 We need to make sure that updateDocument() is always an atomic operation from 
 a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
 details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-31 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014016#comment-13014016
]

Jason Rutherglen commented on LUCENE-2573:
--

bq. influenced due to the fact that flushing is very very CPU intensive

Do you think this is due mostly to the vint decoding? We're not interleaving
postings on flush with this patch so the CPU consumption should be somewhat
lower.

bq. At the same time CMS might kick in way more often since we are writing more
segments which are also smaller compared to trunk

This's probably the more likely case. In general, we may be able to default to
a higher overall RAM buffer size, and perhaps there won't be degradation in
indexing performance like there is with trunk? In the future with RT we could
get fancy and selectively merge segments as we're flushing, if writing larger
segments is important.

I'd personally prefer to write out 1-2 GB segments, and limit the number of
DWPTs to 2-3, mainly for servers that are concurrently indexing and searching
(eg, the RT use case). I think the current default number of thread states is
a bit high.

Tiered flushing of DWPTs by RAM with low/high water marks
-

Key: LUCENE-2573
URL: https://issues.apache.org/jira/browse/LUCENE-2573
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch

Now that we have DocumentsWriterPerThreads we need to track total consumed
RAM across all DWPTs.
A flushing strategy idea that was discussed in LUCENE-2324 was to use a
tiered approach:
- Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
- Flush all DWPTs at a high water mark (e.g. at 110%)
- Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
used, flush at 90%, 95%, 100%, 105% and 110%.
Should we allow the user to configure the low and high water mark values
explicitly using total values (e.g. low water mark at 120MB, high water mark
at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-03-29 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012659#comment-13012659
]

Jason Rutherglen commented on LUCENE-3003:
--

{quote}Eventually we should fold this ability into docvalues, ie we'd
write the byte[] image at indexing time, and then loading would be
fast, instead of uninverting{quote}

I'd guess that pulsing should be 'good enough' most of the time? It seems like
there'll be some overlap in terms of the gains from pulsing vis-à-vis
DocValues?

Move UnInvertedField into Lucene core
-

Key: LUCENE-3003
URL: https://issues.apache.org/jira/browse/LUCENE-3003
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
Fix For: 3.2, 4.0

Attachments: LUCENE-3003.patch

Solr's UnInvertedField lets you quickly lookup all terms ords for a
given doc/field.
Like, FieldCache, it inverts the index to produce this, and creates a
RAM-resident data structure holding the bits; but, unlike FieldCache,
it can handle multiple values per doc, and, it does not hold the term
bytes in RAM. Rather, it holds only term ords, and then uses
TermsEnum to resolve ord - term.
This is great eg for faceting, where you want to use int ords for all
of your counting, and then only at the end you need to resolve the
top N ords to their text.
I think this is a useful core functionality, and we should move most
of it into Lucene's core. It's a good complement to FieldCache. For
this first baby step, I just move it into core and refactor Solr's
usage of it.
After this, as separate issues, I think there are some things we could
explore/improve:
* The first-pass that allocates lots of tiny byte[] looks like it
could be inefficient. Maybe we could use the byte slices from the
indexer for this...
* We can improve the RAM efficiency of the TermIndex: if the codec
supports ords, and we are operating on one segment, we should just
use it. If not, we can use a more RAM-efficient data structure,
eg an FST mapping to the ord.
* We may be able to improve on the main byte[] representation by
using packed ints instead of delta-vInt?
* Eventually we should fold this ability into docvalues, ie we'd
write the byte[] image at indexing time, and then loading would be
fast, instead of uninverting

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-03-29 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012687#comment-13012687
]

Jason Rutherglen commented on LUCENE-3003:
--

bq. Ie Pulsing is good for terms that have only 1 or 2 docs

I thought the default is 16 docs? If there are more then seek'ing to the
postings should be negligible (in comparison to a larger aggregate index size
when using CSF/DocValues, which'll consume more of the system IO cache)?

Move UnInvertedField into Lucene core
-

Attachments: LUCENE-3003.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-03-11 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005631#comment-13005631
]

Jason Rutherglen commented on LUCENE-2324:
--

bq. I think making a different data structure to hold low-DF terms would
actually be a big boost in RAM efficiency. The RAM-per-unique-term is fairly
high...

However we're not sure why a largish 1+ GB RAM buffer seems to slow down? If
we're round robin indexing against the DWPTs I think they'll have a similar
number of unique terms as today, even though each DWPT will be smaller in size
total size from each containing 1/Nth docs.

Per thread DocumentsWriters that write their own private segments
-

Key: LUCENE-2324
URL: https://issues.apache.org/jira/browse/LUCENE-2324
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch,
lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out

See LUCENE-2293 for motivation and more details.
I'm copying here Mike's summary he posted on 2293:
Change the approach for how we buffer in RAM to a more isolated
approach, whereby IW has N fully independent RAM segments
in-process and when a doc needs to be indexed it's added to one of
them. Each segment would also write its own doc stores and
normal segment merging (not the inefficient merge we now do on
flush) would merge them. This should be a good simplification in
the chain (eg maybe we can remove the *PerThread classes). The
segments can flush independently, letting us make much better
concurrent use of IO CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-03-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005651#comment-13005651
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

{quote}Ie, if a given term X occurrs in 6 DWPTs (today) then we merge-sort the 
docIDs from the postings of that term, which is costly. (The normal merge 
that will merge these DWPTs after this issue lands just append by 
docIDs).{quote}

Right, this is the same principal motivation behind implementing DWPTs for use 
with realtime search, eg, the doc-id interleaving is too expensive to be 
performed at query time.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, 
 lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-03-10 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005159#comment-13005159
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Is the max optimal DWPT size related to the size of the terms hash, or is it 
likely something else?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, 
 lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-03-10 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005177#comment-13005177
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

{quote}Because 1) the RAM efficiency ought to scale up very well, as you see a 
given term in more and more docs (hmm, though, maybe not, because from Zipf's 
law, half your terms will be singletons no matter how many docs you index), and 
2) less merging is required.{quote}

I'm not sure how we handled concurrency on the terms hash before, however with 
DWPTs there won't be contention regardless.  It'd be nice if we could build 1-2 
GB segment's in RAM, I think that would greatly reduce the number merges that 
are required downstream.  Eg, then there's less need for merging by size, and 
most merges would be caused by the number/percentage of deletes.  If it turns 
out the low DF terms are causing the slowdown, maybe there is a different 
hashing system that could be used.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, 
 lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2919) IndexSplitter that divides by primary key term

2011-03-03 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2919:
-

Attachment: LUCENE-2919.patch

First cut.  Roughly divides an index by the exclusive mid term given.  

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1294 matches

Mail list logo