Re: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-19 Thread Andy
Nagendra,

In another email you mentioned there's a problem where if an existing document 
is updated both the old and new version will show up in search results.

Has that been solved in Solr-RA 3.3?

--- On Mon, 7/18/11, Nagendra Nagarajayya nnagaraja...@transaxtions.com wrote:

 From: Nagendra Nagarajayya nnagaraja...@transaxtions.com
 Subject: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high 
 performance 1 tps
 To: solr-user@lucene.apache.org
 Date: Monday, July 18, 2011, 10:43 AM
 Hi!
 
 I would like to announce the availability of Solr 3.3 with
 RankingAlgorithm and Near Real Time (NRT) search capability
 now. The NRT performance is very high, 10,000 documents/sec
 with the MBArtists 390k index. The NRT functionality allows
 you to add documents without the IndexSearchers being closed
 or caches being cleared. A commit is also not needed with
 the document update. Searches can run concurrently with
 document updates. No changes are needed except for enabling
 the NRT through solrconfig.xml.
 
 RankingAlgorithm query performance is now 3x times faster
 than before and is exposed as the Lucene API. This release
 also adds supports for the last document with a unique id to
 be searchable and visible in search results in case of
 multiple updates of the document.
 
 I have a wiki page that describes NRT performance in detail
 and can be accessed from here:
 
 http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x
 
 You can download Solr 3.3 with RankingAlgorithm (NRT
 version) from here:
 
 http://solr-ra.tgels.org
 
 I would like to invite you to give this version a try as
 the performance is very high.
 
 Regards,
 
 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.org
 
 
 
 


Re: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-19 Thread Nagendra Nagarajayya
Yes, this problem has been solved though not completely, there is still 
a refresh problem.  To eliminate duplicate documents with a unique id 
during update, you need to set 
maxBufferedDeleteTerms1/maxBufferedDeleteTerms. This makes the most 
recent updated document to become searchable as well as removing the 
older documents. There is a catch though, if some of the fields  in a 
document are different and this is updated , older content might show up 
as part of the results even though the query matches the most recent 
document content ie. if the most recent doc has afield set to 
docafieldabc/afield/doc and this is updated, and the old docs 
were docafieldxyz/afield, at query time, q=afield:abc matches, but 
the results show may show docafieldxyz/afield. I am still 
researching this.


You can get more information about the performance and known issues here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 7/19/2011 1:21 AM, Andy wrote:

Nagendra,

In another email you mentioned there's a problem where if an existing document 
is updated both the old and new version will show up in search results.

Has that been solved in Solr-RA 3.3?

--- On Mon, 7/18/11, Nagendra Nagarajayyannagaraja...@transaxtions.com  wrote:


From: Nagendra Nagarajayyannagaraja...@transaxtions.com
Subject: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high 
performance 1 tps
To: solr-user@lucene.apache.org
Date: Monday, July 18, 2011, 10:43 AM
Hi!

I would like to announce the availability of Solr 3.3 with
RankingAlgorithm and Near Real Time (NRT) search capability
now. The NRT performance is very high, 10,000 documents/sec
with the MBArtists 390k index. The NRT functionality allows
you to add documents without the IndexSearchers being closed
or caches being cleared. A commit is also not needed with
the document update. Searches can run concurrently with
document updates. No changes are needed except for enabling
the NRT through solrconfig.xml.

RankingAlgorithm query performance is now 3x times faster
than before and is exposed as the Lucene API. This release
also adds supports for the last document with a unique id to
be searchable and visible in search results in case of
multiple updates of the document.

I have a wiki page that describes NRT performance in detail
and can be accessed from here:

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.3 with RankingAlgorithm (NRT
version) from here:

http://solr-ra.tgels.org

I would like to invite you to give this version a try as
the performance is very high.

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org










[Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-18 Thread Nagendra Nagarajayya

Hi!

I would like to announce the availability of Solr 3.3 with 
RankingAlgorithm and Near Real Time (NRT) search capability now. The NRT 
performance is very high, 10,000 documents/sec with the MBArtists 390k 
index. The NRT functionality allows you to add documents without the 
IndexSearchers being closed or caches being cleared. A commit is also 
not needed with the document update. Searches can run concurrently with 
document updates. No changes are needed except for enabling the NRT 
through solrconfig.xml.


RankingAlgorithm query performance is now 3x times faster than before 
and is exposed as the Lucene API. This release also adds supports for 
the last document with a unique id to be searchable and visible in 
search results in case of multiple updates of the document.


I have a wiki page that describes NRT performance in detail and can be 
accessed from here:


http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.3 with RankingAlgorithm (NRT version) from here:

http://solr-ra.tgels.org

I would like to invite you to give this version a try as the performance 
is very high.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org





Re: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-18 Thread Nagendra Nagarajayya
Thanks Mark! I made the earlier implementation of NRT with 1.4.1 
available to Solr through a JIRA issue:


 https://issues.apache.org/jira/browse/SOLR-2568
( I had made available the implementation details through a paper 
published at 
http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf which 
includes the source, modifications, etc.)


I plan to make available the current implementation of NRT with Solr 
3.2/3.3 and RankingAlgorithm as a patch. This implementation has very 
high performance (1 docs/sec) and in fact on my system is faster 
than the normal update/commit.
There are some issues not yet resolved as to when to invalidate/update 
the cache but this seems to be not a very easy problem.


Regarding the Lucene list ( I thought both Solr and Lucene were now 
shared projects. I can add a message to my emails to make it clear that 
Solr with RankingAlgorithm is an external implementation. I also plan to 
file an RFE to allow plugin/api support for external text search 
libraries support for Solr.


- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org


On 7/18/2011 9:45 AM, Mark Miller wrote:

Hey Nagendra - I don't mind seeing these external project announces here 
(though you might keep Solr related announces off the Lucene user list), but 
please word these announces so that users are not confused that this is an 
Apache release, and that it is an external project built on top of Apache Solr.

Thanks,

- Mark

On Jul 18, 2011, at 10:43 AM, Nagendra Nagarajayya wrote:


Hi!

I would like to announce the availability of Solr 3.3 with RankingAlgorithm and 
Near Real Time (NRT) search capability now. The NRT performance is very high, 
10,000 documents/sec with the MBArtists 390k index. The NRT functionality 
allows you to add documents without the IndexSearchers being closed or caches 
being cleared. A commit is also not needed with the document update. Searches 
can run concurrently with document updates. No changes are needed except for 
enabling the NRT through solrconfig.xml.

RankingAlgorithm query performance is now 3x times faster than before and is 
exposed as the Lucene API. This release also adds supports for the last 
document with a unique id to be searchable and visible in search results in 
case of multiple updates of the document.

I have a wiki page that describes NRT performance in detail and can be accessed 
from here:

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.3 with RankingAlgorithm (NRT version) from here:

http://solr-ra.tgels.org

I would like to invite you to give this version a try as the performance is 
very high.

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org




- Mark Miller
lucidimagination.com