filtering non english text from my results

2011-08-14 Thread Omri Cohen
Hi All,

I am looking for a solution to filter out text which contains non english
words. Where my goal is to present my english speaking users with results in
their language.

any ideas?

thanks
Omri


Re: Can Master push data to slave

2011-08-14 Thread Pawan Darira
Regarding point b, i mean that when Slave server does a replication from
Master, it creates a lock-file in it's index directory. How to avoid that?


On Tue, Aug 9, 2011 at 2:56 AM, Markus Jelsma wrote:

> Hi,
>
> > Hi
> >
> > I am using Solr 1.4. and doing a replication process where my slave is
> > pulling data from Master. I have 2 questions
> >
> > a. Can Master push data to slave
>
> Not in current versions. Not sure about exotic patches for this.
>
> > b. How to make sure that lock file is not created while replication
>
> What do you mean?
>
> >
> > Please help
> >
> > thanks
> > Pawan
>


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Nagendra Nagarajayya

Bill:

I did look at Marks performance tests. Looks very interesting.

Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 8/14/2011 7:47 PM, Bill Bell wrote:

I understand.

Have you looked at Mark's patch? From his performance tests, it looks
pretty good.

When would RA work better?

Bill


On 8/14/11 8:40 PM, "Nagendra Nagarajayya"
wrote:


Bill:

The technical details of the NRT implementation in Apache Solr with
RankingAlgorithm (SOLR-RA) is available here:

http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf

(Some changes for Solr 3.x, but for most it is as above)

Regarding support for 4.0 trunk, should happen sometime soon.

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org





On 8/14/2011 7:11 PM, Bill Bell wrote:

OK,

I'll ask the elephant in the roomŠ.

What is the difference between the new UpdateHandler from Mark and the
SOLR-RA?

The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

Pros/Cons?


On 8/14/11 8:10 PM, "Nagendra
Nagarajayya"
wrote:


Naveen:

NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have
to be cleared or the old searchers closed or  new searchers opened, and
warmed (error that you are facing).

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 8/14/2011 10:37 AM, Naveen Gupta wrote:

Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we
started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using
commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins
for
15,000 docs) per user. There are many paraller user committing to our
SOLR.

So we removed the commitWithin, and hence performance was much much
better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is
submitted
for indexing.

The question here is what is difference between commitWithin and
commit
(apart from the fact that commit takes memory and processes and
additional
hardware usage)

Why we want it to be visible as soon as possible, since we are
applying
many
business rules on top of the results (older indexes as well as new
one)
and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then
other
optimizations.

We will definitely try NRT. But please tell me other options which we
can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick
Ericksonwrote:


Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

Erick

On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller
wrote:

On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:


You either have to go to near real time (NRT), which is under
development, but not committed to trunk yet

NRT support is committed to trunk.

- Mark Miller
lucidimagination.com




















Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Bill Bell
I understand.

Have you looked at Mark's patch? From his performance tests, it looks
pretty good.

When would RA work better?

Bill


On 8/14/11 8:40 PM, "Nagendra Nagarajayya" 
wrote:

>Bill:
>
>The technical details of the NRT implementation in Apache Solr with
>RankingAlgorithm (SOLR-RA) is available here:
>
>http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf
>
>(Some changes for Solr 3.x, but for most it is as above)
>
>Regarding support for 4.0 trunk, should happen sometime soon.
>
>Regards
>
>- Nagendra Nagarajayya
>http://solr-ra.tgels.org
>http://rankingalgorithm.tgels.org
>
>
>
>
>
>On 8/14/2011 7:11 PM, Bill Bell wrote:
>> OK,
>>
>> I'll ask the elephant in the roomŠ.
>>
>> What is the difference between the new UpdateHandler from Mark and the
>> SOLR-RA?
>>
>> The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?
>>
>> Pros/Cons?
>>
>>
>> On 8/14/11 8:10 PM, "Nagendra
>>Nagarajayya"
>> wrote:
>>
>>> Naveen:
>>>
>>> NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
>>> document to become searchable. Any document that you add through update
>>> becomes  immediately searchable. So no need to commit from within your
>>> update client code.  Since there is no commit, the cache does not have
>>> to be cleared or the old searchers closed or  new searchers opened, and
>>> warmed (error that you are facing).
>>>
>>> Regards
>>>
>>> - Nagendra Nagarajayya
>>> http://solr-ra.tgels.org
>>> http://rankingalgorithm.tgels.org
>>>
>>>
>>>
>>> On 8/14/2011 10:37 AM, Naveen Gupta wrote:
 Hi Mark/Erick/Nagendra,

 I was not very confident about NRT at that point of time, when we
 started
 project almost 1 year ago, definitely i would try NRT and see the
 performance.

 The current requirement was working fine till we were using
 commitWithin 10
 millisecs in the XMLDocument which we were posting to SOLR.

 But due to which, we were getting very poor performance (almost 3 mins
 for
 15,000 docs) per user. There are many paraller user committing to our
 SOLR.

 So we removed the commitWithin, and hence performance was much much
 better.

 But then we are getting this maxWarmingSearcher Error, because we are
 committing separately as a curl request after once entire doc is
 submitted
 for indexing.

 The question here is what is difference between commitWithin and
commit
 (apart from the fact that commit takes memory and processes and
 additional
 hardware usage)

 Why we want it to be visible as soon as possible, since we are
applying
 many
 business rules on top of the results (older indexes as well as new
one)
 and
 apply different filters.

 upto 5 mins is fine for us. but more than that we need to think then
 other
 optimizations.

 We will definitely try NRT. But please tell me other options which we
 can
 apply in order to optimize.?

 Thanks
 Naveen


 On Sun, Aug 14, 2011 at 9:42 PM, Erick
 Ericksonwrote:

> Ah, thanks, Mark... I must have been looking at the wrong JIRAs.
>
> Erick
>
> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller
> wrote:
>> On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:
>>
>>> You either have to go to near real time (NRT), which is under
>>> development, but not committed to trunk yet
>> NRT support is committed to trunk.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>




Re: Cache replication

2011-08-14 Thread Bill Bell
OK. But SOLR has built-in caching. Do you not like the caching? What so
you think we should change to the SOLR cache?

Bill


On 8/10/11 9:16 AM, "didier deshommes"  wrote:

>Consider putting a cache (memcached, redis, etc) *in front* of your
>solr slaves. Just make sure to update it when replication occurs.
>
>didier
>
>On Tue, Aug 9, 2011 at 6:07 PM, arian487  wrote:
>> I'm wondering if the caches on all the slaves are replicated across
>>(such as
>> queryResultCache).  That is to say, if I hit one of my slaves and cache
>>a
>> result, and I make a search later and that search happens to hit a
>>different
>> slave, will that first cached result be available for use?
>>
>> This is pretty important because I'm going to have a lot of slaves and
>>if
>> this isn't done, then I'd have a high chance of running a lot uncached
>> queries.
>>
>> Thanks :)
>>
>> --
>> View this message in context:
>>http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.ht
>>ml
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>




Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Nagendra Nagarajayya

Bill:

The technical details of the NRT implementation in Apache Solr with 
RankingAlgorithm (SOLR-RA) is available here:


http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf

(Some changes for Solr 3.x, but for most it is as above)

Regarding support for 4.0 trunk, should happen sometime soon.

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org





On 8/14/2011 7:11 PM, Bill Bell wrote:

OK,

I'll ask the elephant in the roomŠ.

What is the difference between the new UpdateHandler from Mark and the
SOLR-RA?

The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

Pros/Cons?


On 8/14/11 8:10 PM, "Nagendra Nagarajayya"
wrote:


Naveen:

NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have
to be cleared or the old searchers closed or  new searchers opened, and
warmed (error that you are facing).

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 8/14/2011 10:37 AM, Naveen Gupta wrote:

Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we
started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using
commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins
for
15,000 docs) per user. There are many paraller user committing to our
SOLR.

So we removed the commitWithin, and hence performance was much much
better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is
submitted
for indexing.

The question here is what is difference between commitWithin and commit
(apart from the fact that commit takes memory and processes and
additional
hardware usage)

Why we want it to be visible as soon as possible, since we are applying
many
business rules on top of the results (older indexes as well as new one)
and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then
other
optimizations.

We will definitely try NRT. But please tell me other options which we
can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick
Ericksonwrote:


Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

Erick

On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller
wrote:

On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:


You either have to go to near real time (NRT), which is under
development, but not committed to trunk yet

NRT support is committed to trunk.

- Mark Miller
lucidimagination.com

















Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Bill Bell
OK,

I'll ask the elephant in the roomŠ.

What is the difference between the new UpdateHandler from Mark and the
SOLR-RA?

The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

Pros/Cons?


On 8/14/11 8:10 PM, "Nagendra Nagarajayya" 
wrote:

>Naveen:
>
>NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
>document to become searchable. Any document that you add through update
>becomes  immediately searchable. So no need to commit from within your
>update client code.  Since there is no commit, the cache does not have
>to be cleared or the old searchers closed or  new searchers opened, and
>warmed (error that you are facing).
>
>Regards
>
>- Nagendra Nagarajayya
>http://solr-ra.tgels.org
>http://rankingalgorithm.tgels.org
>
>
>
>On 8/14/2011 10:37 AM, Naveen Gupta wrote:
>> Hi Mark/Erick/Nagendra,
>>
>> I was not very confident about NRT at that point of time, when we
>>started
>> project almost 1 year ago, definitely i would try NRT and see the
>> performance.
>>
>> The current requirement was working fine till we were using
>>commitWithin 10
>> millisecs in the XMLDocument which we were posting to SOLR.
>>
>> But due to which, we were getting very poor performance (almost 3 mins
>>for
>> 15,000 docs) per user. There are many paraller user committing to our
>>SOLR.
>>
>> So we removed the commitWithin, and hence performance was much much
>>better.
>>
>> But then we are getting this maxWarmingSearcher Error, because we are
>> committing separately as a curl request after once entire doc is
>>submitted
>> for indexing.
>>
>> The question here is what is difference between commitWithin and commit
>> (apart from the fact that commit takes memory and processes and
>>additional
>> hardware usage)
>>
>> Why we want it to be visible as soon as possible, since we are applying
>>many
>> business rules on top of the results (older indexes as well as new one)
>>and
>> apply different filters.
>>
>> upto 5 mins is fine for us. but more than that we need to think then
>>other
>> optimizations.
>>
>> We will definitely try NRT. But please tell me other options which we
>>can
>> apply in order to optimize.?
>>
>> Thanks
>> Naveen
>>
>>
>> On Sun, Aug 14, 2011 at 9:42 PM, Erick
>>Ericksonwrote:
>>
>>> Ah, thanks, Mark... I must have been looking at the wrong JIRAs.
>>>
>>> Erick
>>>
>>> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller
>>> wrote:
 On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:

> You either have to go to near real time (NRT), which is under
> development, but not committed to trunk yet
 NRT support is committed to trunk.

 - Mark Miller
 lucidimagination.com









>




Loggly support

2011-08-14 Thread Bill Bell
How do you setup log4j to work with Loggly for SOLR logs?

Anyone have this set up?

Bill





Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Nagendra Nagarajayya

Naveen:

NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a 
document to become searchable. Any document that you add through update 
becomes  immediately searchable. So no need to commit from within your 
update client code.  Since there is no commit, the cache does not have 
to be cleared or the old searchers closed or  new searchers opened, and 
warmed (error that you are facing).


Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 8/14/2011 10:37 AM, Naveen Gupta wrote:

Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins for
15,000 docs) per user. There are many paraller user committing to our SOLR.

So we removed the commitWithin, and hence performance was much much better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is submitted
for indexing.

The question here is what is difference between commitWithin and commit
(apart from the fact that commit takes memory and processes and additional
hardware usage)

Why we want it to be visible as soon as possible, since we are applying many
business rules on top of the results (older indexes as well as new one) and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then other
optimizations.

We will definitely try NRT. But please tell me other options which we can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick Ericksonwrote:


Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

Erick

On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller
wrote:

On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:


You either have to go to near real time (NRT), which is under
development, but not committed to trunk yet

NRT support is committed to trunk.

- Mark Miller
lucidimagination.com













Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Peter Sturge
It's worth noting that the fast commit rate is only an indirect part
of the issue you're seeing. As the error comes from cache warming - a
consequence of committing, it's not the fault of commiting directly.
It's well worth having a good close look at exactly what you're caches
are doing when they are warmed, and trying as much as possible to
remove any uneeded facet/field caching etc.
The time it takes to repopulate the caches causes the error - if it's
slower than the commit rate, you'll get into the 'try again later'
spiral.

There are a number of ways to help mitigate this - NRT is the
certainly the [hopefullly near] future for this. Other strategies
include distributed search/cloud/ZK - splitting the index into logical
shards, so your commits and their associated caches are smaller and
more targeted. You can also use two Solr instances - one optimized for
writes/commits, one for reads, (write commits are async of the 'read'
instance), plus there are customized solutions like RankingAlgorithm,
Zoie etc.


On Sun, Aug 14, 2011 at 2:47 AM, Naveen Gupta  wrote:
> Hi,
>
> Most of the settings are default.
>
> We have single node (Memory 1 GB, Index Size 4GB)
>
> We have a requirement where we are doing very fast commit. This is kind of
> real time requirement where we are polling many threads from third party and
> indexes into our system.
>
> We want these results to be available soon.
>
> We are committing for each user (may have 10k threads and inside that 1
> thread may have 10 messages). So overall documents per user will be having
> around .1 million (10)
>
> Earlier we were using commit Within  as 10 milliseconds inside the document,
> but that was slowing the indexing and we were not getting any error.
>
> As we removed the commit Within, indexing became very fast. But after that
> we started experiencing in the system
>
> As i read many forums, everybody told that this is happening because of very
> fast commit rate, but what is the solution for our problem?
>
> We are using CURL to post the data and commit
>
> Also till now we are using default solrconfig.
>
> Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052)
>        at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>        at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
>        at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>        at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:662)
>


Re: solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-14 Thread Erik Hatcher
Does instantiating a Solr::Connection for each request make things better?

Erik

On Aug 14, 2011, at 11:34 , Ian Connor wrote:

> It is nothing special - just like this:
> 
>conn   = Solr::Connection.new("http://#{LOCAL_SHARD}";,
> {:timeout => 1000, :autocommit => :on})
>options[:shards] = HA_SHARDS
>response = conn.query(query, options)
> 
> Where LOCAL_SHARD points to a haproxy of a single shard and HA_SHARDS is an
> array of 18 shards (via haproxy).
> 
> Ian.
> 
> On Mon, Aug 8, 2011 at 12:50 PM, Erik Hatcher wrote:
> 
>> Ian -
>> 
>> What does your solr-ruby using code look like?
>> 
>> Solr::Connection is light-weight, so you could just construct a new one of
>> those for each request.  Are you keeping an instance around?
>> 
>> Erik
>> 
>> 
>> On Aug 8, 2011, at 12:03 , Ian Connor wrote:
>> 
>>> Hi,
>>> 
>>> I have seen some of these errors come through from time to time. It looks
>>> like:
>>> 
>>> /usr/lib/ruby/1.8/net/http.rb:1060:in
>>> `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post'
>>> 
>>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in
>>> `post'
>>> 
>>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in
>>> `send'
>>> 
>>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in
>>> `create_and_send_query'
>>> 
>>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in
>>> `query'
>>> 
>>> It is as if the http object has gone away. Would it be good to create a
>> new
>>> one inside of the connection or is something more serious going on?
>>> ubuntu 10.04
>>> passenger 3.0.8
>>> rails 2.3.11
>>> 
>>> --
>>> Regards,
>>> 
>>> Ian Connor
>> 
>> 
> 
> 
> -- 
> Regards,
> 
> Ian Connor
> 1 Leighton St #723
> Cambridge, MA 02141
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Fax: +1(770) 818 5697
> Skype: ian.connor



Re: Some questions about SolrJ

2011-08-14 Thread Shawn Heisey

On 8/13/2011 9:59 AM, Michael Sokolov wrote:


Shawn, my experience with SolrJ in that configuration (no autoCommit) 
is that you have control over commits: if you don't issue an explicit 
commit, it won't happen.  Re lifecycle: we don't use a static 
instance; rather our app maintains a small pool of 
CommonsHttpSolrServer instances that we re-use across requests.  I 
think that will be preferable since I don't think the underlying 
HttpClient is thread safe?
Hmm, I just checked and actually CommonsHttpSolrServer uses 
MultiThreadedHttpConnectionManager so it should be thread-safe, and OK 
to use a static instance as per documentation.  Sorry for the 
misinformation.


Thanks for the help!

I've been able to muddle my way through part of my implementation on my 
own.  There doesn't seem to be any way to point to the base /solr/ url 
and then ask SolrJ to add a core when creating requests.  I did see that 
you can set the URL for the server object after it's created, but if I 
ever make this thing multithreaded, I fear doing so will cause 
problems.  I'm going with one server object (solrServer) for CoreAdmin 
and another object (solrCore) for requests against the core.


This new build system has an object representing one complete index, 
which uses a container of seven objects representing each of the 
shards.  Each of the shard objects has two objects representing a build 
core and a live core.  Each of the core objects contains the solrServer 
and solrCore already mentioned.  Since I have two complete indexes, this 
means that the final product will initialize 56 server objects.


I couldn't use static server objects as recommended by the docs, because 
I have so many instances that all need different URLs.  They are private 
class members that get created only once, so I think it will be OK.  A 
static object would be a good idea for a search application, because it 
likely only needs to deal with one URL.  Our webapp developers told me 
that they will be putting the server object into a bean in the 
application context.


When I've got everything done and debugged, I will use what I've learned 
to augment the SolrJ wiki page.  Who is the best community person to 
coordinate with on that to make sure I put up good information?


Thanks,
Shawn



Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Mark Miller
It's somewhat confusing - I'll straighten it out though. I left the issue open 
to keep me from taking forever to doc it - hasn't helped much yet - but maybe 
later today...

On Aug 14, 2011, at 12:12 PM, Erick Erickson wrote:

> Ah, thanks, Mark... I must have been looking at the wrong JIRAs.
> 
> Erick
> 
> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller  wrote:
>> 
>> On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:
>> 
>>> You either have to go to near real time (NRT), which is under
>>> development, but not committed to trunk yet
>> 
>> NRT support is committed to trunk.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com










Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Naveen Gupta
Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins for
15,000 docs) per user. There are many paraller user committing to our SOLR.

So we removed the commitWithin, and hence performance was much much better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is submitted
for indexing.

The question here is what is difference between commitWithin and commit
(apart from the fact that commit takes memory and processes and additional
hardware usage)

Why we want it to be visible as soon as possible, since we are applying many
business rules on top of the results (older indexes as well as new one) and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then other
optimizations.

We will definitely try NRT. But please tell me other options which we can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick Erickson wrote:

> Ah, thanks, Mark... I must have been looking at the wrong JIRAs.
>
> Erick
>
> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller 
> wrote:
> >
> > On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:
> >
> >> You either have to go to near real time (NRT), which is under
> >> development, but not committed to trunk yet
> >
> > NRT support is committed to trunk.
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Erick Erickson
Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

Erick

On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller  wrote:
>
> On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:
>
>> You either have to go to near real time (NRT), which is under
>> development, but not committed to trunk yet
>
> NRT support is committed to trunk.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>


Re: paging size in SOLR

2011-08-14 Thread Erick Erickson
Yep.

ResultWindowSize in
>> solrconfig.xml
>>
>> Best
>> Erick
>>
>> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet  wrote:
>> > thanks erick ... that means it depends upon the memory allocated to the
>> JVM
>> > .
>> >
>> > going back queryCacheResults factor i have got this doubt ..
>> > say, i have got 10 threads with 10 different queries ..and each of them
>> in
>> > parallel are searching the same index with millions of docs in it
>> > (multisharded ) .
>> > now each of the queries have large number of results in it hence got to
>> page
>> > them all..
>> > which all thread's (query ) result-set will be cached ? so that
>> subsequent
>> > pages can be retrieved quickly ..?
>> >
>> > On 14 August 2011 17:40, Erick Erickson  wrote:
>> >
>> >> There isn't an "optimum" page size that I know of, it'll vary with lots
>> of
>> >> stuff, not the least of which is whatever servlet container limits there
>> >> are.
>> >>
>> >> But I suspect you can get quite a few (1000s) without
>> >> too much problem, and you can always use the JSON response
>> >> writer to pack in more pages with less overhead.
>> >>
>> >> You pretty much have to try it and see.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet 
>> wrote:
>> >> > speaking about pagesizes, what is the optimum page size that should be
>> >> > retrieved each time ??
>> >> > i understand it depends upon the data you are fetching back fromeach
>> hit
>> >> > document ... but lets say when ever a document is hit am fetching back
>> >> 100
>> >> > bytes worth data from each of those docs in indexes (along with solr
>> >> > response statements ) .
>> >> > this will make 100*x bytes worth data in each page if x is the page
>> size
>> >> ..
>> >> > what is the optimum value of this x that solr can return each time
>> >> without
>> >> > going into exceptions 
>> >> >
>> >> > On 13 August 2011 19:59, Erick Erickson 
>> wrote:
>> >> >
>> >> >> Jame:
>> >> >>
>> >> >> You control the number via settings in solrconfig.xml, so it's
>> >> >> up to you.
>> >> >>
>> >> >> Jonathan:
>> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is
>> really
>> >> >> about keeping a large sorted array in memory but at least you
>> only
>> >> >> pay it once per 10,000, rather than 100 times (assuming page size is
>> >> >> 100)...
>> >> >>
>> >> >> Best
>> >> >> Erick
>> >> >>
>> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
>> >> >> wrote:
>> >> >> > when you say queryResultCache, does it only cache n number of
>> result
>> >> for
>> >> >> the
>> >> >> > last one query or more than one queries?
>> >> >> >
>> >> >> >
>> >> >> > On 10 August 2011 20:14, simon  wrote:
>> >> >> >
>> >> >> >> Worth remembering there are some performance penalties with deep
>> >> >> >> paging, if you use the page-by-page approach. may not be too much
>> of
>> >> a
>> >> >> >> problem if you really are only looking to retrieve 10K docs.
>> >> >> >>
>> >> >> >> -Simon
>> >> >> >>
>> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >> >> >>  wrote:
>> >> >> >> > Well, if you really want to you can specify start=0 and
>> rows=1
>> >> and
>> >> >> >> > get them all back at once.
>> >> >> >> >
>> >> >> >> > You can do page-by-page by incrementing the "start" parameter as
>> >> you
>> >> >> >> > indicated.
>> >> >> >> >
>> >> >> >> > You can keep from re-executing the search by setting your
>> >> >> >> queryResultCache
>> >> >> >> > appropriately, but this affects all searches so might be an
>> issue.
>> >> >> >> >
>> >> >> >> > Best
>> >> >> >> > Erick
>> >> >> >> >
>> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <
>> jamevaa...@gmail.com
>> >> >
>> >> >> >> wrote:
>> >> >> >> >> hi,
>> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and
>> my
>> >> >> page
>> >> >> >> size
>> >> >> >> >> is 1000 .
>> >> >> >> >> how do i get back the data (pages) one after other ?do i have
>> to
>> >> >> >> increment
>> >> >> >> >> the "start" value each time by the page size from 0 and do the
>> >> >> iteration
>> >> >> >> ?
>> >> >> >> >> In this case am i querying the index 10 time instead of one or
>> >> after
>> >> >> >> first
>> >> >> >> >> query the result will be cached somewhere for the subsequent
>> pages
>> >> ?
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> JAME VAALET
>> >> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > -JAME
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > -JAME
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>


Re: solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-14 Thread Ian Connor
It is nothing special - just like this:

  conn   = Solr::Connection.new("http://#{LOCAL_SHARD}";,
   {:timeout => 1000, :autocommit => :on})
  options[:shards] = HA_SHARDS
  response = conn.query(query, options)

Where LOCAL_SHARD points to a haproxy of a single shard and HA_SHARDS is an
array of 18 shards (via haproxy).

Ian.

On Mon, Aug 8, 2011 at 12:50 PM, Erik Hatcher wrote:

> Ian -
>
> What does your solr-ruby using code look like?
>
> Solr::Connection is light-weight, so you could just construct a new one of
> those for each request.  Are you keeping an instance around?
>
>Erik
>
>
> On Aug 8, 2011, at 12:03 , Ian Connor wrote:
>
> > Hi,
> >
> > I have seen some of these errors come through from time to time. It looks
> > like:
> >
> > /usr/lib/ruby/1.8/net/http.rb:1060:in
> > `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post'
> >
> > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in
> > `post'
> >
> > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in
> > `send'
> >
> > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in
> > `create_and_send_query'
> >
> > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in
> > `query'
> >
> > It is as if the http object has gone away. Would it be good to create a
> new
> > one inside of the connection or is something more serious going on?
> > ubuntu 10.04
> > passenger 3.0.8
> > rails 2.3.11
> >
> > --
> > Regards,
> >
> > Ian Connor
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor


Re: Results Group-By using SolrJ

2011-08-14 Thread Omri Cohen
Thanks a lot!! Exactly what I was looking for..

That solved this!
On Sun, Aug 14, 2011 at 6:23 PM, Martijn v Groningen <
martijn.v.gronin...@gmail.com> wrote:

> Hi Omri,
>
> SOLR-2637 was concerned with adding grouped response parsing. There is no
> convenience method for grouping, but you can use the normal
> SolrQuery#set(...) methods to enable grouping.
> The following code should enable grouping via SolrJ api:
> SolrQuery query = new SolrQuery();
> query.set(GroupParams.GROUP, true);
> query.set(GroupParams.GROUP_FIELD, "your_field");
>
> Martijn
>
> On 14 August 2011 16:53, Omri Cohen  wrote:
>
> > Hi All,
> >
> > I am trying to group by results using SolrJ. According to
> > https://issues.apache.org/jira/browse/SOLR-2637 the feature was added,
> so
> > I
> > upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for
> > grouping in QueryResponse, which is getGroupResponse(). The only thing
> left
> > that I don't understand is where do I set on which field to group. There
> is
> > no method that looks like it does so on the SolrQuery object..
> >
> > Ideas anyone ?
> >
> > thanks,
> > Omri
> >
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>


Re: Results Group-By using SolrJ

2011-08-14 Thread Martijn v Groningen
Hi Omri,

SOLR-2637 was concerned with adding grouped response parsing. There is no
convenience method for grouping, but you can use the normal
SolrQuery#set(...) methods to enable grouping.
The following code should enable grouping via SolrJ api:
SolrQuery query = new SolrQuery();
query.set(GroupParams.GROUP, true);
query.set(GroupParams.GROUP_FIELD, "your_field");

Martijn

On 14 August 2011 16:53, Omri Cohen  wrote:

> Hi All,
>
> I am trying to group by results using SolrJ. According to
> https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, so
> I
> upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for
> grouping in QueryResponse, which is getGroupResponse(). The only thing left
> that I don't understand is where do I set on which field to group. There is
> no method that looks like it does so on the SolrQuery object..
>
> Ideas anyone ?
>
> thanks,
> Omri
>



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Nagendra Nagarajayya

Naveen:

You should try NRT with Apache Solr 3.3 and RankingAlgorithm. You can 
update 10,000 documents / sec while also concurrently searching. You can 
set commit  freq to about 15 mins or as desired. The 10,000 document 
update performance is with the MBArtists index on a dual core Linux 
system. So you may be able to see similar performance on your system. 
You can get more details of the NRT implementation from here:


http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x

You can download Apache Solr 3.3 with RankingAlgorithm from here:

http://solr-ra.tgels.org/

(There are no changes to your existing setup, everything should work as 
earlier except for adding the  tag to your solrconfig.xml)


Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 8/13/2011 6:47 PM, Naveen Gupta wrote:

Hi,

Most of the settings are default.

We have single node (Memory 1 GB, Index Size 4GB)

We have a requirement where we are doing very fast commit. This is kind of
real time requirement where we are polling many threads from third party and
indexes into our system.

We want these results to be available soon.

We are committing for each user (may have 10k threads and inside that 1
thread may have 10 messages). So overall documents per user will be having
around .1 million (10)

Earlier we were using commit Within  as 10 milliseconds inside the document,
but that was slowing the indexing and we were not getting any error.

As we removed the commit Within, indexing became very fast. But after that
we started experiencing in the system

As i read many forums, everybody told that this is happening because of very
fast commit rate, but what is the solution for our problem?

We are using CURL to post the data and commit

Also till now we are using default solrconfig.

Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
exceeded limit of maxWarmingSearchers=2, try again later.
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052)
 at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424)
 at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
 at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:662)





Results Group-By using SolrJ

2011-08-14 Thread Omri Cohen
Hi All,

I am trying to group by results using SolrJ. According to
https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, so I
upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for
grouping in QueryResponse, which is getGroupResponse(). The only thing left
that I don't understand is where do I set on which field to group. There is
no method that looks like it does so on the SolrQuery object..

Ideas anyone ?

thanks,
Omri


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Mark Miller

On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:

> You either have to go to near real time (NRT), which is under
> development, but not committed to trunk yet 

NRT support is committed to trunk.

- Mark Miller
lucidimagination.com










Re: paging size in SOLR

2011-08-14 Thread jame vaalet
my queryResultCache size =0  and queryResultWindowSize =50
does this mean that am not caching any results ?

On 14 August 2011 18:27, Erick Erickson  wrote:

> As many results will be cached as you ask. See solrconfig.xml,
> the queryResultCache. This cache is essentially a map of queries
> and result document IDs. The number of doc IDs cached for
> each query is controlled by queryResultWindowSize in
> solrconfig.xml
>
> Best
> Erick
>
> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet  wrote:
> > thanks erick ... that means it depends upon the memory allocated to the
> JVM
> > .
> >
> > going back queryCacheResults factor i have got this doubt ..
> > say, i have got 10 threads with 10 different queries ..and each of them
> in
> > parallel are searching the same index with millions of docs in it
> > (multisharded ) .
> > now each of the queries have large number of results in it hence got to
> page
> > them all..
> > which all thread's (query ) result-set will be cached ? so that
> subsequent
> > pages can be retrieved quickly ..?
> >
> > On 14 August 2011 17:40, Erick Erickson  wrote:
> >
> >> There isn't an "optimum" page size that I know of, it'll vary with lots
> of
> >> stuff, not the least of which is whatever servlet container limits there
> >> are.
> >>
> >> But I suspect you can get quite a few (1000s) without
> >> too much problem, and you can always use the JSON response
> >> writer to pack in more pages with less overhead.
> >>
> >> You pretty much have to try it and see.
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet 
> wrote:
> >> > speaking about pagesizes, what is the optimum page size that should be
> >> > retrieved each time ??
> >> > i understand it depends upon the data you are fetching back fromeach
> hit
> >> > document ... but lets say when ever a document is hit am fetching back
> >> 100
> >> > bytes worth data from each of those docs in indexes (along with solr
> >> > response statements ) .
> >> > this will make 100*x bytes worth data in each page if x is the page
> size
> >> ..
> >> > what is the optimum value of this x that solr can return each time
> >> without
> >> > going into exceptions 
> >> >
> >> > On 13 August 2011 19:59, Erick Erickson 
> wrote:
> >> >
> >> >> Jame:
> >> >>
> >> >> You control the number via settings in solrconfig.xml, so it's
> >> >> up to you.
> >> >>
> >> >> Jonathan:
> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is
> really
> >> >> about keeping a large sorted array in memory but at least you
> only
> >> >> pay it once per 10,000, rather than 100 times (assuming page size is
> >> >> 100)...
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
> >> >> wrote:
> >> >> > when you say queryResultCache, does it only cache n number of
> result
> >> for
> >> >> the
> >> >> > last one query or more than one queries?
> >> >> >
> >> >> >
> >> >> > On 10 August 2011 20:14, simon  wrote:
> >> >> >
> >> >> >> Worth remembering there are some performance penalties with deep
> >> >> >> paging, if you use the page-by-page approach. may not be too much
> of
> >> a
> >> >> >> problem if you really are only looking to retrieve 10K docs.
> >> >> >>
> >> >> >> -Simon
> >> >> >>
> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >> >> >>  wrote:
> >> >> >> > Well, if you really want to you can specify start=0 and
> rows=1
> >> and
> >> >> >> > get them all back at once.
> >> >> >> >
> >> >> >> > You can do page-by-page by incrementing the "start" parameter as
> >> you
> >> >> >> > indicated.
> >> >> >> >
> >> >> >> > You can keep from re-executing the search by setting your
> >> >> >> queryResultCache
> >> >> >> > appropriately, but this affects all searches so might be an
> issue.
> >> >> >> >
> >> >> >> > Best
> >> >> >> > Erick
> >> >> >> >
> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet <
> jamevaa...@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> >> hi,
> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and
> my
> >> >> page
> >> >> >> size
> >> >> >> >> is 1000 .
> >> >> >> >> how do i get back the data (pages) one after other ?do i have
> to
> >> >> >> increment
> >> >> >> >> the "start" value each time by the page size from 0 and do the
> >> >> iteration
> >> >> >> ?
> >> >> >> >> In this case am i querying the index 10 time instead of one or
> >> after
> >> >> >> first
> >> >> >> >> query the result will be cached somewhere for the subsequent
> pages
> >> ?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> JAME VAALET
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > -JAME
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > -JAME
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME


Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Erick Erickson
You either have to go to near real time (NRT), which is under
development, but not committed to trunk yet or just stop
warming up searchers and let the first user to open a searcher
pay the penalty for warmup, (useColdSearchers as I remember).

Although I'd also ask whether this is a reasonable requirement,
that the messages be searchable within milliseconds. Is 1 minute
really too much time? 5 minutes? You can estimate the minimum time
you can get away with by looking at the warmup times on the admin/stats
page.

Best
Erick

On Sat, Aug 13, 2011 at 9:47 PM, Naveen Gupta  wrote:
> Hi,
>
> Most of the settings are default.
>
> We have single node (Memory 1 GB, Index Size 4GB)
>
> We have a requirement where we are doing very fast commit. This is kind of
> real time requirement where we are polling many threads from third party and
> indexes into our system.
>
> We want these results to be available soon.
>
> We are committing for each user (may have 10k threads and inside that 1
> thread may have 10 messages). So overall documents per user will be having
> around .1 million (10)
>
> Earlier we were using commit Within  as 10 milliseconds inside the document,
> but that was slowing the indexing and we were not getting any error.
>
> As we removed the commit Within, indexing became very fast. But after that
> we started experiencing in the system
>
> As i read many forums, everybody told that this is happening because of very
> fast commit rate, but what is the solution for our problem?
>
> We are using CURL to post the data and commit
>
> Also till now we are using default solrconfig.
>
> Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052)
>        at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>        at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
>        at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>        at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:662)
>


Re: paging size in SOLR

2011-08-14 Thread Erick Erickson
As many results will be cached as you ask. See solrconfig.xml,
the queryResultCache. This cache is essentially a map of queries
and result document IDs. The number of doc IDs cached for
each query is controlled by queryResultWindowSize in
solrconfig.xml

Best
Erick

On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet  wrote:
> thanks erick ... that means it depends upon the memory allocated to the JVM
> .
>
> going back queryCacheResults factor i have got this doubt ..
> say, i have got 10 threads with 10 different queries ..and each of them in
> parallel are searching the same index with millions of docs in it
> (multisharded ) .
> now each of the queries have large number of results in it hence got to page
> them all..
> which all thread's (query ) result-set will be cached ? so that subsequent
> pages can be retrieved quickly ..?
>
> On 14 August 2011 17:40, Erick Erickson  wrote:
>
>> There isn't an "optimum" page size that I know of, it'll vary with lots of
>> stuff, not the least of which is whatever servlet container limits there
>> are.
>>
>> But I suspect you can get quite a few (1000s) without
>> too much problem, and you can always use the JSON response
>> writer to pack in more pages with less overhead.
>>
>> You pretty much have to try it and see.
>>
>> Best
>> Erick
>>
>> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet  wrote:
>> > speaking about pagesizes, what is the optimum page size that should be
>> > retrieved each time ??
>> > i understand it depends upon the data you are fetching back fromeach hit
>> > document ... but lets say when ever a document is hit am fetching back
>> 100
>> > bytes worth data from each of those docs in indexes (along with solr
>> > response statements ) .
>> > this will make 100*x bytes worth data in each page if x is the page size
>> ..
>> > what is the optimum value of this x that solr can return each time
>> without
>> > going into exceptions 
>> >
>> > On 13 August 2011 19:59, Erick Erickson  wrote:
>> >
>> >> Jame:
>> >>
>> >> You control the number via settings in solrconfig.xml, so it's
>> >> up to you.
>> >>
>> >> Jonathan:
>> >> Hmmm, that's seems right, after all the "deep paging" penalty is really
>> >> about keeping a large sorted array in memory but at least you only
>> >> pay it once per 10,000, rather than 100 times (assuming page size is
>> >> 100)...
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
>> >> wrote:
>> >> > when you say queryResultCache, does it only cache n number of result
>> for
>> >> the
>> >> > last one query or more than one queries?
>> >> >
>> >> >
>> >> > On 10 August 2011 20:14, simon  wrote:
>> >> >
>> >> >> Worth remembering there are some performance penalties with deep
>> >> >> paging, if you use the page-by-page approach. may not be too much of
>> a
>> >> >> problem if you really are only looking to retrieve 10K docs.
>> >> >>
>> >> >> -Simon
>> >> >>
>> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >> >>  wrote:
>> >> >> > Well, if you really want to you can specify start=0 and rows=1
>> and
>> >> >> > get them all back at once.
>> >> >> >
>> >> >> > You can do page-by-page by incrementing the "start" parameter as
>> you
>> >> >> > indicated.
>> >> >> >
>> >> >> > You can keep from re-executing the search by setting your
>> >> >> queryResultCache
>> >> >> > appropriately, but this affects all searches so might be an issue.
>> >> >> >
>> >> >> > Best
>> >> >> > Erick
>> >> >> >
>> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet > >
>> >> >> wrote:
>> >> >> >> hi,
>> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
>> >> page
>> >> >> size
>> >> >> >> is 1000 .
>> >> >> >> how do i get back the data (pages) one after other ?do i have to
>> >> >> increment
>> >> >> >> the "start" value each time by the page size from 0 and do the
>> >> iteration
>> >> >> ?
>> >> >> >> In this case am i querying the index 10 time instead of one or
>> after
>> >> >> first
>> >> >> >> query the result will be cached somewhere for the subsequent pages
>> ?
>> >> >> >>
>> >> >> >>
>> >> >> >> JAME VAALET
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > -JAME
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>


Re: paging size in SOLR

2011-08-14 Thread jame vaalet
thanks erick ... that means it depends upon the memory allocated to the JVM
.

going back queryCacheResults factor i have got this doubt ..
say, i have got 10 threads with 10 different queries ..and each of them in
parallel are searching the same index with millions of docs in it
(multisharded ) .
now each of the queries have large number of results in it hence got to page
them all..
which all thread's (query ) result-set will be cached ? so that subsequent
pages can be retrieved quickly ..?

On 14 August 2011 17:40, Erick Erickson  wrote:

> There isn't an "optimum" page size that I know of, it'll vary with lots of
> stuff, not the least of which is whatever servlet container limits there
> are.
>
> But I suspect you can get quite a few (1000s) without
> too much problem, and you can always use the JSON response
> writer to pack in more pages with less overhead.
>
> You pretty much have to try it and see.
>
> Best
> Erick
>
> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet  wrote:
> > speaking about pagesizes, what is the optimum page size that should be
> > retrieved each time ??
> > i understand it depends upon the data you are fetching back fromeach hit
> > document ... but lets say when ever a document is hit am fetching back
> 100
> > bytes worth data from each of those docs in indexes (along with solr
> > response statements ) .
> > this will make 100*x bytes worth data in each page if x is the page size
> ..
> > what is the optimum value of this x that solr can return each time
> without
> > going into exceptions 
> >
> > On 13 August 2011 19:59, Erick Erickson  wrote:
> >
> >> Jame:
> >>
> >> You control the number via settings in solrconfig.xml, so it's
> >> up to you.
> >>
> >> Jonathan:
> >> Hmmm, that's seems right, after all the "deep paging" penalty is really
> >> about keeping a large sorted array in memory but at least you only
> >> pay it once per 10,000, rather than 100 times (assuming page size is
> >> 100)...
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
> >> wrote:
> >> > when you say queryResultCache, does it only cache n number of result
> for
> >> the
> >> > last one query or more than one queries?
> >> >
> >> >
> >> > On 10 August 2011 20:14, simon  wrote:
> >> >
> >> >> Worth remembering there are some performance penalties with deep
> >> >> paging, if you use the page-by-page approach. may not be too much of
> a
> >> >> problem if you really are only looking to retrieve 10K docs.
> >> >>
> >> >> -Simon
> >> >>
> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >> >>  wrote:
> >> >> > Well, if you really want to you can specify start=0 and rows=1
> and
> >> >> > get them all back at once.
> >> >> >
> >> >> > You can do page-by-page by incrementing the "start" parameter as
> you
> >> >> > indicated.
> >> >> >
> >> >> > You can keep from re-executing the search by setting your
> >> >> queryResultCache
> >> >> > appropriately, but this affects all searches so might be an issue.
> >> >> >
> >> >> > Best
> >> >> > Erick
> >> >> >
> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet  >
> >> >> wrote:
> >> >> >> hi,
> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
> >> page
> >> >> size
> >> >> >> is 1000 .
> >> >> >> how do i get back the data (pages) one after other ?do i have to
> >> >> increment
> >> >> >> the "start" value each time by the page size from 0 and do the
> >> iteration
> >> >> ?
> >> >> >> In this case am i querying the index 10 time instead of one or
> after
> >> >> first
> >> >> >> query the result will be cached somewhere for the subsequent pages
> ?
> >> >> >>
> >> >> >>
> >> >> >> JAME VAALET
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > -JAME
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME


Re: Not update on duplicate key

2011-08-14 Thread Erik Hatcher
Though I think you could get the Dedupe feature to do this: 
http://wiki.apache.org/solr/Deduplication


On Aug 13, 2011, at 11:52 , Erick Erickson wrote:

> If you mean just throw the new document on the floor if
> the index already contains a document with that key, I don't
> think you can do that. You could write a custom updateHandler
> that checks first to see whether the particular uniqueKey is
> in the index I suppose...
> 
> Best
> Erick
> 
> On Fri, Aug 12, 2011 at 7:31 AM, Rohit  wrote:
>> Hi All,
>> 
>> 
>> 
>> Please correct  me if I am wrong, but when I am trying to insert a document
>> into Solr which was previously index, it overwrites the current key.
>> 
>> 
>> 
>> Is there a way to change the behaviour,
>> 
>> 
>> 
>> 1. I don't want Solr to override but on the other hand it should ignore the
>> entry
>> 
>> 2. Also, if I could change the behaviour on the fly, update based on a flag
>> and ignore on another flag.
>> 
>> 
>> 
>> 
>> 
>> Thanks and Regards,
>> 
>> Rohit
>> 
>> 
>> 
>> 



Re: paging size in SOLR

2011-08-14 Thread Erick Erickson
There isn't an "optimum" page size that I know of, it'll vary with lots of
stuff, not the least of which is whatever servlet container limits there are.

But I suspect you can get quite a few (1000s) without
too much problem, and you can always use the JSON response
writer to pack in more pages with less overhead.

You pretty much have to try it and see.

Best
Erick

On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet  wrote:
> speaking about pagesizes, what is the optimum page size that should be
> retrieved each time ??
> i understand it depends upon the data you are fetching back fromeach hit
> document ... but lets say when ever a document is hit am fetching back 100
> bytes worth data from each of those docs in indexes (along with solr
> response statements ) .
> this will make 100*x bytes worth data in each page if x is the page size ..
> what is the optimum value of this x that solr can return each time without
> going into exceptions 
>
> On 13 August 2011 19:59, Erick Erickson  wrote:
>
>> Jame:
>>
>> You control the number via settings in solrconfig.xml, so it's
>> up to you.
>>
>> Jonathan:
>> Hmmm, that's seems right, after all the "deep paging" penalty is really
>> about keeping a large sorted array in memory but at least you only
>> pay it once per 10,000, rather than 100 times (assuming page size is
>> 100)...
>>
>> Best
>> Erick
>>
>> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
>> wrote:
>> > when you say queryResultCache, does it only cache n number of result for
>> the
>> > last one query or more than one queries?
>> >
>> >
>> > On 10 August 2011 20:14, simon  wrote:
>> >
>> >> Worth remembering there are some performance penalties with deep
>> >> paging, if you use the page-by-page approach. may not be too much of a
>> >> problem if you really are only looking to retrieve 10K docs.
>> >>
>> >> -Simon
>> >>
>> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>> >>  wrote:
>> >> > Well, if you really want to you can specify start=0 and rows=1 and
>> >> > get them all back at once.
>> >> >
>> >> > You can do page-by-page by incrementing the "start" parameter as you
>> >> > indicated.
>> >> >
>> >> > You can keep from re-executing the search by setting your
>> >> queryResultCache
>> >> > appropriately, but this affects all searches so might be an issue.
>> >> >
>> >> > Best
>> >> > Erick
>> >> >
>> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
>> >> wrote:
>> >> >> hi,
>> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
>> page
>> >> size
>> >> >> is 1000 .
>> >> >> how do i get back the data (pages) one after other ?do i have to
>> >> increment
>> >> >> the "start" value each time by the page size from 0 and do the
>> iteration
>> >> ?
>> >> >> In this case am i querying the index 10 time instead of one or after
>> >> first
>> >> >> query the result will be cached somewhere for the subsequent pages ?
>> >> >>
>> >> >>
>> >> >> JAME VAALET
>> >> >>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > -JAME
>> >
>>
>
>
>
> --
>
> -JAME
>


Re: paging size in SOLR

2011-08-14 Thread jame vaalet
speaking about pagesizes, what is the optimum page size that should be
retrieved each time ??
i understand it depends upon the data you are fetching back fromeach hit
document ... but lets say when ever a document is hit am fetching back 100
bytes worth data from each of those docs in indexes (along with solr
response statements ) .
this will make 100*x bytes worth data in each page if x is the page size ..
what is the optimum value of this x that solr can return each time without
going into exceptions 

On 13 August 2011 19:59, Erick Erickson  wrote:

> Jame:
>
> You control the number via settings in solrconfig.xml, so it's
> up to you.
>
> Jonathan:
> Hmmm, that's seems right, after all the "deep paging" penalty is really
> about keeping a large sorted array in memory but at least you only
> pay it once per 10,000, rather than 100 times (assuming page size is
> 100)...
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
> wrote:
> > when you say queryResultCache, does it only cache n number of result for
> the
> > last one query or more than one queries?
> >
> >
> > On 10 August 2011 20:14, simon  wrote:
> >
> >> Worth remembering there are some performance penalties with deep
> >> paging, if you use the page-by-page approach. may not be too much of a
> >> problem if you really are only looking to retrieve 10K docs.
> >>
> >> -Simon
> >>
> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
> >>  wrote:
> >> > Well, if you really want to you can specify start=0 and rows=1 and
> >> > get them all back at once.
> >> >
> >> > You can do page-by-page by incrementing the "start" parameter as you
> >> > indicated.
> >> >
> >> > You can keep from re-executing the search by setting your
> >> queryResultCache
> >> > appropriately, but this affects all searches so might be an issue.
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
> >> wrote:
> >> >> hi,
> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my
> page
> >> size
> >> >> is 1000 .
> >> >> how do i get back the data (pages) one after other ?do i have to
> >> increment
> >> >> the "start" value each time by the page size from 0 and do the
> iteration
> >> ?
> >> >> In this case am i querying the index 10 time instead of one or after
> >> first
> >> >> query the result will be cached somewhere for the subsequent pages ?
> >> >>
> >> >>
> >> >> JAME VAALET
> >> >>
> >> >
> >>
> >
> >
> >
> > --
> >
> > -JAME
> >
>



-- 

-JAME