Re: change base Scoring Algorithms

2010-03-31 Thread MitchK

Before you do so, you should have a look at function queries. Take a
searchengine and look for some examples. I know there are some quite good
ones to influence scoring by the creation-date, so that newer documents were
scored higher than older ones. 

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/change-base-Scoring-Algorithms-tp687041p687722.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANN] Eclipse GIT plugin beta version released

2010-03-31 Thread Thomas Koch
GIT is one of the most popular distributed version control system. 
In the hope, that more Java developers may want to explore the world of easy 
branching, merging and patch management, I'd like to inform you, that a beta 
version of the upcoming Eclipse GIT plugin is available:

http://www.infoq.com/news/2010/03/egit-released
http://aniszczyk.org/2010/03/22/the-start-of-an-adventure-egitjgit-0-7-1/

Maybe, one day, some apache / hadoop projects will use GIT... :-)

(Yes, I know git.apache.org.)

Best regards,

Thomas Koch, http://www.koch.ro


Re: question about synonyms and response

2010-03-31 Thread MitchK

Reading the wiki, one can see that the synonyms are added to the query, when
synonym-expanding at querytime is true. That means instead of searching only
for nice you search for example for nice | pretty.

I suggest you to read the wiki, searching for synonymFilter, and consider
the noticed use-cases for your own schema.xml.

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/question-about-synonyms-and-response-tp686737p687878.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge

2010-03-31 Thread MitchK

Hi,

sorry, I have not much experiences in doing this with Solr, but my
data-config.xml looks like:

dataConfig
dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/db user=user password=...
batchSize=-1/
document

/document
/dataConfig

The db at the end of the url stands for the db you want to use. 

Perhaps this helps a little bit.

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p687887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search accross more than one field (dismax) ignored

2010-03-31 Thread MitchK

Hello community,

it seems like the query parser ignores any other field to search on. The
only one that is not ignored is the standardSearchField.

My search-url:
select/?q=videoqt=dismaxqf=titleMain^2.0+titleShort^5.3debugQuery=on

The parsedquerystring etc.:
--
str name=rawquerystringvideo/str
str name=querystringvideo/str
−
str name=parsedquery
+DisjunctionMaxQuery((titleMain:video^2.0)~0.01)
DisjunctionMaxQuery((titleMain:video^2.0)~0.01)
/str
−
str name=parsedquery_toString
+(titleMain:video^2.0)~0.01 (titleMain:video^2.0)~0.01
/str

--

My solrconfig for the dismax handler:

  requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
titleMain^2.5 titleShort^1.9 descriptionMain^2.0
descriptionExcerpt^1.5
 /str
 str name=pf
titleMain^2.0 titleShort^1.2 descriptionMain^1.2
descriptionExcerpt^1.1
 /str
 str name=fl
ID,title,score
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps10/int
 str name=q.alt*:*/str
  /lst

Every field mentioned above is set to indexed=true in the schema.xml.

Even when I do not query against the dismax-requestHandler, a search accross
more than one field seems to fail. 

Any suggestions where you would search for the error are welcome.

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Search-accross-more-than-one-field-dismax-ignored-tp687935p687935.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: Multiple QParserPlugins, Single RequestHandler

2010-03-31 Thread peter . sturge
Ahh, SearchComponent, that sounds like the one. I'll have a go with that  
and see how I get on. Hooking into log4j might be an option also.

Many thanks for pointing out the right direction.

Peter


On Mar 30, 2010 9:54pm, Erik Hatcher erik.hatc...@gmail.com wrote:



On Mar 30, 2010, at 2:43 PM, Peter S wrote:



I have an existing QParserPlugin subclass that does some tagging  
functionality (kind of a group alias thing). This is currently registered  
with the default queryHandler.




I want to add another, quite separate plugin that writes an audit of  
every query request that comes in.





Sounds like what you want is a SearchComponent, not a QParserPlugin.  
You'll have to plug it into each request handler in the config though.  
or...





Being able to track what has happened on a Solr instance in a  
non-repudiated fashion would be [hopefully] useful for others as well (eg  
if you're storing/accessing secure documents and need to know every time  
someone accesses something). I know there is some logging that tracks  
requests etc., but log files are difficult to secure in a  
forensically-legal way. Maybe whatever generates the log entries can be  
plugged into so that secure, 'tamper-proof' audit trails can be generated?





The logging is able to be hooked, so you could write your own log handler  
to write the events elsewhere. This is left as an exercise for the  
reader, since it will depend on which logging framework employed.





Erik






Shred queries on EmbeddedSolrServer

2010-03-31 Thread Claudio Atzori
In my application I need to create and destroy indexes via java code, so 
to bypass the http requests I'm using the EmbeddedSolrServer, and I am 
creating different SolrCore(s) one per every index I need.
Now the point is that a requirement of my application is the capability 
to perfom a query on a specific index, on a subset of indexes, or on 
every index.


I have been looking to the shred parameter:

http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=somehttp://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=some 
query...


...and ok, but my solr core instances doesn't expose an http interface, 
so how can I shred a query between all my solr cores?


Thanks in advance,
Claudio


Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-31 Thread Andrzej Bialecki

On 2010-03-31 06:14, Andy wrote:



--- On Tue, 3/30/10, Andrzej Bialeckia...@getopt.org  wrote:


From: Andrzej Bialeckia...@getopt.org
Subject: Re: SOLR-1316 How To Implement this autosuggest component ???
To: solr-user@lucene.apache.org
Date: Tuesday, March 30, 2010, 9:59 AM
On 2010-03-30 15:42, Robert Muir
wrote:

On Mon, Mar 29, 2010 at 11:34 PM, Andyangelf...@yahoo.com

wrote:



Reading through this thread and SOLR-1316, there

seems to be a lot of

different ways to implement auto-complete in Solr.

I've seen the mentions

of:

EdgeNGrams
TermsComponent
Faceting
TST
Patricia Tries
RadixTree
DAWG




Another idea is you can use the Automaton support in

the lucene flexible

indexing branch: to query the index directly with a

DFA that represents

whatever terms you want back.
The idea is that there really isn't much gain in

building a separate Pat,

Radix Tree, or DFA to do this when you can efficiently

intersect a DFA with

the existing terms dictionary.

I don't really understand what autosuggest needs to

do, but if you are doing

things like looking for mispellings you can easily

build a DFA that

recognizes terms within some short edit distance with

the support thats

there (the LevenshteinAutomata class), to quickly get

back candidates.


You can intersect/concatenate/union these DFAs with

prefix or suffix DFAs if

you want too, don't really understand what the

algorithm should do, but I'm

happy to try to help.



The problem is a bit more complicated. There are two
issues:

* simple term-level completion often produces wrong results
for
multi-term queries (which are usually rewritten as weak
phrase queries),

* the weights of suggestions should not correspond directly
to IDF in
the index - much better results can be obtained when they
correspond to
the frequency of terms/phrases in the query logs ...

TermsComponent and EdgeNGrams, while simple to use, suffer
from both issues.



Thanks.

I actually have 2 use cases for autosuggest:

1) The normal one - I want to suggest search terms to users after they've 
typed a few letters. Just like Google suggest. Looks like for this use case SOLR-1316 is 
the best option. Right?


Hopefully, yes - it depends on how you intend to populate the TST. If 
you populate it from the main index, then (unless you have indexed 
phrases) there won't be any benefit over the TermsComponent. It may be 
faster, but it will take more RAM. If you populate it from a list of 
top-N queries, then SOLR-1316 is the way to go.



2) I have a field city with values that are entered by users. When a user is 
entering his city, I want to make suggestion based on what cities have already been 
entered so far by other users -- in order to reduce chances of duplication. What method 
would you recommend for this use case?


If the city field is not analyzed then TermsComponent is easiest to 
use. If it is analyzed, but vast majority of cities are single terms, 
then TermsComponent is ok too. If you want to assign different 
priorities to suggestions (other than a simple IDF based priority), or 
have many city names consisting of multiple tokens, then use SOLR-1316.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Open Lucene indexs dynamically

2010-03-31 Thread Grant Ingersoll

On Mar 31, 2010, at 5:05 AM, Pierre FILSTROFF wrote:

 Hello !
 
 I have a question about a specific configuration of solR : I want to 
 configure solR to open Lucene indexs when it needs (also open indexs 
 dynamically), and close them after a determined duration. The goal of this 
 configuration is also to have a shorter initial loading of solR 
 (approximately 2 hours at this moment).

How big is your index?  And what kind of hardware are you running?  2 hours 
seems really long.

 Could solR execute this kind of behavior ? And what could be the 
 configuration ?
 
 Thank you!
 
 ++
 
 Pierre
 
 
 
 
 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



exclude words?

2010-03-31 Thread Sebastian Funk

Hey there,

I'm sure this easy a pretty easy thing, but I can't find the solution:
can I search for a text with one word (e.g. books) especially not in  
it? so solr returns all documents, that don't have books somewhere  
in them?


thanks for the help,
sebastian


Re: exclude words?

2010-03-31 Thread Siddhant Goel
I think you can use something like q=hello world -books. Should do.

On Wed, Mar 31, 2010 at 7:34 PM, Sebastian Funk
qbasti.f...@googlemail.comwrote:

 Hey there,

 I'm sure this easy a pretty easy thing, but I can't find the solution:
 can I search for a text with one word (e.g. books) especially not in it?
 so solr returns all documents, that don't have books somewhere in them?

 thanks for the help,
 sebastian




-- 
- Siddhant


Persistent /repliction remotely...

2010-03-31 Thread Peter Sturge
Hi,

Would anyone know if it is possible to persistently enable/disable
repliction remotely using Solrj (http api)?
i.e. If 
http://slave_host:port/solr/repliction?command=disablepollhttp://slave_host:port/solr/replication?command=disablepollis
issued, and slave_host is subsequently restarted, is there a way to
ensure it remembers it isn't supposed to be replic_ting.

Thanks,
Peter


Watch-words for the postmaster

2010-03-31 Thread Peter Sturge
Hi,

Sorry for putting this on the user list directly - there seems to be no
allowed response from the postmaster account.

I couldn't figure out why there was so little on the subject of replic_tion
on the user-list, until I posted a question about it.
I found that the email server's spam blocker doesn't like the word
replic_tion, as it thinks the sender is trying to sell dodgy wristware.

Remplikeytion is such a great feature in Solr, it seems a shame for the
forum to miss out on talking/learning more about it simply because of cheap
bling.
If the postmaster of this list could perhaps remove this and related stem
words from its block list, that would be great.


Experience with Solr and JVM heap sizes over 2 GB

2010-03-31 Thread Burton-West, Tom
Hello all,

We have been running a configuration in production with 3 solr instances under 
one  tomcat with 16GB allocated to the JVM.  (java -Xmx16384m -Xms16384m)  I 
just noticed the warning in the LucidWorks Certified Distribution Reference 
Guide that warns against using more than 2GB (see below).
Are other people using systems with over 2GB allocated to the JVM?

What steps can we take to determine if performance is being adversely affected 
by the large heap size?

“The larger the heap the longer it takes to do garbage collection. This can 
mean minor, random pauses or, in extreme cases, “freeze the world” pauses of a 
minute or more. As a practical matter, this can become a serious problem for 
heap sizes that exceed about two gigabytes, even if far more physical memory is 
available.”
http://www.lucidimagination.com/search/document/CDRG_ch08_8.4.1?q=memory%20caching

Tom Burton-West
--
lst name=jvm
str name=version14.2-b01/str
str name=nameJava HotSpot(TM) 64-Bit Server VM/str
int name=processors16/int
−
lst name=memory
str name=free2.3 GB/str
str name=total15.3 GB/str
str name=max15.3 GB/str
str name=used13.1 GB (%85.3)/str
/lst




Re: Watch-words for the postmaster

2010-03-31 Thread Yonik Seeley
Hmmm, I see plenty of stuff in the archives that mention replication.

Testing:
replica
replication
fake
fake watch
fake watch rolex
rolex replica
top quality replica watch brands
authentic

-Yonik
http://www.lucidimagination.com



On Wed, Mar 31, 2010 at 11:09 AM, Peter Sturge
peter.stu...@googlemail.com wrote:
 Hi,

 Sorry for putting this on the user list directly - there seems to be no
 allowed response from the postmaster account.

 I couldn't figure out why there was so little on the subject of replic_tion
 on the user-list, until I posted a question about it.
 I found that the email server's spam blocker doesn't like the word
 replic_tion, as it thinks the sender is trying to sell dodgy wristware.

 Remplikeytion is such a great feature in Solr, it seems a shame for the
 forum to miss out on talking/learning more about it simply because of cheap
 bling.
 If the postmaster of this list could perhaps remove this and related stem
 words from its block list, that would be great.



Re: Experience with Solr and JVM heap sizes over 2 GB

2010-03-31 Thread Glen Newton
I have used up to 27GB of heap with no issues, both SOLR and (just) Lucene.

-Glen Newton
http://zzzoot.blogspot.com/

On 31 March 2010 11:34, Burton-West, Tom tburt...@umich.edu wrote:
 Hello all,

 We have been running a configuration in production with 3 solr instances 
 under one  tomcat with 16GB allocated to the JVM.  (java -Xmx16384m 
 -Xms16384m)  I just noticed the warning in the LucidWorks Certified 
 Distribution Reference Guide that warns against using more than 2GB (see 
 below).
 Are other people using systems with over 2GB allocated to the JVM?

 What steps can we take to determine if performance is being adversely 
 affected by the large heap size?

 “The larger the heap the longer it takes to do garbage collection. This can 
 mean minor, random pauses or, in extreme cases, “freeze the world” pauses of 
 a minute or more. As a practical matter, this can become a serious problem 
 for heap sizes that exceed about two gigabytes, even if far more physical 
 memory is available.”
 http://www.lucidimagination.com/search/document/CDRG_ch08_8.4.1?q=memory%20caching

 Tom Burton-West
 --
 lst name=jvm
 str name=version14.2-b01/str
 str name=nameJava HotSpot(TM) 64-Bit Server VM/str
 int name=processors16/int
 −
 lst name=memory
 str name=free2.3 GB/str
 str name=total15.3 GB/str
 str name=max15.3 GB/str
 str name=used13.1 GB (%85.3)/str
 /lst






-- 

-


Re: Experience with Solr and JVM heap sizes over 2 GB

2010-03-31 Thread Yonik Seeley
On Wed, Mar 31, 2010 at 11:34 AM, Burton-West, Tom tburt...@umich.edu wrote:
 Hello all,

 We have been running a configuration in production with 3 solr instances 
 under one  tomcat with 16GB allocated to the JVM.  (java -Xmx16384m 
 -Xms16384m)  I just noticed the warning in the LucidWorks Certified 
 Distribution Reference Guide that warns against using more than 2GB (see 
 below).
 Are other people using systems with over 2GB allocated to the JVM?

Plenty of people.

People always want specific numbers for the general case (how many
documents, how large a heap, etc)... and those specific numbers are
always wrong for a good percent of the population and their specific
setups and needs :-)

In general, you don't want your heap larger than it needs to be - this
leaves more free RAM for the OS to cache important parts of the lucene
index files.

 What steps can we take to determine if performance is being adversely 
 affected by the large heap size?

If your query response latencies are acceptable, I wouldn't worry about it.
If they normally are, but sometimes aren't, then GC could be the
issue.  One way to investigate further is to use the -verbose:gc and
-XX:-PrintGC* options:
http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp

-Yonik
http://www.lucidimagination.com


Re: Delivery Status Notification (Failure)

2010-03-31 Thread Peter Sturge
Hi Yonik,

Well, I tried to reply to your email with all those spambot favourites, and
I got the mailer daemon error below. Maybe it likes 'private' domains like
LucidImagination better than googlemail...
To be honest, I'd much rather live with mispelling replucidate than put up
with spambots.

Thanks


On Wed, Mar 31, 2010 at 5:18 PM, Mail Delivery Subsystem 
mailer-dae...@googlemail.com wrote:

 Delivery to the following recipient failed permanently:

 solr-user@lucene.apache.org

 Technical details of permanent failure:
 Google tried to deliver your message, but it was rejected by the recipient
 domain. We recommend contacting the other email provider for further
 information about the cause of this error. The error that the other server
 returned was: 552 552 spam score (8.8) exceeded threshold (state 18).

 - Original message -

 MIME-Version: 1.0
 Received: by 10.216.176.72 with HTTP; Wed, 31 Mar 2010 09:18:11 -0700 (PDT)
 In-Reply-To: 
 y2gc68e39171003310838ja9ef7bf6g76b17634ec60b...@mail.gmail.com
 References: 
 r2s7cd732451003310809oa3915e87nb11383c7de363...@mail.gmail.com
 y2gc68e39171003310838ja9ef7bf6g76b17634ec60b...@mail.gmail.com
 Date: Wed, 31 Mar 2010 17:18:11 +0100
 Received: by 10.216.85.140 with SMTP id u12mr751266wee.78.1270052291422;
 Wed,
31 Mar 2010 09:18:11 -0700 (PDT)
 Message-ID: i2g7cd732451003310918qb0d547c0gbcc00d73c332...@mail.gmail.com
 
 Subject: Re: Watch-words for the postmaster
 From: Peter Sturge peter.stu...@googlemail.com
 To: solr-user@lucene.apache.org, yo...@lucidimagination.com
 Content-Type: multipart/alternative; boundary=0016e6db2ad6ad296e04831b1736

 Indeed. Must be me, then - and I don't even wear a watch...


 On Wed, Mar 31, 2010 at 4:38 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:

(...content removed to allow send...)


Query time only Ranges

2010-03-31 Thread abhatna...@vantage.com

Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some indexing requests to Solr fail

2010-03-31 Thread Jon Poulton
Hi there,
Thanks for the reply!

Our backend code is currently set to commit every time it sends over a  
batch of documents - so it depends on how big the batch is and how  
often edits occur - probably too often. I've looked at the code, and  
the SolrJ commit() method takes two parameters - one is called  
waitSearcher, and another waitFlush. They aren't really documented too  
well, but I assume that the waitSearcher bool (currently set to false)  
may be part of the problem.

I am considering removing the code that calls the commit() method  
altogether and relying on the settings for DirectUpdateHandler2 to  
determine when commits actually get done. That way we can tweak it on  
the Solr side without having to recompile and redeploy our main app  
(or by having to add new settings and code to handle them to our main  
app).

Out of curiosity; how are people doing optimize() calls? Are you doing  
them immediately after every commit(), or periodically as part of a job?

Jon

On 31 Mar 2010, at 05:11, Lance Norskog wrote:

 How often do you commit? New searchers are only created after a
 commit. You notice that handleCommit is in the stack trace :) This
 means that commits are happening too often for the amount of other
 traffic currently happening, and so it can't finishing creating the
 searcher before the next commit starts the next searcher.

 The service unavailable messages are roughly the same problem: these
 commits might be timing out because the other end is too busy doing
 commits.  You might try using autocommit instead: commits can happen
 every N documents, every T seconds, or both. This keeps the commit
 overhead to a controlled amount and commits should stay behind warming
 up previous searchers.

 On Tue, Mar 30, 2010 at 7:15 AM, Jon Poulton jon.poul...@vyre.com  
 wrote:
 Hi there,
 We have a setup in which our main application (running on a  
 separate Tomcat instance on the same machine) uses SolrJ calls to  
 an instance of Solr running on the same box. SolrJ is used both for  
 indexing and searching Solr. Searching seems to be working fine,  
 but quite frequently we see the following stack trace in our  
 application logs:

 org.apache.solr.common.SolrException: Service Unavailable
 Service Unavailable
 request: http://localhost:8070/solr/unify/update/javabin
  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request 
 (CommonsHttpSolrServer.java:424)
  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request 
 (CommonsHttpSolrServer.java:243)
  at  
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process 
 (AbstractUpdateRequest.java:105)
  at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java: 
 86)
  at vyre.content.rabida.index.RemoteIndexingThread.sendIndexRequest 
 (RemoteIndexingThread.java:283)
  at vyre.content.rabida.index.RemoteIndexingThread.commitBatch 
 (RemoteIndexingThread.java:195)
  at vyre.util.thread.AbstractBatchProcessor.commit 
 (AbstractBatchProcessor.java:93)
  at vyre.util.thread.AbstractBatchProcessor.run 
 (AbstractBatchProcessor.java:117)
  at java.lang.Thread.run(Thread.java:619)

 Looking in the Solr logs, there does not appear to be any problems.  
 The host and port number are correct, its just sometimes our  
 content gets indexed (visible in the solr logs), and sometimes it  
 doesn't (nothing visible in solr logs). I'm not sure what could be  
 causing this problem, but I can hazard a couple of guesses; is  
 there any upper llimit on the size of a javabin request, or any  
 point at which the service would decide that the POST was too  
 large? Has any one else encountered a similar problem?

 On a final note, scrolling back through the solr logs does reveal  
 the following:

 29-Mar-2010 17:05:25 org.apache.solr.core.SolrCore getSearcher
 WARNING: [unify] Error opening new searcher. exceeded limit of  
 maxWarmingSearchers=2, try again later.
 29-Mar-2010 17:05:25  
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {} 0 22
 29-Mar-2010 17:05:25 org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new  
 searcher. exceeded limit of maxWarmingSearchers=2, try again later.
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 
 1029)
   at org.apache.solr.update.DirectUpdateHandler2.commit 
 (DirectUpdateHandler2.java:418)
   at  
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit 
 (RunUpdateProcessorFactory.java:85)
   at org.apache.solr.handler.RequestHandlerUtils.handleCommit 
 (RequestHandlerUtils.java:107)
   at  
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody 
 (ContentStreamHandlerBase.java:48)
   at org.apache.solr.handler.RequestHandlerBase.handleRequest 
 (RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at org.apache.solr.servlet.SolrDispatchFilter.execute 
 (SolrDispatchFilter.java:338)
   at 

Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge

2010-03-31 Thread Silent Surfer
Hi Mitch,

The configuration that you have seems to be perfectly fine .
Could you please let us know what error you are seeing in the logs ?

Also, could you please confirm whether you have the 
mysql-connector-java-5.1.12-bin.jar under the lib folder ? 

Following is my configuration that I used and works perfectly fine
dataSource driver=com.mysql.jdbc.Driver autoCommit=true 
url=jdbc:mysql://localhost:3306/mysql user=username password=password /


Thanks,
sS


- Original Message 
From: MitchK mitc...@web.de
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 12:57:04 AM
Subject: Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge


Hi,

sorry, I have not much experiences in doing this with Solr, but my
data-config.xml looks like:

dataConfig
dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/db user=user password=...
batchSize=-1/
document

/document
/dataConfig

The db at the end of the url stands for the db you want to use. 

Perhaps this helps a little bit.

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p687887.html
Sent from the Solr - User mailing list archive at Nabble.com.



  



Re: Query time only Ranges

2010-03-31 Thread Silent Surfer
Hi Ankit,

Try the following approach.
create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Solr will automatically will take care of Rounding off to the HOUR specified.

For eg:
the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Regards,
sS


- Original Message 
From: abhatna...@vantage.com abhatna...@vantage.com
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 9:56:38 AM
Subject: Query time only Ranges


Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.



  



Re: Query time only Ranges

2010-03-31 Thread Silent Surfer
Small typo..Corrected and sending..

the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z]


Thx,
Tiru


- Original Message 
From: Silent Surfer silentsurfe...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 12:36:22 PM
Subject: Re: Query time only Ranges

Hi Ankit,

Try the following approach.
create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Solr will automatically will take care of Rounding off to the HOUR specified.

For eg:
the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Regards,
sS


- Original Message 
From: abhatna...@vantage.com abhatna...@vantage.com
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 9:56:38 AM
Subject: Query time only Ranges


Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.


  


Re: Spatial / Local Solr radius

2010-03-31 Thread Michael
Mauricio,

I hooked up the spatial solr plugin to the Eclipse debugger and
narrowed the problem down to CartesianShapeFilter.getBoxShape(). The
algorithm used in the method can produce values of startX that are
greater than endX depending on the tier level returned by
CartesianTierPlotter.bestFit().  In this case, the for loop is
skipped altogether and the method returns a CartesianShape object with
an empty boxIds list.

I notice the problem when I have small, geographically sparse datasets.

I'm going to shoot the jteam an email regarding this.

Michael D.


On Tue, Mar 30, 2010 at 5:10 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
 Hi Michael

 I exchanged a few mails with jteam, ultimately I realized my longitudes'
 signs were inverted so I was mapping to China instead of U.S. Still a bug,
 but inverting those longitudes fixed the problem in my case since I'm not
 running world-wide searches.
 Before that I ran a test to determine  what radii failed for a grid of 3x3
 lat/long with radius between 10 and 2500, if you're interested I can send
 you the results to compare.
 Also I'm running RC3, I see RC4 is out but haven't tried it.
 It would be interesting to see if this happens with the new spatial
 functions in trunk.

 --
 Mauricio


 On Tue, Mar 30, 2010 at 4:00 PM, Michael solrco...@gmail.com wrote:

 Mauricio,

 I was wondering whether you had heard anything back from jteam
 regarding this issue. I have also noticed it and was wondering why It
 was happening.

 One thing I noticed is that this problem only appears for sparse
 datasets as compared to dense ones. For example, I have two datasets
 I've been testing with - one with 56 U.S. cities (the sparse set)
 and one with over 197000 towns and cities (the dense set). The dense
 set exhibited no problems with consistency searching at various radii,
 but the sparse set exhibited the same issues you experienced.

 Michael D.

 On Mon, Dec 28, 2009 at 7:39 PM, Mauricio Scheffer
 mauricioschef...@gmail.com wrote:
  It's jteam's plugin ( http://www.jteam.nl/news/spatialsolr ) which AFAIK
 is
  just the latest patch for SOLR-773 packaged as a stand-alone plugin.
 
  I'll try to contact jteam directly.
 
  Thanks
  Mauricio
 
  On Mon, Dec 28, 2009 at 8:02 PM, Grant Ingersoll gsing...@apache.org
 wrote:
 
 
  On Dec 28, 2009, at 11:47 AM, Mauricio Scheffer wrote:
 
   q={!spatial lat=43.705 long=116.3635 radius=100}*:*
 
  What's QParser is the spatial plugin? I don't know of any such QParser
 in
  Solr.  Is this a third party tool?  If so, I'd suggest asking on that
 list.
 
  
   with no other parameters.
   When changing the radius to 250 I get no results.
  
   In my config I have startTier = 9 and endTier = 17 (default values)
  
  
   On Mon, Dec 28, 2009 at 1:24 PM, Grant Ingersoll gsi...@gmail.com
  wrote:
  
   What do your queries look like?
  
   On Dec 28, 2009, at 9:30 AM, Mauricio Scheffer wrote:
  
   Hi everyone,
   I'm getting inconsistent behavior from Spatial Solr when searching
 with
   different radii. For the same lat/long I get:
  
   radius=1 - 1 result
   radius=10 - 0 result
   radius=25 - 2 results
   radius=100 - 2 results
   radius=250 - 0 results
  
   I don't understand why radius=10 and 250 return no results. Is this
 a
   known
   bug? I'm using the default configuration as specified in the PDF.
   BTW I also tried LocalSolr with the same results.
  
   Thanks
   Mauricio
  
  
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 
 




Re: Spatial / Local Solr radius

2010-03-31 Thread Mccleese, Sean W (388A)
Michael,

This was a problem I encountered as well, sometime late summer last year. My 
memory is a bit hazy on the details, but as far as I remember the problem 
centered around the tier level being set incorrectly. Additionally, I think 
there's a JUnit test (perhaps CartesianShapeFilterTest?) that would indicate 
the source of the problem but large sections of the test are 
invalidated/commented out for the spatial change(s).

Again, I haven't touched this code in several months but that's my recollection 
on the issue. Either way, it's certainly not an isolated problem, though my 
test datasets were also sparse and geographically distant.

-Sean

-- Forwarded Message
From: Michael solrco...@gmail.com
Reply-To: solr-user@lucene.apache.org
Date: Wed, 31 Mar 2010 13:33:39 -0700
To: solr-user@lucene.apache.org
Subject: Re: Spatial / Local Solr radius

Mauricio,

I hooked up the spatial solr plugin to the Eclipse debugger and
narrowed the problem down to CartesianShapeFilter.getBoxShape(). The
algorithm used in the method can produce values of startX that are
greater than endX depending on the tier level returned by
CartesianTierPlotter.bestFit().  In this case, the for loop is
skipped altogether and the method returns a CartesianShape object with
an empty boxIds list.

I notice the problem when I have small, geographically sparse datasets.

I'm going to shoot the jteam an email regarding this.

Michael D.


On Tue, Mar 30, 2010 at 5:10 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
 Hi Michael

 I exchanged a few mails with jteam, ultimately I realized my longitudes'
 signs were inverted so I was mapping to China instead of U.S. Still a bug,
 but inverting those longitudes fixed the problem in my case since I'm not
 running world-wide searches.
 Before that I ran a test to determine  what radii failed for a grid of 3x3
 lat/long with radius between 10 and 2500, if you're interested I can send
 you the results to compare.
 Also I'm running RC3, I see RC4 is out but haven't tried it.
 It would be interesting to see if this happens with the new spatial
 functions in trunk.

 --
 Mauricio


 On Tue, Mar 30, 2010 at 4:00 PM, Michael solrco...@gmail.com wrote:

 Mauricio,

 I was wondering whether you had heard anything back from jteam
 regarding this issue. I have also noticed it and was wondering why It
 was happening.

 One thing I noticed is that this problem only appears for sparse
 datasets as compared to dense ones. For example, I have two datasets
 I've been testing with - one with 56 U.S. cities (the sparse set)
 and one with over 197000 towns and cities (the dense set). The dense
 set exhibited no problems with consistency searching at various radii,
 but the sparse set exhibited the same issues you experienced.

 Michael D.

 On Mon, Dec 28, 2009 at 7:39 PM, Mauricio Scheffer
 mauricioschef...@gmail.com wrote:
  It's jteam's plugin ( http://www.jteam.nl/news/spatialsolr ) which AFAIK
 is
  just the latest patch for SOLR-773 packaged as a stand-alone plugin.
 
  I'll try to contact jteam directly.
 
  Thanks
  Mauricio
 
  On Mon, Dec 28, 2009 at 8:02 PM, Grant Ingersoll gsing...@apache.org
 wrote:
 
 
  On Dec 28, 2009, at 11:47 AM, Mauricio Scheffer wrote:
 
   q={!spatial lat=43.705 long=116.3635 radius=100}*:*
 
  What's QParser is the spatial plugin? I don't know of any such QParser
 in
  Solr.  Is this a third party tool?  If so, I'd suggest asking on that
 list.
 
  
   with no other parameters.
   When changing the radius to 250 I get no results.
  
   In my config I have startTier = 9 and endTier = 17 (default values)
  
  
   On Mon, Dec 28, 2009 at 1:24 PM, Grant Ingersoll gsi...@gmail.com
  wrote:
  
   What do your queries look like?
  
   On Dec 28, 2009, at 9:30 AM, Mauricio Scheffer wrote:
  
   Hi everyone,
   I'm getting inconsistent behavior from Spatial Solr when searching
 with
   different radii. For the same lat/long I get:
  
   radius=1 - 1 result
   radius=10 - 0 result
   radius=25 - 2 results
   radius=100 - 2 results
   radius=250 - 0 results
  
   I don't understand why radius=10 and 250 return no results. Is this
 a
   known
   bug? I'm using the default configuration as specified in the PDF.
   BTW I also tried LocalSolr with the same results.
  
   Thanks
   Mauricio
  
  
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 
 




-- End of Forwarded Message



Re: Query time only Ranges

2010-03-31 Thread Shashi Kant
In that case, you could just calculate an offset from 00:00:00 in
seconds (ignore the date)
Pretty simple.


On Wed, Mar 31, 2010 at 4:57 PM, abhatna...@vantage.com
abhatna...@vantage.com wrote:

 Hi Sashi,
 Could you elaborate point no .1 in the light of case where in a field should
 have just time?


 Ankit


 --
 View this message in context: 
 http://n3.nabble.com/Query-time-only-Ranges-tp688831p689413.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge

2010-03-31 Thread MitchK

I can only speculate, since I am not sure, why you are using { and } in
your declarations. I don't really know what you are expecting {MetaMatrix
ODBC} should do.

The mysql-connector can be loaded, because I have set a classpath to it (it
is stored in my JRE's root-directory).

Hope this helps?

Mitch
-- 
View this message in context: 
http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p689549.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: resetting stats

2010-03-31 Thread Trey Grainger
: reloading the core just to reset the stats definitely seems like throwing
: out the baby with the bathwater.

Agreed about throwing out the baby with the bath water - if stats need to be
reset, though, then that's the only way today.  A reset stats button would
be a nice way to prevent having to do this.

: Huh? ... how would having an extra core (with no data) help you with
: getting aggregate stats from your request handlers?

Say I have 3 Cores names core0, core1, and core2, where only core1 and core2
have documents and caches.  If all my searches hit core0, and core0 shards
out to core1 and core2, then the stats from core0 would be accurate for
errors, timeouts, totalTime, avgTimePerRequest, avgRequestsPerSecond, etc.
Obviously this is based upon the following two assumptions: 1) The request
handlers you are using/monitoring are distributed aware, and 2) you are
using distributed search and all your queries are going to an aggregating
core.

I'm not suggesting that anyone needs a setup like this, just pointing out
that this type of setup somewhat avoids throwing the baby out with the bath
water by not putting a baby in the bath water that is going to be thrown out
(core0).


On Wed, Mar 31, 2010 at 6:40 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : You can reload the core on which you want to reset the stats - this lets
 you
 : keep the engine up and running without requiring you restart Solr.  If
 you

 reloading the core just to reset the stats definitely seems like throwing
 out the baby with the bathwater.

 : have an separate core for aggregating (i.e. a core that contains no data
 and
 : has no caches) then the overhead for reloading that core is negligable
 and
 : the time to reload is essentially zero.

 Huh? ... how would having an extra core (with no data) help you with
 getting aggregate stats from your request handlers?  If you want to know
 the avgTImePerRequest from handlerA, that numberisn't going to be useful
 if it comes from a core that isn't what your users are querying
 against

 :  : Is there a way to reset the stats counters? For example in the Query
 :  handler
 :  : avgTimePerRequest is not much use after a while as it is an avg since
 the
 :  : server started.


 -Hoss




Re: resetting stats

2010-03-31 Thread Chris Hostetter

: Say I have 3 Cores names core0, core1, and core2, where only core1 and core2
: have documents and caches.  If all my searches hit core0, and core0 shards
: out to core1 and core2, then the stats from core0 would be accurate for
: errors, timeouts, totalTime, avgTimePerRequest, avgRequestsPerSecond, etc.

Ahhh yes. (i see what you mean by aggregating core now ... i thought 
you ment a core just for aggregatign stats)

*If* you are using distributed search, then you can gather stats from the 
core you use for collating/aggregating from the other shards, and 
reloading that core should be cheap.

but if you aren't already using distributed searching, it would be a bad 
idea from a performance standpoint to add it just to take advantage of 
being able to reload the coordinator core (the overhead of searching one 
distributed shard vs doing the same query directly is usually very 
measurable, even on if the shard is the same Solr instance as your 
coordinator)



-Hoss



Re: exclude words?

2010-03-31 Thread Chris Hostetter

: I think you can use something like q=hello world -books. Should do.

or just q=-books ... finds all docs that do not have books (in the 
default search field)

:  so solr returns all documents, that don't have books somewhere in them?

somewhere is kinda vague ... if you mean don't have the word 'books' in 
any field then not unless you use copyField to create a catchall field 
you can query against containing all the text.




-Hoss



Re: Solr crashing while extracting from very simple text file

2010-03-31 Thread Ross
Does anyone have any thoughts or suggestions on this?  I guess it's
really a Tika problem. Should I try to report it to the Tika project?

I wonder if someone could try it to see if it's a general problem or
just me. I can reproduce it by firing up the nano editor, creating a
file with XXBLE on one line and nothing else. Try indexing that and
Solr / Tika crashes. I can avoid it by editing the file slightly but I
haven't really been able to discover a consistent pattern. It works if
I change the word to lower case. Also a three line file like this
works

a
a
XXBLE

but not

x
x
XXBLE

It's a bit unfortunate because a similar word (a person's name ??BLE )
with the same problem appears frequently in upper case near the top of
my files.

Cheers
Ross


On Sun, Mar 21, 2010 at 12:58 PM, Ross tetr...@gmail.com wrote:
 Hi all

 I'm trying to import some text files. I'm mostly following Avi
 Rappoport's tutorial.  Some of my files cause Solr to crash while
 indexing. I've narrowed it down to a very simple example.

 I have a file named test.txt with one line. That line is the word
 XXBLE and nothing else

 This is the command I'm using.

 curl 
 http://localhost:8080/solr-example/update/extract?literal.id=1commit=true;
 -F myfi...@test.txt

 The result is pasted below. Other files work just fine. The problem
 seems to be related to the letters B and E. If I change them to
 something else or make them lower case then it works. In my real
 files, the XX is something else but the result is the same. It's a
 common word in the files. I guess for this quick and dirty job I'm
 doing I could do a bulk replace in the files to make it lower case.

 Is there any workaround for this?

 Thanks
 Ross

 htmlheadtitleApache Tomcat/6.0.20 - Error
 report/titlestyle!--H1
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
 H3 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
 B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
 P 
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color :
 #525D76;}--/style /headbodyh1HTTP Status 500 -
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from org.apache.tika.parser.txt.txtpar...@19ccba

 org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from org.apache.tika.parser.txt.txtpar...@19ccba
        at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
        at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:636)
 Caused by: org.apache.tika.exception.TikaException: Unexpected
 RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba
        at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
        at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
        ... 18 more
 Caused by: java.lang.NullPointerException
        at 

Re: Shred queries on EmbeddedSolrServer

2010-03-31 Thread Lance Norskog
You can create and destroy cores over the HTTP interface:

http://www.lucidimagination.com/search/document/CDRG_ch08_8.2.5

But you are right, the Embedded Solr API does not support Distributed
Search across multiple cores. See:

org.apache.solr.handler.component.SearchHandler.submit()  which very
definitly only does HTTP requests.

https://issues.apache.org/jira/browse/SOLR-1858 requests this feature.

On Wed, Mar 31, 2010 at 3:51 AM, Claudio Atzori
claudio.atz...@isti.cnr.it wrote:
 In my application I need to create and destroy indexes via java code, so to
 bypass the http requests I'm using the EmbeddedSolrServer, and I am creating
 different SolrCore(s) one per every index I need.
 Now the point is that a requirement of my application is the capability to
 perfom a query on a specific index, on a subset of indexes, or on every
 index.

 I have been looking to the shred parameter:

 http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=somehttp://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=some
 query...

 ...and ok, but my solr core instances doesn't expose an http interface, so
 how can I shred a query between all my solr cores?

 Thanks in advance,
 Claudio




-- 
Lance Norskog
goks...@gmail.com


Re: Some indexing requests to Solr fail

2010-03-31 Thread Lance Norskog
'waitFlush' means 'wait until the data from this commit is completely
written to disk'.  'waitSearcher' means 'wait until Solr has
completely finished loading up the new index from what it wrote to
disk'.

Optimize rearranges the entire disk footprint of the disk. It needs a
separate amount of free disk space in the same partition. Usually
people run optimize overnight, not during active production hours.
There is a way to limit the optimize pass so that it makes the index
'more optimized': the maxSegments parameter:

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22

On Wed, Mar 31, 2010 at 10:04 AM, Jon Poulton jon.poul...@vyre.com wrote:
 Hi there,
 Thanks for the reply!

 Our backend code is currently set to commit every time it sends over a
 batch of documents - so it depends on how big the batch is and how
 often edits occur - probably too often. I've looked at the code, and
 the SolrJ commit() method takes two parameters - one is called
 waitSearcher, and another waitFlush. They aren't really documented too
 well, but I assume that the waitSearcher bool (currently set to false)
 may be part of the problem.

 I am considering removing the code that calls the commit() method
 altogether and relying on the settings for DirectUpdateHandler2 to
 determine when commits actually get done. That way we can tweak it on
 the Solr side without having to recompile and redeploy our main app
 (or by having to add new settings and code to handle them to our main
 app).

 Out of curiosity; how are people doing optimize() calls? Are you doing
 them immediately after every commit(), or periodically as part of a job?

 Jon

 On 31 Mar 2010, at 05:11, Lance Norskog wrote:

 How often do you commit? New searchers are only created after a
 commit. You notice that handleCommit is in the stack trace :) This
 means that commits are happening too often for the amount of other
 traffic currently happening, and so it can't finishing creating the
 searcher before the next commit starts the next searcher.

 The service unavailable messages are roughly the same problem: these
 commits might be timing out because the other end is too busy doing
 commits.  You might try using autocommit instead: commits can happen
 every N documents, every T seconds, or both. This keeps the commit
 overhead to a controlled amount and commits should stay behind warming
 up previous searchers.

 On Tue, Mar 30, 2010 at 7:15 AM, Jon Poulton jon.poul...@vyre.com
 wrote:
 Hi there,
 We have a setup in which our main application (running on a
 separate Tomcat instance on the same machine) uses SolrJ calls to
 an instance of Solr running on the same box. SolrJ is used both for
 indexing and searching Solr. Searching seems to be working fine,
 but quite frequently we see the following stack trace in our
 application logs:

 org.apache.solr.common.SolrException: Service Unavailable
 Service Unavailable
 request: http://localhost:8070/solr/unify/update/javabin
  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request
 (CommonsHttpSolrServer.java:424)
  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request
 (CommonsHttpSolrServer.java:243)
  at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process
 (AbstractUpdateRequest.java:105)
  at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:
 86)
  at vyre.content.rabida.index.RemoteIndexingThread.sendIndexRequest
 (RemoteIndexingThread.java:283)
  at vyre.content.rabida.index.RemoteIndexingThread.commitBatch
 (RemoteIndexingThread.java:195)
  at vyre.util.thread.AbstractBatchProcessor.commit
 (AbstractBatchProcessor.java:93)
  at vyre.util.thread.AbstractBatchProcessor.run
 (AbstractBatchProcessor.java:117)
  at java.lang.Thread.run(Thread.java:619)

 Looking in the Solr logs, there does not appear to be any problems.
 The host and port number are correct, its just sometimes our
 content gets indexed (visible in the solr logs), and sometimes it
 doesn't (nothing visible in solr logs). I'm not sure what could be
 causing this problem, but I can hazard a couple of guesses; is
 there any upper llimit on the size of a javabin request, or any
 point at which the service would decide that the POST was too
 large? Has any one else encountered a similar problem?

 On a final note, scrolling back through the solr logs does reveal
 the following:

 29-Mar-2010 17:05:25 org.apache.solr.core.SolrCore getSearcher
 WARNING: [unify] Error opening new searcher. exceeded limit of
 maxWarmingSearchers=2, try again later.
 29-Mar-2010 17:05:25
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {} 0 22
 29-Mar-2010 17:05:25 org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new
 searcher. exceeded limit of maxWarmingSearchers=2, try again later.
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:
 1029)
       at 

backup command

2010-03-31 Thread Jake Brownell
Hi,

I'm running the official Solr 1.4 release and encountering an exception and 
told that a file does not exist when using the java replication command=backup. 
It looks very much like SOLR-1475 which was fixed for 1.4. I tried adding a 
deletionPolicy within solrconfig.xml to keep commit points for 30 minutes, but 
still receive the error. Our index is about 25G. On occasion I have seen the 
backup finish, but unfortunately it fails more often.

Does anyone have any pointers?

Thanks for your help,
Jake