Re: Replication and querying

2010-02-10 Thread Julian Hille
Hi, its would be possible to add that to the main solr but the problem is: Lets face it (example): We have kind of 1.5 million documents in the solr master. These Documents are books. These books have fields like title, ids, numbers and authors and more. This solr is global. Now: The slave solr

Re: after flush: fdx size mismatch on query durring writes

2010-02-10 Thread Michael McCandless
Yes, more details would be great... Is this easily repeated? The exists?=false is particularly spooky. It means, somehow, a new segment was being flushed, containing 1285 docs, but then after closing the doc stores, the stored fields index file (_X.fdx) had been deleted. Can you turn on

Solr-JMX/Jetty agentId

2010-02-10 Thread Jan Simon Winkelmann
Hi, I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty MBeanServer. Using jmx

spellcheck

2010-02-10 Thread michaelnazaruk
Hello,all! I have some problem with spellcheck! I download,build and connect dictionary(~500 000 words)!It work fine! But i have suggestions for any word (even correct word)! Is there possible to get suggestion only for wrong word? -- View this message in context:

How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Hi at all, I'm working with Solr1.4 and came across the point, that Solr limits the number of documents retrieved by a solr response. This number can be changed by the common query parameter 'rows'. In my scenario it is very important that the response contains ALL documents in the index! I

Re: Solr-JMX/Jetty agentId

2010-02-10 Thread Tim Terlegård
2010/2/10 Jan Simon Winkelmann winkelm...@newsfactory.de: I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty

RE: analysing wild carded terms

2010-02-10 Thread Fuad Efendi
hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis? I believe it is illogical. wildcarded terms will go through terms enumerator.

Getting max/min dates from solr index

2010-02-10 Thread Mark N
How can we get the max and min date from the Solr index ? I would need these dates to draw a graph ( for example timeline graph ) Also can we use date faceting to show how many documents are indexed every month . Consider I need to draw a timeline graph for current year to show how many records

RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
I was just thinking along similar lines As far as I can tell you can use the parameters start rows in combination to control the retrieval of query results So http://host:port/solr/select/?q=query Will retrieve up to results 1..10 http://host:port/solr/select/?q=querystart=11rows=10 Will

Cannot get like exact searching to work

2010-02-10 Thread Aaron Zeckoski
I am using SOLR 1.3 and my server is embedded and accessed using SOLRJ. I would like to setup my searches so that exact matches are the first results returned, followed by near matches, and finally token based matches. For example, if I have a summary field in schema which is created using

Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Hi Stefan, you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters. But as I don't use Solr in a web

RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards

Re: How to not limit maximum number of documents?

2010-02-10 Thread Ron Chan
just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: stefan maric stefan.ma...@bt.com To:

AW: Solr-JMX/Jetty agentId

2010-02-10 Thread Jan Simon Winkelmann
2010/2/10 Jan Simon Winkelmann winkelm...@newsfactory.de: I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty

Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler. Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels

RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
Yes, I tried the q=queryrows=-1 - the other day and gave up But as you say it wouldn't help because you might get a) timeouts because you have to wait a 'long' time for the large set of results to be returned b) exceptions being thrown because you're retrieving too much info to be thrown

Re: How to not limit maximum number of documents?

2010-02-10 Thread Walter Underwood
Solr will not do this efficiently. Getting all rows will be very slow. Adding a parameter will not make it fast. Why do you want to do this? wunder On Feb 10, 2010, at 7:06 AM, ego...@gmx.de wrote: Setting the 'rows' parameter to a number larger than the number of documents available

Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Okay. So we have to leave this question open for now. There might be other (more advanced) users that can answer this question. It's for sure, the solution we found is not quite good. In the meantime, I will look for a way to submit a feature request. :) Original-Message

Re: How to not limit maximum number of documents?

2010-02-10 Thread Ron Chan
I meant, available in total, not what just what satisfies the particular query you should have at least an estimate of the amount of total documents, even if it grows daily and if you are talking about millions of rows, and you are try to retrieve them all, IMHO, not getting all of them will

delete via DIH

2010-02-10 Thread Lukas Kahwe Smith
Hi, There is a solution to update via DIH, but is there also a way to define a query that fetches id's for documents that should be removed? regards, Lukas Kahwe Smith m...@pooteeweet.org

question/suggestion for Solr-236 patch

2010-02-10 Thread gdeconto
I have been able to apply and use the solr-236 patch (field collapsing) successfully. Very, very cool and powerful. My one comment/concern is that the collapseCount and aggregate function values in the collapse_counts list only represent the collapsed documents (ie the ones that are not shown

Re: analysing wild carded terms

2010-02-10 Thread Joe Calderon
sorry, what i meant to say is apply text analysis to the part of the query that is wildcarded, for example if a term with latin1 diacritics is wildcarded ide still like to run it through ISOLatin1Filter On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi f...@efendi.ca wrote: hello *, quick question,

Re: question/suggestion for Solr-236 patch

2010-02-10 Thread gdeconto
Joe Calderon-2 wrote: you can do that very easily yourself in a post processing step after you receive the solr response true (and am already doing so). was thinking that having this done as part of the field collapsing code, it might be faster than doing so via post processing (ie no

RE: analysing wild carded terms

2010-02-10 Thread Steven A Rowe
Hi Joe, See this recent thread from a user with a very similar issue: http://old.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--td24162104.html In the above thread, Mark Miller mentions that Lucene's AnalyzingQueryParser should do the trick, but would need to be integrated into

Re: Distributed search and haproxy and connection build up

2010-02-10 Thread Ian Connor
Thanks, I bypassed haproxy as a test and it did reduce the number of connections - but it did not seem as those these connections were hurting anything. Ian. On Tue, Feb 9, 2010 at 11:01 PM, Lance Norskog goks...@gmail.com wrote: This goes through the Apache Commons HTTP client library:

dismax and multi-language corpus

2010-02-10 Thread Claudio Martella
Hello list, I have a corpus with 3 languages, so i setup a text content field (with no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers. i copyField the text to my language-away fields. So, I setup this dismax searchHandler: requestHandler name=content

RE: Indexing / querying multiple data types

2010-02-10 Thread Stefan Maric
Lance after a bit more reading - cleaning up my configuration (case sensitivity corrected but didn't appear to be affecting the indexing i don't use the atomID field for querying anyhow) I've added a docType field when I index my data and now use the fq parameter to filter on that new field

DataImportHandler - too many connections MySQL error after upgrade to Solr 1.4 release

2010-02-10 Thread Bojan Šmid
Hi all, I had DataImportHandler working perfectly on Solr 1.4 nightly build from June 2009. I upgraded the Solr to 1.4 release and started getting errors: Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Server connection failure during transaction. Due to underlying

implementing profanity detector

2010-02-10 Thread Mike Perham
FYI this does not work.  It appears that the update seems to run on a different thread to the analysis, perhaps because the update is done when the commit happens?  I'm sending the document XML with commitWithin=6. I would appreciate any other ideas.  I'm drawing a blank on how to implement

Re: dismax and multi-language corpus

2010-02-10 Thread Otis Gospodnetic
Claudio - fields with '-' in them can be problematic. Side comment: do you really want to search across all languages at once? If not, maybe 3 different dismax configs would make your searches better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search

Need a bit of help, Solr 1.4: type text.

2010-02-10 Thread Dickey, Dan
I'm using the standard text type for a field, and part of the data being indexed is 13th, as in Friday the 13th. I can't seem to get it to match when I'm querying for Friday the 13th either quoted or not. One thing that does match is 13 th if I send the search query with a space between... Any

Re: Need a bit of help, Solr 1.4: type text.

2010-02-10 Thread Yu-Shan Fung
Check out the configuration of WordDelimiterFilterFactory in your schema.xml. Depending on your settings, it's probably tokenizaing 13th into 13 and th. You can also have them concatenated back into a single token, but I can't remember the exact parameter. I think it could be catenateAll. On

Re: How to configure multiple data import types

2010-02-10 Thread Chris Hostetter
: Subject: How to configure multiple data import types : In-Reply-To: 4b6c0de5.8010...@zib.de http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh

Re: Indexing / querying multiple data types

2010-02-10 Thread Chris Hostetter
: Subject: Indexing / querying multiple data types : In-Reply-To: 8cf3f00d0572f8479efcd0783be11eb1927...@xmb-rcd-104.cisco.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing

Re: Faceting

2010-02-10 Thread Chris Hostetter
: NOTE: Please start a new email thread for a new topic (See : http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) FWIW: I'm the most nit-picky person i know about Thread-Hijacking, but i don't see any MIME headers to indicate that Jose did that). : If i follow this path can i then

Re: How to not limit maximum number of documents?

2010-02-10 Thread Chris Hostetter
: Okay. So we have to leave this question open for now. There might be : other (more advanced) users that can answer this question. It's for : sure, the solution we found is not quite good. The question really isn't open, it's a FAQ...

Query elevation based on field

2010-02-10 Thread Jason Chaffee
Is it possible to do query elevation based on field? Basically, I would like to search the same term on three different fields: q=field1:term OR field2:term OR field3:term and I would like to sort the results by fourth field sort=field4+asc However, I would like to elevate

RE: Index Courruption after replication by new Solr 1.4 Replication

2010-02-10 Thread Osborn Chan
Hi All, I found out there is file corruption issue by using both EmbeddedSolrServer Solr 1.4 Java based replication together in slave server. In my slave server, I have 2 webapps in a tomcat instance. 1) multicore webapp with slave config 2) my custom webapp using EmbeddedSolrServer while

Re: source tree for lucene

2010-02-10 Thread Chris Hostetter
: i want to recompile lucene with : http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure : which source tree to use, i tried using the implied trunk revision : from the admin/system page but solr fails to build with the generated : jars, even if i exclude the patches from 2230...

The Riddle of the Underscore and the Dollar Sign . . .

2010-02-10 Thread Christopher Ball
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. I am trying to get rid of underscores('_') when shingling, but seem unable to do so with a Stopwords Filter. And yet underscores are being removed when I am not even trying to by the

RE: HTTP caching and distributed search

2010-02-10 Thread Chris Hostetter
: I tried your suggestion, Hoss, but committing to the new coordinator : core doesn't change the indexVersion and therefore the ETag value isn't : changed. Hmmm... so the empty commit doesn't change the indexVersion? ... i didn't realize that. Well, I suppose you could replace your empty

Re: Which schema changes are incompatible?

2010-02-10 Thread Chris Hostetter
: http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F : : but it is not clear about the times when this is needed. So I wonder, do I : need to do it after adding a field, removing a field, changing field type, : changing indexed/stored/multiValue

Re: dismax and multi-language corpus

2010-02-10 Thread Jason Rutherglen
Claudio - fields with '-' in them can be problematic. Why's that? On Wed, Feb 10, 2010 at 2:38 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Claudio - fields with '-' in them can be problematic. Side comment: do you really want to search across all languages at once?  If not,

Question on Solr Scalability

2010-02-10 Thread abhishes
Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index

Re: Question on Solr Scalability

2010-02-10 Thread Juan Pedro Danculovic
To scale solr, take a look to this article http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Juan Pedro Danculovic CTO - www.linebee.com On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data

Re: Question on Solr Scalability

2010-02-10 Thread David Stuart
Hi, I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch Which allows sharding to live on different servers and will search across all of those shard when a query comes in. There are a few patch which will hopefully be available in the