Re: DIH - Example of using $nextUrl and $hasMore

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
currently the initial counter is not set , so the value becomes an empty string http://subdomain.site.com/boards.rss?page=${blogs.n} becomes http://subdomain.site.com/boards.rss?page= we need to fix this. Unfortunately the transformer is invoked only after the first chunk is fetched. the best

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Lance Norskog
There are two xml library projects that do streaming xpath reads with full expression evaluation: Nux and dom4j. Nux is from LBL and is an kinda like BSD license and dom4j is BSD license. http://dom4j.org/dom4j-1.6.1/project-info.html http://acs.lbl.gov/nux/ The licensing probably kills these,

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Maybe I am not clear, but I am not able to find anything on the net. Basically, if I had in my index millions of names starting with A* I would like to know how many distinct surnames are present in the resultset (similar to a distinct SQL query). I will attempt to have a look at the SOLR sources

Re: Total count of facets

2009-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 2:14 PM, Bruno Aranda brunoara...@gmail.com wrote: Maybe I am not clear, but I am not able to find anything on the net. Basically, if I had in my index millions of names starting with A* I would like to know how many distinct surnames are present in the resultset

Re: New wiki pages

2009-02-04 Thread Lance Norskog
I've added them to http://wiki.apache.org/solr/FrontPage under Search and Indexing. I declare open season on them. That is, anyone can edit them for any reason. I'm sure I got some things wrong in memory sizing and sorting. These tips and opinions came from my experience on an index with hundreds

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Mmh, thanks for your answer but with that I get the count of names starting with A*, but I would like to get the count of distinct surnames (or town names, or any other field that is not the name...) for the people with name starting with A*. Is that possible? Thanks! Bruno 2009/2/4 Shalin

Re: Total count of facets

2009-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda brunoara...@gmail.com wrote: Mmh, thanks for your answer but with that I get the count of names starting with A*, but I would like to get the count of distinct surnames (or town names, or any other field that is not the name...) for the people with

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Fergus McMenemie
: The solr data field is populated properly. So I guess that bit works. : I really wish I could use xpath=//para : The limitation comes from streaming the XML instead of creating a DOM. : XPathRecordReader is a custom streaming XPath parser implementation and : streaming is easy only because we

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Unfortunately, after some tests listing all the distinct surnames or other fields is too slow and too memory consuming with our current infrastructure. Could someone confirm that if I wanted to add this functionality (just count the total of different facets) what I should do is to subclass the

Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Hi, I am trying to configure solr on ubuntu server and I am getting the following exception. I can able work it on windows box. message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar: Hi, I am trying to configure solr on ubuntu server and I am getting the following exception. I can able work it on windows box. Hi Anto. Have you installed the solr package 1.2 from ubuntu? Or the release 1.3 as war file? Olivier --

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Hi Olivier Thanks for your quick reply. I am using the release 1.3 as war file. - Anto Binish Kaspar -Original Message- From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de] Sent: Wednesday, February 04, 2009 6:20 PM To: solr-user@lucene.apache.org Subject: Re: Severe errors in

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar: Hi Olivier Thanks for your quick reply. I am using the release 1.3 as war file. - Anto Binish Kaspar OK. As far a i understood you need to make sure that your solr home is set. this needs to be done in Quting:

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
I am using Context file, here is my solr.xml $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml Context docBase=/usr/local/solr/solr-1.3/solr.war debug=0 crossContext=true Environment name=/solr/home type=java.lang.String value=usr/local/solr/solr-1.3/solr override=true / /Context I

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
A slash? Olivier Von meinem iPhone gesendet Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar antobin...@ec.is: I am using Context file, here is my solr.xml $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml Context docBase=/usr/local/solr/solr-1.3/solr.war debug=0 crossContext=true

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Now it’s a giving a different message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null

Re: Boost function

2009-02-04 Thread Erick Erickson
From Hossman... index time field boosts are a way to express things like this documents title is worth twice as much as the title of most documents query time boosts are a way to express i care about matches on this clause of my query twice as much as i do about matches to other clauses of my

Re: Severe errors in solr configuration

2009-02-04 Thread Shalin Shekhar Mangar
According to http://wiki.apache.org/solr/SolrTomcat, the JNDI context should be: Context docBase=/some/path/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/my/solr/home override=true / /Context Notice that in the snippet you posted, the name was

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Yes I removed, still I have the same issue. Any idea what may be cause of this issue? - Anto Binish Kaspar -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, February 04, 2009 7:42 PM To: solr-user@lucene.apache.org Subject: Re: Severe

Highlighting on Prefix-Search Bug/Workaround (Re: query with stemming, prefix and fuzzy?)

2009-02-04 Thread Gert Brinkmann
Mark Miller wrote: Currently I think about dropping the stemming and only use prefix-search. But as highlighting does not work with a prefix house* this is a problem for me. The hint to use house?* instead does not work here. Thats because wildcard queries are also not highlightable now.

Differences in output of spell checkers

2009-02-04 Thread Marcus Stratmann
Hello, I'm trying to learn how to use the spell checkers of solr (1.3). I found out that FileBasedSpellChecker and IndexBasedSpellChecker produce different outputs. IndexBasedSpellChecker says lst name=spellcheck lst name=suggestions lst name=gane

Boost function

2009-02-04 Thread Tushar_Gandhi
Hi, I want to know about boosting. What is the use ? How we can implement that? and How it will affect my search results? Thanks, Tushar -- View this message in context: http://www.nabble.com/Boost-function-tp21829651p21829651.html Sent from the Solr - User mailing list archive at

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Thanks, I will try that though I am talking in my case about 100,000+ distinct surnames/towns maximum per query and I just needed the count and not the whole list. In any case, this brute-force approach is still something I can try but I wonder how this will behave speed and memory wise when there

Re: DIH, assigning multiple xpaths to the same solr field: solved

2009-02-04 Thread Fergus McMenemie
Thanks Shalin, Using the following appears to work properly! field column=para1 name=para xpath=/record/sect1/para / field column=para2 name=para xpath=/record/list/listitem/para / field column=para3 name=para xpath=/a/b/c/para / field column=para4 name=para

Re: Total count of facets

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda brunoara...@gmail.com wrote: Unfortunately, after some tests listing all the distinct surnames or other fields is too slow and too memory consuming with our current infrastructure. Could someone confirm that if I wanted to add this functionality

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar: Yes I removed, still I have the same issue. Any idea what may be cause of this issue? Have you solved your problem? Olivier -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main

Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Jon Drukman
Otis Gospodnetic wrote: That should be fine (but apparently isn't), as long as you don't have some very slow machine or if your caches are are large and configured to copy a lot of data on commit. this is becoming more and more problematic. we have periods where we get 10 of these

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
The implementation assumed that most of the users have xml with a fixed schema. . In that case giving absolute path is not hard. This helps us deal with a large subset of usecases rather easily. We have not added all the features which are possible with a streaming parser. It is wiser to

Multiple uniqueKey problems

2009-02-04 Thread Bruno Mateus
Hello, I'm facing some problems in generating a compound unique key. I'm indexing some database tables not related with each other. In my data-config.xml I have the following dataConfig document name=objectTypes entity name=node pk=NODEID query=select * from node field

Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Is an easy way to choose/create an alternate sorting algorithm? I'm frequently dealing with large result sets (a few million results) and I might be able to benefit domain knowledge in my sort. -- View this message in context:

Spell checking not returning full terms

2009-02-04 Thread Rupert Fiasco
We are using Solr 1.3 and trying to get spell checking functionality. FYI, our index contains a lot of medical terms (which might or might not make a difference as they are not English-y words, if that makes any sense?) If I specify a spellcheck query of spellcheck.q=diabtes I get suggestions

Queued Requests during GC

2009-02-04 Thread wojtekpia
During full garbage collection, Solr doesn't acknowledge incoming requests. Any requests that were received during the GC are timestamped the moment GC finishes (at least that's what my logs show). Is there a limit to how many requests can queue up during a full GC? This doesn't seem like a Solr

Re: Spell checking not returning full terms

2009-02-04 Thread Grant Ingersoll
I'm guessing the field you are checking against is being stemmed. The field you spell check against should have minimal analysis done to it, i.e. tokenization and probably downcasing. See http://wiki.apache.org/solr/SpellCheckComponent and

Re: Queued Requests during GC

2009-02-04 Thread Sridhar Basam
That is the expected behaviour, all application threads are paused during GC (CMS collector being an exception, there are smaller pauses but the application threads continue to mostly run). The number of connections that could end up being queued would depend on your acceptCount setting in

Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Otis Gospodnetic
Jon, If you can, don't commit on every update and that should help or fully solve your problem. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jon Drukman jdruk...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, February 4,

Re: Differences in output of spell checkers

2009-02-04 Thread Grant Ingersoll
On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote: Hello, I'm trying to learn how to use the spell checkers of solr (1.3). I found out that FileBasedSpellChecker and IndexBasedSpellChecker produce different outputs. IndexBasedSpellChecker says lst name=spellcheck lst

Re: Custom Sorting Algorithm

2009-02-04 Thread Otis Gospodnetic
Hi, You can use one of the exiting function queries (if they fit your need) or write a custom function query to reorder the results of a query. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: wojtekpia wojte...@hotmail.com To:

Re: Queued Requests during GC

2009-02-04 Thread Otis Gospodnetic
Wojtek, I'm not familiar with the details of Tomcat configuration, but this definitely sounds like a container issue, closely related to the JVM. Doing a thread dump for the Java process (the JVM your TOmcat runs in) while the GC is running will show you which threads are blocked and in turn

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
That's not quite what I meant. I'm not looking for a custom comparator, I'm looking for a custom sorting algorithm. Is there a way to use quick sort or merge sort or... rather than the current algorithm? Also, what is the current algorithm? Otis Gospodnetic wrote: You can use one of the

Re: Total count of facets

2009-02-04 Thread Erik Hatcher
What about using the luke request handler to get the distinct values count? Although it is pretty seriously heavy on a big index, so probably not quite workable in your case. Erik On Feb 4, 2009, at 12:54 PM, Yonik Seeley wrote: On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda

Re: Custom Sorting Algorithm

2009-02-04 Thread Mark Miller
It would not be simple to use a new algorithm. The current implementation takes place at the Lucene level and uses a priority queue. When you ask for the top n results, a priority queue of size n is filled with all of the matching documents. The ordering in the priority queue is the sort. The

Re: Total count of facets

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 3:47 PM, Erik Hatcher e...@ehatchersolutions.com wrote: What about using the luke request handler to get the distinct values count? That wouldn't restrict results by the base query and filters. -Yonik

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Ok, so maybe a better question is: should I bother trying to change the sorting algorithm? I'm concerned that with large data sets, sorting becomes a severe bottleneck (this is an assumption, I haven't profiled anything to verify). Does it become a severe bottleneck? Do you know if alternate sort

Re: Queued Requests during GC

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 3:12 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I'd be curious if you could reproduce this in Jetty All application threads are blocked... it's going to be the same in Jetty or Tomcat or any other container that's pure Java. There is an OS level listening

Re: Custom Sorting Algorithm

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 4:45 PM, wojtekpia wojte...@hotmail.com wrote: Ok, so maybe a better question is: should I bother trying to change the sorting algorithm? I'm concerned that with large data sets, sorting becomes a severe bottleneck (this is an assumption, I haven't profiled anything to

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
This is when a load balancer helps. The requests sent around the time that the GC starts will be stuck on that server, but later ones can be sent to other servers. We use a least connections load balancing strategy. Each connection represents a request in progress, so this is the same as

Re: Queued Requests during GC

2009-02-04 Thread Mark Miller
Walter Underwood wrote: Also, only use as much heap as you really need. A larger heap means longer GCs. Right. Ideally you want to figure out how to get longer pauses down. There is a lot of fiddling that you can do to improve gc times. On a multiprocessor machine you can parallelize

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 2:48 PM, Mark Miller markrmil...@gmail.com wrote: If there are spots in Lucene/Solr that are producing so much garbage that we can't keep up, perhaps work can be done to address this upon pinpointing the issues. - Mark I have not had the time to pin it down, but I suspect that

Re: Queued Requests during GC

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 5:52 PM, Walter Underwood wunderw...@netflix.com wrote: I have not had the time to pin it down, but I suspect that items evicted from the query result cache contain a lot of objects. Are the keys a full parse tree? That could be big. Yes, keys are full Query objects. It

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
Aha! I bet that the full Query object became a lot more complicated between Solr 1.1 and 1.3. That would explain why we did 4X as much GC after the upgrade. Items evicted from cache are tenured, so they contribute to the full GC. With an HTTP cache in front, there is hardly anything left to be

Re: Queued Requests during GC

2009-02-04 Thread Mark Miller
Walter Underwood wrote: Aha! I bet that the full Query object became a lot more complicated between Solr 1.1 and 1.3. That would explain why we did 4X as much GC after the upgrade. Items evicted from cache are tenured, so they contribute to the full GC. With an HTTP cache in front, there is

Re: Highlighting Oddities

2009-02-04 Thread ashokc
I have seen some of these oddities that Chris is referring to. In my case, terms that are NOT in the query get highlighted. For example searching for 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms either. Do these filter factories add some extra intelligence to the

Re: Queued Requests during GC

2009-02-04 Thread Chris Hostetter
: Aha! I bet that the full Query object became a lot more complicated : between Solr 1.1 and 1.3. That would explain why we did 4X as much GC : after the upgrade. I don't thinkg the Query class implementations themselves changed in anyway that would have made them larger -- but if you

Latest on DataImportHandler and Tika?

2009-02-04 Thread Chris Harris
Back in November, Shalin and Grant were discussing integrating DataImportHandler and Tika. Shalin's estimation about the best way to do this was as follows: ** I think the best way would be a TikaEntityProcessor which knows how to handle documents. I guess a typical use-case would be

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 3:44 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I don't thinkg the Query class implementations themselves changed in anyway that would have made them larger -- but if you switched from the standard parser to dismax parser, or started using lots of boost queries, or started

Maximum Term Frequency and Minimum Document Length

2009-02-04 Thread Jonah Schwartz
We want to configure solr so that fields are indexed with a maximum term frequency and a minimum document length. If a term appears more than N times in a field it will be considered to have appeared only N times. If a document length is under M terms, it will be considered to exactly M terms. We

Re: Spell checking not returning full terms

2009-02-04 Thread Rupert Fiasco
Awesome! After reading up on the links you sent me I got it all working. Thanks! FYI - I did previously come across one of the links you sent over: http://wiki.apache.org/solr/SpellCheckerRequestHandler But what threw me off is that when I started reading about that yesterday, in the first

Query on Level of Access to lucene in Solr

2009-02-04 Thread Nick
Hello there, I'm a solr newbie but i've used lucene for some complex IR projects before. Can someone please help me understand the extent to which solr allows access to lucene? To elaborate, say, i'm considering the use of solr for all its wonderful properties like scaling,

instanceDir value is incorrect in multicore environment

2009-02-04 Thread Mark Ferguson
Hello, I have a problem with setting the instanceDir property for the cores in solr.xml. When I set the value to be relative, it sets it as relative to the location from which I started the application, instead of relative to the solr.home property. I am using Tomcat and I am creating a context

Re: instanceDir value is incorrect in multicore environment

2009-02-04 Thread Mark Ferguson
I looked at the core status page and it looks like the problem isn't actually the instanceDir property, but rather dataDir. It's not being appended to instanceDir so its path is relative to cwd. I'm using a patched version of Solr with some of my own custom changes relating to dataDir, so this is

Re: Latest on DataImportHandler and Tika?

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
We have not taken up anything yet. The idea is to create another contrib which will contain extensions to DIH which has external dependencies as SOLR-934. TikaEntityProcessor is something we wish to do but our limited bandwidth has been the problem On Thu, Feb 5, 2009 at 5:15 AM, Chris Harris

Severe errors in solr configuration

2009-02-04 Thread David Trainor
Hello, I am running Ubuntu 8.10, with Tomcat 6.0.18 installed via the package manager, and I am trying to get Solr 1.3.0 up and running, with no success. I believe I am having the same problem described here: http://www.nabble.com/Severe-errors-in-solr-configuration-td21829562.html When I