currently the initial counter is not set , so the value becomes an empty string
http://subdomain.site.com/boards.rss?page=${blogs.n}
becomes
http://subdomain.site.com/boards.rss?page=
we need to fix this. Unfortunately the transformer is invoked only
after the first chunk is fetched.
the best
There are two xml library projects that do streaming xpath reads with full
expression evaluation: Nux and dom4j. Nux is from LBL and is an kinda like
BSD license and dom4j is BSD license.
http://dom4j.org/dom4j-1.6.1/project-info.html
http://acs.lbl.gov/nux/
The licensing probably kills these,
Maybe I am not clear, but I am not able to find anything on the net.
Basically, if I had in my index millions of names starting with A* I would
like to know how many distinct surnames are present in the resultset
(similar to a distinct SQL query).
I will attempt to have a look at the SOLR sources
On Wed, Feb 4, 2009 at 2:14 PM, Bruno Aranda brunoara...@gmail.com wrote:
Maybe I am not clear, but I am not able to find anything on the net.
Basically, if I had in my index millions of names starting with A* I would
like to know how many distinct surnames are present in the resultset
I've added them to http://wiki.apache.org/solr/FrontPage under Search and
Indexing. I declare open season on them. That is, anyone can edit them for
any reason. I'm sure I got some things wrong in memory sizing and sorting.
These tips and opinions came from my experience on an index with hundreds
Mmh, thanks for your answer but with that I get the count of names starting
with A*, but I would like to get the count of distinct surnames (or town
names, or any other field that is not the name...) for the people with name
starting with A*. Is that possible?
Thanks!
Bruno
2009/2/4 Shalin
On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda brunoara...@gmail.com wrote:
Mmh, thanks for your answer but with that I get the count of names starting
with A*, but I would like to get the count of distinct surnames (or town
names, or any other field that is not the name...) for the people with
: The solr data field is populated properly. So I guess that bit works.
: I really wish I could use xpath=//para
: The limitation comes from streaming the XML instead of creating a DOM.
: XPathRecordReader is a custom streaming XPath parser implementation and
: streaming is easy only because we
Unfortunately, after some tests listing all the distinct surnames or other
fields is too slow and too memory consuming with our current infrastructure.
Could someone confirm that if I wanted to add this functionality (just count
the total of different facets) what I should do is to subclass the
Hi,
I am trying to configure solr on ubuntu server and I am getting the following
exception. I can able work it on windows box.
message Severe errors in solr configuration. Check your log files for more
detailed information on what may be wrong. If you want solr to continue after
Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:
Hi,
I am trying to configure solr on ubuntu server and I am getting the
following exception. I can able work it on windows box.
Hi Anto.
Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?
Olivier
--
Hi Olivier
Thanks for your quick reply. I am using the release 1.3 as war file.
- Anto Binish Kaspar
-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
Sent: Wednesday, February 04, 2009 6:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in
Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:
Hi Olivier
Thanks for your quick reply. I am using the release 1.3 as war file.
- Anto Binish Kaspar
OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in
Quting:
I am using Context file, here is my solr.xml
$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
Context docBase=/usr/local/solr/solr-1.3/solr.war
debug=0 crossContext=true
Environment name=/solr/home type=java.lang.String
value=usr/local/solr/solr-1.3/solr override=true /
/Context
I
A slash?
Olivier
Von meinem iPhone gesendet
Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar antobin...@ec.is:
I am using Context file, here is my solr.xml
$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
Context docBase=/usr/local/solr/solr-1.3/solr.war
debug=0 crossContext=true
Now it’s a giving a different message
Severe errors in solr configuration. Check your log files for more detailed
information on what may be wrong. If you want solr to continue after
configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in null
From Hossman...
index time field boosts are a way to express things like this documents
title is worth twice as much as the title of most documents query time
boosts are a way to express i care about matches on this clause of my
query twice as much as i do about matches to other clauses of my
According to http://wiki.apache.org/solr/SolrTomcat, the JNDI context should
be:
Context docBase=/some/path/solr.war debug=0 crossContext=true
Environment name=solr/home type=java.lang.String
value=/my/solr/home override=true /
/Context
Notice that in the snippet you posted, the name was
Yes I removed, still I have the same issue. Any idea what may be cause of this
issue?
- Anto Binish Kaspar
-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Wednesday, February 04, 2009 7:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe
Mark Miller wrote:
Currently I think about dropping the stemming and only use
prefix-search. But as highlighting does not work with a prefix house*
this is a problem for me. The hint to use house?* instead does not
work here.
Thats because wildcard queries are also not highlightable now.
Hello,
I'm trying to learn how to use the spell checkers of solr (1.3). I found
out that FileBasedSpellChecker and IndexBasedSpellChecker produce
different outputs.
IndexBasedSpellChecker says
lst name=spellcheck
lst name=suggestions
lst name=gane
Hi,
I want to know about boosting. What is the use ?
How we can implement that? and How it will affect my search results?
Thanks,
Tushar
--
View this message in context:
http://www.nabble.com/Boost-function-tp21829651p21829651.html
Sent from the Solr - User mailing list archive at
Thanks, I will try that though I am talking in my case about 100,000+
distinct surnames/towns maximum per query and I just needed the count and
not the whole list. In any case, this brute-force approach is still
something I can try but I wonder how this will behave speed and memory wise
when there
Thanks Shalin,
Using the following appears to work properly!
field column=para1 name=para xpath=/record/sect1/para /
field column=para2 name=para xpath=/record/list/listitem/para /
field column=para3 name=para xpath=/a/b/c/para /
field column=para4 name=para
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda brunoara...@gmail.com wrote:
Unfortunately, after some tests listing all the distinct surnames or other
fields is too slow and too memory consuming with our current infrastructure.
Could someone confirm that if I wanted to add this functionality
Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar:
Yes I removed, still I have the same issue. Any idea what may be
cause of this issue?
Have you solved your problem?
Olivier
--
Olivier Dobberkau
Je TYPO3, desto d.k.d
d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main
Otis Gospodnetic wrote:
That should be fine (but apparently isn't), as long as you don't have some very
slow machine or if your caches are are large and configured to copy a lot of
data on commit.
this is becoming more and more problematic. we have periods where we
get 10 of these
The implementation assumed that most of the users have xml with a
fixed schema. . In that case giving absolute path is not hard. This
helps us deal with a large subset of usecases rather easily.
We have not added all the features which are possible with a
streaming parser. It is wiser to
Hello,
I'm facing some problems in generating a compound unique key. I'm
indexing some database tables not related with each other. In my
data-config.xml I have the following
dataConfig
document name=objectTypes
entity name=node pk=NODEID query=select * from node
field
Is an easy way to choose/create an alternate sorting algorithm? I'm
frequently dealing with large result sets (a few million results) and I
might be able to benefit domain knowledge in my sort.
--
View this message in context:
We are using Solr 1.3 and trying to get spell checking functionality.
FYI, our index contains a lot of medical terms (which might or might
not make a difference as they are not English-y words, if that makes
any sense?)
If I specify a spellcheck query of spellcheck.q=diabtes
I get suggestions
During full garbage collection, Solr doesn't acknowledge incoming requests.
Any requests that were received during the GC are timestamped the moment GC
finishes (at least that's what my logs show). Is there a limit to how many
requests can queue up during a full GC? This doesn't seem like a Solr
I'm guessing the field you are checking against is being stemmed. The
field you spell check against should have minimal analysis done to it,
i.e. tokenization and probably downcasing. See http://wiki.apache.org/solr/SpellCheckComponent
and
That is the expected behaviour, all application threads are paused
during GC (CMS collector being an exception, there are smaller pauses
but the application threads continue to mostly run). The number of
connections that could end up being queued would depend on your
acceptCount setting in
Jon,
If you can, don't commit on every update and that should help or fully solve
your problem.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jon Drukman jdruk...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, February 4,
On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote:
Hello,
I'm trying to learn how to use the spell checkers of solr (1.3). I
found out that FileBasedSpellChecker and IndexBasedSpellChecker
produce different outputs.
IndexBasedSpellChecker says
lst name=spellcheck
lst
Hi,
You can use one of the exiting function queries (if they fit your need) or
write a custom function query to reorder the results of a query.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: wojtekpia wojte...@hotmail.com
To:
Wojtek,
I'm not familiar with the details of Tomcat configuration, but this definitely
sounds like a container issue, closely related to the JVM.
Doing a thread dump for the Java process (the JVM your TOmcat runs in) while
the GC is running will show you which threads are blocked and in turn
That's not quite what I meant. I'm not looking for a custom comparator, I'm
looking for a custom sorting algorithm. Is there a way to use quick sort or
merge sort or... rather than the current algorithm? Also, what is the
current algorithm?
Otis Gospodnetic wrote:
You can use one of the
What about using the luke request handler to get the distinct values
count? Although it is pretty seriously heavy on a big index, so
probably not quite workable in your case.
Erik
On Feb 4, 2009, at 12:54 PM, Yonik Seeley wrote:
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda
It would not be simple to use a new algorithm. The current
implementation takes place at the Lucene level and uses a priority
queue. When you ask for the top n results, a priority queue of size n is
filled with all of the matching documents. The ordering in the priority
queue is the sort. The
On Wed, Feb 4, 2009 at 3:47 PM, Erik Hatcher e...@ehatchersolutions.com wrote:
What about using the luke request handler to get the distinct values count?
That wouldn't restrict results by the base query and filters.
-Yonik
Ok, so maybe a better question is: should I bother trying to change the
sorting algorithm? I'm concerned that with large data sets, sorting
becomes a severe bottleneck (this is an assumption, I haven't profiled
anything to verify). Does it become a severe bottleneck? Do you know if
alternate sort
On Wed, Feb 4, 2009 at 3:12 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
I'd be curious if you could reproduce this in Jetty
All application threads are blocked... it's going to be the same in
Jetty or Tomcat or any other container that's pure Java. There is an
OS level listening
On Wed, Feb 4, 2009 at 4:45 PM, wojtekpia wojte...@hotmail.com wrote:
Ok, so maybe a better question is: should I bother trying to change the
sorting algorithm? I'm concerned that with large data sets, sorting
becomes a severe bottleneck (this is an assumption, I haven't profiled
anything to
This is when a load balancer helps. The requests sent around the
time that the GC starts will be stuck on that server, but later
ones can be sent to other servers.
We use a least connections load balancing strategy. Each connection
represents a request in progress, so this is the same as
Walter Underwood wrote:
Also, only use as much heap as you really need. A larger heap
means longer GCs.
Right. Ideally you want to figure out how to get longer pauses down.
There is a lot of fiddling that you can do to improve gc times.
On a multiprocessor machine you can parallelize
On 2/4/09 2:48 PM, Mark Miller markrmil...@gmail.com wrote:
If there are spots in Lucene/Solr that are producing so much garbage
that we can't keep up, perhaps work can be done to address this upon
pinpointing the issues.
- Mark
I have not had the time to pin it down, but I suspect that
On Wed, Feb 4, 2009 at 5:52 PM, Walter Underwood wunderw...@netflix.com wrote:
I have not had the time to pin it down, but I suspect that items
evicted from the query result cache contain a lot of objects.
Are the keys a full parse tree? That could be big.
Yes, keys are full Query objects.
It
Aha! I bet that the full Query object became a lot more complicated
between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
after the upgrade.
Items evicted from cache are tenured, so they contribute to the full GC.
With an HTTP cache in front, there is hardly anything left to be
Walter Underwood wrote:
Aha! I bet that the full Query object became a lot more complicated
between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
after the upgrade.
Items evicted from cache are tenured, so they contribute to the full GC.
With an HTTP cache in front, there is
I have seen some of these oddities that Chris is referring to. In my case,
terms that are NOT in the query get highlighted. For example searching for
'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
either. Do these filter factories add some extra intelligence to the
: Aha! I bet that the full Query object became a lot more complicated
: between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
: after the upgrade.
I don't thinkg the Query class implementations themselves changed in
anyway that would have made them larger -- but if you
Back in November, Shalin and Grant were discussing integrating
DataImportHandler and Tika. Shalin's estimation about the best way to
do this was as follows:
**
I think the best way would be a TikaEntityProcessor which knows how to
handle documents. I guess a typical use-case would be
On 2/4/09 3:44 PM, Chris Hostetter hossman_luc...@fucit.org wrote:
I don't thinkg the Query class implementations themselves changed in
anyway that would have made them larger -- but if you switched from the
standard parser to dismax parser, or started using lots of boost
queries, or started
We want to configure solr so that fields are indexed with a maximum term
frequency and a minimum document length. If a term appears more than N times
in a field it will be considered to have appeared only N times. If a
document length is under M terms, it will be considered to exactly M terms.
We
Awesome! After reading up on the links you sent me I got it all working. Thanks!
FYI - I did previously come across one of the links you sent over:
http://wiki.apache.org/solr/SpellCheckerRequestHandler
But what threw me off is that when I started reading about that
yesterday, in the first
Hello there,
I'm a solr newbie but i've used lucene for some complex
IR projects before.
Can someone please help me understand the extent to which solr allows
access to lucene?
To elaborate, say, i'm considering the use of solr for all its wonderful
properties like scaling,
Hello,
I have a problem with setting the instanceDir property for the cores in
solr.xml. When I set the value to be relative, it sets it as relative to the
location from which I started the application, instead of relative to the
solr.home property.
I am using Tomcat and I am creating a context
I looked at the core status page and it looks like the problem isn't
actually the instanceDir property, but rather dataDir. It's not being
appended to instanceDir so its path is relative to cwd.
I'm using a patched version of Solr with some of my own custom changes
relating to dataDir, so this is
We have not taken up anything yet. The idea is to create another
contrib which will contain extensions to DIH which has external
dependencies as SOLR-934.
TikaEntityProcessor is something we wish to do but our limited
bandwidth has been the problem
On Thu, Feb 5, 2009 at 5:15 AM, Chris Harris
Hello,
I am running Ubuntu 8.10, with Tomcat 6.0.18 installed via the package
manager, and I am trying to get Solr 1.3.0 up and running, with no success.
I believe I am having the same problem described here:
http://www.nabble.com/Severe-errors-in-solr-configuration-td21829562.html
When I
62 matches
Mail list logo