Hi.
On the Sunspot (a Ruby Solr client) Wiki
(https://github.com/outoftime/sunspot/wiki/Matching-substrings-in-fulltext-search)
it says that the NGramFilter should allow substring indexing. As I
never got it working, I searched a bit and found this site:
Hello,
We have been running read only solr instances for a few months now,
yesterday i have noticed an high cpu usage coming from the JVM, it
simply use 100% of the CPU for no reason.
Nothing was changed, we are using Jetty as a Servlet container for solr.
Where can i start looking what cause
Hi Hoss,
Ok, that makes much more sense now. I was under the impression that values
were copied as well which seemed a bit odd..
unless you have to deal with a use case similar to yours. :)
Cheers,
- Savvas
On 9 February 2011 02:25, Chris Hostetter hossman_luc...@fucit.org wrote:
: In my
hi,
we have a problem with our solr test instance.
This instance is running with 90 cores with about 2 GB of Index-Data per core.
This worked fine for a few weeks.
Now we get an exception querying data from one core :
java.lang.IndexOutOfBoundsException: Index: 104, Size: 11
at
Hello together,
i am currently developing a search solution, based on Apache Solr. Currently I
have the problem that I want to offer the user the possibility to maintain
synonyms and stopwords in a userfriendy tool. But currently I could not find
any possibility to write the stopwords.txt or
Timo,
On Wed, Feb 9, 2011 at 11:07 AM, Timo Schmidt timo.schm...@aoemedia.de wrote:
But currently I could not find any possibility to write the stopwords.txt or
synonyms.txt.
what about writing the Files from an external Application and reload
your Solr Core!?
Seemed to be the simplest way to
Hi Stefan,
i allready thought about that. Maybe some php service or something like that.
But this would mean, that I need additional software on that server like a
normal
Apache installation, which needs to be maintained. That's why I thought a
solution that
is build into solr would be nice.
Hi Timo,
of course - that's right. Write some JSP (i guess) which could be
integrated in the already existing jetty/tomcat Server?
Just wondering about, how do you perform Search-Requests to Solr?
Normally, there is already any other Service running, which acts as
'proxy' to the outer world? ;)
The parsed data is only sent to the Solr index of you tell a segment to be
indexed; solrindex crawldb linkdb segment
If you did this only once after injecting and then the consequent
fetch,parse,update,index sequence then you, of course, only see those URL's.
If you don't index a segment
The show stopper for JTS is it's license, unfortunately. Otherwise, I think it
would be done already! We could, since it's LGPL, make it an optional
dependency, assuming someone can stub it out.
On Feb 8, 2011, at 11:18 PM, Adam Estrada wrote:
I just came across a ~nudge post over in the
How could i stub this out not being a java guy? What is needed in order to do
this?
Licensing is always going to be an issue with JTS which is why I am interested
in the project SIS sitting in incubation right now.
I willing to put forth the effort if I had a little direction from the peanut
Yes we have something, but on another machine.
Timo Schmidt
Entwickler (Diplom Informatiker FH)
AOE media GmbH
Borsigstr. 3
65205 Wiesbaden
Germany
Tel. +49 (0) 6122 70 70 7 - 234
Fax. +49 (0) 6122 70 70 7 -199
e-Mail: timo.schm...@aoemedia.de
Web: http://www.aoemedia.de/
Pflichtangaben
Thought I would share this on web mapping...it's a great write up and something
to consider when talking about working with spatial data.
http://www.tokumine.com/2010/09/20/gis-data-payload-sizes/
Adam
On Feb 9, 2011, at 7:03 AM, Grant Ingersoll gsing...@apache.org wrote:
The show stopper
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
There is only EdgeNGramFilterFactory listed (which I got working for
prefix indexing), but no NGramFilterFactory. Is that filter not
supported anymore, or is that list not up to date?
It should be there.
Grant,
How could i stub this out not being a java guy? What is needed in order to do
this?
Licensing is always going to be an issue with JTS which is why I am interested
in the project SIS sitting in incubation right now.
I'm willing to put forth the effort if I had a little direction on
Thought I would share this on web mapping...it's a great write up and something
to consider when talking about working with spatial data.
http://www.tokumine.com/2010/09/20/gis-data-payload-sizes/
Adam
On Feb 9, 2011, at 7:03 AM, Grant Ingersoll wrote:
The show stopper for JTS is it's
Timo,
then use cronjobs on your solr-machine to fetch the generated
synonyms-file, put in to the correct location and reload the
core-configuration (which is required to update the synonyms-file)? :)
Regards
Stefan
On Wed, Feb 9, 2011 at 1:15 PM, Timo Schmidt timo.schm...@aoemedia.de wrote:
I think we had a similar exception recently when attempting to sort on a
multi-valued field ... could that be possible in your case?
André
-Ursprüngliche Nachricht-
Von: Dominik Lange [mailto:dominikla...@searchmetrics.com]
Gesendet: Mittwoch, 9. Februar 2011 10:55
An:
You can try attaching jConsole to the process to see what it shows. If
you're on a *nix box
you can get a gross idea what's going on with top.
Best
Erick
On Wed, Feb 9, 2011 at 4:31 AM, Erez Zarum e...@icinga.org.il wrote:
Hello,
We have been running read only solr instances for a few months
No, we do not have multivalued fields and we do not sort (in this case).
We reindexed csv file and the error disappeared, but it would we interesting
why this error occured...
Thank you for you suggestion.
Dominik
-Ursprüngliche Nachricht-
Von: André Widhani
In addition to Koji's note, see the bold comment at the top of that
page that says that this not a complete list, the definitive list is
always the javadocs...
Best
Erick
On Wed, Feb 9, 2011 at 3:34 AM, Kai Schlamp schl...@gmx.de wrote:
Hi.
On the Sunspot (a Ruby Solr client) Wiki
(
Hello,
On Tue, Feb 8, 2011 at 11:12 PM, Grant Ingersoll gsing...@apache.org wrote:
It's a little hard to read due to the indentation, but AFAICT you have two
terms, usb and cabl. USB appears at position 0 and cabl at position 1.
Those are the relative positions to each other. Perhaps you
Hi Markus,
I am sorry for not being clear, I meant to say that...
Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in
turn contain links to a.html, b.html, c.html, d.html) is injected into the
seed.txt, after the whole process I was expecting a bunch of other pages
which
WARNING: I don't do Nutch much, but could it be that your
crawl depth is 1? See:
http://wiki.apache.org/nutch/NutchTutorial
http://wiki.apache.org/nutch/NutchTutorialand search for depth
Best
Erick
On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :. ab1s...@gmail.com wrote:
Hi Markus,
I am sorry
Are you using the depth parameter with the crawl command or are you using the
separate generate, fetch etc. commands?
What's $ nutch readdb crawldb -stats returning?
On Wednesday 09 February 2011 15:06:40 .: Abhishek :. wrote:
Hi Markus,
I am sorry for not being clear, I meant to say
On Tue, Feb 8, 2011 at 9:02 PM, Andy angelf...@yahoo.com wrote:
Is it possible to do a query like {!boost b=log(popularity)}foo over sharded
indexes?
Yep, that should work fine.
-Yonik
http://lucidimagination.com
Hi Erick,
Thanks a bunch for the response
Could be a chance..but all I am wondering is where to specify the depth in
the whole entire process in the URL
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried
specifying it during the fetcher phase but it was just ignored :(
Hi Solr Users,
We are in the process of upgrading from Solr 1.3 to Solr 1.4.1.
While performing stress test on Solr 1.4.1 to measure the performance
improvement in Query times (QTime) and no more blocked threads, we ran into
memory issues with Solr 1.4.1.
Test Setup details:
- 2 identical hosts
Searching and sorting is now done on a per-segment basis, meaning that
the FieldCache entries used for sorting and for function queries are
created and used per-segment and can be reused for segments that don't
change between index updates. While generally beneficial, this can lead
to increased
Solr does handle concurrency fine. But there is NOT transaction isolation
like you'll get from an rdbms. All 'pending' changes are (conceptually, anyway)
held in a single queue, and any commit will commit ALL of them. There isn't
going to be any data corruption issues or anything from
However, the Solr book, in the Commit, Optimise, Rollback section reads:
if more than one Solr client were to submit modifications and commit them
at similar times, it is possible for part of one client's set of changes to
be committed before that client told Solr to commit
which suggests
Hello,
Thanks very much for your quick replies.
So, according to Pierre, all updates will be immediately posted to Solr, but
all commits will be serialised. But doesn't that contradict Jonathan's
example where you can end up with FIVE 'new indexes' being warmed? If
commits are serialised, then
Don't think commit, that is confusing. Solr is not a database. In particular,
it does not have the isolation property from ACID.
Solr indexes new documents as a batch, then installs a new version of the
entire index. Installing a new index isn't instant, especially with warming
queries. Solr
Hi Savvas,
well, although it sounds strange: If a commit happens, a new Index Searcher
is warming. If a new commit happens while a 'new' Index Searcher is warming,
another Index Searcher is warming. So, at this point of time, you got 3
Index Searchers: The old one, the 'new' one and the newest
Well, Jonathan explanations are much more accurate than mine. :)
I took the word serialization as meaning kind of isolation between commits,
which is not very smart. Sorry to have introduce more confusion in this.
Pierre
-Message d'origine-
De : Savvas-Andreas Moysidis
Yes, we'll probably go towards that path as our index files are relatively
small, so auto warming might not be extremely useful in our case..
Yep, we do realise the difference between a db and a Solr commit. :)
Thanks.
On 9 February 2011 16:15, Walter Underwood wun...@wunderwood.org wrote:
Thanks very much Em.
- Savvas
On 9 February 2011 16:22, Savvas-Andreas Moysidis
savvas.andreas.moysi...@googlemail.com wrote:
Yes, we'll probably go towards that path as our index files are relatively
small, so auto warming might not be extremely useful in our case..
Yep, we do realise the
Hi All,
This is Rahul and am using Solr for one of my upcoming projects.
I had a query regarding search term count using Solr.
We have a requirement in one of our search based projects to search the
results based on search term counts per document.
For eg,
if a user searches for something like
Hi Abishek,
depth is a param of crawl command, not fetch command
If you are using custom script calling individual stages of nutch crawl,
then depth N means , you running that script for N times.. You can put a
loop, in the script.
Thanks,
Charan
On Wed, Feb 9, 2011 at 6:26 AM, .: Abhishek :.
Hello folks,
I got a question regarding an own QueryWeight implementation for a special
usecase.
For the current usecase we want to experiment with different values for the
idf based on different algorithms and how they affect the scoring.
Is there a way to plug-in an own weight-implementation
On Wed, Feb 9, 2011 at 12:16 PM, Em mailformailingli...@yahoo.de wrote:
For the current usecase we want to experiment with different values for the
idf based on different algorithms and how they affect the scoring.
For tf, idf, lengthNorm, coord, etc, see Similarity.
Solr already alows you to
I suspect it's worthwhile to back up and ask whether this is a reasonable
requirement. What is the use-case? Because unless the input is very
uniform, I wouldn't be surprised if this will produce poor results. For
instance,
if solr appears once in a field 5 words long and 5 times in another
Dear Adam,
I also got the OutOfMemory exception. I changed the JAVA_OPTS in catalina.sh
as follows.
...
if [ -z $LOGGING_MANAGER ]; then
JAVA_OPTS=$JAVA_OPTS
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
else
JAVA_OPTS=$JAVA_OPTS -server -Xms8096m -Xmx8096m
Hi Yonik,
thanks for the fast feedback.
Well, as far as I can see there is no possibility to get the original query
from the similarity-class...
Let me ask differently: I know there are some distributed
idf-implementations out there.
One approach is to ask every shard for its idf for a term
Bing Li,
One should be conservative when setting Xmx. Also, just setting Xmx might not
do the trick at all because the garbage collector might also be the issue
here. Configure the JVM to output debug logs of the garbage collector and
monitor the heap usage (especially the tenured generation)
I should also add that reducing the caches and autowarm sizes (or not using
them at all) drastically reduces memory consumption when a new searcher is
being prepares after a commit. The memory usage will spike at these events.
Again, use a monitoring tool to get more information on your
I have a data set indexed over two irons, with M docs per Solr core for a
total of N cores.
If I perform a query across all N cores with start=0 and rows=30, I get,
say, numFound=27521). If I simply change the start param to start=27510
(simulating being on the last page of data), I get a
On Wed, Feb 9, 2011 at 1:18 PM, Em mailformailingli...@yahoo.de wrote:
How do they store these idfs for the current request so that the
similarity is aware of them?
The df (as opposed to idf) is requested from the searcher by the
weight, which then uses the similarity to produce the idf. See
Thanks, again. :)
Okay, so if one wants a distributed idf one should extend a searcher instead
of the query-class.
But it doesn't seem to be pluggable, right?
Well, for our purposes extending the query-class is enough, but just from
beeing curious: Where should one starts if one wants to make
mrw wrote:
I have a data set indexed over two irons, with M docs per Solr core for a
total of N cores.
If I perform a query across all N cores with start=0 and rows=30, I get,
say, numFound=27521). If I simply change the start param to start=27510
(simulating being on the last page of
On Wed, Feb 9, 2011 at 1:51 PM, Em mailformailingli...@yahoo.de wrote:
Okay, so if one wants a distributed idf one should extend a searcher instead
of the query-class.
Yes.
If you're interested in distributed search for Solr, there is a patch
in progress:
On Wed, Feb 9, 2011 at 1:42 PM, mrw mikerobertsw...@gmail.com wrote:
I have a data set indexed over two irons, with M docs per Solr core for a
total of N cores.
If I perform a query across all N cores with start=0 and rows=30, I get,
say, numFound=27521). If I simply change the start param
Hello all,
I am looking into an enterprise search solution for our architecture and I am
very pleased to see all the features Solr provides. In our case, we will have a
need for a highly scalable application for multiple clients. This application
will be built to serve many users who each will
What about standing up a VM (search appliance that you would make) for
each client?
If there's no data sharing across clients, then using the same solr
server/index doesn't seem necessary.
Solr will easily meet your needs though, its the best there is.
On Wed, 2011-02-09 at 14:23 -0500, Greg
From what I understand about multicore, each of the indexes are independant
from each other right? Or would one index have access to the info of the other?
My requirement is like you mention, a client has access only to his or her
search data based in their documents. Other clients have no
This application will be built to serve many users
If this means that you have thousands of users, 1000s of VMs and/or
1000s of cores is not going to scale.
Have an ID in the index for each user, and filter using it.
Then they can see only their own documents.
Assuming that you are building an
Hi,
I am asked that whether solr renders biased search result? For example, for
this search (query all movie title by this Comedy genre), for user who
indicates a preference to 1950's movies, solr renders the 1950's movies with
higher score (top in the list)?Or if user is a kid, then the
Another option (assuming the case where a user can be granted access to
a certain class of documents, and more than one user would be able to
access certain documents) would be to store the access filter (as an OR
query of content types) in an external cache (perhaps a database or an
eternal cache
Cyang,
why can't you, for a kid, add a boosting query
genre:kid^2.0
aside of the rest?
That would double the score of a match if the users are kids.
But note that you'd better calibrate the coefficient with some test battery.
This is part of the fine art, I think.
paul
Le 9 févr. 2011 à
Hi,
I have a class (in a jar) that reads from properties (text) files. I have
these
files in the same jar file as the class.
However, when my class reads those properties files, those files cannot be
found
since solr reads from tomcat's bin directory.
I don't really want to put the config
Hi,
I'm scheduling solr to build every hour or so.
I'd like to do some pre and post processing for each index build. The
preprocessing would do some checks and perhaps will skip the build.
For post processing, I will do some checks and either commit or rollback the
build.
Can I write some
I am trying to use the regex transformer but it's not returning anything.
Either my regex is wrong, or I've done something else wrong in the setup of the
entity. Is there any way to debug this? Making a change and waiting 7 minutes
to reindex the entity sucks.
entity name=boxshot
Hello,
Andy, so did you get final answer to your quetion?
I am also trying to do something similar. Please give me pointers if you
have any.
Basically even I need to use Ngram with WhitespaceTokenizer any help will be
appreciated.
--
View this message in context:
Is there a reason why the StatsComponent only deals with indexed fields?
I just updated the wiki: http://wiki.apache.org/solr/StatsComponent to call
this fact out since it was not apparent previously.
I've briefly skimmed the source of StatsComponent, but am not familiar
enough with the code or
That makes sense. It is a little bit indirect. You have to translate that
user preference/profile into a search field value and then dictate search
result boosting the doc with that preference value.
--
View this message in context:
What kinds of information would you expect for a stored-only field? I mean,
the stored part is just a blob that Solr doesn't peek inside of, so I'm not
sure
what useful information *could* be returned
Best
Erick
On Wed, Feb 9, 2011 at 3:55 PM, Travis Truman trum...@gmail.com wrote:
Is
What *could* solr do for you? You've outlined a domain-specific requirement,
I'm not sure how a general-purpose search engine would incorporate
that functionality
Best
Erick
On Wed, Feb 9, 2011 at 4:08 PM, cyang2010 ysxsu...@hotmail.com wrote:
That makes sense. It is a little bit
Is your war always deployed the the same location, ie /usr/mycomp/
myapplication/webapps/myapp.war? If so then on startup copy the
files out of your directory and put them under CATALINA_BASE/solr (usr/
mycomp/myapplication/solr) and in your war file have the META-INF/
context.xml JNDI
Hi,
I'd like to communicate errors between my entity processor and the DataImporter
in case of error.
Should there be an error in my entity processor, I'd like the index build to
rollback. How can I do this?
I want to throw an exception of some sort. Only thing I can think of is to
force a
Wanted to add some more details to my problem. I have many jars that have
their
own config files. So I'd have to copy files for every jar. Can solr read from
the classpath (jar files)?
Yes my war is always deployed to the same location under webapps. I do already
have solr/home defined in
I can throw DataImportHandlerException (a runtime exception) from my
entityprocessor which will force a rollback.
Tri
From: Tri Nguyen tringuye...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Wed, February 9, 2011 3:50:05 PM
Subject: communication between
Tri:
You might want to consider, rather than going through DIH with your own
entity
processor, just using SolrJ in a separate process. That allows you much
finer
control over the behavior of your indexing process.
Making a connection to Solr via SolrJ and adding a one-field document is
maybe
Hi Charan,
Thanks for the clarifications.
The link I have been referring to(
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) does not say
anything about using the crawl? Do I have to do it after the last step
mentioned?
Thanks,
Abi
On Thu, Feb 10, 2011 at 12:58 AM, charan kumar
I tried the multi-core route and it gets too complicated and cumbersome to
maintain. That is just from my own personal testing...It was suggested that
each user have their own ID in a single index that you can query against
accordingly. In the example schema.xml I believe there is a field
Hi,
What is the significance of copy field when used in faceting .
plz explain with example.
Thanks!
Isha
What is facet.pivot field? PLz explain with example
76 matches
Mail list logo