Where is NGramFilter?

2011-02-09 Thread Kai Schlamp
Hi. On the Sunspot (a Ruby Solr client) Wiki (https://github.com/outoftime/sunspot/wiki/Matching-substrings-in-fulltext-search) it says that the NGramFilter should allow substring indexing. As I never got it working, I searched a bit and found this site:

high cpu usage

2011-02-09 Thread Erez Zarum
Hello, We have been running read only solr instances for a few months now, yesterday i have noticed an high cpu usage coming from the JVM, it simply use 100% of the CPU for no reason. Nothing was changed, we are using Jetty as a Servlet container for solr. Where can i start looking what cause

Re: General question about Solr Caches

2011-02-09 Thread Savvas-Andreas Moysidis
Hi Hoss, Ok, that makes much more sense now. I was under the impression that values were copied as well which seemed a bit odd.. unless you have to deal with a use case similar to yours. :) Cheers, - Savvas On 9 February 2011 02:25, Chris Hostetter hossman_luc...@fucit.org wrote: : In my

IndexOutOfBoundsException

2011-02-09 Thread Dominik Lange
hi, we have a problem with our solr test instance. This instance is running with 90 cores with about 2 GB of Index-Data per core. This worked fine for a few weeks. Now we get an exception querying data from one core : java.lang.IndexOutOfBoundsException: Index: 104, Size: 11 at

Maintain stopwords.txt and synonyms.txt

2011-02-09 Thread Timo Schmidt
Hello together, i am currently developing a search solution, based on Apache Solr. Currently I have the problem that I want to offer the user the possibility to maintain synonyms and stopwords in a userfriendy tool. But currently I could not find any possibility to write the stopwords.txt or

Re: Maintain stopwords.txt and synonyms.txt

2011-02-09 Thread Stefan Matheis
Timo, On Wed, Feb 9, 2011 at 11:07 AM, Timo Schmidt timo.schm...@aoemedia.de wrote: But currently I could not find any possibility to write the stopwords.txt or synonyms.txt. what about writing the Files from an external Application and reload your Solr Core!? Seemed to be the simplest way to

AW: Maintain stopwords.txt and synonyms.txt

2011-02-09 Thread Timo Schmidt
Hi Stefan, i allready thought about that. Maybe some php service or something like that. But this would mean, that I need additional software on that server like a normal Apache installation, which needs to be maintained. That's why I thought a solution that is build into solr would be nice.

Re: Maintain stopwords.txt and synonyms.txt

2011-02-09 Thread Stefan Matheis
Hi Timo, of course - that's right. Write some JSP (i guess) which could be integrated in the already existing jetty/tomcat Server? Just wondering about, how do you perform Search-Requests to Solr? Normally, there is already any other Service running, which acts as 'proxy' to the outer world? ;)

Re: Nutch and Solr search on the fly

2011-02-09 Thread Markus Jelsma
The parsed data is only sent to the Solr index of you tell a segment to be indexed; solrindex crawldb linkdb segment If you did this only once after injecting and then the consequent fetch,parse,update,index sequence then you, of course, only see those URL's. If you don't index a segment

Re: [WKT] Spatial Searching

2011-02-09 Thread Grant Ingersoll
The show stopper for JTS is it's license, unfortunately. Otherwise, I think it would be done already! We could, since it's LGPL, make it an optional dependency, assuming someone can stub it out. On Feb 8, 2011, at 11:18 PM, Adam Estrada wrote: I just came across a ~nudge post over in the

Re: [WKT] Spatial Searching

2011-02-09 Thread Estrada Groups
How could i stub this out not being a java guy? What is needed in order to do this? Licensing is always going to be an issue with JTS which is why I am interested in the project SIS sitting in incubation right now. I willing to put forth the effort if I had a little direction from the peanut

AW: Maintain stopwords.txt and synonyms.txt

2011-02-09 Thread Timo Schmidt
Yes we have something, but on another machine. Timo Schmidt Entwickler (Diplom Informatiker FH) AOE media GmbH Borsigstr. 3 65205 Wiesbaden Germany Tel. +49 (0) 6122 70 70 7 - 234 Fax. +49 (0) 6122 70 70 7 -199 e-Mail: timo.schm...@aoemedia.de Web: http://www.aoemedia.de/ Pflichtangaben

Re: [WKT] Spatial Searching

2011-02-09 Thread Estrada Groups
Thought I would share this on web mapping...it's a great write up and something to consider when talking about working with spatial data. http://www.tokumine.com/2010/09/20/gis-data-payload-sizes/ Adam On Feb 9, 2011, at 7:03 AM, Grant Ingersoll gsing...@apache.org wrote: The show stopper

Re: Where is NGramFilter?

2011-02-09 Thread Koji Sekiguchi
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory There is only EdgeNGramFilterFactory listed (which I got working for prefix indexing), but no NGramFilterFactory. Is that filter not supported anymore, or is that list not up to date? It should be there.

Re: [WKT] Spatial Searching

2011-02-09 Thread Adam Estrada
Grant, How could i stub this out not being a java guy? What is needed in order to do this? Licensing is always going to be an issue with JTS which is why I am interested in the project SIS sitting in incubation right now. I'm willing to put forth the effort if I had a little direction on

Re: [WKT] Spatial Searching

2011-02-09 Thread Adam Estrada
Thought I would share this on web mapping...it's a great write up and something to consider when talking about working with spatial data. http://www.tokumine.com/2010/09/20/gis-data-payload-sizes/ Adam On Feb 9, 2011, at 7:03 AM, Grant Ingersoll wrote: The show stopper for JTS is it's

Re: Maintain stopwords.txt and synonyms.txt

2011-02-09 Thread Stefan Matheis
Timo, then use cronjobs on your solr-machine to fetch the generated synonyms-file, put in to the correct location and reload the core-configuration (which is required to update the synonyms-file)? :) Regards Stefan On Wed, Feb 9, 2011 at 1:15 PM, Timo Schmidt timo.schm...@aoemedia.de wrote:

AW: IndexOutOfBoundsException

2011-02-09 Thread André Widhani
I think we had a similar exception recently when attempting to sort on a multi-valued field ... could that be possible in your case? André -Ursprüngliche Nachricht- Von: Dominik Lange [mailto:dominikla...@searchmetrics.com] Gesendet: Mittwoch, 9. Februar 2011 10:55 An:

Re: high cpu usage

2011-02-09 Thread Erick Erickson
You can try attaching jConsole to the process to see what it shows. If you're on a *nix box you can get a gross idea what's going on with top. Best Erick On Wed, Feb 9, 2011 at 4:31 AM, Erez Zarum e...@icinga.org.il wrote: Hello, We have been running read only solr instances for a few months

AW: IndexOutOfBoundsException

2011-02-09 Thread Dominik Lange
No, we do not have multivalued fields and we do not sort (in this case). We reindexed csv file and the error disappeared, but it would we interesting why this error occured... Thank you for you suggestion. Dominik -Ursprüngliche Nachricht- Von: André Widhani

Re: Where is NGramFilter?

2011-02-09 Thread Erick Erickson
In addition to Koji's note, see the bold comment at the top of that page that says that this not a complete list, the definitive list is always the javadocs... Best Erick On Wed, Feb 9, 2011 at 3:34 AM, Kai Schlamp schl...@gmx.de wrote: Hi. On the Sunspot (a Ruby Solr client) Wiki (

Re: TermVector query using Solr Tutorial

2011-02-09 Thread Ryan Chan
Hello, On Tue, Feb 8, 2011 at 11:12 PM, Grant Ingersoll gsing...@apache.org wrote: It's a little hard to read due to the indentation, but AFAICT you have two terms, usb and cabl.  USB appears at position 0 and cabl at position 1.   Those are the relative positions to each other.  Perhaps you

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Markus, I am sorry for not being clear, I meant to say that... Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in turn contain links to a.html, b.html, c.html, d.html) is injected into the seed.txt, after the whole process I was expecting a bunch of other pages which

Re: Nutch and Solr search on the fly

2011-02-09 Thread Erick Erickson
WARNING: I don't do Nutch much, but could it be that your crawl depth is 1? See: http://wiki.apache.org/nutch/NutchTutorial http://wiki.apache.org/nutch/NutchTutorialand search for depth Best Erick On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :. ab1s...@gmail.com wrote: Hi Markus, I am sorry

Re: Nutch and Solr search on the fly

2011-02-09 Thread Markus Jelsma
Are you using the depth parameter with the crawl command or are you using the separate generate, fetch etc. commands? What's $ nutch readdb crawldb -stats returning? On Wednesday 09 February 2011 15:06:40 .: Abhishek :. wrote: Hi Markus, I am sorry for not being clear, I meant to say

Re: Does Distributed Search support {!boost }?

2011-02-09 Thread Yonik Seeley
On Tue, Feb 8, 2011 at 9:02 PM, Andy angelf...@yahoo.com wrote: Is it possible to do a query like {!boost b=log(popularity)}foo over sharded indexes? Yep, that should work fine. -Yonik http://lucidimagination.com

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Erick, Thanks a bunch for the response Could be a chance..but all I am wondering is where to specify the depth in the whole entire process in the URL http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried specifying it during the fetcher phase but it was just ignored :(

Solr 1.4.1 using more memory than Solr 1.3

2011-02-09 Thread Rachita Choudhary
Hi Solr Users, We are in the process of upgrading from Solr 1.3 to Solr 1.4.1. While performing stress test on Solr 1.4.1 to measure the performance improvement in Query times (QTime) and no more blocked threads, we ran into memory issues with Solr 1.4.1. Test Setup details: - 2 identical hosts

Re: Solr 1.4.1 using more memory than Solr 1.3

2011-02-09 Thread Markus Jelsma
Searching and sorting is now done on a per-segment basis, meaning that the FieldCache entries used for sorting and for function queries are created and used per-segment and can be reused for segments that don't change between index updates. While generally beneficial, this can lead to increased

RE: Concurrent updates/commits

2011-02-09 Thread Jonathan Rochkind
Solr does handle concurrency fine. But there is NOT transaction isolation like you'll get from an rdbms. All 'pending' changes are (conceptually, anyway) held in a single queue, and any commit will commit ALL of them. There isn't going to be any data corruption issues or anything from

RE: Concurrent updates/commits

2011-02-09 Thread Pierre GOSSE
However, the Solr book, in the Commit, Optimise, Rollback section reads: if more than one Solr client were to submit modifications and commit them at similar times, it is possible for part of one client's set of changes to be committed before that client told Solr to commit which suggests

Re: Concurrent updates/commits

2011-02-09 Thread Savvas-Andreas Moysidis
Hello, Thanks very much for your quick replies. So, according to Pierre, all updates will be immediately posted to Solr, but all commits will be serialised. But doesn't that contradict Jonathan's example where you can end up with FIVE 'new indexes' being warmed? If commits are serialised, then

Re: Concurrent updates/commits

2011-02-09 Thread Walter Underwood
Don't think commit, that is confusing. Solr is not a database. In particular, it does not have the isolation property from ACID. Solr indexes new documents as a batch, then installs a new version of the entire index. Installing a new index isn't instant, especially with warming queries. Solr

Re: Concurrent updates/commits

2011-02-09 Thread Em
Hi Savvas, well, although it sounds strange: If a commit happens, a new Index Searcher is warming. If a new commit happens while a 'new' Index Searcher is warming, another Index Searcher is warming. So, at this point of time, you got 3 Index Searchers: The old one, the 'new' one and the newest

RE: Concurrent updates/commits

2011-02-09 Thread Pierre GOSSE
Well, Jonathan explanations are much more accurate than mine. :) I took the word serialization as meaning kind of isolation between commits, which is not very smart. Sorry to have introduce more confusion in this. Pierre -Message d'origine- De : Savvas-Andreas Moysidis

Re: Concurrent updates/commits

2011-02-09 Thread Savvas-Andreas Moysidis
Yes, we'll probably go towards that path as our index files are relatively small, so auto warming might not be extremely useful in our case.. Yep, we do realise the difference between a db and a Solr commit. :) Thanks. On 9 February 2011 16:15, Walter Underwood wun...@wunderwood.org wrote:

Re: Concurrent updates/commits

2011-02-09 Thread Savvas-Andreas Moysidis
Thanks very much Em. - Savvas On 9 February 2011 16:22, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Yes, we'll probably go towards that path as our index files are relatively small, so auto warming might not be extremely useful in our case.. Yep, we do realise the

Query regarding search term count in Solr

2011-02-09 Thread Rahul Warawdekar
Hi All, This is Rahul and am using Solr for one of my upcoming projects. I had a query regarding search term count using Solr. We have a requirement in one of our search based projects to search the results based on search term counts per document. For eg, if a user searches for something like

Re: Nutch and Solr search on the fly

2011-02-09 Thread charan kumar
Hi Abishek, depth is a param of crawl command, not fetch command If you are using custom script calling individual stages of nutch crawl, then depth N means , you running that script for N times.. You can put a loop, in the script. Thanks, Charan On Wed, Feb 9, 2011 at 6:26 AM, .: Abhishek :.

QueryWeight for Solr

2011-02-09 Thread Em
Hello folks, I got a question regarding an own QueryWeight implementation for a special usecase. For the current usecase we want to experiment with different values for the idf based on different algorithms and how they affect the scoring. Is there a way to plug-in an own weight-implementation

Re: QueryWeight for Solr

2011-02-09 Thread Yonik Seeley
On Wed, Feb 9, 2011 at 12:16 PM, Em mailformailingli...@yahoo.de wrote: For the current usecase we want to experiment with different values for the idf based on different algorithms and how they affect the scoring. For tf, idf, lengthNorm, coord, etc, see Similarity. Solr already alows you to

Re: Query regarding search term count in Solr

2011-02-09 Thread Erick Erickson
I suspect it's worthwhile to back up and ask whether this is a reasonable requirement. What is the use-case? Because unless the input is very uniform, I wouldn't be surprised if this will produce poor results. For instance, if solr appears once in a field 5 words long and 5 times in another

Re: Solr Out of Memory Error

2011-02-09 Thread Bing Li
Dear Adam, I also got the OutOfMemory exception. I changed the JAVA_OPTS in catalina.sh as follows. ... if [ -z $LOGGING_MANAGER ]; then JAVA_OPTS=$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager else JAVA_OPTS=$JAVA_OPTS -server -Xms8096m -Xmx8096m

Re: QueryWeight for Solr

2011-02-09 Thread Em
Hi Yonik, thanks for the fast feedback. Well, as far as I can see there is no possibility to get the original query from the similarity-class... Let me ask differently: I know there are some distributed idf-implementations out there. One approach is to ask every shard for its idf for a term

Re: Solr Out of Memory Error

2011-02-09 Thread Markus Jelsma
Bing Li, One should be conservative when setting Xmx. Also, just setting Xmx might not do the trick at all because the garbage collector might also be the issue here. Configure the JVM to output debug logs of the garbage collector and monitor the heap usage (especially the tenured generation)

Re: Solr Out of Memory Error

2011-02-09 Thread Markus Jelsma
I should also add that reducing the caches and autowarm sizes (or not using them at all) drastically reduces memory consumption when a new searcher is being prepares after a commit. The memory usage will spike at these events. Again, use a monitoring tool to get more information on your

Changing value of start parameter affects numFound?

2011-02-09 Thread mrw
I have a data set indexed over two irons, with M docs per Solr core for a total of N cores. If I perform a query across all N cores with start=0 and rows=30, I get, say, numFound=27521). If I simply change the start param to start=27510 (simulating being on the last page of data), I get a

Re: QueryWeight for Solr

2011-02-09 Thread Yonik Seeley
On Wed, Feb 9, 2011 at 1:18 PM, Em mailformailingli...@yahoo.de wrote: How do they store these idfs for the current request so that the similarity is aware of them? The df (as opposed to idf) is requested from the searcher by the weight, which then uses the similarity to produce the idf. See

Re: QueryWeight for Solr

2011-02-09 Thread Em
Thanks, again. :) Okay, so if one wants a distributed idf one should extend a searcher instead of the query-class. But it doesn't seem to be pluggable, right? Well, for our purposes extending the query-class is enough, but just from beeing curious: Where should one starts if one wants to make

Re: Changing value of start parameter affects numFound?

2011-02-09 Thread mrw
mrw wrote: I have a data set indexed over two irons, with M docs per Solr core for a total of N cores. If I perform a query across all N cores with start=0 and rows=30, I get, say, numFound=27521). If I simply change the start param to start=27510 (simulating being on the last page of

Re: QueryWeight for Solr

2011-02-09 Thread Yonik Seeley
On Wed, Feb 9, 2011 at 1:51 PM, Em mailformailingli...@yahoo.de wrote: Okay, so if one wants a distributed idf one should extend a searcher instead of the query-class. Yes. If you're interested in distributed search for Solr, there is a patch in progress:

Re: Changing value of start parameter affects numFound?

2011-02-09 Thread Yonik Seeley
On Wed, Feb 9, 2011 at 1:42 PM, mrw mikerobertsw...@gmail.com wrote: I have a data set indexed over two irons, with M docs per Solr core for a total of N cores. If I perform a query across all N cores with start=0 and rows=30, I get, say, numFound=27521).  If I simply change the start param

Architecture decisions with Solr

2011-02-09 Thread Greg Georges
Hello all, I am looking into an enterprise search solution for our architecture and I am very pleased to see all the features Solr provides. In our case, we will have a need for a highly scalable application for multiple clients. This application will be built to serve many users who each will

Re: Architecture decisions with Solr

2011-02-09 Thread Darren Govoni
What about standing up a VM (search appliance that you would make) for each client? If there's no data sharing across clients, then using the same solr server/index doesn't seem necessary. Solr will easily meet your needs though, its the best there is. On Wed, 2011-02-09 at 14:23 -0500, Greg

RE: Architecture decisions with Solr

2011-02-09 Thread Greg Georges
From what I understand about multicore, each of the indexes are independant from each other right? Or would one index have access to the info of the other? My requirement is like you mention, a client has access only to his or her search data based in their documents. Other clients have no

Re: Architecture decisions with Solr

2011-02-09 Thread Glen Newton
This application will be built to serve many users If this means that you have thousands of users, 1000s of VMs and/or 1000s of cores is not going to scale. Have an ID in the index for each user, and filter using it. Then they can see only their own documents. Assuming that you are building an

solr render biased search result

2011-02-09 Thread cyang2010
Hi, I am asked that whether solr renders biased search result? For example, for this search (query all movie title by this Comedy genre), for user who indicates a preference to 1950's movies, solr renders the 1950's movies with higher score (top in the list)?Or if user is a kid, then the

Re: Architecture decisions with Solr

2011-02-09 Thread Sujit Pal
Another option (assuming the case where a user can be granted access to a certain class of documents, and more than one user would be able to access certain documents) would be to store the access filter (as an OR query of content types) in an external cache (perhaps a database or an eternal cache

Re: solr render biased search result

2011-02-09 Thread Paul Libbrecht
Cyang, why can't you, for a kid, add a boosting query genre:kid^2.0 aside of the rest? That would double the score of a match if the users are kids. But note that you'd better calibrate the coefficient with some test battery. This is part of the fine art, I think. paul Le 9 févr. 2011 à

solr current workding directory or reading config files

2011-02-09 Thread Tri Nguyen
Hi, I have a class (in a jar) that reads from properties (text) files.  I have these files in the same jar file as the class. However, when my class reads those properties files, those files cannot be found since solr reads from tomcat's bin directory. I don't really want to put the config

pre and post processing when building index

2011-02-09 Thread Tri Nguyen
Hi, I'm scheduling solr to build every hour or so. I'd like to do some pre and post processing for each index build.  The preprocessing would do some checks and perhaps will skip the build. For post processing, I will do some checks and either commit or rollback the build. Can I write some

DataImportHandler: regex debugging

2011-02-09 Thread Jon Drukman
I am trying to use the regex transformer but it's not returning anything. Either my regex is wrong, or I've done something else wrong in the setup of the entity. Is there any way to debug this? Making a change and waiting 7 minutes to reindex the entity sucks. entity name=boxshot

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2011-02-09 Thread pravin
Hello, Andy, so did you get final answer to your quetion? I am also trying to do something similar. Please give me pointers if you have any. Basically even I need to use Ngram with WhitespaceTokenizer any help will be appreciated. -- View this message in context:

Why does the StatsComponent only work with indexed fields?

2011-02-09 Thread Travis Truman
Is there a reason why the StatsComponent only deals with indexed fields? I just updated the wiki: http://wiki.apache.org/solr/StatsComponent to call this fact out since it was not apparent previously. I've briefly skimmed the source of StatsComponent, but am not familiar enough with the code or

Re: solr render biased search result

2011-02-09 Thread cyang2010
That makes sense. It is a little bit indirect. You have to translate that user preference/profile into a search field value and then dictate search result boosting the doc with that preference value. -- View this message in context:

Re: Why does the StatsComponent only work with indexed fields?

2011-02-09 Thread Erick Erickson
What kinds of information would you expect for a stored-only field? I mean, the stored part is just a blob that Solr doesn't peek inside of, so I'm not sure what useful information *could* be returned Best Erick On Wed, Feb 9, 2011 at 3:55 PM, Travis Truman trum...@gmail.com wrote: Is

Re: solr render biased search result

2011-02-09 Thread Erick Erickson
What *could* solr do for you? You've outlined a domain-specific requirement, I'm not sure how a general-purpose search engine would incorporate that functionality Best Erick On Wed, Feb 9, 2011 at 4:08 PM, cyang2010 ysxsu...@hotmail.com wrote: That makes sense. It is a little bit

Re: solr current workding directory or reading config files

2011-02-09 Thread Wilkes, Chris
Is your war always deployed the the same location, ie /usr/mycomp/ myapplication/webapps/myapp.war? If so then on startup copy the files out of your directory and put them under CATALINA_BASE/solr (usr/ mycomp/myapplication/solr) and in your war file have the META-INF/ context.xml JNDI

communication between entity processor and solr DataImporter

2011-02-09 Thread Tri Nguyen
Hi, I'd like to communicate errors between my entity processor and the DataImporter in case of error. Should there be an error in my entity processor, I'd like the index build to rollback. How can I do this? I want to throw an exception of some sort.  Only thing I can think of is to force a

Re: solr current workding directory or reading config files

2011-02-09 Thread Tri Nguyen
Wanted to add some more details to my problem.  I have many jars that have their own config files.  So I'd have to copy files for every jar.  Can solr read from the classpath (jar files)? Yes my war is always deployed to the same location under webapps.  I do already have solr/home defined in

Re: communication between entity processor and solr DataImporter

2011-02-09 Thread Tri Nguyen
I can throw DataImportHandlerException (a runtime exception) from my entityprocessor which will force a rollback. Tri From: Tri Nguyen tringuye...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, February 9, 2011 3:50:05 PM Subject: communication between

Re: communication between entity processor and solr DataImporter

2011-02-09 Thread Erick Erickson
Tri: You might want to consider, rather than going through DIH with your own entity processor, just using SolrJ in a separate process. That allows you much finer control over the behavior of your indexing process. Making a connection to Solr via SolrJ and adding a one-field document is maybe

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Charan, Thanks for the clarifications. The link I have been referring to( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) does not say anything about using the crawl? Do I have to do it after the last step mentioned? Thanks, Abi On Thu, Feb 10, 2011 at 12:58 AM, charan kumar

Re: Architecture decisions with Solr

2011-02-09 Thread Adam Estrada
I tried the multi-core route and it gets too complicated and cumbersome to maintain. That is just from my own personal testing...It was suggested that each user have their own ID in a single index that you can query against accordingly. In the example schema.xml I believe there is a field

Faceting Query

2011-02-09 Thread Isha Garg
Hi, What is the significance of copy field when used in faceting . plz explain with example. Thanks! Isha

Faceting Query

2011-02-09 Thread Isha Garg
What is facet.pivot field? PLz explain with example