Re: Using EmbeddedSolrServer with static documents

2011-04-05 Thread vinodreddyr17
You can unmarshall the xml docs using jaxb and use the pojo adding capabilities of solr to index the doc. You may need to create the classes from the schema using xjc tool. -- View this message in context:

RE: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Ephraim Ofir
I'm not sure about the scale you're aiming for, but you probably want to do both sharding and replication. There's no central server which would be the bottleneck. The guidelines should probably be something like: 1. Split your index to enough shards so it can keep up with the update rate. 2.

Why did you trash Wiki page Troubleshooting HTTP Status 404 - missing core name in path?

2011-04-05 Thread Gabriele Kahlout
Hello, As I had the same problem I went to the wiki looking for the page to solve my problem again, and there under recent changes I found that you had trashed it. I can still solve my problem but why don't you keep it for others to benefit from too? As linked it's a recurring problem for

Re: Mongo REST interface and full data import

2011-04-05 Thread Stefan Matheis
andrew, you're really wondering why the XPathEntityProcessor does not work well, with a JSON-Structure !? The Links Erick posted are stating, that you could push JSON-structured Data to a Solr-HTTP Interface .. but not, that the DataImport Handler will work with them. IIRC there is no way for

Re: Mongo REST interface and full data import

2011-04-05 Thread andrew_s
Hi Stefan, Thanks for clear explanation. I've used XPathEntityProcessor as an example, because didn't found JSON entity processor. I'll write a script to generate XML file for data import. Regards, Andrew -- View this message in context:

Solrj and display which Solr version is used

2011-04-05 Thread Marc SCHNEIDER
Hi, I'm wondering how to find out which version of Solr is currently running using the Solrj library? Thanks, Marc.

RE: Using MLT feature

2011-04-05 Thread Frederico Azeiteiro
Sorry, the reply I made yesterday was directed to Markus and not the list... Here's my thoughts on this. At this point I'm a little confused if SOLR is a good option to find near duplicate docs. Yes there is, try set overwriteDupes to true and documents yielding the same signature will be

Re: Solrj performance bottleneck

2011-04-05 Thread rahul
Thanks Stefan and Victor ! we are using GWT for front end. We stopped issuing multiple asynchronous queries and issue a request and fetch results and then filter the results based on what has been typed subsequent to the request and then re trigger the request only if we don't get the expected

Re: Using MLT feature

2011-04-05 Thread Markus Jelsma
On Tuesday 05 April 2011 12:19:33 Frederico Azeiteiro wrote: Sorry, the reply I made yesterday was directed to Markus and not the list... Here's my thoughts on this. At this point I'm a little confused if SOLR is a good option to find near duplicate docs. Yes there is, try set

normalizing the score

2011-04-05 Thread Paul Libbrecht
Hello list, I did not find a wiki page about normalization. All I found was: http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score where Hoss suggests to normalize depending on the maxScore. I am not comfortable with that since, at least, I want that

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread François Schiettecatte
And if you have control over machine placement, split them across racks so that a power outage on one rack does not take out your search cluster. François On Apr 5, 2011, at 3:19 AM, Ephraim Ofir wrote: I'm not sure about the scale you're aiming for, but you probably want to do both sharding

Re: Matching on a multi valued field

2011-04-05 Thread Michael Sokolov
Could you try creating fields dynamically: common_names_1, common_names_2, etc. Keep track of the max number of fields and generate queries listing all the fields? Gross, but it handles all the cases mentioned in the thread (wildcards, phrases, etc). -Mike On 3/29/2011 4:57 PM, Brian

Different Result for the same query depending on using SolrServer or SolrCore ?

2011-04-05 Thread Amel Fraisse
Hello every body, I am using Solr for indexing and searching. I am using 2 classes for searching document: In the first one I'm instanciating a SolrServer to search documents as follows : server = new EmbeddedSolrServer(coreContainer, ); server.add(doc); query.setQuery(id:+idDoc);

RE: Problems indexing very large set of documents

2011-04-05 Thread Brandon Waterloo
It wasn't just a single file, it was dozens of files all having problems toward the end just before I killed the process. IPADDR - - [04/04/2011:17:17:03 +] POST /solr/update/extract?literal.id=32-130-AFB-84commit=false HTTP/1.1 500 4558 IPADDR - - [04/04/2011:17:17:05 +] POST

Re: Matching on a multi valued field

2011-04-05 Thread Renaud Delbru
Hi, you could try the SIREn plugin [1] which supports multi-valued fields. [1] http://siren.sindice.com -- Renaud Delbru On 29/03/11 21:57, Brian Lamb wrote: Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false /

RE: Using MLT feature

2011-04-05 Thread Frederico Azeiteiro
Thank you, I'll try to create a c# method to create the same sig of SOLR, and then compare both sigs before index the doc. This way I can avoid the indexation of existing docs. If anyone needs to use this parameter (as this info is not on the wiki), you can add the option str

Re: Using MLT feature

2011-04-05 Thread Markus Jelsma
If you check the code for TextProfileSignature [1] your'll notice the init method reading params. You can set those params as you did. Reading Javadoc [2] might help as well. But what's not documented in the Javadoc is how QUANT is computed; it rounds. [1]:

question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Nemani, Raj
All, I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive search. One of the words that got indexed as part my indexing process is después. Having used the ASCIIFoldingFilterFactory,I expected that If I searched for word despues I should have the document containing the

Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Ben Davies
I can't remember where I read it, but I think MappingCharFilterFactory is prefered. There is an example in the example schema. charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ From this, I get: org.apache.solr.analysis.MappingCharFilterFactory

Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread lboutros
Is there any Stemming configured in for this field in your schema configuration file ? Ludovic. 2011/4/5 Nemani, Raj [via Lucene] ml-node+2780463-48954297-383...@n3.nabble.com All, I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive search. One of the words that got

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Steven A Rowe
I added this test method locally to TestASCIIFoldingFilter.java in the Lucene/Solr 3.1.0 source tree, and it passed, so the filter is not the problem (and the Solr factory certainly isn't either - it's just a wrapper) - I second Ludovic's question - you must have other filters configured:

Re: Why did you trash Wiki page Troubleshooting HTTP Status 404 - missing core name in path?

2011-04-05 Thread Chris Hostetter
: As I had the same problem I went to the wiki looking for the page to solve : my problem again, and there under recent changes I found that you had : trashed it. I'm confused -- the page did not have any troubleshooting suggestions or advice, it was just the details of a specific -- it seemed

Indexing data with Trade Mark Symbol

2011-04-05 Thread mechravi25
Hi, Has anyone indexed the data with Trade Mark symbol??...when i tried to index, the data appears as below. Data: 79797 - Siebel Research– AI Fund, 79797 - Siebel Research– AI Fund,l Original Data: 79797 - Siebel Research™ AI Fund, Please help me to resolve this Regards, Ravi

Problem with qf in solrconfig with an embedded server

2011-04-05 Thread belokys
Hello everyone! I need your help. I have tried to add a qf that agregate a boost to a field in my queries by solrconfig.xml. I have tested the solution in a solr server running in standalone mode and it runs perfectly but when I try to do it on a embedded server, the query doesn´t returns me

Re: Indexing data with Trade Mark Symbol

2011-04-05 Thread Markus Jelsma
Any word delimiter filter will get rid of that symbol. Use a char pattern replace filter, that should work. Use admin/analysis.jsp to see which filter is removing it. Configure a field type appropriate to what you want to index. On Mon, Apr 4, 2011 at 9:55 AM, mechravi25

Re: help with Jetty log message

2011-04-05 Thread Kaufman Ng
Looks like you are using openjdk. Can you try using Sun jdk? On Mon, Apr 4, 2011 at 6:53 AM, Upayavira u...@odoko.co.uk wrote: This is not Solr crashing, per se, it is your JVM. I personally haven't generally had much success debugging these kinds of failure - see whether it happens again,

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Nemani, Raj
Here is the field type definition for ‘text’ field which is what I am using for the indexed fields. Can you guys notice any obvious filter that could be the issue? --- fieldType name=text class=solr.TextField

Re: Why did you trash Wiki page Troubleshooting HTTP Status 404 - missing core name in path?

2011-04-05 Thread Gabriele Kahlout
Oh I see. I unfortunately didn't see your earlier email. Thank you! On Tue, Apr 5, 2011 at 6:41 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : As I had the same problem I went to the wiki looking for the page to solve : my problem again, and there under recent changes I found that you

Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Markus Jelsma
It's not the ASCII folding filter but the stemmer that's removing some trailing characters. Something you can easily spot on the analysis page. Here is the field type definition for ‘text’ field which is what I am using for the indexed fields. Can you guys notice any obvious filter that could

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread lboutros
Your analyzer contains these two filters : before : So two things : The words you are testing are not english words (no ?), so the stemming will have strange behavior. If you really want to remove accents, try to put the ASCIIFoldingFilterFactory before the two others. Ludovic. -

Re: Problems indexing very large set of documents

2011-04-05 Thread Anuj Kumar
Hi Brandon, Sorry, I can't make out much here. The exception gives TIKA error that signifies the parsing issue with PDF. That's all I can make out. May be someone else on this mailing list can help. Sorry. - Anuj On Tue, Apr 5, 2011 at 6:35 PM, Brandon Waterloo brandon.water...@matrix.msu.edu

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread lboutros
this analyzer seems to work : I used Spanish stemming, put the ASCIIFoldingFilterFactory before the stemming filter and added it in the

Script to remove all index.* leftovers

2011-04-05 Thread William Bell
There is a bug that leaves old index.* directories in the Solr data directory. Here is a script that will clean it up. I wanted to make sure this is okay, without doing a core reload. Thanks. #!/bin/bash DIR=/mnt/servers/solr/data LIST=`ls $DIR` INDEX=`cat $DIR/index.properties | grep index\=

Re: Script to remove all index.* leftovers

2011-04-05 Thread Markus Jelsma
Hi, This seems alright as it leaves the current index in place, doesn't mess with the spellchecker and leave the properties alone. But, there are two problems: 1. it doesn't take into account the commitsToKeep value set in the deletion policy, and; 2. it will remove any directory to which a

RE: Problems indexing very large set of documents

2011-04-05 Thread Chris Hostetter
: It wasn't just a single file, it was dozens of files all having problems : toward the end just before I killed the process. ... : That is by no means all the errors, that is just a sample of a few. : You can see they all threw HTTP 500 errors. What is strange is, nearly : every

Re: dismax boost query not useful?

2011-04-05 Thread Chris Hostetter
Short answer: the existence is entirely historic. I added bq because i needed it, and then i added bf because the _val_:... syntax was anoying. : can't think of a useful case when I want to both *add* a component to : the ultimate score, and for that component to be a non-function query :

apache-solr-3.1 slow stats component queries

2011-04-05 Thread Johannes Goll
Hi, thank you for making the new apache-solr-3.1 available. I have installed the version from http://apache.tradebit.com/pub//lucene/solr/3.1.0/ and am running into very slow stats component queries (~ 1 minute) for fetching the computed sum of the stats field url:

Keywords/terms mutual exclusion

2011-04-05 Thread Octavian Covalschi
Hi there, I'm trying to use Solr in one of my projects and I've got a small problem that I can't figure out. Basically our application is collecting data submitted by users. Now the problem is that submitted data may contain some incorrect info, like some keywords that will mess up search

Re: Keywords/terms mutual exclusion

2011-04-05 Thread Jonathan Rochkind
I don't completely understand. I think maybe you replaced your domain-specific actualities with another example in an attempt to be more general or not reveal your business, but just made your explanation even more confusing! But. At the point you are indexing, is it possible to know that

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Nemani, Raj
Thank you so much. I will give this a try. Thanks again everybody for your help Raj -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: Tuesday, April 05, 2011 2:28 PM To: solr-user@lucene.apache.org Subject: RE: question on solr.ASCIIFoldingFilterFactory this

what happens to docsPending if stop solr before commit

2011-04-05 Thread Robert Petersen
Hello fellow enthusiastic solr users, I tried to find the answer to this simple question online, but failed. I was wondering about this, what happens to uncommitted docsPending if I stop solr and then restart solr? Are they lost? Are they still there but still uncommitted? Do they get

ConcurrentLRUCache$Stats error

2011-04-05 Thread Paul
I'm using solr 1.4.1 and just noticed a bunch of these errors in the solr.log file: SEVERE: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/common/util/ConcurrentLRUCache$Stats;)V They appear to

Re: ConcurrentLRUCache$Stats error

2011-04-05 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-1797 I'm using solr 1.4.1 and just noticed a bunch of these errors in the solr.log file: SEVERE: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/c

Re: Keywords/terms mutual exclusion

2011-04-05 Thread Octavian Covalschi
Yes, you may be right sorry for the confusion. Our ultimate goal is to collect user entered data, with least possible interaction (users are lazy you know) from them. So basically users just point out where they found that particular item, and app's job is to index it and later show it in search

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-04-05 Thread Jan Høydahl
Hi, Just curious, was there any resolution to this? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. feb. 2011, at 03.40, Markus Jelsma wrote: Do you have GC logging enabled? Tail -f the log file and you'll see what CMS is telling you. Tuning the occupation

Eclipse: Invalid character constant

2011-04-05 Thread Eric Grobler
Hi Everyone, Some language specific classes like GermanLightStemmer has invalid character compiler errors for code like: switch(s[i]) { case 'ä': case 'à ': case 'á': in Eclipse with JDK 1.6 How do I get rid of these errors? Thanks Regards Ericz

Re: Eclipse: Invalid character constant

2011-04-05 Thread Robert Muir
in eclipse you need to set your project's character encoding to UTF-8. if you are checking out the source code from svn, you can run 'ant eclipse' from the top level, and then hit refresh on your project. it will set your encoding and your classpath up. On Tue, Apr 5, 2011 at 6:10 PM, Eric

Re: Eclipse: Invalid character constant

2011-04-05 Thread Eric Grobler
Hi Robert, Thanks for the fast response! I used https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/ but did not find 'ant eclipse'. However setting my projects Resouce encoding to UTF-8 worked. Thanks for your help and have a nice day :-) Regards Ericz On Tue, Apr 5, 2011

Re: Eclipse: Invalid character constant

2011-04-05 Thread Stefan Matheis
Eric, have a look at Line #67 in build.xml :) target name=eclipse description=Setup Eclipse configuration -- Only available with SVN checkout Regards Stefan Am 06.04.2011 00:28, schrieb Eric Grobler: Hi Robert, Thanks for the fast response! I used

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-04-05 Thread Simon Wistow
On Wed, Apr 06, 2011 at 12:05:57AM +0200, Jan Høydahl said: Just curious, was there any resolution to this? Not really. We tuned the GC pretty aggressively - we use these options -server -Xmx20G -Xms20G -Xss10M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode

Re: Eclipse: Invalid character constant

2011-04-05 Thread Eric Grobler
Hi Stefan, Thanks for the information. I used Checkout Projects from SVN inside eclipse which does not have the root build.xml file. What does this eclipse build actually do? Thanks Regards Eric On Tue, Apr 5, 2011 at 11:34 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Eric,

Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Jens Mueller
Hello Ephraim, thank you so much for the great Document/Scaling-Concept!! First I think you really should publish this on the solr wiki. This approach is nowhere documented there and not really obvious for newbies and your document is great and explains this very well! Please allow me to

Synonym-time Reindexing Issues

2011-04-05 Thread Preston Marshall
Hello all, I am having an issue with Solr and the SynonymFilterFactory. I am using a library to interface with Solr called sunspot. I realize that is not what this list is for, but I believe this may be an issue with Solr, not the library (plus the lib author doesn't know the answer). I am

Re: Script to remove all index.* leftovers

2011-04-05 Thread William Bell
Thank you for pointing out #2. The commitsToKeep is interesting, but I thought each commit would create a segment (before optimized) and be self contained in the index.* directory? I would only run this on the slave. Bill On Tue, Apr 5, 2011 at 2:54 PM, Markus Jelsma markus.jel...@openindex.io

RE: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Tirthankar Chatterjee
Hi Jen, Can you please forward the diagram attachment too that Ephraim sent. :-) Thanks, Tirthankar -Original Message- From: Jens Mueller [mailto:supidupi...@googlemail.com] Sent: Tuesday, April 05, 2011 10:30 PM To: solr-user@lucene.apache.org Subject: Re: FW: Very very large scale

Embedded Solr constructor not returning

2011-04-05 Thread Greg Pendlebury
Hi All, I'm hoping this is a reasonably trivial issue, but it's frustrating me to no end. I'm putting together a tiny command line app to write data into an index. It has no web based Solr running against it; the index will be moved at a later time to have a proper server instance start for

Re: Embedded Solr constructor not returning

2011-04-05 Thread Greg Pendlebury
Hmmm, after being stuck on this for hours, I find the answer myself 15minutes after asking for help... as usual. :) For anyone interested, and no doubt this will not be a revelation for some, I need the servlet API in my app for it to work, despite being command line. So adding this to the maven

Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Isan Fulia
Hi Ephraim/Jen, Can u share that diagram with all.It may really help all of us. Thanks, Isan Fulia. On 6 April 2011 10:15, Tirthankar Chatterjee tchatter...@commvault.comwrote: Hi Jen, Can you please forward the diagram attachment too that Ephraim sent. :-) Thanks, Tirthankar

Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Jonathan DeMello
I third that request. Would greatly appreciate taking a look at that diagram! Regards, Jonathan On Wed, Apr 6, 2011 at 9:12 AM, Isan Fulia isan.fu...@germinait.com wrote: Hi Ephraim/Jen, Can u share that diagram with all.It may really help all of us. Thanks, Isan Fulia. On 6 April 2011