Russian stopwords

2008-12-05 Thread tushar kapoor
I am trying to filter russian stopwords but have not been successful with that. I am using the following schema entry - . fieldType name=text class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory

Re: new faceting algorithm

2008-12-05 Thread Till Kinstler
Yonik Seeley schrieb: We'd love some feedback on how it works to ensure that it actually is a win for the majority and should be the default. I just did a quick test using Solr nightly 2008-11-30. I have an index of about 2.9 mil bibliographic records, size: 16G. I tested facetting author

JSONResponseWriter bug ? (solr-1.3)

2008-12-05 Thread Grégoire Neuville
Hi, I think I've discovered a bug with the JSONResponseWriter : starting from the following query - http://127.0.0.1:8080/solr-urbamet/select?q=(tout:1)rows=0sort=TITRE+descfacet=truefacet.query=SUJET:b*facet.field=SUJETfacet.prefix=bfacet.limit=1facet.missing=truewt=jsonjson.nl=arrarr - which

multiValued multiValued fields

2008-12-05 Thread Joel Karlsson
Hello, I want to index a field with an array of arrays, is that possible in Solr? I.e I have one multi-valued field with persons and would like one multi-valued field with their employer, but sometimes there are more than one employer per person and therefor it would've been good to use a

Can Solr follow links?

2008-12-05 Thread Joel Karlsson
Hello, Is there any way for Solr to follow links stored in my database and index the content of these files and HTTP-resources? Thanks in advance! // Joel

Re: new faceting algorithm

2008-12-05 Thread Andre Hagenbruch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Till Kinstler schrieb: Hi, I just did a quick test using Solr nightly 2008-11-30. I have an index of about 2.9 mil bibliographic records, size: 16G. I tested facetting author names, each index document may contain multiple author names, so

Re: Is there a clean way to determine whether a core exists?

2008-12-05 Thread Dean Thompson
Wow -- thanks for all the help!! With everyone's help, I did end up in a *much* better place: private static boolean solrCoreExists(String coreName, String solrRootUrl) throws IOException, SolrServerException { CommonsHttpSolrServer adminServer = new

Re: new faceting algorithm

2008-12-05 Thread Peter Keegan
Hi Yonik, May I ask in which class(es) this improvement was made? I've been using the DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a Lucene based app. to do faceting. Thanks, Peter On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A new

DataImportHandler - time stamp format in

2008-12-05 Thread Jae Joo
In the dataimport.properties file, there is the timespamp. #Thu Dec 04 15:36:22 EST 2008 last_index_time=2008-12-04 15\:36\:20 I am using the Oracle (10g) and would like to know which format of timestamp I have to use in Oracle. Thanks, Jae

Re: JSONResponseWriter bug ? (solr-1.3)

2008-12-05 Thread Yonik Seeley
Thanks for the report Grégoire, it definitely looks like a bug. Would you mind opening a JIRA issue for this? -Yonik On Fri, Dec 5, 2008 at 6:26 AM, Grégoire Neuville [EMAIL PROTECTED] wrote: Hi, I think I've discovered a bug with the JSONResponseWriter : starting from the following query -

Re: Solr on Solaris

2008-12-05 Thread Jae Joo
I do have same experience. What is the CPU in the Solaris box? it is not depending on the operating system (linux or Solaris). It is depenong on the CPU (Intel ro SPARC). Don't know why, but based on my performance test, SPARC machine requires MORE memory for java application. Jae On Thu, Dec 4,

RE: Solr on Solaris

2008-12-05 Thread Kashyap, Raghu
Jon, What do you mean by off a Zone? Please clarify -Raghu -Original Message- From: Jon Baer [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Solr on Solaris Just curious, is this off a zone by any chance? - Jon On Dec

RE: Solr on Solaris

2008-12-05 Thread Kashyap, Raghu
Hi Jae, Its intel based CPU. -Raghu -Original Message- From: Jae Joo [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2008 9:53 AM To: solr-user@lucene.apache.org Subject: Re: Solr on Solaris I do have same experience. What is the CPU in the Solaris box? it is not depending on

Re: new faceting algorithm

2008-12-05 Thread Rob Casson
very similar situation to those already reported. 2.9M bilbiographic records, with authors being the (previous) bottleneck, and the one we're starting to test with the new algorithm. so far, no load tests, but just in single requests i'm seeing the same improvements...phenomenal improvements,

Re: new faceting algorithm

2008-12-05 Thread Koji Sekiguchi
Peter, It is UnInvertedField class. See also: https://issues.apache.org/jira/browse/SOLR-475 Peter Keegan wrote: Hi Yonik, May I ask in which class(es) this improvement was made? I've been using the DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a Lucene based

RE: Russian stopwords

2008-12-05 Thread Steven A Rowe
Hi Tushar, On 12/05/2008 at 5:18 AM, tushar kapoor wrote: I am trying to filter russian stopwords but have not been successful with that. [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory

Re: Solr on Solaris

2008-12-05 Thread Jon Baer
Are you running Solr in a container more specifically, Ive had few issues w/ zones in the past and Solr (I believe there are some networking issues w/ older Solaris versions) ... They are basically where you can slice (virtualize) your resources and divide a box up into something similar

Re: Merging Indices

2008-12-05 Thread Shalin Shekhar Mangar
On Fri, Dec 5, 2008 at 5:09 AM, ashokc [EMAIL PROTECTED] wrote: The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? I think that would mean that the IndexReader and IndexWriter on that index are closed. 1. Do I need to stop SOLR

Re: DataImportHandler - time stamp format in

2008-12-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
I gguess you are trying to pass it in the SQL query. Tryit as it is . If oracle does not take it you can format the date according to what oracle likes http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7 On Fri, Dec 5, 2008 at 8:09 PM, Jae Joo [EMAIL

Re: Can Solr follow links?

2008-12-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
Look at http://wiki.apache.org/solr/DataImportHandler You may use an outer entity with SqlEntityProcessor and an inner entity with XPathEntityProcessor On Fri, Dec 5, 2008 at 5:35 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Hello, Is there any way for Solr to follow links stored in my

Re: Merging Indices

2008-12-05 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote: The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? If you do a commit, and then prevent updates, the index should be closed (no open IndexWriter). 1. Do I need to stop

Re: Merging Indices

2008-12-05 Thread ashokc
Thanks for the help Yonik Shalin.It really makes it easy for me if I do not have to stop/start the SOLR app during the merge operations. The reason I have to do this many times a day, is that I am implementing a simple-minded entity-extraction procedure for the content I am indexing. I have a

RE: Solr on Solaris

2008-12-05 Thread Kashyap, Raghu
Jon, We are running under tomcat. Thanks for the link I will check it out -Raghu -Original Message- From: Jon Baer [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2008 10:57 AM To: solr-user@lucene.apache.org Subject: Re: Solr on Solaris Are you running Solr in a container more

Re: IOException: Mark invalid while analyzing HTML

2008-12-05 Thread Dean Thompson
Was this one ever addressed? I'm seeing it in some small percentage of the documents that I index in 1.4-dev 708596M. I don't see a corresponding JIRA issue. James Brady-3 wrote: Hi, I'm seeing a problem mentioned in Solr-42, Highlighting problems with

Re: Solr on Solaris

2008-12-05 Thread Jeryl Cook
your out of memory :). each instance of an application server you can technically only allocate like 1024mb to the JVM, to take advantage of the memory you need to run multiple instances of the application server. are you using RAMDirectory with SOLR? On Thu, Dec 4, 2008 at 10:40 PM, Kashyap,

getting xml out of a SolrDocument ?

2008-12-05 Thread Dan Robin
I am using solrj to query solr and the QueryResponse.getResults() returns a SolrDocumentList. There is a SolrDocument in the list with the results I want. The problem is that I want to view these results as XML. How can I get the SolrDocument to give me XML? Thanks in advance. -Dan --

Re: Solr on Solaris

2008-12-05 Thread Glen Newton
When you are saying application server do you mean tomcat? If yes, I have allocated 8GB of heap to tomcat and it uses it all no problem (64 bit Intel/64 bit Java). -glen 2008/12/5 Jeryl Cook [EMAIL PROTECTED]: your out of memory :). each instance of an application server you can technically

Re: Stemmer vs. exact match

2008-12-05 Thread Grant Ingersoll
On Dec 4, 2008, at 8:19 PM, Jonathan Ariel wrote: Hi! I'm wondering what solr is really doing with the exact word vs. the stemmed word. So for example I have 2 documents. The first one has in the title the word convertible The second one has convert When solr stem the titles, both will be

Re: getting xml out of a SolrDocument ?

2008-12-05 Thread Erik Hatcher
I'd somehow pass through Solr's XML response, or perhaps consider using Solr's XSLT response writer to convert to the format you want. I don't have the magic incantation handy, but it should be possible to make a request through SolrJ and get the raw response string back in whatever

Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia
I've seen some strangle results in the last few days of testing, but this one flies in the face of everything I've read on this forum: Reducing filterCache size has increased performance. I have posted my setup here: http://www.nabble.com/Throughput-Optimization-td20335132.html. My original

Re: Smaller filterCache giving better performance

2008-12-05 Thread Mike Klaas
On 5-Dec-08, at 2:24 PM, wojtekpia wrote: I've seen some strangle results in the last few days of testing, but this one flies in the face of everything I've read on this forum: Reducing filterCache size has increased performance. This isn't really unexpected behaviour. The problem with a

Re: getting xml out of a SolrDocument ?

2008-12-05 Thread Yonik Seeley
On Fri, Dec 5, 2008 at 5:24 PM, Erik Hatcher [EMAIL PROTECTED] wrote: I'd somehow pass through Solr's XML response, or perhaps consider using Solr's XSLT response writer to convert to the format you want. I don't have the magic incantation handy, but it should be possible to make a request

Re: Smaller filterCache giving better performance

2008-12-05 Thread Yonik Seeley
On Fri, Dec 5, 2008 at 5:24 PM, wojtekpia [EMAIL PROTECTED] wrote: I've seen some strangle results in the last few days of testing, but this one flies in the face of everything I've read on this forum: Reducing filterCache size has increased performance. I have posted my setup here:

Re: Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia
Reducing the amount of memory given to java slowed down Solr at first, then quickly caused the garbage collector to behave badly (same issue as I referenced above). I am using the concurrent cache for all my caches. -- View this message in context:

Re: Dealing with field values as key/value pairs

2008-12-05 Thread Chris Hostetter
: So i'm basically looking for design pattern/best practice for that scenario : based on people's experience. I've taken two approaches in the past... 1) encode the id and the label in the field value; facet on it; require clients to know how to decode. This works really well for simple

Re: Ordering updates

2008-12-05 Thread Shalin Shekhar Mangar
On Fri, Dec 5, 2008 at 5:40 AM, Laurence Rowe [EMAIL PROTECTED] wrote: 2008/12/4 Shalin Shekhar Mangar [EMAIL PROTECTED]: I think we have a slight misunderstanding here. Because there are many CMS processes it is possible that the same document will be updated concurrently (from different