Re: solr setup

2006-03-20 Thread Yonik Seeley
Caused by: java.lang.RuntimeException: Can't find resource solrconfig.xml Hmmm, we've been moving around the config directory lately... what version are you working off of. Check that the example directory has ./solrconf in it. Then check that there is a ./solrconf/ from wherever you are

Re: Multiple indices

2006-03-21 Thread Yonik Seeley
On 3/21/06, Grant Ingersoll [EMAIL PROTECTED] wrote: I was wondering if it is possible to have one SOLR instance host multiple indices? Otherwise, I would need to deploy a separate WAR for every SOLR instance I want, correct? It's not currently possible. A fair amount would have to change

Re: SEVERE: java.lang.OutOfMemoryError: Java heap space

2006-03-22 Thread Yonik Seeley
On 3/22/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Occasionally when inserting I get the error message SEVERE: java.lang.OutOfMemoryError: Java heap space Any clues how to track down whenwhere it's happening? Or any good way I can get better clues how to track it down? What's the heap

Re: What is proper way to re-init index?

2006-03-27 Thread Yonik Seeley
Hi John, The error message undefined field form means Solr doesn't know about the form field. Did you copy your schema.xml to the example/solrconf directory and restart the app server? I tried your schema and doc, and didn't get the error you did. I got an error further down due to and invalid

Re: What is proper way to re-init index?

2006-03-27 Thread Yonik Seeley
, and that you restarted the server so it would be re-read? -Yonik -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, March 27, 2006 10:17 AM To: solr-user@lucene.apache.org Subject: Re: What is proper way to re-init index? Hi John, The error message undefined

Re: solr setup

2006-03-28 Thread Yonik Seeley
It might be easier to download a recent Tomcat 5.5 distribution and get it working with that first... then try with the bundled version of Tomcat once you understand how everything works. Thanks Yonik, maybe I should try that, though I now think that the configuration is not the main

Re: faceted browsing

2006-03-29 Thread Yonik Seeley
On 3/29/06, Clay Webster [EMAIL PROTECTED] wrote: How could faceted browsing be accomplished without [Chris's] metadata documents? The most basic form: consider if a field called category existed on each document. You could then ask for the counts of the top 10 values in category field for all

Re: highlighting

2006-04-04 Thread Yonik Seeley
It's probably best to focus on the ideal interface first (query parameters as input format, and desired XML output format). We might also want to keep termvectors in mind when thinking about this stuff... seems like they are related (per-field optional/extra data). -Yonik

Re: Solr Multisearcher

2006-04-05 Thread Yonik Seeley
On 4/5/06, Chris Hostetter [EMAIL PROTECTED] wrote: but the first step would probably be to provide the same level of functionality MultiSearcher Ahh, I was thinking the first step would be to try and use MultiSearcher via RemoteSearcher/RemoteSearchable. -Yonik

Re: Run solr on windows with IIS

2006-04-07 Thread Yonik Seeley
On 4/7/06, Mike Austin [EMAIL PROTECTED] wrote: When is the replication part done or what is it used for? I need to get more familar with that. It's not builtin to Solr, and it's only needed if you want a single master Solr instance that you update, and automated copying of the index that

Re: Run solr on windows with IIS

2006-04-07 Thread Yonik Seeley
It's not builtin to Solr, I should clarify that part... index replication/distribution is a part of Solr, but it's *very* loosely coupled (and not enabled or set-up by default). So you can come up with alternate ways of doing it if you need multiple searchers for high-availability or traffic

Re: highlighting

2006-04-18 Thread Yonik Seeley
On 4/18/06, Chris Hostetter [EMAIL PROTECTED] wrote: To add to that: when thinking about how clients will specify what extra info they want we should consider not only external clients using HTTP and the StandardRequestHandler, but also what the internal API looks like for people wanting to

Re: Solr is indexing XML only?

2006-04-27 Thread Yonik Seeley
On 4/27/06, David Trattnig [EMAIL PROTECTED] wrote: thank you so much! Could you also explain me how to use these two Tokenizers? Here's the HTMLStrip tokenizer description: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-031d5d370010955fdcc529d208395cd556f4a73e Read through

Re: Adding xml to SolrQueryResponse

2006-05-01 Thread Yonik Seeley
On 5/1/06, Mike Austin [EMAIL PROTECTED] wrote: Is there a way to add attributes besides name to an xml node returned from SolrQueryResponse? I've looked at the SolrQueryResponse.add and it looks like a NamedList is my only option. I know that I can get by with nodes that have only the name

Re: Java heap space

2006-05-04 Thread Yonik Seeley
On 5/3/06, Yonik Seeley [EMAIL PROTECTED] wrote: I just tried sending in 100,000 deletes and it didn't cause a problem: the memory grew from 22M to 30M. Random thought: perhaps it has something to do with how you are sending your requests? Yep, I was able to reproduce a memory problem w

Re: Java heap space

2006-05-04 Thread Yonik Seeley
I verified that Tomcat 5.5.17 doesn't experience this problem. -Yonik On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 5/3/06, Yonik Seeley [EMAIL PROTECTED] wrote: I just tried sending in 100,000 deletes and it didn't cause a problem: the memory grew from 22M to 30M. Random thought

Re: One big XML file vs. many HTTP requests

2006-05-12 Thread Yonik Seeley
On 5/12/06, Michael Levy [EMAIL PROTECTED] wrote: How efficient is making a separate HTTP request per-document, when there are millions of documents? If you use persistent connections and add make multiple requests in parallel, there won't be much difference than multiple docs per request.

Re: Java heap space

2006-05-15 Thread Yonik Seeley
On 5/15/06, Marcus Stratmann [EMAIL PROTECTED] wrote: The only situation I get OutOfMemory errors is after an optimize when the server performs an auto-warming of the cahces: A single filter that is big enough to be represented as a bitset (3000 in general) will take up 1.3MB Some ways to

Re: Documentation?

2006-05-16 Thread Yonik Seeley
On 5/16/06, Jeff Rodenburg [EMAIL PROTECTED] wrote: I was checking around the solr site and pages at apache.org and wasn't finding much. Make sure you don't miss the Wiki, there's a fair amount of stuff there. -Yonik

Re: Newbie with problems getting Solr to run on Tomcat

2006-05-16 Thread Yonik Seeley
On 5/16/06, Morten Fangel [EMAIL PROTECTED] wrote: Could someone just guide me with a step by step install of the example from solr-nightly/example into Tomcat. For development, It's probably easiest to download Tomcat Core (in tgz or zip format) from

Re: fresh start question: exception running the demo - element 'web-app' not found

2006-05-16 Thread Yonik Seeley
On 5/16/06, Doron Cohen [EMAIL PROTECTED] wrote: Btw, it seems the post.sh script has a type - for the demo add-docs stage to work, the URL should be set by URL=http://localhost:8983/solr/update; (rather than URL=http://localhost:7070/update;. Hmmm, another victim of the downgrade. It's fixed

Re: Separate config and index per webapp

2006-05-17 Thread Yonik Seeley
/solr/conf/index (In my case I'm testing under Windows and it's C:\Tomcat 5.5\solr\data\index) Any ideas? Thanks. Mike Baranczak wrote: On May 15, 2006, at 3:26 PM, Yonik Seeley wrote: On 5/15/06, Michael Levy [EMAIL PROTECTED] wrote: I'd like to use Solr for a number of separate projects

Re: Multiple uniqueKey fields

2006-05-22 Thread Yonik Seeley
On 5/22/06, Nick Snels [EMAIL PROTECTED] wrote: is it possible to define multiple uniqueKey fields in schema.xml? Not currently, as one can normally get by with a single uniqueKey field. If you have multiple unique key fields on each document you can make a compound key. If you have multiple

Re: Multiple uniqueKey fields

2006-05-22 Thread Yonik Seeley
On 5/22/06, Nick Snels [EMAIL PROTECTED] wrote: Thanks for the answer. I don't quit get how to make a compound key (but I'll give it a try). I have multiple tables with an id, but if I put all of them in a Solr index, the id isn't a unique field. You have the different type of documents in the

Re: solrconfig environment variable

2006-05-23 Thread Yonik Seeley
On 5/23/06, maustin75 [EMAIL PROTECTED] wrote: Ahh.. ok.. java -D solr.solr.home=/myhome/solr -jar start.jar - That will work. It won't if you put a space after the -D ;-) java -Dsolr.solr.home=/myhome/solr -jar start.jar -Yonik

Re: Range vs Term lookup

2006-05-30 Thread Yonik Seeley
On 5/30/06, maustin75 [EMAIL PROTECTED] wrote: I'm doing a search based on price and was wondering what the performance difference would be between these two queries: 1) +price:[0 TO 20] 2) +price:4567 Basically, to do a search with a range or pre-determine the range and do a search based on

Re: Range vs Term lookup

2006-05-30 Thread Yonik Seeley
On 5/30/06, maustin75 [EMAIL PROTECTED] wrote: The same speed if they are in Solr's cache :-) Range query will be slightly slower, but if it becomes a bottleneck or not depends on the total complexity of the queries/requests. What does the cache use as a key to determine if it is cached?

Re: Graduation, and SoC

2006-05-31 Thread Yonik Seeley
On 5/31/06, Chris Hostetter [EMAIL PROTECTED] wrote: : I also saw the two student proposals for using Solr and Lucene to index : Apache mailing lists. That project should already have started. Where : is the code for that going to go? I'd like to see how that's going. I had no idea there had

Re: solr newbie

2006-06-01 Thread Yonik Seeley
Hi Tim, Curl is a little command-line networking tool. The easiest way to get it is cygwin if you are not on a UNIX system. See the 'Requirements section of the tutorial: 3. On Win32, cygwin, for shell support. (If you plan to use Subversion on Win32, be sure to select the subversion package

Re: solr newbie

2006-06-01 Thread Yonik Seeley
On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote: Don't understand what web category means. SH. The cygwin installer has different categories of packages... base,devel,etc. If you are looking for the curl package, it should be filed under web. It's not installed by default, so you need to

Re: !Solr

2006-06-01 Thread Yonik Seeley
Thanks for the report Karl, much appreciated. It looks like a problem with your servlet container/JVM not liking the XML entity ../../../conf/web.external.xml in the web.xml I guess the IBM JVM uses some stricter XML parsing rules or something. If you remove that from the web.xml, it should be

Re: solr newbie

2006-06-01 Thread Yonik Seeley
On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote: I'll need to install cygwin again I think. Thanks. Don't uninstall cygwin... just re-run the cygwin setup.exe and it will do incremental updates, installing packages that have changed, and allowing you to select new packages to install.

Re: solr newbie

2006-06-01 Thread Yonik Seeley
On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote: I found the web options. Thank you very much. While that is installing incrementally, two last questions. Are there any example stylesheets to review to see how the data flows into the layout? How would one go about injecting database

Re: solr newbie

2006-06-01 Thread Yonik Seeley
On 6/2/06, Darren Vengroff [EMAIL PROTECTED] wrote: I wrote just such a client within the last 24h to support load-testing Solr for my application. The client stub is simple and independent of my particular application, so it would be easy for me to contribute it if there is interest. It has

Re: stylesheet issue

2006-06-02 Thread Yonik Seeley
On 6/2/06, Tim Archambault [EMAIL PROTECTED] wrote: I've got solr installed and running, with only one failure left to date. Whenver I try to select a stylesheet for my search, I get an error message such as this: Hi Tim, There is no stylesheet :-) It's a hold-over from an old XML format

Re: stylesheet issue

2006-06-02 Thread Yonik Seeley
On 6/2/06, Tim Archambault [EMAIL PROTECTED] wrote: That'll be fine. As you can probably tell, I'm not a programmer. I am just a dangerous end-user with expertise in marketing online operations trying to save a buck. I am going to try to learn XSL or if that doesn't work, I'll bastardize the

Re: List of indexed terms for a field

2006-06-07 Thread Yonik Seeley
On 6/7/06, Paul Terray [EMAIL PROTECTED] wrote: I am trying to make an index: Is there any way to get a list of all indexed terms for a field (especially a string or text one)? Hi Paul, There isn't currently a way to do this, except perhaps writing your own custom request handler and using the

Re: embedding solr in a webapp?

2006-06-07 Thread Yonik Seeley
On 6/7/06, Joachim Martin [EMAIL PROTECTED] wrote: We are looking at running read-only solr nodes embedded in our webapp nodes. This would give us the additional features of solr over lucene, but would keep it in memory and reduce the overhead of http/xml transport of results. Looks like we

Re: Finding documents with undefined field

2006-06-07 Thread Yonik Seeley
On 6/7/06, Erik Hatcher [EMAIL PROTECTED] wrote: Solr's DocSets are a better way to go in the long run, I'm convinced - I'm just now starting to leverage them in other ways. Some random performance numbers... when I enabled HashDocSet support, performance of CNET shoppers faceted browsing

Re: OutOfMemory error while sorting

2006-06-14 Thread Yonik Seeley
On 6/14/06, Chris Hostetter [EMAIL PROTECTED] wrote: Off the top of my head, i don't remember if omiting norms for fields reduces the amount of resident memory needed by the index It does indeed. 1 byte per document for the indexed field. -Yonik

Re: custom query response writer

2006-06-15 Thread Yonik Seeley
On 6/15/06, Erik Hatcher [EMAIL PROTECTED] wrote: Having a way to hook into the response writing by leveraging the ever improving Solr codebase and its utilities rather than copy/pasting would be a nice way to aim, I think. It's a double edged sword. Making more things public facilitates

Re: Error posting document

2006-06-20 Thread Yonik Seeley
On 6/20/06, Kerry Wilson [EMAIL PROTECTED] wrote: I am getting the following error when trying to post any document. Hi Kerry, could you provide an example document that show this? -Yonik

Re: Using Lucene index in Solr

2006-06-21 Thread Yonik Seeley
On 6/21/06, Tricia Williams [EMAIL PROTECTED] wrote: I was wondering if there are any major differences in building an index using Lucene and Solr. If there is no substantial differences, how would one go about using an existing index created using Lucene in Solr? You can definitely do

Re: Faceted Browsing questions

2006-06-24 Thread Yonik Seeley
On 6/24/06, Erik Hatcher [EMAIL PROTECTED] wrote: This weekend :) I have imported more data than my hacked implementation can handle without bumping up Jetty's JVM heap size, so I'm now at the point where it is necessary for me to start using the LRUCache. Though I have already refactored to

Re: Splitting and matching words

2006-06-25 Thread Yonik Seeley
On 6/25/06, Eric Jain [EMAIL PROTECTED] wrote: I'd like to have PowerShot, powershot and power-shot match each other. Solr has a WordDelimiterFilter, which works quite well, except that powershot still won't match PowerShot (tokenized into power (shot powershot), so power powershot would

Re: Splitting and matching words

2006-06-25 Thread Yonik Seeley
On 6/25/06, Yonik Seeley [EMAIL PROTECTED] wrote: 1) a new QueryParser smart enough to make a boolean query instead of a MultiPhraseQuery. Power Shot OR PowerShot Thinking about this option a bit more... The problem is ambiguity. Sometimes a MultiPhraseQuery is the correct interpretation

Re: Faceted Browsing questions

2006-06-28 Thread Yonik Seeley
On 6/26/06, Chris Hostetter [EMAIL PROTECTED] wrote: : My next challenge is to re-implement the catch-all facets that I used : to do by unioning all documents in an (Open)BitSet and inverting it. : How can I invert a DocSet (I realize I gat get the bits and do it : that way, but is there a

Re: ApacheCon EU slides

2006-07-11 Thread Yonik Seeley
On 7/11/06, Tim Archambault [EMAIL PROTECTED] wrote: Yonik, Thanks for the slides. Quick question. I'm looking at a new hosting provider for our newspaper website. When reviewing your High Availability slide, I see a lot of redundancy which is great, but it is not within my budget constraints

Re: Is solr scalable with respect to number of documents?

2006-07-11 Thread Yonik Seeley
On 7/11/06, Wang, Ningjun (LNG-NPV) [EMAIL PROTECTED] wrote: Is SOLR scalable with respect to number of documents? Suppose I have billions of documents that need to be indexed. I cannot store them on one single machine. I have to spread them over to several machines. Can I issues a search over

Re: including time as a factor in relevance?

2006-07-15 Thread Yonik Seeley
On 7/15/06, WHIRLYCOTT [EMAIL PROTECTED] wrote: I need to have my search result relevance influenced by time. Older things in my index are less relevant than newer things. I don't want to do a strict sort by date. Is this supported somehow by using a dismax request handler? Or if you have

Solr has integrated highlighting

2006-07-15 Thread Yonik Seeley
Those who aren't subscribed to solr-dev may be interested to know that the lucene highlighter has been integrated into Solr, for both the standard request handler, and the dismax handler. See the highlight, highlightFields, and maxSnippets params documented here:

Re: Cyrillic characters

2006-07-18 Thread Yonik Seeley
OK, lets split up the indexing side from the query side for a moment and assume that you are indexing correctly (setting the content-type correctly, etc). I just added a new value to the multi-valued features field to the solr.xml example document: Good unicode support: héllo (hello with an

Re: Cyrillic characters

2006-07-18 Thread Yonik Seeley
Definitely some Firefox bugs with UTF8 at least: If I go to the admin screen, and paste in héllo into the query box, then kill Solr and run netcat to see exactly what I get, it's the following: $ nc -l -p 8983 GET /solr/select/?stylesheet=q=h%E9lloversion=2.1start=0rows=10indent=on HT TP/1.1

Re: Recompilation of latest lucene seems to break update of Solr

2006-07-26 Thread Yonik Seeley
Hi Tom, I had fixed the LUCENE-545 backward incompatability in Lucene here: http://issues.apache.org/jira/browse/LUCENE-609 Although it shouldn't be neccessary, maybe it would work if you put the new lucene libs in solr's lib dir and rebuild solr? -Yonik On 7/26/06, Tom Weber [EMAIL PROTECTED]

Re: Doc add limit

2006-07-26 Thread Yonik Seeley
could have detected low memory and tried to reload the webapp. -Yonik On 7/26/06, sangraal aiken [EMAIL PROTECTED] wrote: Thanks for you help Yonik, I've responded to your questions below: On 7/26/06, Yonik Seeley [EMAIL PROTECTED] wrote: It's possible it's not hanging, but just takes a long time

Re: Doc add limit

2006-07-26 Thread Yonik Seeley
updates while the server is hung... weird I know. Thanks for all your help, I'll send a post if/when I find a solution. -S On 7/26/06, Yonik Seeley [EMAIL PROTECTED] wrote: Tomcat problem, or a Solr problem that is only manifesting on your platform, or a JVM or libc problem, or even a client

Solr's JSON, Python, Ruby output format

2006-07-27 Thread Yonik Seeley
Solr now has a JSON response format, in addition to Python and Ruby versions that can be directly eval'd. http://wiki.apache.org/solr/SolJSON -Yonik

Re: Doc add limit

2006-07-27 Thread Yonik Seeley
On 7/26/06, sangraal aiken [EMAIL PROTECTED] wrote: I removed everything from the Add xml so the docs looked like this: doc field name=id187880/field /doc doc field name=id187852/field /doc and it still hung at 6,144... Maybe you can try the following simple Python client to try and rule out

Re: Doc add limit

2006-07-27 Thread Yonik Seeley
You might also try the Java update client here: http://issues.apache.org/jira/browse/SOLR-20 -Yonik

Re: Doc add limit

2006-07-27 Thread Yonik Seeley
On 7/27/06, sangraal aiken [EMAIL PROTECTED] wrote: Commenting out the following line in SolrCore fixes my problem... but of course I don't get the result status info... but this isn't a problem for me really. -Sangraal writer.write(result status=\ + status + \/result); While it's possible

Re: Doc add limit

2006-07-28 Thread Yonik Seeley
It may be some sort of weird interaction with persistent connections and timeouts (both client and server have connection timeouts I assume). Does anything change if you remove your .disconnect() call (it shouldn't be needed). Do you ever see any exceptions in the client side? The code you show

Re: Doc add limit

2006-07-28 Thread Yonik Seeley
writing something multi-threaded. -Andrew Bertrand Delacretaz wrote: On 7/28/06, Yonik Seeley [EMAIL PROTECTED] wrote: ...Getting all the little details of connection handling correct can be tough... it's probably a good idea if we work toward common client libraries so everyone doesn't have

Re: Incremental updates/Sorting problem

2006-08-08 Thread Yonik Seeley
On 8/8/06, bo_b [EMAIL PROTECTED] wrote: As mentioned in another post i am trying to index a vbulletin database containing roughly 7 million posts. The very first query where I apply sorting after a full indexing, seems to take roughly QTime264998/QTime ms. Subsequent searches are fast. I

Re: [ANN] Solr article on xml.com

2006-08-10 Thread Yonik Seeley
On 8/10/06, Bertrand Delacretaz [EMAIL PROTECTED] wrote: FYI, xml.com just published an article that I wrote after testing Solr in the last few weeks: Solr: Indexing XML with Lucene and REST http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html It's basic stuff, but

Re: Searching with access controls

2006-08-10 Thread Yonik Seeley
On 8/10/06, Martyn Smith [EMAIL PROTECTED] wrote: I'm trying to index data in a system that implements some rather nasty access controls on the data. Basically, there are users, and communities, and users are members of the communities. Potentially a user could be a member of hundreds or even

Re: SOlr crashes

2006-08-14 Thread Yonik Seeley
On 8/14/06, Chris Hostetter [EMAIL PROTECTED] wrote: Something else to consider is using the compound file format to reduce the number of files for your index. this is mentioned in the Lucen FAQ... Yeah, although unless you have a *lot* of fields with norms, I'd sooner reduce the mergeFactor

Re: dismax and field indicators

2006-08-16 Thread Yonik Seeley
On 8/16/06, Chris Hostetter [EMAIL PROTECTED] wrote: yeah ... the dismax handler is designed this way -- it doesn't support the full syntax of the lucene QueryParser, instead it treats the text input as literal text the user is searching for, I think there is value in this type of feature... A

Re: Viewing Lucene indexes generated by Solr

2006-08-17 Thread Yonik Seeley
On 8/17/06, Ken Krugler [EMAIL PROTECTED] wrote: Hi all, I have a Lucene index generated by Solr. After optimizing it, I'm able to view it using LIMO. But when I download it to my Mac and try to use Luke, it fails - Luke complains that: .../index/_jdi4.f1 (No such file or Directory). The

Re: Possible bug in copyField

2006-08-25 Thread Yonik Seeley
On 8/25/06, jason rutherglen [EMAIL PROTECTED] wrote: When doing a copyField into a text field that is supposed to be stemmed I'm not seeing the stemming occur. How did you determine that stemming didn't occur? -Yonik

Re: Possible bug in copyField

2006-08-28 Thread Yonik Seeley
On 8/28/06, Chris Hostetter [EMAIL PROTECTED] wrote: : By looking at what is stored. Has this worked for others? the stored value of a field is allways going to be the pre-analyzed text -- that's why the stored values in your text fields still have upper case characters and stop words. And

Re: Add doc limit - Follow Up

2006-08-30 Thread Yonik Seeley
On 8/29/06, sangraal aiken [EMAIL PROTECTED] wrote: The problem only occurs when adding docs that contain ![CDATA[]] tags in the body of the field tag. The problem also only seems to cause an add limit on an individual post. I limited the size of my HTTP posts to 5000 documents per post, and the

Re: Error in faceted browsing

2006-09-13 Thread Yonik Seeley
On 9/13/06, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks for the heads up on the merchant_name. I would probably just keep a dictionary in memory, but if I wanted to pull the stored merchant_name back, how would/can I do that? If you don't want merchant_name tokenized at all, just change

Re: Faceted Searching problems

2006-09-13 Thread Yonik Seeley
On 9/13/06, Erik Hatcher [EMAIL PROTECTED] wrote: Would it ever make sense to generate facets on a tokenized field? Maybe the facet implementation could throw an error if the field name specified is tokenized? I think it probably can make sense... - finding top terms in a full-text field that

Re: Faceted Searching problems

2006-09-13 Thread Yonik Seeley
On 9/13/06, Erik Hatcher [EMAIL PROTECTED] wrote: You need to use an untokenized field for facets. At least 3 answers in 5 minutes... we should try synchronized swimming ;-) -Yonik

Re: Index term list - using facets ?

2006-09-15 Thread Yonik Seeley
On 9/15/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/15/06, Paul Terray [EMAIL PROTECTED] wrote: I think I am close to have a list of index terms, using facet searching. However, I still have a question: I would like to limit the terms to a query. My goal is to do a simple google suggest

Re: redundant Solr servers

2006-09-15 Thread Yonik Seeley
On 9/15/06, Mike Baranczak [EMAIL PROTECTED] wrote: Solr has a replication scheme built-in: http://wiki.apache.org/solr/CollectionDistribution Wow, that was easy. This looks like exactly what I need. I had somehow completely missed that. The main pain of it all is configuration (of cron,

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Yonik Seeley
On 9/18/06, Michael Imbeault [EMAIL PROTECTED] wrote: Just a little follow-up - I did a little more testing, and the query takes 20 seconds no matter what - If there's one document in the results set, or if I do a query that returns all 13 documents. Yes, currently the same strategy is

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Yonik Seeley
On 9/18/06, Michael Imbeault [EMAIL PROTECTED] wrote: Yonik Seeley wrote: For cases like author, if there is only one value per document, then a possible fix is to use the field cache. If there can be multiple occurrences, there doesn't seem to be a good way that preserves exact counts

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Yonik Seeley
I just updated the comments in solrconfig.xml: !-- Cache used by SolrIndexSearcher for filters (DocSets), unordered sets of *all* documents that match a query. When a new searcher is opened, its caches may be prepopulated or autowarmed using data from caches in the old

Re: strange highlighting behavior

2006-09-19 Thread Yonik Seeley
On 9/19/06, Brian Lucas [EMAIL PROTECTED] wrote: The unusual characters on lst name=… are what I can't figure out, as it DEFINITELY is not the id. I've tried indexed id with integer, sint, and string all with the same result. Yes, looks like you hit a bug where you are seeing the indexed

Re: strange highlighting behavior

2006-09-19 Thread Yonik Seeley
On 9/19/06, Yonik Seeley [EMAIL PROTECTED] wrote: The fix would be to use FieldType.indexedToReadable() to convert the indexed form back to a readable form. Oops, that should be storedToReadable since the id is obtained from the stored fields, not from the index. Hmmm, a quick look

Re: strange highlighting behavior

2006-09-19 Thread Yonik Seeley
On 9/19/06, Brian Lucas [EMAIL PROTECTED] wrote: Converting to 'integer' and deleting/reindexing fixed it. Can 'sint' be used for the id with highlighting, or does one need to use integer or string for that? It should be usable (but I personally haven't tested that). If it's not, it's a bug

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Yonik Seeley
On 9/19/06, Chris Hostetter [EMAIL PROTECTED] wrote: Quick Question: did you say you are faceting on the first name field seperately from the last name field? ... why? You'll probably see a sharp increase in performacne if you have a single untokenized author field containing hte full name and

Re: wana use CJKAnalyzer

2006-09-20 Thread Yonik Seeley
On 9/20/06, James liu [EMAIL PROTECTED] wrote: My step to support CJK...: 1:add lucene-analyzers-2.0.0.jar to C:\cygwin\tmp\solr-nightly\lib 2:use cmd, cd C:\cygwin\tmp\solr-nightly,ant dist 3:copy C:\cygwin\tmp\solr-nightly\dist\solr- 1.0.war to

Re: Solr Newbie question: doubts about dynamic filed

2006-09-20 Thread Yonik Seeley
On 9/20/06, Marcio Pinto Motta [EMAIL PROTECTED] wrote: I have some doubts about dynamic fields, when we add a doc with a new dynamic filed, this new dynamic filed is only append to doc's that will have it defined in the xml, or for every document in the index? Just for documents that the

Re: Default XML Output Schema

2006-09-21 Thread Yonik Seeley
On 9/21/06, sangraal aiken [EMAIL PROTECTED] wrote: Perhaps a silly questions, but I'm wondering if anyone can tell me why solr outputs XML like this: During the initial development of Solr (2004), I remember throwing up both options, and most developers preferred to have a limited number of

Re: wana use CJKAnalyzer

2006-09-21 Thread Yonik Seeley
On 9/21/06, Chris Hostetter [EMAIL PROTECTED] wrote: : i just wanna say: no your help,maybe i will give up.thk u again. : : http://www.flickr.com/photos/[EMAIL PROTECTED]/248815068/ : thk Hoss,Nick Snels,Koji,Mike and everybody who helped me and wanna help : me.. : : i can use solr

Re: Reloading solrconfig.xml

2006-09-21 Thread Yonik Seeley
On 9/21/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: What's the best way to dynamically change solrconfig.xml and have the changes take effect? Everything would need to be designed for that, and it's currently not. You might be able to reload the config, but all the classes that looked at

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Michael Imbeault [EMAIL PROTECTED] wrote: It turns out that journal_name has 17038 different tokens, which is manageable, but first_author has 400 000. I don't think this will ever yield good performance, so i might only do journal_name facets. Hang in there Michael, a fix is on

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Michael Imbeault [EMAIL PROTECTED] wrote: Btw, Any plans for a facets cache? Maybe a partial one (like caching top terms to implement some other optimizations). My general philosophy on caching in Solr has been to cache things the client can't: elemental things, or *parts* of

Re: Reloading solrconfig.xml

2006-09-21 Thread Yonik Seeley
On 9/21/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Thanks, that's actually simpler and it will work for me. Since I'm thinking of only changing mergeFactor and friends on the fly, I suppose I'd only need to modify Master's solrconfig.xml. Is this for testing or something? I could think of

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Yonik Seeley [EMAIL PROTECTED] wrote: Hang in there Michael, a fix is on the way for your scenario (and subscribe to solr-dev if you want to stay on the bleeding edge): OK, the optimization has been checked in. You can checkout from svn and build Solr, or wait for the 9-22 nightly

Re: Fixed first hits - custom RequestHandler?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: I have a situation where I want certain documents to appear at the top of the hit list for certain searches, regardless of their score. One can think of it as the ads right on top of Google's search results (but I'm not dealing with ads).

Re: Fixed first hits - custom RequestHandler?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Yonik Seeley [EMAIL PROTECTED] wrote: You could make anything with an isSpecial boolean field appear first: search_field:java; score desc, special desc Oops, that should be search_field:java; special desc, score desc score desc should be the secondary sort, or whatever you

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Yonik Seeley
On 9/22/06, Tim Archambault [EMAIL PROTECTED] wrote: I have a couple of questions from some online newspaper folks who are interested in Solr and are trying to understand how and why it came to be. I think inherent in these questions is the underlying theme I hear all the time and that is Solr

Re: wana use CJKAnalyzer

2006-09-22 Thread Yonik Seeley
On 9/22/06, Walter Underwood [EMAIL PROTECTED] wrote: This might be a Solr bug. Solr should be able to accept XML in any of the required encodings (ASCII, Latin 1, UTF-8, and UTF-16). Getting XML content types exactly right is tricky, see RFC 3023. Right now Solr pays attention to Content-type

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Yonik Seeley
On 9/22/06, Tim Archambault [EMAIL PROTECTED] wrote: I've been talking with other papers about Solr and I think what bothers many is that there a is a deposit of information in a structured database here [named A], then we have another set of basically the same data over here [named B] and they

Re: Extending Solr's Admin functionality

2006-09-25 Thread Yonik Seeley
On 9/23/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: How about another approach - expose all Solr admin data via HTTP/XML, just like it's done with search requests? Things like the stats page should already be XML with a stylesheet (for exactly the reasons you mention). IIRC, the XML may be

Re: Multiple schemas

2006-09-26 Thread Yonik Seeley
On 9/26/06, climbingrose [EMAIL PROTECTED] wrote: Am I right that we can only have one schema per solr server? If so, how would you deal with the issue of submitting completely different data models (such as clothes and cars)? If they have no relation, put them in separate servers or webapps.

Re: Extending Solr's Admin functionality

2006-09-26 Thread Yonik Seeley
On 9/26/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: On the other hand, some people I talked to also expressed interest in JMX, so I'd encourage Simon to make that contribution. I'm also interested in JMX. It has different adapters, including an HTTP one AFAIK, but I don't know how easy it

  1   2   3   4   5   6   7   8   9   10   >