Re: wana use CJKAnalyzer

2006-09-22 Thread James liu
2006/9/23, Walter Underwood <[EMAIL PROTECTED]>: On 9/21/06 5:37 PM, "James liu" <[EMAIL PROTECTED]> wrote:> Yes,it working. the root of my problem is xml muse be encoded by utf-8.> if use php,it not about www browser. just notice that > curl header information must be utf-8.> if use post.sh,xml mu

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault
Amen Hoss. I appreciated you explaining in terms of what I can understand, "jobs." Makes it easier for me to learn. What you are saying is right-on with what I'm trying to understand. Right now I have simple Lucene Indexes that basically re-created once daily and that simply isn't doing the job

Re: relational design in solr?

2006-09-22 Thread Chris Hostetter
: The best example I can think of is a resume database. You could : certainly just put the whole resume : document into the text index and do full text searches. But to answer : the question of what people : received a Harvard MBA in the last 10 years and have worked at Intel in : the last 5 yea

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Chris Hostetter
: I've been talking with other papers about Solr and I think what bothers many : is that there a is a deposit of information in a structured database here : [named A], then we have another set of basically the same data over here : [named B] and they don't understand why they have to manage to dif

Re: Simple Faceted Searching out of the box

2006-09-22 Thread J.J. Larrea
Regarding XML databases, there is an excellent open-source XML database 'eXist' which currently uses indexes to speed up both structure-based and content-based retrieval via XQuery; there are plans on their development roadmap to replace parts of the indexing mechanism, particularly fulltext ana

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Yonik Seeley
On 9/22/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Excellent news; as you guessed, my schema was (for some reason) set to version 1.0. Yeah, I just realized that having "version" right next to "name" would lead people to think it's "their" version number, when it's really Solr's version nu

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Michael Imbeault
Excellent news; as you guessed, my schema was (for some reason) set to version 1.0. This also caused some of the problems I had with the original SolrPHP (parsing the wrong response). But better yet, the 800 seconds query is now running in 0.5-2 seconds! Amazing optimization! I can now do face

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault
Okay. We are all on the same page. I just don't express myself as well in "programming speak" yet. I'm going to read up on Otis' "Lucene in Action" tonight. I'd swear he had an example of how to inject records into a lucene index using java and sql. Maybe I'm wrong though. On 9/22/06, Walter U

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Joachim Martin
I think you will find that this architecture is quite common. What commercial packages provide (remember you are getting this for free!) are the tools for managing the dynamic export of data out of your database into the full-text search engine. Solr provides a very easy way to do this, but ye

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Walter Underwood
Sorry, I was not being exact with "store". Lucene has separate control over whether the value of a field is stored and whether it is indexed. The term "nurse" might be searchable, but the only value that is stored in the index for retrieval is the database key for each matching job. It seems like

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Yonik Seeley
On 9/22/06, Tim Archambault <[EMAIL PROTECTED]> wrote: I've been talking with other papers about Solr and I think what bothers many is that there a is a deposit of information in a structured database here [named A], then we have another set of basically the same data over here [named B] and they

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault
I'm really confused. I don't mean "store" the data figuratively as in a lucene/solr command. Storing an ID number in a solr index isn't going to help a user find "nurse". I think part of this is that some people feel that databases like MSSQL, MYSQL should be able to provide quality search experie

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Walter Underwood
On 9/22/06 12:25 PM, "Tim Archambault" <[EMAIL PROTECTED]> wrote: > A recruitment (jobs) customer goes onto our website and posts an online job > posting to our newspaper website. Upon insert into the database, I need to > generate an xml file to be sent to SOLR to ADD as a record to the search >

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault
Okay, I'll use an example. A recruitment (jobs) customer goes onto our website and posts an online job posting to our newspaper website. Upon insert into the database, I need to generate an xml file to be sent to SOLR to ADD as a record to the search engine. Same goes for an edit, my database u

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Erik Hatcher
On Sep 22, 2006, at 2:45 PM, Tim Archambault wrote: I believe there's a way to access MSSQL, MySQL etc. directly with Lucene, but not sure how to do this with SOLR. Nope. Lucene is a pure search engine, with no hooks to databases, or document parsers, etc. Lots of folks have built these

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault
Obvious datasources: MSSQL, MySQL, etc. I'm under the impression that I have to send an XML request to SOLR for every add, update, delete, etc. in my database. I believe there's a way to access MSSQL, MySQL etc. directly with Lucene, but not sure how to do this with SOLR. Thanks for all your fee

Re: Extending Solr's Admin functionality

2006-09-22 Thread Chris Hostetter
: I may need to add functionality to Solr's admin pages. The : functionality that I'm looking to add is the ability to trigger certain : indexing functions and monitor their progress. I'm wondering if people : have thoughts about the best way to do this. Here are my initial ideas: : : 1. Add ad

Re: wana use CJKAnalyzer

2006-09-22 Thread Walter Underwood
On 9/22/06 10:22 AM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > What I think might be ideal: If there is a charset definition, then > let the servlet handle it by requesting a Writer. If there isn't > a charset definition, request a byte-oriented InputStream from the > container and let the XML

Re: wana use CJKAnalyzer

2006-09-22 Thread Yonik Seeley
On 9/22/06, Walter Underwood <[EMAIL PROTECTED]> wrote: This might be a Solr bug. Solr should be able to accept XML in any of the required encodings (ASCII, Latin 1, UTF-8, and UTF-16). Getting XML content types exactly right is tricky, see RFC 3023. Right now Solr pays attention to Content-typ

Re: wana use CJKAnalyzer

2006-09-22 Thread Walter Underwood
On 9/21/06 5:37 PM, "James liu" <[EMAIL PROTECTED]> wrote: > Yes,it working. the root of my problem is xml muse be encoded by utf-8. > if use php,it not about www browser. just notice that > curl header information must be utf-8. > if use post.sh,xml muse be encoded by utf-8.(my editplus default e

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Yonik Seeley
On 9/22/06, Tim Archambault <[EMAIL PROTECTED]> wrote: I have a couple of questions from some online newspaper folks who are interested in Solr and are trying to understand how and why it came to be. I think inherent in these questions is the underlying theme I hear all the time and that is "Solr

Re: relational design in solr?

2006-09-22 Thread Joachim Martin
Chris, I think what I am trying to do is actually much simpler than what you are talking about here. I do plan on returning document ids and retrieving full entity data from the database- solr would just be used for the search, not for results display. The problem is that some data cannot be

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Yonik Seeley
On 9/22/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: I upgraded to the most recent Solr build (9-22) and sadly it's still really slow. 800 seconds query with a single facet on first_author, 15 millions documents total, the query return 180. Maybe i'm doing something wrong? Also, this is on my

Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault
I have a couple of questions from some online newspaper folks who are interested in Solr and are trying to understand how and why it came to be. I think inherent in these questions is the underlying theme I hear all the time and that is "Solr is not a content management system. It's a search engin