Re: exception while feeding converted text from pdf

2008-05-14 Thread Brian Carmalt
Hello Cam, Are you writing your xml by hand, as in no xml writer? That can cause problems. In your exception it says "latitude 59&", the & should have converted to '&'(I think). If you can use Java6, there is a XMLStreamWriter in java.xml.stream that does automatic special character escaping. This

Fwd: [Solr Wiki] Update of "DataImportHandler" by YukiDog

2008-05-14 Thread Shalin Shekhar Mangar
Hello, If you find a problem with DataImportHandler or the wiki documentation, then please do report it back in the mailing list so that we may have a chance to verify your problem and propose solutions. It may help us improve the tool as well as the documentation. In the wiki edit below, the cha

Re: Field Grouping

2008-05-14 Thread oleg_gnatovskiy
Yes, that is the patch I am trying to get to work. It doesn't have a feature for distributed search. ryantxu wrote: > > You may want to check "field collapsing" > https://issues.apache.org/jira/browse/SOLR-236 > > There is a patch that works against 1.2, but the one for trunk needs > some wo

Re: Chinese Language + Solr

2008-05-14 Thread j . L
I don't know the cost. I know the bigger chinese search use it. More chinese people who study and use full-text search think it is the best chinese analyzer which u can buy. Baidu(www.baidu.com), is the biggest chinese search, and googlechina is the No 2. Baidu not use it (http://www.hylanda.c

Re: Chinese Language + Solr

2008-05-14 Thread Otis Gospodnetic
Out of curiosity, what's the cost (the site is in Chinese, so I can't tell :( )? BasisTech are the main people for this type of stuff. Expensive, though, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: j.L <[EMAIL PROTECTED]> > T

Re: result count query

2008-05-14 Thread Otis Gospodnetic
There is no way to know without doing the search. Using rows=0 you are really just avoiding getting the actual hits in the response. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: solr_user <[EMAIL PROTECTED]> > To: solr-user@lucene.apach

Re: Chinese Language + Solr

2008-05-14 Thread j . L
if u can read chinese and wanna write ur chinese-analyzer,,, maybe u can see it http://www.googlechinablog.com/2006/04/blog-post_10.html 2008/5/15 j. L <[EMAIL PROTECTED]>: > if commercial analyzers, i recommend > http://www.hylanda.com/(itis the best analyzer >

Re: Chinese Language + Solr

2008-05-14 Thread j . L
if commercial analyzers, i recommend http://www.hylanda.com/(it is the best analyzer in chinese word) On Thu, May 15, 2008 at 8:32 AM, j. L <[EMAIL PROTECTED]> wrote: > u can try je-analyzer,,,i building 17m docs search site by solr and > je-analyzer > > > On Thu, May 15, 2008 at 6:44 AM, Walter

Re: result count query

2008-05-14 Thread solr_user
Thanks Otis, Actually what I really want to do is just check whether the query is going to return any results or not. I tried the rows=0 thing and that works quite efficiently. Just wondering if there is anything even more efficient then that that will answer whether the query has any hits or

Re: Chinese Language + Solr

2008-05-14 Thread j . L
u can try je-analyzer,,,i building 17m docs search site by solr and je-analyzer On Thu, May 15, 2008 at 6:44 AM, Walter Underwood <[EMAIL PROTECTED]> wrote: > N-gram works pretty well for Chinese, there are even studies to > back that up. > > Do not use the N-gram matches for highlighting. They

Re: Stop words and exact phrase

2008-05-14 Thread Walter Underwood
Sorry, I was hurrying before class (training to get a service dog). I use the DisMax handler, which can expand a query to go against multiple fields. The per-field analysis applies at both index and query time, so the exact field does not have stopwords removed. Very helpful for queries like "Being

Re: Chinese Language + Solr

2008-05-14 Thread Walter Underwood
N-gram works pretty well for Chinese, there are even studies to back that up. Do not use the N-gram matches for highlighting. They look really stupid to native speakers. wunder On 5/14/08 2:03 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > There are no free morphological analyzers for Chin

RE: solr highlighting

2008-05-14 Thread Kevin Xiao
Yes. I did all that. Maybe my custom analyzer conflicts with highlighting. Thanks for the tips. - Kevin -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 14, 2008 12:03 PM To: solr-user@lucene.apache.org Subject: Re: solr highlighting The minimum "stuff"

Re: Fwd: Grouping products

2008-05-14 Thread Tricia Williams
Perhaps the Synonym Filter would work for this. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters will tell you more. Tricia Otis Gospodnetic wrote: Hi Vender, Solr can't do the grouping for you. Solr can do the searching/finding for you, but it won't be able to recognize differ

Re: Fwd: Grouping products

2008-05-14 Thread Vender Livre
Thanks. I will study more about it. Cheers. On Wed, May 14, 2008 at 6:29 PM, Daniel Papasian < [EMAIL PROTECTED]> wrote: > Vender Livre wrote: > >> But it can find the most probable product, can't it? >> >> Is there a library or tool that do something like that? >> >> Someone told me SOLR would

Re: Fwd: Grouping products

2008-05-14 Thread Daniel Papasian
Vender Livre wrote: But it can find the most probable product, can't it? Is there a library or tool that do something like that? Someone told me SOLR would solve this problem. I wouldn't say solr would solve this problem... sounds like someone sold you snake oil! If you wanted to use solr,

Re: Fwd: Grouping products

2008-05-14 Thread Vender Livre
Would be easier implementing this idea with SOLR than with Lucene? I'm a bit confused. Thanks for help. On Wed, May 14, 2008 at 6:21 PM, Vender Livre <[EMAIL PROTECTED]> wrote: > But it can find the most probable product, can't it? > > Is there a library or tool that do something like that? > >

Re: Fwd: Grouping products

2008-05-14 Thread Vender Livre
But it can find the most probable product, can't it? Is there a library or tool that do something like that? Someone told me SOLR would solve this problem. The idea i had was to get a product name and match it against other names, and then find the best scored. Then I would group the product to

Re: Duplicates results when using a non optimized index

2008-05-14 Thread Otis Gospodnetic
Tim, Hm, not sure what caused this. 1.2 is now quite old (yes, I know it's the last stable release), so if I were you I would consider moving to 1.3-dev. It sounds like the index is already "polluted" with duplicate documents, so you'll want to rebuild the index whether you decide to stay wit

Re: Fwd: Grouping products

2008-05-14 Thread Otis Gospodnetic
Hi Vender, Solr can't do the grouping for you. Solr can do the searching/finding for you, but it won't be able to recognize different model names and figure out which ones represent the same product. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message --

Re: Stop words and exact phrase

2008-05-14 Thread Otis Gospodnetic
You can use : syntax to specify the field to search. For example: title:"who moved my cheese" There is nothing in Solr that would let you instruct it to send phrase queries to one field, and other queries to other field(s). However, you can add that logic to your application and alter th

Re: Help with Solr + KStem

2008-05-14 Thread Otis Gospodnetic
Hung, You included the KStem jar itself, and that is good, but class KStemFilterFactory does not exist anywhere in Solr. You need to get it from here: https://issues.apache.org/jira/browse/SOLR-379 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message >

Re: result count query

2008-05-14 Thread Otis Gospodnetic
I think specifying rows=0 in the URL gets you that number without giving you the actual results. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: solr_user <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, May 14, 2008

Re: Chinese Language + Solr

2008-05-14 Thread Otis Gospodnetic
There are no free morphological analyzers for Chinese (are there for any language?) that I know. People tend to use one of the n-gram analyzers from Lucene contrib. I've used them before and they do OK. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message

Fwd: Grouping products

2008-05-14 Thread Vender Livre
-- Forwarded message -- From: Vender Livre <[EMAIL PROTECTED]> Date: Wed, May 14, 2008 at 5:59 PM Subject: Grouping products To: [EMAIL PROTECTED] Hi, I'm working in a software that must group similar products. For example: CANON IP1300 PRINTER CANON IP 1300 IP1300 CANON PRINT

Chinese Language + Solr

2008-05-14 Thread Francisco Sanmartin
I have had successful experiences using Sorl with an English website, and now I am going to deploy Solr in a chinese site. I've been looking in the mailing list and there are some useful information in the old posts. But, we would like some kind of feedback of the people who already have deploye

result count query

2008-05-14 Thread solr_user
Hi, Is there an efficient way to just get the result count of a query issued to Solr? Solr-user -- View this message in context: http://www.nabble.com/result-count-query-tp17240159p17240159.html Sent from the Solr - User mailing list archive at Nabble.com.

Help with Solr + KStem

2008-05-14 Thread Hung Huynh
I have KStem.jar in solr/lib and solr/example/lib and made a change to schema.xml to include the KStem line (removed the Porter line). This is what I get when I try to hit the Solr Admin page. How can I go about resolving this error? Thanks, HH --- HTTP

Re: solr highlighting

2008-05-14 Thread Mike Klaas
The minimum "stuff" needed to highlight term X in field F is: field F must be 'stored' field F must have an analyzer defined a query with term X is sent (e.g., q=X) with parameters hl=true (or 'on'), hl.fl=F Try it on the example: 1. get the example running 2. cd example/exampledocs 3. ./post.sh

Re: Release date of SOLR 1.3

2008-05-14 Thread Matthew Runo
There isn't a specific date so far, but I'd like to say that only once in the year or so I've been working with the SVN head build of Solr have I noticed a bug get committed. And it was fixed very quickly once it was found.. I think if you need to have development features you're probably s

Re: Stop words and exact phrase

2008-05-14 Thread cricdigs
Hi wunder, Thanks for your response. I am still a little confused. Solr's analysis page shows that the stop word is removed from the query - its got nothing to do with the indexing imo. If indexing has removed the stop words then I should not get any results right? But I get the results with the

RE: solr highlighting

2008-05-14 Thread Kevin Xiao
Thanks Christian. I did try many options indicated in wiki, didn't work. So I want to see if the basics work, i.e. only define hl=true and a field for hl.fl. Do I need to include something global to make hl settings work? Thanks, - Kevin -Original Message- From: Christian Vogler [mailto

Re: exception while feeding converted text from pdf

2008-05-14 Thread Shalin Shekhar Mangar
Yes, you need to XML encode your test. If you use SolrJ to add documents to Solr, it will take care of the encoding for you. On Wed, May 14, 2008 at 9:53 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello, > > I made a simple java program to convert my pdfs to text, and then to xml > file. > I am ge

exception while feeding converted text from pdf

2008-05-14 Thread Cam Bazz
Hello, I made a simple java program to convert my pdfs to text, and then to xml file. I am getting a strange exception. I think the converted files have some errors. should I encode the txt string that I extract from the pdfs in a special way? Best, -C.B. EVERE: org.xmlpull.v1.XmlPullParserExcep

Release date of SOLR 1.3

2008-05-14 Thread Umar Shah
Hi, I'm using the latest trunk code from SOLR . I am basically using function queries (sum, product, scale) for my project which are not present in 1.2. I wanted to know if there is some decided date for release of Solr1.3. If the date is far/ not decide, what should be the best practice to adopt

Re: Stop words and exact phrase

2008-05-14 Thread Walter Underwood
Try creating a separate field that does not remove stopwords, populating that with and configuring the phrase queries to go against that field instead. I do something similar. For both regular and phrase queries, we have a stemmed and stopped field and another field with neither. The "exact" fiel

Stop words and exact phrase

2008-05-14 Thread cricdigs
Hi all, Is there a config setting that I could use to not remove stop words when doing an exact phrase match. For example when searching for "the world" (in quotes) I would like to look for just that and not get results for just "world". When I look at the analysis, I see that word "the" is remov

Re: solr highlighting

2008-05-14 Thread Christian Vogler
On Wednesday 14 May 2008 09:21:36 Kevin Xiao wrote: > Hi there, > > I am new to solr. I want search term to be highlighted on the results. I > thought it is pretty simple, but could not make it work. I read a lot of > solr documents and mail archives (I wish there is a search function for > this, w

Re: help for preprocessing the query

2008-05-14 Thread Umar Shah
On Tue, May 13, 2008 at 5:04 PM, Umar Shah <[EMAIL PROTECTED]> wrote: > > > > > On Tue, May 13, 2008 at 4:39 PM, Shalin Shekhar Mangar < > [EMAIL PROTECTED]> wrote: > >> Did you put a filter-mapping in web.xml? > > > no, > I just did that and it seems to be working... > thanks for all the help fo

RE: How Special Character '&' used in indexing

2008-05-14 Thread Steven A Rowe
Hi Ricky, Mike and wunder, neither of whom are newbies, were trying to educate you about this mailing list's etiquette (generally accepted rules of conduct). While you may *think* that two regular correspondents on this list are just hassling you, I expect that you will find it difficult to co

Re: Loading performance slowdown at ~ 400K documents

2008-05-14 Thread David Pratt
Hi Tracy. I appreciate your taking the time to provide this. Overall, it is helpful to see comparative information for boosting performance. Many thanks. Regards David Tracy Flynn wrote: David The main content organization I index is some number of articles existing under a common title.

Re: indexing pdf documents

2008-05-14 Thread Brian Carmalt
Hello Cam, The wiki for RichDocuments explains how you can add meta data to the RDUpdater. http://wiki.apache.org/solr/UpdateRichDocuments I have used the patch to index docs and thier meta data, but it was not exactly what we needed. Brian. Am Mittwoch, den 14.05.2008, 12:38 +0300 schrieb

Re: Differences between nightly builds

2008-05-14 Thread Lucas F. A. Teixeira
Thanks Otis! []s, Lucas Lucas Frare A. Teixeira [EMAIL PROTECTED] Tel: +55 11 3660.1622 - R3018 Otis Gospodnetic escreveu: Lucas, Look at the solr svn repository's root and you will see a file name called CHANGES.txt. That contains all major Solr changes back

Re: Loading performance slowdown at ~ 400K documents

2008-05-14 Thread Tracy Flynn
David The main content organization I index is some number of articles existing under a common title. I have three SOLR instances containing: - Instance 1 - All 'live' articles ~ 750K articles - 3-4KB each - Instance 2 - All 'live' titles' - ~ 95K titles - < 1 KB each - Instance 3 - All arti

Re: indexing pdf documents

2008-05-14 Thread Cam Bazz
Hello Elizabeth; Yes, I have PDF files, and metadata about them already extracted. so I need something like: someone content of my pdf file it seems that the updaterichdocument patch can only accept pdfs in raw form - so it is not possible to feed metadata. Have you found a solution other th

Re[2]: the time factor

2008-05-14 Thread JLIST
Hello Otis, Got it. I'll take a look. Thanks for spending so much time helping others out! Jack Tuesday, May 13, 2008, 9:06:18 PM, you wrote: > Jack, > The answer is: function queries! :) > You can easily use function queries with DisMaxRequestHandler. > For example, this is what you can add

Re: How Special Character '&' used in indexing

2008-05-14 Thread Ricky
I dont know whats your problem when the people who had answered my question had no issues. Please dont Spam anymore ! Thanks, Ricky. On Tue, May 13, 2008 at 11:09 AM, Walter Underwood <[EMAIL PROTECTED]> wrote: > "ASAP" means "As Soon As Possible", not "As Soon As Convenient". > Please don't say

RE: Duplicates results when using a non optimized index

2008-05-14 Thread Tim Mahy
Hi, thanks for the answer, - do duplicates go away after optimization is done? --> no, if we search the index even after it is optimized, we still get the duplicate results and even if we search on one of the slaves servers which have the same index through synchronization ... btw this is the