Missing content stream

2008-05-09 Thread Ricky Martin
Hello, Am a newbie to SOLR. I am trying to learn it now. i have downloaded apache-solr 1.2.0.zip file. I have tried the examples in the exampledocs of solr 1.2. The xml file examples are working fine. Able to index them also. But i could not get the result for csv file i.e book.csv. I am getting

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread marcusherou
Hi. I will as well head into a path like yours within some months from now. Currently I have an index of ~10M docs and only store id's in the index for performance and distribution reasons. When we enter a new market I'm assuming we will soon hit 100M and quite soon after that 1G documents. Each

How Special Character '' used in indexing

2008-05-09 Thread Ricky Martin
Hello, I have a field name field name =companyA K Inc/field, which i cannot parse using XML to POST data to solr. When i search using A K, i should be getting the exactly this field name. Please someone help me with this ASAP. Thanks, Ricky.

Re: Function Query result

2008-05-09 Thread Umar Shah
thanks mike, some how it was not evident from the wiki example, or i was too presumptious ;-). -umar On Fri, May 9, 2008 at 2:53 AM, Mike Klaas [EMAIL PROTECTED] wrote: On 7-May-08, at 11:40 PM, Umar Shah wrote: That would be sufficient for my requirements, I'm using the following

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread James Brady
Hi, we have an index of ~300GB, which is at least approaching the ballpark you're in. Lucky for us, to coin a phrase we have an 'embarassingly partitionable' index so we can just scale out horizontally across commodity hardware with no problems at all. We're also using the multicore

Re: How Special Character '' used in indexing

2008-05-09 Thread Shalin Shekhar Mangar
You need to XML encode special characters. Use amp; instead of . On Fri, May 9, 2008 at 12:07 PM, Ricky Martin [EMAIL PROTECTED] wrote: Hello, I have a field name field name =companyA K Inc/field, which i cannot parse using XML to POST data to solr. When i search using A K, i should be

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread Marcus Herou
Cool. Since you must certainly already have a good partitioning scheme, could you elaborate on high level how you set this up ? I'm certain that I will shoot myself in the foot both once and twice before getting it right but this is what I'm good at; to never stop trying :) However it is nice to

Weird problems with document size

2008-05-09 Thread Andrew Savory
Hi, I'm trying to debug a misbehaving solr search setup. Here's the scenario: - custom index client that posts insert/delete events to solr via http; - custom content handlers in solr; - tcpmon in the middle to see what's going on When I post an add event to solr of less than about 5k,

Multilingual Search

2008-05-09 Thread Sachit P. Menon
Can we have a multilingual search using Solr Thanks and Regards Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1, Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA |Voice +91 80 26264000 |Extn 65377|Fax +91 80 26264100 | Mob : +91

Re: Multilingual Search

2008-05-09 Thread Grant Ingersoll
Yes. Solr handles UTF-8 and has many analyzers for non-English languages. -Grant On May 9, 2008, at 7:23 AM, Sachit P. Menon wrote: Can we have a multilingual search using Solr Thanks and Regards Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1, Global

Loading performance slowdown at ~ 400K documents

2008-05-09 Thread Tracy Flynn
Hi, I'm starting to see significant slowdown in loading performance after I have loaded about 400K documents. I go from a load rate of near 40 docs/sec to 20- 25 docs a second. Am I correct in assuming that, during indexing operations, Lucene/SOLR tries to hold as much of the indexex in

Re: Loading performance slowdown at ~ 400K documents

2008-05-09 Thread Nick Jenkin
Hi Tracy Do you have autocommit enabled (or are you manually commiting every few thousand docs?) If not try that. -Nick On 5/10/08, Tracy Flynn [EMAIL PROTECTED] wrote: Hi, I'm starting to see significant slowdown in loading performance after I have loaded about 400K documents. I go from a

Re: Solr hardware specs

2008-05-09 Thread Nick Jenkin
Hi It all depends on the load your server is under, how many documents you have etc. -- I am not sure what you mean by network connectivity -- solr really should not be run on a publicly accessible IP address. Can you provide some more info on the setup? -Nick On 5/10/08, dudes dudes [EMAIL

RE: Solr hardware specs

2008-05-09 Thread dudes dudes
HI Nick, I'm quite new to solr, so excuse my ignorance for any solr related settings :). We think that would have up to 400K docs in a loady environment. We surely don't want to have solr to be publicly accessible ( Just for the internal use). We are not sure if we could have 2 network

Re: Unlimited number of return documents?

2008-05-09 Thread Francisco Sanmartin
Yeah, I understand the possible problems of changing this value. It's just a very particular case and there won't be a lot of documents to return. I guess I'll have to use a very high int number, I just wanted to know if there was any proper configuration for this situation. Thanks for the

Re: Unlimited number of return documents?

2008-05-09 Thread Erik Hatcher
Or make two requests... one with rows=0 to see how many documents match without retrieving any, then another with that amount specified. Erik On May 9, 2008, at 8:54 AM, Francisco Sanmartin wrote: Yeah, I understand the possible problems of changing this value. It's just a very

Re: Solr hardware specs

2008-05-09 Thread Erick Erickson
This still isn't very helpful. How big are the docs? How many fields do you expect to index? What is your expected query rate? You can get away with an old laptop if your docs are, say, 5K each and you only expect to query it once a day and have one text field. If each doc is 10M, you're

Re: How Special Character '' used in indexing

2008-05-09 Thread Ricky
I have tried sending the 'amp' instead of '' like the following, field name =companyA amp K Inc/field. But i still get the same error entity reference name can not contain character ' position: START_TAG seen ...fieldname = companyA amp .. Please kindly reply ASAP. Thanks, Ricky On Fri, May

Re: How Special Character '' used in indexing

2008-05-09 Thread Alan Rykhus
amp; you're missing the ; On Fri, 2008-05-09 at 08:26 -0500, Ricky wrote: I have tried sending the 'amp' instead of '' like the following, field name =companyA amp K Inc/field. But i still get the same error entity reference name can not contain character ' position: START_TAG seen

Re: How Special Character '' used in indexing

2008-05-09 Thread Erick Erickson
I don't see a semi-colon at the end of your entity reference, is that a typo? i.e. amp; On Fri, May 9, 2008 at 9:26 AM, Ricky [EMAIL PROTECTED] wrote: I have tried sending the 'amp' instead of '' like the following, field name =companyA amp K Inc/field. But i still get the same error entity

Re: How Special Character '' used in indexing

2008-05-09 Thread Ricky
Thanks all, I got it, its amp; /Ricky On Fri, May 9, 2008 at 9:38 AM, Erick Erickson [EMAIL PROTECTED] wrote: I don't see a semi-colon at the end of your entity reference, is that a typo? i.e. amp; On Fri, May 9, 2008 at 9:26 AM, Ricky [EMAIL PROTECTED] wrote: I have tried sending the

Missing content Stream

2008-05-09 Thread Ricky
Hello, Am a newbie to SOLR. I am trying to learn it now. i have downloaded apache-solr 1.2.0.zip file. I have tried the examples in the exampledocs of solr 1.2. The xml file examples are working fine. Able to index them also. But i could not get the result for csv file i.e books.csv. I am getting

Re: Missing content Stream

2008-05-09 Thread Ryan McKinley
make sure you are following all the directions on: http://wiki.apache.org/solr/UpdateCSV in particular check Methods of uploading CSV records On May 9, 2008, at 9:58 AM, Ricky wrote: Hello, Am a newbie to SOLR. I am trying to learn it now. i have downloaded apache-solr 1.2.0.zip file. I have

Solr Multicore, are there any way to retrieve all the cores registered?

2008-05-09 Thread Walter Ferrara
In solr, last trunk version in svn, is it possible to access the core registry, or what used to be the static MultiCore object? My goal is to retrieve all the cores registered in a given (multicore) enviroment. It used to be MultiCore.getRegistry() initially, at first stages of solr-350; but

Re: Missing content Stream

2008-05-09 Thread Ricky
Yes, i have followed the directions on http://wiki.apache.org/solr/UpdateCSV. http://wiki.apache.org/solr/UpdateCSV i am learning Solr from the mentioned webpage. Can it be a problem with CURL? /Rickey On Fri, May 9, 2008 at 10:15 AM, Ryan McKinley [EMAIL PROTECTED] wrote: make sure you are

Re: Solr Multicore, are there any way to retrieve all the cores registered?

2008-05-09 Thread Ryan McKinley
check the status action also, check the index.jsp page (i don't have the code in front of me) On May 9, 2008, at 10:16 AM, Walter Ferrara wrote: In solr, last trunk version in svn, is it possible to access the core registry, or what used to be the static MultiCore object? My goal is to

JSON updates?

2008-05-09 Thread kirk beers
Hi folks, I was wondering if xml is the only format used for updating Solr documents or can JSON or Ruby be used as well ? K

Re: Solr Multicore, are there any way to retrieve all the cores registered?

2008-05-09 Thread Walter Ferrara
Ryan McKinley wrote: check the status action also, check the index.jsp page index.jsp do: org.apache.solr.core.MultiCore multicore = (org.apache.solr.core.MultiCore)request.getAttribute(org.apache.solr.MultiCore); which is ok in a servlet, but how should I do the same inside an handler,

Re: JSON updates?

2008-05-09 Thread Otis Gospodnetic
Hi, Input is XML only, I believe. It's the output that can be XML or JSON or... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: kirk beers [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, May 9, 2008 10:59:22 AM Subject:

Re: Weird problems with document size

2008-05-09 Thread Otis Gospodnetic
Andrew, I don't understand what that lock and unlock is for... Just do this: add add add add ... ... optionally commit or optimize Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrew Savory [EMAIL PROTECTED] To:

Re: Extending XmlRequestHandler

2008-05-09 Thread Alexander Ramos Jardim
Ok, Thanks for the advice! I got the XmlRequestHandler code. I see it uses Stax right at the XML it gets. There isn't anything to plug in or out to get an easy way to change the xml format. So, I am thinking about creating my own RequestHandler as said already. Would it be too slow to use a

Re: Weird problems with document size

2008-05-09 Thread Andrew Savory
Hi, On 09/05/2008, Otis Gospodnetic [EMAIL PROTECTED] wrote: I don't understand what that lock and unlock is for... Just do this: add add add add ... ... optionally commit or optimize Yeah, I didn't understand what the lock/unlock was for either - but on further reviewing the

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread James Brady
So our problem is made easier by having complete index partitionability by a user_id field. That means at one end of the spectrum, we could have one monolithic index for everyone, while at the other end of the spectrum we could individual cores for each user_id. At the moment, we've gone

Re: Extending XmlRequestHandler

2008-05-09 Thread Daniel Papasian
Alexander Ramos Jardim wrote: Ok, Thanks for the advice! I got the XmlRequestHandler code. I see it uses Stax right at the XML it gets. There isn't anything to plug in or out to get an easy way to change the xml format. To maybe save you from reinventing the wheel, when I asked a similar

Re: Weird problems with document size

2008-05-09 Thread Otis Gospodnetic
Right, there is no need for that locking, you can safely have multiple indexing/update requests hitting Solr in parallel. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrew Savory [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent:

Re: Loading performance slowdown at ~ 400K documents

2008-05-09 Thread Mike Klaas
Hi Tracy, What is your Solr/Lucene version? Is the slowdown sustained or temporary (it is not strange to see a slowdown for a few minutes if a large segment merge is happening)? I disagree with Nick's advice of enabling autocommit. -Mike On 9-May-08, at 5:02 AM, Tracy Flynn wrote: Hi,

Re: Extending XmlRequestHandler

2008-05-09 Thread Alexander Ramos Jardim
Thanks, To maybe save you from reinventing the wheel, when I asked a similar question a couple weeks back, hossman pointed me towards SOLR-285 and SOLR-370. 285 does XSLT, 270 does STX. But sorry, can you point me to the version? I am not acostumed with version control. -- Alexander

Re: Function Query result

2008-05-09 Thread Mike Klaas
No problem. You can return the favour by clarifying the wiki example, since it is publicly editable :). (It is hard for developers who are very familiar with a system to write good documentation for beginners, alas.) -Mike On 8-May-08, at 11:44 PM, Umar Shah wrote: thanks mike, some

Re: How Special Character '' used in indexing

2008-05-09 Thread Mike Klaas
On 9-May-08, at 6:26 AM, Ricky wrote: I have tried sending the 'amp' instead of '' like the following, field name =companyA amp K Inc/field. But i still get the same error entity reference name can not contain character ' position: START_TAG seen ...fieldname = companyA amp .. Please use

Re: Function Query result

2008-05-09 Thread Umar Shah
Mike, as asked, I have added an example , hope it will be helpful to future users . thanks again. On Sat, May 10, 2008 at 12:11 AM, Mike Klaas [EMAIL PROTECTED] wrote: No problem. You can return the favour by clarifying the wiki example, since it is publicly editable :). (It is hard for

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread Ken Krugler
Hi Marcus, It seems a lot of what you're describing is really similar to MapReduce, so I think Otis' suggestion to look at Hadoop is a good one: it might prevent a lot of headaches and they've already solved a lot of the tricky problems. There a number of ridiculously sized projects using it

Re: Solr hardware specs

2008-05-09 Thread Walter Underwood
And use a log of real queries, captured from your website or one like it. Query statistics are not uniform. wunder On 5/9/08 6:20 AM, Erick Erickson [EMAIL PROTECTED] wrote: This still isn't very helpful. How big are the docs? How many fields do you expect to index? What is your expected

RE: Solr feasibility with terabyte-scale data

2008-05-09 Thread Lance Norskog
A useful schema trick: MD5 or SHA-1 ids. we generate our unique ID with the MD5 cryptographic checksumming algorithm. This takes X bytes of data and creates a 128-bit long random number, or 128 random bits. At this point there are no reports of two different datasets that give the same checksum.

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread Otis Gospodnetic
You can't believe how much it pains me to see such nice piece of work live so separately. But I also think I know why it happened :(. Do you know if Stefan Co. have the intention to bring it under some contrib/ around here? Would that not make sense? Otis -- Sematext --

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread Ken Krugler
Hi Otis, You can't believe how much it pains me to see such nice piece of work live so separately. But I also think I know why it happened :(. Do you know if Stefan Co. have the intention to bring it under some contrib/ around here? Would that not make sense? I'm not working on the

Re: Function Query result

2008-05-09 Thread Mike Klaas
Thanks so much Umar! -Mike On 9-May-08, at 1:22 PM, Umar Shah wrote: Mike, as asked, I have added an example , hope it will be helpful to future users . thanks again. On Sat, May 10, 2008 at 12:11 AM, Mike Klaas [EMAIL PROTECTED] wrote: No problem. You can return the favour by

exceeded limit of maxWarmingSearchers

2008-05-09 Thread Sasha Voynow
Hi: I'm getting flurries of these error messages: WARNING: Error opening new searcher. exceeded limit of maxWarmingSearchers=4, try again later. SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=4, try again later. On a solr

Re: exceeded limit of maxWarmingSearchers

2008-05-09 Thread Otis Gospodnetic
Sasha, Do you have postCommit or postOptimize hooks enabled? Are you sending commits or have autoCommit on? My suggestions: - comment out post* hooks - do not send a commit until you are done (or you can just optimize at the end) - disable autoCommit If there is anything else that could

Re: Solr feasibility with terabyte-scale data

2008-05-09 Thread Otis Gospodnetic
From what I can tell from the overview on http://katta.wiki.sourceforge.net/, it's a partial replication of Solr/Nutch functionality, plus some goodies. It might have been better to work those goodies into some friendly contrib/ be it Solr, Nutch, Hadoop, or Lucene. Anyhow, let's see what

Re: exceeded limit of maxWarmingSearchers

2008-05-09 Thread Otis Gospodnetic
Bah, ignore 30% of what I said below - 30% of my mind was following Sesame Street, another 30% was looking at some Hadoop jobs, and the last 30% was writing the response. The missing 10% is missing. Leave the post* hook(s) in, they are fine -- you have to trigger the snapshooter somehow,

Re: exceeded limit of maxWarmingSearchers

2008-05-09 Thread Sasha Voynow
It happened without auto-commit. Although I would like to be able to use a reasonably infrequent autocommit setting. Is it generally better to handle batching your commits programmatically on the client side rather than relying on auto-commit?As far as post* hooks. I will comment out a post

Re: exceeded limit of maxWarmingSearchers

2008-05-09 Thread Ryan McKinley
On May 9, 2008, at 7:33 PM, Sasha Voynow wrote: Is it generally better to handle batching your commits programmatically on the client side rather than relying on auto-commit? the time based auto-commit is useful if you are indexing from multiple clients to a single server. Rather then

Simple Solr POST using java

2008-05-09 Thread Marshall Gunter
Can someone please tell me why this code snippet would not add a document to the Solr index after a commit/ was issued or please post a snippet of Java code to add a document to the Solr index that includes the URL reference as a String? Code example: String strToAdd = add doc