Hello,
Am a newbie to SOLR. I am trying to learn it now. i have downloaded
apache-solr 1.2.0.zip file. I have tried the examples in the exampledocs of
solr 1.2. The xml file examples are working fine. Able to index them also.
But i could not get the result for csv file i.e book.csv. I am getting
Hi.
I will as well head into a path like yours within some months from now.
Currently I have an index of ~10M docs and only store id's in the index for
performance and distribution reasons. When we enter a new market I'm
assuming we will soon hit 100M and quite soon after that 1G documents. Each
Hello,
I have a field name field name =companyA K Inc/field, which i cannot
parse using XML to POST data to solr. When i search using A K, i should be
getting the exactly this field name.
Please someone help me with this ASAP.
Thanks,
Ricky.
thanks mike,
some how it was not evident from the wiki example, or i was too presumptious
;-).
-umar
On Fri, May 9, 2008 at 2:53 AM, Mike Klaas [EMAIL PROTECTED] wrote:
On 7-May-08, at 11:40 PM, Umar Shah wrote:
That would be sufficient for my requirements,
I'm using the following
Hi, we have an index of ~300GB, which is at least approaching the
ballpark you're in.
Lucky for us, to coin a phrase we have an 'embarassingly
partitionable' index so we can just scale out horizontally across
commodity hardware with no problems at all. We're also using the
multicore
You need to XML encode special characters. Use amp; instead of .
On Fri, May 9, 2008 at 12:07 PM, Ricky Martin [EMAIL PROTECTED]
wrote:
Hello,
I have a field name field name =companyA K Inc/field, which i
cannot
parse using XML to POST data to solr. When i search using A K, i should
be
Cool.
Since you must certainly already have a good partitioning scheme, could you
elaborate on high level how you set this up ?
I'm certain that I will shoot myself in the foot both once and twice before
getting it right but this is what I'm good at; to never stop trying :)
However it is nice to
Hi,
I'm trying to debug a misbehaving solr search setup. Here's the scenario:
- custom index client that posts insert/delete events to solr via http;
- custom content handlers in solr;
- tcpmon in the middle to see what's going on
When I post an add event to solr of less than about 5k,
Can we have a multilingual search using Solr
Thanks and Regards
Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1,
Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA |Voice +91
80 26264000 |Extn 65377|Fax +91 80 26264100 | Mob : +91
Yes. Solr handles UTF-8 and has many analyzers for non-English
languages.
-Grant
On May 9, 2008, at 7:23 AM, Sachit P. Menon wrote:
Can we have a multilingual search using Solr
Thanks and Regards
Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus,
Phase-1,
Global
Hi,
I'm starting to see significant slowdown in loading performance after
I have loaded about 400K documents. I go from a load rate of near 40
docs/sec to 20- 25 docs a second.
Am I correct in assuming that, during indexing operations, Lucene/SOLR
tries to hold as much of the indexex in
Hi Tracy
Do you have autocommit enabled (or are you manually commiting every
few thousand docs?)
If not try that.
-Nick
On 5/10/08, Tracy Flynn [EMAIL PROTECTED] wrote:
Hi,
I'm starting to see significant slowdown in loading performance after I
have loaded about 400K documents. I go from a
Hi
It all depends on the load your server is under, how many documents
you have etc. -- I am not sure what you mean by network connectivity
-- solr really should not be run on a publicly accessible IP address.
Can you provide some more info on the setup?
-Nick
On 5/10/08, dudes dudes [EMAIL
HI Nick,
I'm quite new to solr, so excuse my ignorance for any solr related settings :).
We think that would have up to 400K docs in a loady environment. We surely
don't want to have solr to be publicly
accessible ( Just for the internal use). We are not sure if we could have 2
network
Yeah, I understand the possible problems of changing this value. It's
just a very particular case and there won't be a lot of documents to
return. I guess I'll have to use a very high int number, I just wanted
to know if there was any proper configuration for this situation.
Thanks for the
Or make two requests... one with rows=0 to see how many documents
match without retrieving any, then another with that amount specified.
Erik
On May 9, 2008, at 8:54 AM, Francisco Sanmartin wrote:
Yeah, I understand the possible problems of changing this value.
It's just a very
This still isn't very helpful. How big are the docs? How many fields do you
expect to index? What is your expected query rate?
You can get away with an old laptop if your docs are, say, 5K each and you
only
expect to query it once a day and have one text field.
If each doc is 10M, you're
I have tried sending the 'amp' instead of '' like the following,
field name =companyA amp K Inc/field.
But i still get the same error entity reference name can not contain
character ' position: START_TAG seen ...fieldname = companyA amp ..
Please kindly reply ASAP.
Thanks,
Ricky
On Fri, May
amp;
you're missing the ;
On Fri, 2008-05-09 at 08:26 -0500, Ricky wrote:
I have tried sending the 'amp' instead of '' like the following,
field name =companyA amp K Inc/field.
But i still get the same error entity reference name can not contain
character ' position: START_TAG seen
I don't see a semi-colon at the end of your entity reference, is that a
typo?
i.e. amp;
On Fri, May 9, 2008 at 9:26 AM, Ricky [EMAIL PROTECTED] wrote:
I have tried sending the 'amp' instead of '' like the following,
field name =companyA amp K Inc/field.
But i still get the same error entity
Thanks all,
I got it, its amp;
/Ricky
On Fri, May 9, 2008 at 9:38 AM, Erick Erickson [EMAIL PROTECTED]
wrote:
I don't see a semi-colon at the end of your entity reference, is that a
typo?
i.e. amp;
On Fri, May 9, 2008 at 9:26 AM, Ricky [EMAIL PROTECTED] wrote:
I have tried sending the
Hello,
Am a newbie to SOLR. I am trying to learn it now. i have downloaded
apache-solr 1.2.0.zip file. I have tried the examples in the exampledocs of
solr 1.2. The xml file examples are working fine. Able to index them also.
But i could not get the result for csv file i.e books.csv. I am getting
make sure you are following all the directions on:
http://wiki.apache.org/solr/UpdateCSV
in particular check Methods of uploading CSV records
On May 9, 2008, at 9:58 AM, Ricky wrote:
Hello,
Am a newbie to SOLR. I am trying to learn it now. i have downloaded
apache-solr 1.2.0.zip file. I have
In solr, last trunk version in svn, is it possible to access the core
registry, or what used to be the static MultiCore object? My goal is to
retrieve all the cores registered in a given (multicore) enviroment.
It used to be MultiCore.getRegistry() initially, at first stages of
solr-350; but
Yes, i have followed the directions on http://wiki.apache.org/solr/UpdateCSV.
http://wiki.apache.org/solr/UpdateCSV
i am learning Solr from the mentioned webpage.
Can it be a problem with CURL?
/Rickey
On Fri, May 9, 2008 at 10:15 AM, Ryan McKinley [EMAIL PROTECTED] wrote:
make sure you are
check the status action
also, check the index.jsp page
(i don't have the code in front of me)
On May 9, 2008, at 10:16 AM, Walter Ferrara wrote:
In solr, last trunk version in svn, is it possible to access the
core registry, or what used to be the static MultiCore object? My
goal is to
Hi folks,
I was wondering if xml is the only format used for updating Solr documents
or can JSON or Ruby be used as well ?
K
Ryan McKinley wrote:
check the status action
also, check the index.jsp page
index.jsp do:
org.apache.solr.core.MultiCore multicore =
(org.apache.solr.core.MultiCore)request.getAttribute(org.apache.solr.MultiCore);
which is ok in a servlet, but how should I do the same inside an
handler,
Hi,
Input is XML only, I believe. It's the output that can be XML or JSON or...
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: kirk beers [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, May 9, 2008 10:59:22 AM
Subject:
Andrew,
I don't understand what that lock and unlock is for...
Just do this:
add
add
add
add
...
...
optionally commit or optimize
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Andrew Savory [EMAIL PROTECTED]
To:
Ok,
Thanks for the advice!
I got the XmlRequestHandler code. I see it uses Stax right at the XML it
gets. There isn't anything to plug in or out to get an easy way to change
the xml format.
So, I am thinking about creating my own RequestHandler as said already.
Would it be too slow to use a
Hi,
On 09/05/2008, Otis Gospodnetic [EMAIL PROTECTED] wrote:
I don't understand what that lock and unlock is for...
Just do this:
add
add
add
add
...
...
optionally commit or optimize
Yeah, I didn't understand what the lock/unlock was for either - but on
further reviewing the
So our problem is made easier by having complete index
partitionability by a user_id field. That means at one end of the
spectrum, we could have one monolithic index for everyone, while at
the other end of the spectrum we could individual cores for each
user_id.
At the moment, we've gone
Alexander Ramos Jardim wrote:
Ok,
Thanks for the advice!
I got the XmlRequestHandler code. I see it uses Stax right at the XML it
gets. There isn't anything to plug in or out to get an easy way to change
the xml format.
To maybe save you from reinventing the wheel, when I asked a similar
Right, there is no need for that locking, you can safely have multiple
indexing/update requests hitting Solr in parallel.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Andrew Savory [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent:
Hi Tracy,
What is your Solr/Lucene version? Is the slowdown sustained or
temporary (it is not strange to see a slowdown for a few minutes if a
large segment merge is happening)?
I disagree with Nick's advice of enabling autocommit.
-Mike
On 9-May-08, at 5:02 AM, Tracy Flynn wrote:
Hi,
Thanks,
To maybe save you from reinventing the wheel, when I asked a similar
question a couple weeks back, hossman pointed me towards SOLR-285 and
SOLR-370. 285 does XSLT, 270 does STX.
But sorry, can you point me to the version? I am not acostumed with version
control.
--
Alexander
No problem. You can return the favour by clarifying the wiki example,
since it is publicly editable :).
(It is hard for developers who are very familiar with a system to
write good documentation for beginners, alas.)
-Mike
On 8-May-08, at 11:44 PM, Umar Shah wrote:
thanks mike,
some
On 9-May-08, at 6:26 AM, Ricky wrote:
I have tried sending the 'amp' instead of '' like the following,
field name =companyA amp K Inc/field.
But i still get the same error entity reference name can not contain
character ' position: START_TAG seen ...fieldname = companyA
amp ..
Please use
Mike,
as asked, I have added an example , hope it will be helpful to future users
.
thanks again.
On Sat, May 10, 2008 at 12:11 AM, Mike Klaas [EMAIL PROTECTED] wrote:
No problem. You can return the favour by clarifying the wiki example,
since it is publicly editable :).
(It is hard for
Hi Marcus,
It seems a lot of what you're describing is really similar to
MapReduce, so I think Otis' suggestion to look at Hadoop is a good
one: it might prevent a lot of headaches and they've already solved
a lot of the tricky problems. There a number of ridiculously sized
projects using it
And use a log of real queries, captured from your website or one
like it. Query statistics are not uniform.
wunder
On 5/9/08 6:20 AM, Erick Erickson [EMAIL PROTECTED] wrote:
This still isn't very helpful. How big are the docs? How many fields do you
expect to index? What is your expected
A useful schema trick: MD5 or SHA-1 ids. we generate our unique ID with the
MD5 cryptographic checksumming algorithm. This takes X bytes of data and
creates a 128-bit long random number, or 128 random bits. At this point
there are no reports of two different datasets that give the same checksum.
You can't believe how much it pains me to see such nice piece of work live so
separately. But I also think I know why it happened :(. Do you know if Stefan
Co. have the intention to bring it under some contrib/ around here? Would
that not make sense?
Otis
--
Sematext --
Hi Otis,
You can't believe how much it pains me to see such nice piece of
work live so separately. But I also think I know why it happened
:(. Do you know if Stefan Co. have the intention to bring it
under some contrib/ around here? Would that not make sense?
I'm not working on the
Thanks so much Umar!
-Mike
On 9-May-08, at 1:22 PM, Umar Shah wrote:
Mike,
as asked, I have added an example , hope it will be helpful to
future users
.
thanks again.
On Sat, May 10, 2008 at 12:11 AM, Mike Klaas [EMAIL PROTECTED]
wrote:
No problem. You can return the favour by
Hi:
I'm getting flurries of these error messages:
WARNING: Error opening new searcher. exceeded limit of
maxWarmingSearchers=4, try again later.
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
exceeded limit of maxWarmingSearchers=4, try again later.
On a solr
Sasha,
Do you have postCommit or postOptimize hooks enabled? Are you sending commits
or have autoCommit on?
My suggestions:
- comment out post* hooks
- do not send a commit until you are done (or you can just optimize at the end)
- disable autoCommit
If there is anything else that could
From what I can tell from the overview on http://katta.wiki.sourceforge.net/,
it's a partial replication of Solr/Nutch functionality, plus some goodies. It
might have been better to work those goodies into some friendly contrib/ be it
Solr, Nutch, Hadoop, or Lucene. Anyhow, let's see what
Bah, ignore 30% of what I said below - 30% of my mind was following Sesame
Street, another 30% was looking at some Hadoop jobs, and the last 30% was
writing the response. The missing 10% is missing.
Leave the post* hook(s) in, they are fine -- you have to trigger the
snapshooter somehow,
It happened without auto-commit. Although I would like to be able to use a
reasonably infrequent autocommit setting. Is it generally better to handle
batching your commits programmatically on the client side rather than
relying on auto-commit?As far as post* hooks. I will comment out a post
On May 9, 2008, at 7:33 PM, Sasha Voynow wrote:
Is it generally better to handle
batching your commits programmatically on the client side rather
than
relying on auto-commit?
the time based auto-commit is useful if you are indexing from multiple
clients to a single server. Rather then
Can someone please tell me why this code snippet would not add a
document to the Solr index after a commit/ was issued or
please post a snippet of Java code to add a document to the Solr index
that includes the URL reference as a String?
Code example:
String strToAdd =
add
doc
53 matches
Mail list logo