Hi
Yes, the XML is inside the DB in a clob. Would love to use XPath
inside SQLEntityProcessor as it will save me tons of trouble for file-
dumping (given that I am not able to post it). This is how I setup my
DIH for DB import.
driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracl
: I was trying to index three sets of document having 2000 articles using
: three threads of embedded solr server. But while indexing, giving me
: exception ?org.apache.lucene.store.LockObtainFailedException: Lock
something doesn't sound right here ... i'm not expert on embeding solr, i
think
: If I query for 'ferrar*' on my index, I will get 'ferrari' and 'red ferrari'
: as a result. And that's fine. But if I try to query for 'red ferrar*', I
: have to put it between double quotes as I want to grant that it will be used
: as only one term, but the '*' is being ignored, as I don't get
: Because I used server.setParser(new XMLResponseParser()), I get the
: wt=xml parameter in the responseHeader, but the format of the
: responseHeader is clearly no XML at all. I expect Solr does output XML,
: but that the QueryResponse, when I print its contents, formats this as
: the string
: Still search on any field (?q=searchTerm) gives following error
: "The request sent by the client was syntactically incorrect (Invalid Date
: String:'searchTerm')."
because "searchTerm" isn't a valid date string
: Is this valid to define *_dt (i.e. date fields ) in solrConfig.xml ?
if you re
: Is there a way to pass the analyzer to the query parser plugin
Solr uses a variant of the PerFieldAnalzyer -- you specify in the
schema.xml what analyzer you want to use for each field.
if you have some sort of *really* exotic situation, you can always design
a custom QParser that looks at s
: i'm newbie with solr. We have installed with together with ezfind from
: EZ Publish web sites and it is working. But in one of the servers we
: have this kind of problem. It works for example for 3 hours, and then in
: one moment it stop to work, searching and indexing does not work.
it's prett
On Thu, Jan 22, 2009 at 7:02 AM, Gunaranjan Chandraraju
wrote:
> Thanks
>
> Yes the source of data is a DB. However the xml is also posted on updates
> via publish framework. So I can just plug in an adapter hear to listen for
> changes and post to SOLR. I was trying to use the XPathProcessor i
: what i need is ,to log the existing urlid and new urlid(of course both will
: not be same) ,when a .xml file of same id(unique field) is posted.
:
: I want to make this by modifying the solr source.Which file do i need to
: modify so that i could get the above details in log ?
:
: I tried with
Thanks
Yes the source of data is a DB. However the xml is also posted on
updates via publish framework. So I can just plug in an adapter hear
to listen for changes and post to SOLR. I was trying to use the
XPathProcessor inside the SQLEntityProcessor and this did not work
(using 1.3 -
Hi Grant
Thanks for the reply. My response below.
The data is stored as XMLs. Each record/entity corresponds to an
XML. The XML is of the form
...
I have currently put it in a schema.xml and DIH handler as follows
schema.xml
data-import.xml
Ron Chan wrote:
I'm using out of the box Solr 1.3 that I had just downloaded, so I guess it is the StandardAnalyzer
It seems WordDelimiterFilter worked for you.
Go to Admin console, click analysis, then give:
Field name: text
Field value (Index): SD/DDeck
verbose output: checked
highlight
A ballpark calculation would be
Collected Amount (From GC logging)/ # of Requests.
The GC logging can tell you how much it collected each time, no need to
try and snapshot before and after heap sizes. However (big caveat here),
this is a ballpark figure. The garbage collector is not guaranteed t
(Thanks for the responses)
My filterCache hit rate is ~60% (so I'll try making it bigger), and I am CPU
bound.
How do I measure the size of my per-request garbage? Is it (total heap size
before collection - total heap size after collection) / # of requests to
cause a collection?
I'll try your
After some test with System.currentTimeMillis I have seen that the diference
is almos unapreciable ... but phpresponse was a little bit faster...
Marc Sturlese wrote:
>
> Hey there, I am using Solr as backEnd and I don't mind whou to get back
> the results. How is faster for Solr to create the r
Is there anyway to suppress the logging of the /admin/ping requests? We have
HAProxy configured to do health checks to this URI every couple of seconds
and it is really cluttering our logs. I'd still like to see the logging from
the other requestHandlers.
Thanks!
Todd
Oops, missed that. I'll have to defer to folks with more
SOLR experience than I have, I've pretty much worked
in Lucene.
Best
Erick
On Wed, Jan 21, 2009 at 3:57 PM, Ron Chan wrote:
> I'm using out of the box Solr 1.3 that I had just downloaded, so I guess it
> is the StandardAnalyzer
>
> bu
I'm using out of the box Solr 1.3 that I had just downloaded, so I guess it is
the StandardAnalyzer
but shouldn't the returned docs equal numFound?
- Original Message -
From: "Erick Erickson"
To: solr-user@lucene.apache.org
Sent: Wednesday, 21 January, 2009 20:49:56 GMT +00:00 GMT
It depends (tm). What analyzer are you using when indexing?
I'd expect (though I haven't checked) that StandardAnalyzer
would break SD/DDeck into two tokens, SD and DDeck which
corresponds nicely with what you're reporting.
Other analyzers and/or filters are easy to specify
I'd recommend get
: > I guess most people store it as a simple string "key(separator)value". Is
or use dynamic fields to putthe "key" into the field name...
: > > > multiValued="true" />
...could be...
...then index value
if you omitNorms the overhead of having many fields should be low -
allthough i'm no
I have a test search which I know should return 34 docs and it does
however, numFound says 40
with debug enabled, I can see the 40 it has found
my search looks for "SD DDeck" in the description
34 of them had "SD DDeck" with 6 of them having "SD/DDeck"
now, I can probably work round it if
Have you tried different sizes for the nursery? It should be several
times larger than the per-request garbage.
Also, check your cache sizes. Objects evicted from the cache are
almost always tenured, so those will add to the time needed for
a full GC.
Guess who was tuning GC for a week or two in
>From a high level view, there is a certain amount of garbage collection
that must occur. That garbage is generated per request, through a
variety of means (buffers, request, response, cache expulsion). The only
thing that JVM parameters can address is *when* that collection occurs.
It can occur
: Right, that's probably the crux of it - distributed search required
: some extensions to response writers... things like handling
: SolrDocument and SolrDocumentList.
Grrr... that's right, i forgot that there wasn't any way to make
SolrDocumentList implement DocList ... and i don't think this
The large drop in old generation from 27GB->6GB indicates that things
are getting into your old generation prematurely. They really don't need
to get there at all, and should be collected sooner (more frequently).
Look into increasing young generation sizes via JVM parameters. Also
look into concu
: I am using a ranking algorithm by modifying the XMLWriter to use a
: formulation which takes the top 3 results and query with the 3 results and
: now presents the result with as function of the results from these 3
: queries. Can anyone reply if I can take the top 3results and query with them
:
I would say that putting more Solr instances, each one with your own data
directory could help if you can qualify your docs, in such a way that you
can put "A" type docs in index "A", "B" type docs in index "B", and so on.
2009/1/21 wojtekpia
>
> I'm using a recent version of Sun's JVM (6 update
One other useful piece of information would be how big you
expect your indexes to be. Which you should be able to estimate
quite easily by indexing, say, 20,000 documents from the
relevant databases.
Of particular interest will be the delta between the size of the
index at, say, 10,000 documents a
I'm using a recent version of Sun's JVM (6 update 7) and am using the
concurrent generational collector. I've tried several other collectors, none
seemed to help the situation.
I've tried reducing my heap allocation. The search performance got worse as
I reduced the heap. I didn't monitor the gar
>Hi Fergus,
>
>It seems a field it is expecting is missing from the XML.
You mean there is some field in the document we are indexing
that is missing?
>
>sourceColName="*fileAbsePath*"/>
>
>I guess "fileAbsePath" is a typo? Can you check if that is the cause?
Well spotted. I had made a mess of sa
How many boxes running your index? If it is just one, maybe distributing
your index will get you a better performance during garbage collection.
2009/1/21 wojtekpia
>
> I'm intermittently experiencing severe performance drops due to Java
> garbage
> collection. I'm allocating a lot of RAM to my
Definitely you will want to have more than one box for your index.
You can take a look at distributed search and multicore ate the wiki.
2009/1/21 Thomas Dowling
> On 01/21/2009 12:25 PM, Matthew Runo wrote:
> > At a certain level it will become better to have multiple smaller boxes
> > rather
Can someone please make sense of why the following occurs in our system.
The first item barely matches but scores higher than the second one that
matches all over the place. The second one is a MUCH better match but has a
worse score. These are in the same query results. All I can see are the
nor
What JVM and garbage collector setting? We are using the IBM JVM with
their concurrent generational collector. I would strongly recommend
trying a similar collector on your JVM. Hint: how much memory is in
use after a full GC? That is a good approximation to the working set.
27GB is a very, very l
On Mon, Jan 19, 2009 at 9:42 PM, David Shettler wrote:
> Thank you Shalin, I'm in the process of implementing your suggestion,
> and it works marvelously. Had to upgrade to solr 1.3, and had to hack
> up acts_as_solr to function correctly.
>
> Is there a way to receive a search for a given field
What exactly does Solr do when it receives a new Index? How does it keep
serving while performing the updates? It seems that the part that causes the
slowdown is this transition.
Otis Gospodnetic wrote:
>
> This is an old and long thread, and I no longer recall what the specific
> suggestions
I guess Noble meant the Solr log.
On Tue, Jan 20, 2009 at 9:29 PM, Nick Friedrich <
nick.friedr...@student.uni-magdeburg.de> wrote:
> no, there are no exceptions
> but I have to admit, that I'm not sure what you mean with console
>
>
> Zitat von Noble Paul ??? ?? :
>
> it got rolled bac
On 01/21/2009 12:25 PM, Matthew Runo wrote:
> At a certain level it will become better to have multiple smaller boxes
> rather than one huge one. I've found that even an old P4 with 2 gigs of
> ram has decent response time on our 150,000 item index with only a few
> users - but it quickly goes down
Created SOLR 974: https://issues.apache.org/jira/browse/SOLR-974
--
View this message in context:
http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588634.html
Sent from the Solr - User mailing list archive at Nabble.com.
I'm intermittently experiencing severe performance drops due to Java garbage
collection. I'm allocating a lot of RAM to my Java process (27GB of the 32GB
physically available). Under heavy load, the performance drops approximately
every 10 minutes, and the drop lasts for 30-40 seconds. This coinci
Hi Fergus,
It seems a field it is expecting is missing from the XML.
I guess "fileAbsePath" is a typo? Can you check if that is the cause?
On Wed, Jan 21, 2009 at 5:40 PM, Fergus McMenemie wrote:
> Shalin
>
> Downloaded nightly for 21jan and tried DIH again. Its better but
> still broken.
On Wed, Jan 21, 2009 at 6:05 PM, Fergus McMenemie wrote:
>
> After looking looking at http://issues.apache.org/jira/browse/SOLR-964,
> where
> it seems this issue has been addressed, I had another go at indexing
> documents
> containing DOCTYPE. It failed as follows.
>
>
That patch has not been c
Yes please. Even though the fix is small, it is important enough to be
mentioned in the release notes.
On Wed, Jan 21, 2009 at 11:05 PM, wojtekpia wrote:
>
> Thanks Shalin, a short circuit would definitely solve it. Should I open a
> JIRA issue?
>
>
> Shalin Shekhar Mangar wrote:
> >
> > I guess
Thanks Shalin, a short circuit would definitely solve it. Should I open a
JIRA issue?
Shalin Shekhar Mangar wrote:
>
> I guess Data Import Handler still calls commit even if there were no
> documents created. We can add a short circuit in the code to make sure
> that
> does not happen.
>
--
At a certain level it will become better to have multiple smaller
boxes rather than one huge one. I've found that even an old P4 with 2
gigs of ram has decent response time on our 150,000 item index with
only a few users - but it quickly goes downhill if we get more than 5
or 6. How many do
Is there a useful guide somewhere that suggests system configurations
for machines that will support multiple large-ish Solr indexes? I'm
working on a group of library databases (journal article citations +
abstracts, mostly), and need to provide some sort of helpful information
to our hardware pe
Hi.
Any good protwords.txt out there?
In a fairly standard solr analyzer chain, we use the English Porter analyzer
like so:
For most purposes the porter does just fine, but occasionally words come along
that really don't work out to well, e.g.,
"maine" is stemmed to "main" - clearly goofing
I have been doing some testing (with System.currentTimeMillis) and the
difference is almost unapreciable but bit faster PHPResponseWriter, just
would like to be sure I am right. Does anybody knows it for sure?
Marc Sturlese wrote:
>
> Hey there, I am using Solr as backEnd and I don't mind whou
Hi Shalin,
I have not faced any memory problems as of now. But I had perviously asked a
question regarding caching and memory
(http://www.nabble.com/How-to-open-a-new-searcher-and-close-the-old-one-by-sending-HTTP-request-td21496803.html)-
--
Hello,
After looking looking at http://issues.apache.org/jira/browse/SOLR-964, where
it seems this issue has been addressed, I had another go at indexing documents
containing DOCTYPE. It failed as follows.
This was using the nightly build from 21-jan 2009.
The comments section within jira sugges
Hey there, I am using Solr as backEnd and I don't mind whou to get back the
results. How is faster for Solr to create the response, using
XMLResponseWriter or PHPResponseWriter??
For my front end is faster to process the response created by
PHPResponseWriter but I would not like to improve speed p
On Wed, Jan 21, 2009 at 4:31 PM, Manupriya wrote:
>
> 2. I had asked peviously regarding caching and memory
> management(
> http://www.nabble.com/How-to-open-a-new-searcher-and-close-the-old-one-by-sending-HTTP-request-td21496803.html
> ).
> So how do I schedule auto-commit for my Solr server.
>
>
Shalin
Downloaded nightly for 21jan and tried DIH again. Its better but
still broken. Dozens of embeded tags are stripped from documents
but it now fails every few documents for no reason I can see. Manually
removing embeded tags causes a given problem document to be indexed,
only to have a it fai
Hi
Do you resolve the probleme?? because I have the same prbleme.
Thanks
--
View this message in context:
http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p21581483.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Wed, Jan 21, 2009 at 4:31 PM, Manupriya wrote:
>
> Hi,
>
> Our Solr server is a standalone server and some web applications send HTTP
> query to search and get back the results.
>
> Now I have following two requirements -
>
> 1. we want to schedule 'delta-import' at a specified time. So that we
Hi,
Our Solr server is a standalone server and some web applications send HTTP
query to search and get back the results.
Now I have following two requirements -
1. we want to schedule 'delta-import' at a specified time. So that we dont
have to explicitly send a HTTP request for delta-import.
I've solved the problem.
It was a time zone problem. :)
L.M.
2009/1/21 Luca Molteni :
> Hello list,
>
> Using SolrJ with Solr 1.3 stable, namedlistcodec unmarshal in readVal
> method (line 161) the number
>
> 119914200
>
> as a date (1 January 2008),
>
> While executing the same query with
Otis Gospodnetic schrieb:
now it works :
positionIncrementGap="100">
words="stopwords.txt"/>
max="50" />
language="German" />
protected="protwords.txt
On Wed, Jan 21, 2009 at 3:42 PM, Jaco wrote:
> Thanks for the fast replies!
>
> It appears that I made a (probably classical) error... I didnt' make the
> change to solrconfig.xml to include the when applying the
> upgrade. I include this now, but the slave is not cleaning up. Will this be
> done
Hello list,
Using SolrJ with Solr 1.3 stable, namedlistcodec unmarshal in readVal
method (line 161) the number
119914200
as a date (1 January 2008),
While executing the same query with the solr administration console,
it gives me a different date value:
2007-12-31T23:00:00Z
It seems like
Thanks for the fast replies!
It appears that I made a (probably classical) error... I didnt' make the
change to solrconfig.xml to include the when applying the
upgrade. I include this now, but the slave is not cleaning up. Will this be
done at some point automatically? Can I trigger this?
User a
Otis Gospodnetic schrieb:
Ralf,
Can you paste the part of your schema.xml where you defined the relevant field?
Otis
Sure !
positionIncrementGap="100">
language="German" />
Hi,
There shouldn't be so many files on the slave. Since the empty index.x
folders are not getting deleted, is it possible that Solr process user does
not enough privileges to delete files/folders?
Also, have you made any changes to the IndexDeletionPolicy configuration?
On Wed, Jan 21, 2009
the index.xxx directories are supposed to be deleted (automatically).
you can safely delete them.
But, I am wondering why the index files in the slave did not get
deleted. By default the deletionPolicy is KeepOnlyLastCommit.
On Wed, Jan 21, 2009 at 2:15 PM, Jaco wrote:
> Hi,
>
> I'm running So
Hello,
> Hi,
> I'm running Solr nightly build of 20.12.2008, with patch as discussed on
> http://markmail.org/message/yq2ram4f3jblermd, using Solr replication.
> On various systems running, I see that the disk space consumed on the slave
> is much higher than on the master. One example:
> - Mast
Hi,
I'm running Solr nightly build of 20.12.2008, with patch as discussed on
http://markmail.org/message/yq2ram4f3jblermd, using Solr replication.
On various systems running, I see that the disk space consumed on the slave
is much higher than on the master. One example:
- Master: 30 GB in 138 fil
66 matches
Mail list logo