Hi,
Is there a way to get the DataImportHandler to skip already-seen records
rather than reindexing them?
The UpdateHandler has an add overwrite=false ... capability which (as I
understand it) means that a document whose uniqueKey matches one already in
the index will be skipped instead of
Marc Sturlese wrote:
You can use deduplication to do that. Create the signature based on the
unique field or any field you want.
Cool, thanks, I hadn't thought of that.
--
View this message in context:
Hi,
I'm trying to get the Velocity / Solritas feature to work for one core of a
two-core Solr instance, but it's not playing nice.
I know the right jars are being loaded, because I can see them mentioned in
the log, but still I get a class not found exception:
09-May-2010 15:34:02
Erik Hatcher-4 wrote:
What version of Solr? Try switching to
class=solr.VelocityResponseWriter, and if that doesn't work use
class=org.apache.solr.request.VelocityResponseWriter. The first
form is the recommended way to do it. The actual package changed in
trunk not too long
Sorry -- in the second of those error messages (the NPE) I meant
str name=defTypelucene/str
not standard.
Andrew Clegg wrote:
Erik Hatcher-4 wrote:
What version of Solr? Try switching to
class=solr.VelocityResponseWriter, and if that doesn't work use
class
or /solr/itas and insert your core name in the
middle.
(Does anyone know if there'd be a simple way to make that automatic?)
Andrew Clegg wrote:
Erik Hatcher-4 wrote:
What version of Solr? Try switching to
class=solr.VelocityResponseWriter, and if that doesn't work use
class
Hi folks,
I had a Solr instance (in Jetty on Linux) taken down by a process monitoring
tool (God) with a SIGKILL recently.
How bad is this? Can it cause index corruption if it's in the middle of
indexing something? Or will it just lose uncommitted changes? What if the
signal arrives in the
Lance Norskog-2 wrote:
The PatternReplace and HTMPStrip tokenizers might be the right bet.
The easiest way to go about this is to make a bunch of text fields
with different analysis stacks and investigate them in the Scema
Browser. You can paste an HTML document into the text box and see
findbestopensource wrote:
Could you tell us your schema used for indexing. In my opinion, using
standardanalyzer / Snowball analyzer will do the best. They will not break
the URLs. Add href, and other related html tags as part of stop words and
it
will removed while indexing.
This
Andrew Clegg wrote:
Re. your config, I don't see a minTokenLength in the wiki page for
deduplication, is this a recent addition that's not documented yet?
Sorry about this -- stupid question -- I should have read back through the
thread and refreshed my memory.
--
View this message
Markus Jelsma wrote:
Well, it got me too! KMail didn't properly order this thread. Can't seem
to
find Hatcher's reply anywhere. ??!!?
Whole thread here:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html
--
View this message in
Hi,
I'm after a bit of clarification about the 'limitations' section of the
distributed search page on the wiki.
The first two limitations say:
* Documents must have a unique key and the unique key must be stored
(stored=true in schema.xml)
* When duplicate doc IDs are received, Solr chooses
Mark Miller-3 wrote:
On 7/4/10 12:49 PM, Andrew Clegg wrote:
I thought so but thanks for clarifying. Maybe a wording change on the
wiki
Sounds like a good idea - go ahead and make the change if you'd like.
That page seems to be marked immutable...
--
View this message in context
Chris Hostetter-3 wrote:
a cleaner way to deal with this would be do use something like
RewriteRule -- either in your appserver (if it supports a feature like
that) or in a proxy sitting in front of Solr.
I think we'll go with this -- seems like the most bulletproof way.
Cheers,
Is anyone using ZooKeeper-based Solr Cloud in production yet? Any war
stories? Any problematic missing features?
Thanks,
Andrew.
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-in-production-tp991995p991995.html
Sent from the Solr - User mailing list archive at
Hi,
I'm a little confused about how the tuning params in solrconfig.xml actually
work.
My index currently has mergeFactor=25 and maxMergeDocs=2147483647.
So this means that up to 25 segments can be created before a merge happens,
and each segment can have up to 2bn docs in, right?
But this
Okay, thanks Marc. I don't really have any complaints about performance
(yet!) but I'm still wondering how the mechanics work, e.g. when you have a
number of segments equal to mergeFactor, and each contains maxMergeDocs
documents.
The docs are a bit fuzzy on this...
--
View this message in
--
View this message in context:
http://lucene.472066.n3.nabble.com/Duplicate-docs-when-mergin-tp1261979p1261979.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
I have a field in my index called related_ids, indexed and stored, with the
following field type:
!--
A text field that tokenizes on whitespace, removing non-word
characters at the
start and end of each token, but preserving meaningful punctuation
*within*
? That 1cuk is past the 10,000th term
in record 2.40?
For this to be possible, I have to assume that the FieldAnalysis
tool ignores this limit
FWIW
Erick
On Fri, Oct 23, 2009 at 12:01 PM, Andrew Clegg
andrew.cl...@gmail.comwrote:
Hi,
I have a field in my index called related_ids
Morning,
Last week I was having a problem with terms visible in my search results in
large documents not causing query hits:
http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
Erick suggested it might be related to
being
ignored.
-Yonik
http://www.lucidimagination.com
On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg andrew.cl...@gmail.com
wrote:
Morning,
Last week I was having a problem with terms visible in my search results
in
large documents not causing query hits:
http://www.nabble.com
Yonik Seeley-2 wrote:
Sorry Andrew, this is something that's bitten people before.
search for maxFieldLength and you will see *2* of them in your config
- one for indexDefaults and one for mainIndex.
The one in mainIndex is set at 1 and hence overrides the one in
indexDefaults.
Yonik Seeley-2 wrote:
If you could, it would be great if you could test commenting out the
one in mainIndex and see if it inherits correctly from
indexDefaults... if so, I can comment it out in the example and remove
one other little thing that people could get wrong.
Yep, it seems
which
make it ugly but whatcha gonna do?
Erik
On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote:
Hi,
If I have a DataImportHandler query with a greater-than sign in,
like this:
entity name=higher_node dataSource=database
query=select *,
title as keywords from
://wiki.apache.org/solr/TermsComponent
Helps?
Cheers
Avlesh
On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg
andrew.cl...@gmail.comwrote:
Hi,
If I give a query that matches a single document, and facet on a
particular
field, I get a list of all the terms in that field which appear
Morning,
Can someone clarify how dismax queries work under the hood? I couldn't work
this particular point out from the documentation...
I get that they pretty much issue the user's query against all of the fields
in the schema -- or rather, all of the fields you've specified in the qf
to that particular field for queries (as opposed to indexing).
For
example, if test is matched against a string vs text field,
different
analyzers may be applied to string or text
Hope that helps
Amit
On Thu, Oct 29, 2009 at 4:39 AM, Andrew Clegg
andrew.cl...@gmail.comwrote:
Morning,
Can someone
-value facets.
On Wed, Oct 28, 2009 at 11:36 AM, Andrew Clegg andrew.cl...@gmail.com
wrote:
Isn't the TermVectorComponent more for one document at a time, and the
TermsComponent for the whole index?
Actually -- having done some digging... What I'm really after is the most
informative terms
Hi,
I've recently added the TermVectorComponent as a separate handler, following
the example in the supplied config file, i.e.:
searchComponent name=tvComponent
class=org.apache.solr.handler.component.TermVectorComponent/
requestHandler name=/tvrh
Hi everyone,
I'm experimenting with highlighting for the first time, and it seems
shockingly slow for some queries.
For example, this query:
http://server:8080/solr/select/?q=transferaseqt=dismaxversion=2.2start=0rows=10indent=on
takes 313ms. But when I add highlighting:
not with those really long response
times). Fixed by moving to JRE 1.6 and tuning garbage collection.
Bye,
Jaco.
2009/11/3 Andrew Clegg andrew.cl...@gmail.com
Hi everyone,
I'm experimenting with highlighting for the first time, and it seems
shockingly slow for some queries.
For example
Nicolas Dessaigne wrote:
Alternatively, you could use a copyfield with a maxChars limit as your
highlighting field. Works well in my case.
Thanks for the tip. We did think about doing something similar (only
enabling highlighting for certain shorter fields) but we decided that
perhaps
Hi,
If I run a MoreLikeThis query like the following:
http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=listmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1
one of the hits in the results is and (I don't do any stopword removal on
this field).
However if I
Lukáš Vlček wrote:
I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
powered
by Solr section which are indexed by Google and other search engines but
still they decided to invest resources into
Morning all,
I'm having problems with joining child a child entity from one database to a
parent from another...
My entity definitions look like this (names changed for brevity):
entity name=parent dataSource=db1 query=select a, b, c from
parent_table
entity name=child dataSource=db2
Lukáš Vlček wrote:
When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
Both of these (Nabble in the second case) in case any recent posts
Any ideas on this? Is it worth sending a bug report?
Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.
Cheers,
Andrew.
Andrew Clegg wrote:
Hi,
If I run a MoreLikeThis query like the following:
http
Noble Paul നോബിള് नोब्ळ्-2 wrote:
no obvious issues.
you may post your entire data-config.xml
Here it is, exactly as last attempt but with usernames etc. removed.
Ignore the comments and the unused FileDataSource...
http://old.nabble.com/file/p26335171/dataimport.temp.xml
Chantal Ackermann wrote:
no idea, I'm afraid - but could you sent the output of
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the
TermVectorComponent you've already pasted.
I can, but I'm afraid they're not very illuminating!
aerox7 wrote:
Hi Andrew,
I download the last build of solr (1.4) and i have the same probleme with
DebugNow in Dataimport dev Console. have you found a solution ?
Sorry about slow reply, I've been on holiday. No, I never found a solution,
it worked in some nightlies but not in others,
Hi,
I'm interested in near-dupe removal as mentioned (briefly) here:
http://wiki.apache.org/solr/Deduplication
However the link for TextProfileSignature hasn't been filled in yet.
Does anyone have an example of using TextProfileSignature that demonstrates
the tunable parameters mentioned in
something.
Thanks again,
Andrew.
Erik Hatcher-4 wrote:
On Jan 12, 2010, at 7:56 AM, Andrew Clegg wrote:
I'm interested in near-dupe removal as mentioned (briefly) here:
http://wiki.apache.org/solr/Deduplication
However the link for TextProfileSignature hasn't been filled in yet.
Does
Erik Hatcher-4 wrote:
On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote:
Thanks Erik, but I'm still a little confused as to exactly where in
the Solr
config I set these parameters.
You'd configure them within the processor element, something like
this:
str name=minTokenLen5
(Many apologies if this appears twice, I tried to send it via Nabble
first but it seems to have got stuck, and is fairly urgent/serious.)
Hi,
I'm trying to use the replication handler to take snapshots, then
archive them and ship them off-site.
Just now I got a message from tar that worried me:
:30, Andrew Clegg andrew.cl...@gmail.com wrote:
(Many apologies if this appears twice, I tried to send it via Nabble
first but it seems to have got stuck, and is fairly urgent/serious.)
Hi,
I'm trying to use the replication handler to take snapshots, then
archive them and ship them off-site
Thanks,
Andrew.
On 16 January 2011 12:55, Andrew Clegg andrew.cl...@gmail.com wrote:
PS one other point I didn't mention is that this server has a very
fast autocommit limit (2 seconds max time).
But I don't know if this is relevant -- I thought the files in the
snapshot wouldn't
First of all, apologies if you get this twice. I posted it by email an hour
ago but it hasn't appeared in any of the archives, so I'm worried it's got
junked somewhere.
I'm trying to use a DataImportHandler to merge some data from a database
with some other fields from a collection of XML files,
Chantal Ackermann wrote:
Hi Andrew,
your inner entity uses an XML type datasource. The default entity
processor is the SQL one, however.
For your inner entity, you have to specify the correct entity processor
explicitly. You do that by adding the attribute processor, and the
value
Erik Hatcher wrote:
On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:
entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/
field column=content
xpath=//*[local-name()='structCategory']/*[local
Chantal Ackermann wrote:
my experience with XPathEntityProcessor is non-existent. ;-)
Don't worry -- your hints put me on the right track :-)
I got it working with:
entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml
A couple of questions about the DIH XPath syntax...
The docs say it supports:
xpath=/a/b/subje...@qualifier='fullTitle']
xpath=/a/b/subject/@qualifier
xpath=/a/b/c
Does the second one mean select the value of the attribute called qualifier
in the /a/b/subject element?
e.g. For this
Andrew Clegg wrote:
subject qualifier=some text /
Sorry, Nabble swallowed my XML example. That was supposed to be
[a]
[b]
[subject qualifier=some text /]
[/b]
[/a]
... but in XML.
Andrew.
--
View this message in context:
http://www.nabble.com/Questions-about-XPath-in-data
Noble Paul നോബിള് नोब्ळ्-2 wrote:
On Thu, Aug 13, 2009 at 6:35 PM, Andrew Cleggandrew.cl...@gmail.com
wrote:
Does the second one mean select the value of the attribute called
qualifier
in the /a/b/subject element?
yes you are right. Isn't that the semantics of standard xpath
Noble Paul നോബിള് नोब्ळ्-2 wrote:
yes. look at the 'flatten' attribute in the field. It should give you
all the text (not attributes) under a given node.
I missed that one -- many thanks.
Andrew.
--
View this message in context:
Hi folks,
I'm trying to use the Debug Now button in the development console to test
the effects of some changes in my data import config (see attached).
However, each time I click it, the right-hand frame fails to load -- it just
gets replaced with the standard 'connection reset' message from
Noble Paul നോബിള് नोब्ळ्-2 wrote:
apparently I do not see any command full-import, delta-import being
fired. Is that true?
It seems that way -- they're not appearing in the logs. I've tried Debug Now
with both full and delta selected from the dropdown, no difference either
way.
If I
Try a sdouble or sfloat field type?
Andrew.
johan.sjoberg wrote:
Hi,
we're performing range queries of a field which is of type double. Some
queries which should generate results does not, and I think it's best
explained by the following examples; it's also expected to exist data in
Paul Tomblin wrote:
Is there such a thing as a wildcard search? If I have a simple
solr.StrField with no analyzer defined, can I query for foo* or
foo.* and get everything that starts with foo such as 'foobar and
foobaz?
Yes. foo* is fine even on a simple string field.
Andrew.
--
You can use the Data Import Handler to pull data out of any XML or SQL data
source:
http://wiki.apache.org/solr/DataImportHandler
Andrew.
Elaine Li wrote:
Hi,
I am new solr user. I want to use solr search to run query against
many xml files I have.
I have set up the solr server to
Hi all, I'm having problems getting Solr to start on Tomcat 6.
Tomcat is installed in /opt/apache-tomcat , solr is in
/opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr .
My config file is in /opt/solr/conf/solrconfig.xml .
I have a Solr-specific context file in
Constantijn Visinescu wrote:
This might be a bit of a hack but i got this in the web.xml of my
applicatin
and it works great.
!-- People who want to hardcode their Solr Home directly into the
WAR File can set the JNDI property here...
--
env-entry
hossman wrote:
: Hi all, I'm having problems getting Solr to start on Tomcat 6.
which version of Solr?
Sorry -- a nightly build from about a month ago. Re. your other message, I
was sure the two machines had the same version on, but maybe not -- when I'm
back in the office tomorrow
Andrew Clegg wrote:
hossman wrote:
This is why the examples of using context files on the wiki talk about
keeping the war *outside* of the webapps directory, and using docBase in
your Context declaration...
http://wiki.apache.org/solr/SolrTomcat
Great, I'll try
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in the query
string cause an NPE. Is this normal behaviour? I never tried it with my
previous installation.
Example:
http://myserver:8080/solr/select/?title:%22Creatine+kinase%22
(I've also tried without the URL
=... :)
Erik
On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in
the query
string cause an NPE. Is this normal behaviour? I never tried it with
my
previous installation.
Example:
http://myserver:8080/solr/select
66 matches
Mail list logo