After trying some search case and different params combination of
WordDelimeter. I wonder what is the best strategy to index string
2DA012_ISO MARK 2 and can be search by term 2DA012?
What if I just want _ to be removed both query/index time, what and how to
configure?
Floyd
2013/8/22 Floyd
Thanks for suggestion
but as per us this is not the right way to re-index all the data each and
every time. we mean when we migrate the sole from older to latest version.
there is some way that solr have to provide the solutions for this because
re indexing the 50 lac document is not an easy job.
I am not using dih for indexing csv files. Im pushing data through solrj
code. But i want a status something like what dih gives. ie. fire a
command=status and we get the response. Is anythin like that available for
any type of file indexing which we do through api ?
On Thu, Aug 22, 2013 at
Hi All
I do have some diffculty with understand the relation between the
optimize and merge
Can anyone give some tips about the difference.
Regards
Hi all
About the RAMBufferSize and commit ,I have read the doc :
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/60544
I can not figure out how do they make work.
Given the settings:
ramBufferSizeMB10/ramBufferSizeMB
autoCommit
Aliasing instead of swapping removed this problem!
DO NOT USE SWAP WHEN IN CLOUD MODE (solr 4.3)
--
View this message in context:
http://lucene.472066.n3.nabble.com/Clusterstate-says-state-recovering-but-Core-says-I-see-state-null-tp4084504p4086037.html
Sent from the Solr - User mailing list
I have been running DIH Imports (15 000 000 rows) all day and every now and
then I get some weird errors. Some examples:
A letter is replaced by an unknow character (Should have been a 'V')
285680 [Thread-20] ERROR org.apache.solr.update.SolrCmdDistributor - shard
update error StdNode:
You can use the /admin/mbeans handler to get all system stats. You can
find stats such as adds and cumulative_adds under the update
handler section.
http://localhost:8983/solr/collection1/admin/mbeans?stats=true
On Thu, Aug 22, 2013 at 12:35 PM, Prasi S prasi1...@gmail.com wrote:
I am not using
No one is asking you to re-index data. The Solr 3.5 index can be read
and written by a Solr 4.x installation.
On Thu, Aug 22, 2013 at 12:08 PM, Montu v Boda
montu.b...@highqsolutions.com wrote:
Thanks for suggestion
but as per us this is not the right way to re-index all the data each and
Hi, Im using DIH to index data to solr. Solr 4.4 version is used. Indexing
proceeds normal in the beginning.
I have some 10 data-config files.
file1 - select * from table where id between 1 and 100
file2 - select * from table where id between 100 and 300. and so
on.
Here 4 batches
Thanks much . This was useful.
On Thu, Aug 22, 2013 at 2:24 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
You can use the /admin/mbeans handler to get all system stats. You can
find stats such as adds and cumulative_adds under the update
handler section.
But is it really a good benchmarking, if you flush the cache? Wouldn't you
want to benchmark against a system, that would be comparable to what is
under real (=production) load?
Dmitry
On Tue, Aug 20, 2013 at 9:39 PM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com wrote:
I
thanks
actually the problem is that we have migrated the solr 1.4 index data to
solr 3.5 using replication feature of solr 3.5. so that what ever data we
have in solr 3.5 is of solr 1.4.
so i do not think so it is work in solr 4.x.
so please suggest your view based on my above point.
Thanks
Hello All,
I am also facing a similar issue. I am using Solr 4.3.
Following is the configuration I gave in schema.xml
fieldType name=string_lower_case class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer type=index
tokenizer
On Wed, 2013-08-21 at 10:09 +0200, sivaprasad wrote:
The slave will poll for every 1hr.
And are there normally changes?
We have configured ~2000 facets and the machine configuration is given
below.
I assume that you only request a subset of those facets at a time.
How much RAM does your
optimize is an explicit request to perform a merge. Merges occur in the
background, automatically, as needed or indicated by the parameters of the
merge policy. An optimize is requested from outside of Solr.
-- Jack Krupansky
-Original Message-
From: YouPeng Yang
Sent: Thursday,
Call optimize on your Solr 3.5 server which will write a new index
segment in v3.5 format. Such an index should be read in Solr 4.x
without any problem.
On Thu, Aug 22, 2013 at 5:00 PM, Montu v Boda
montu.b...@highqsolutions.com wrote:
thanks
actually the problem is that we have migrated the
How can you validate that the changes you just made had any impact on the
performance of the cloud if you don't have the same starting conditions?
What we do basically is running a batch of requests to warm up the index and
then launch the benchmark itself. That way we can measure the impact of
On 8/22/2013 2:25 AM, YouPeng Yang wrote:
Hi all
About the RAMBufferSize and commit ,I have read the doc :
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/60544
I can not figure out how do they make work.
Given the settings:
ramBufferSizeMB10/ramBufferSizeMB
On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
Is there a way to flush the cache of all nodes in a Solr Cloud (by
reloading all the cores, through the collection API, ...) without
having to restart all nodes?
As MMapDirectory shares data with the OS disk cache, flushing of
Dear Users,
(Solr3.6 + Tomcat7)
I use since two years Solr with one core, I would like now to add one
another core (a new database).
Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?
(my current database is around 200Go for 86 000 000 docs)
My
Little precision, I'm on Ubuntu 12.04LTS
Le 22/08/2013 15:56, Bruno Mannina a écrit :
Dear Users,
(Solr3.6 + Tomcat7)
I use since two years Solr with one core, I would like now to add one
another core (a new database).
Can I do this without re-indexing my core1 ?
could you point me to a
First, a core is a separate index so it is completely indipendent from
the already existing core(s). So basically you don't need to reindex.
In order to have two cores (but the same applies for n cores): you must
have in your solr.home the file (solr.xml) described here
Hello All,
I am currently doing a spatial query in solr. I indexed coordinates
(type=location class=solr.LatLonType), but the following query failed.
http://localhost/solr/quan/select?q=*:*stats=truestats.field=coordinatesstats.facet=townshiprows=0
It showed an error:
Field type
i'm trying to index a html page and only user the div with the id=content.
unfortunately nothing is working within the tika-entity, only the standard text
(content) is populated.
do i have to use copyField for test_text to get the data?
or is there a problem with the
Can you try SOLR-4530 switch:
https://issues.apache.org/jira/browse/SOLR-4530
Specifically, setting htmlMapper=identity on the entity definition. This
will tell Tika to send full HTML rather than a seriously stripped one.
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn:
I have an updateProcessor defined. It seems to work perfectly when I
index with SolrJ, but when I use DIH (which I do for a full index
rebuild), it doesn't work. This is the case with both Solr 4.4 and Solr
4.5-SNAPSHOT, svn revision 1516342.
Here's a solrconfig.xml excerpt:
You should declare this
str name=update.chainnohtml/str
in the defaults section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:
requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst
On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
You should declare this
str name=update.chainnohtml/str
in the defaults section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:
requestHandler name=/dataimport
You could declare your update chain as the default by adding 'default=true'
to its declaring element:
updateRequestProcessorChain name=nohtml default=true
and then you wouldn't need to declare it as the default update.chain in either
of your request handlers.
On Aug 22, 2013, at 11:57 AM,
yes, yes of course, you should use your already declared request
handler...that was just a copied and pasted example :)
I'm curious about what kind of error you gotI copied the snippet
above from a working core (just replaced the name of the chain)
BTW: AFAIK is the update.processor that
On 8/22/2013 10:02 AM, Steve Rowe wrote:
You could declare your update chain as the default by adding 'default=true'
to its declaring element:
updateRequestProcessorChain name=nohtml default=true
and then you wouldn't need to declare it as the default update.chain in either
of your
i put it in the tika-entity as attribute, but it doesn't change anything. my
bigger concern is why text_test isn't populated at all
On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:
Can you try SOLR-4530 switch:
https://issues.apache.org/jira/browse/SOLR-4530
Specifically, setting
On 8/22/2013 10:06 AM, Andrea Gazzarini wrote:
yes, yes of course, you should use your already declared request
handler...that was just a copied and pasted example :)
I'm curious about what kind of error you gotI copied the snippet
above from a working core (just replaced the name of the
Ok, found
requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdih-config.xml/str
str name=update.chain*nohtml***/str
/lst
/requestHandler
Of course, my mistake...when I
Hello there,
I have installed solr and its working fine on localhost. Have indexed the
example files given along with solr-4.4.0. These are CSV or XML. Now I want
to index mysql database for django project and search the queries from user
end and also implement more features. What should I do?
Now use DIH to get the data from MYSQL database in to SOLR..
http://wiki.apache.org/solr/DataImportHandler
You need to define the field mapping (between my sql and SOLR document) in
data-config.xml.
--
View this message in context:
I was afraid someone would tell me that... thanks for your input
-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: August-22-13 9:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Flushing cache without restarting everything?
On Tue, 2013-08-20 at
Hi all,
I think that there is some lack in solr's ref doc.
Section Running Solr says to run solr using the command:
$ java -jar start.jar
But If I do this with a fresh install, I have a stack trace like this:
http://pastebin.com/5YRRccTx
Is it this behavior as expected?
-
Best regards
i can do it like this but then the content isn't copied to text. it's just in
text_test
entity name=tika processor=TikaEntityProcessor
url=${rec.path}${rec.file} dataSource=dataUrl
field column=text name=text_test
copyField source=text_test dest=text /
/entity
On 22. Aug
I don't think there's an SOLR- SVN connector available out of the box.
You can write a custom SOLRJ indexer program to get the necessary data from
SVN (using JAVA API) and add the data to SOLR.
--
View this message in context:
On Thu, Aug 22, 2013 at 10:56 PM, SolrLover [via Lucene]
ml-node+s472066n4086140...@n3.nabble.com wrote:
Now use DIH to get the data from MYSQL database in to SOLR..
http://wiki.apache.org/solr/DataImportHandler
These are for versions 1.3, 1.4, 3.6 or 4.0.
Why versions are mentioned there?
Your first problem is that the terms aren't getting to the field
analysis chain as a unit, if you attach debug=query to your
query and say you're searching lastName:(ogden erickson),
you'll sees something like
lastName:ogden lastName:erickson
when what you want is
lastname:ogden erickson
(note,
We warm the file buffers before starting Solr to avoid spending time waiting
for disk IO. The script is something like this:
for core in core1 core2 core3
do
find /apps/solr/data/${core}/index -type f | xargs cat /dev/null
done
It makes a big difference in the first few minutes of service.
After you connect to Subversion, you'll need parsers for code, etc.
You might want to try Krugle instead, since they have already written all that
stuff: http://krugle.org/
wunder
On Aug 22, 2013, at 10:43 AM, SolrLover wrote:
I don't think there's an SOLR- SVN connector available out of
Hello, I am dealing with an issue of highlighting and so far the other posts
that I've read have not provided a solution.
When using proximity search (coming soon~10) I get some documents with no
highlights and some documents highlight these words even when they are not
in a 10 word proximity.
You need to:
1) crawl the SVN database
2) index the files
3) make a UI that fetches the original file when you click on a search
results.
Solr only has #2. If you run a subversion web browser app, you can
download the developer-only version of the LucidWorks product and crawl
the SVN web
Thanks a lot !!!
Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from
the already existing core(s). So basically you don't need to reindex.
In order to have two cores (but the same applies for n cores): you
must have in your
Hi,
I have a Solr cloud set up with 12 shards with 2 replicas each, divided on 6
servers (each server hosting 4 cores). Solr version is 4.3.1.
Due to memory errors on one machine, 3 of its 4 indexes became corrupted. I
unloaded the cores, repaired the indexes with the Lucene CheckIndex tool,
Erick,
I've read over SOLR-4816 after finding your comment about the server-side
stack traces showing threads locked up over semaphores and I'm curious how
that issue cures the problem on the server-side as the patch only includes
client-side changes. Do the servers get so tied up shuffling
Right, it's a little arcane. But the lockup is because the
various leaders send documents to each other and wait
for returns. If there are a _lot_ of incoming packets to
various leaders, it can generate the distributed deadlock.
So the shuffling you refer to is the root of the issue.
If the
On Aug 22, 2013, at 19:53 , Kamaljeet Kaur kamal.kaur...@gmail.com wrote:
On Thu, Aug 22, 2013 at 10:56 PM, SolrLover [via Lucene]
ml-node+s472066n4086140...@n3.nabble.com wrote:
Now use DIH to get the data from MYSQL database in to SOLR..
http://wiki.apache.org/solr/DataImportHandler
Verisons mentioned in the wiki only tell you that these features are
available from that version of Solr. This will not be applicable in your
case as you are using the latest version. So everything you find in the wiki
would be available in 4.4 Solr
--
View this message in context:
I don't think you can go into production with that. But cloudera
distribution (with Hue) might be a similar or better option.
Regards,
Alex
On 22 Aug 2013 14:38, Lance Norskog goks...@gmail.com wrote:
You need to:
1) crawl the SVN database
2) index the files
3) make a UI that fetches the
Ah. That's because Tika processor does not support path extraction. You
need to nest one more level.
Regards,
Alex
On 22 Aug 2013 13:34, Andreas Owen a...@conx.ch wrote:
i can do it like this but then the content isn't copied to text. it's just
in text_test
entity name=tika
Thanks, Erick that's exactly the clarification/confirmation I was looking for!
Greg
What version of solr are you using? Have you copied a solr.xml from
somewhere else? I can almost reproduce the error you're getting if I put a
non-existent core in my solr.xml, e.g.:
solr
cores adminPath=/admin/cores
core name=core0 instanceDir=a_non_existent_core /
/cores
...
On Thu,
If I am using solr.SchemaSimilarityFactory to allow different similarities
for different fields, do I set discountOverlaps=true on the factory or
per field?
What is the syntax? The below does not seem to work
similarity class=solr.BM25SimilarityFactory discountOverlaps=true
similarity
Hi Tom,
Don't set it as attributes but as lists as Solr uses everywhere:
similarity class=solr.SchemaSimilarityFactory
bool name=discountOverlapstrue/bool
/similarity
For BM25 you can also set k1 and b which is very convenient!
Cheers
-Original message-
From:Tom Burton-West
Thanks Markus,
I set it , but it seems to make no difference in the score or statistics
listed in the debugQuery or in the ranking. I'm using a field with
CommonGrams and a huge list of common words, so there should be a huge
difference in the document length with and without discountOverlaps.
I should have said that I have set it both to true and to false and
restarted Solr each time and the rankings and info in the debug query
showed no change.
Does this have to be set at index time?
Tom
I am in the process of setting up a search application that allows the user
to view paginated query results. The documents are highly dynamic but I
want the search results to be static, i.e. I don't want the user to click
the next page button, the query reruns, and now he has a different set of
Hi,
How can i prevent solr from update some fields when updating a doc?
The problem is, i have an uuid with the field name uuid, but it is not an
unique key. When a rss source updates a feed, solr will update the doc with the
same link but it generates a new uuid. This is not the desired
Hi jfeist,
Your mail reminds me this blog, not sure about solr though.
http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html
From: jfeist jfe...@llminc.com
To: solr-user@lucene.apache.org
Sent: Friday, August 23, 2013 12:09 AM
What we need is similar to what is discussed here, except not as a filter
but as an actual query:
http://lucene.472066.n3.nabble.com/filter-query-from-external-list-of-Solr-unique-IDs-td1709060.html
We'd like to implement a query parser/scorer that would allow us to combine
SOLR searches with
Hi,
I am using Solr 4.3 with 3 solr hosta and with an external zookeeper
ensemble of 3 servers. And just 1 shard currently.
When I create collections using collections api it creates collections with
names,
collection1_shard1_replica1, collection1_shard1_replica2,
collection1_shard1_replica3.
Hi Dmitry,
So it seems solrjmeter should not assume the adminPath - and perhaps needs
to be passed as an argument. When you set the adminPath, are you able to
access localhost:8983/solr/statements/admin/cores ?
roman
On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan solrexp...@gmail.com wrote:
Hi
be careful with drop_caches - make sure you sync first
On Thu, Aug 22, 2013 at 1:28 PM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com wrote:
I was afraid someone would tell me that... thanks for your input
-Original Message-
From: Toke Eskildsen
Suppose I have two documents with different id, and there is another field,
for instance content-hash which is something like a 16-byte hash of the
content.
Can Solr be configured to return just one copy, and drop the other if both
are relevant?
If Solr does drop one result, do you get any
Alright, thanks for all your help. I finally fix this problem using
PatternReplaceFilterFactory + WordDelimeterfilterFactory.
I first replace _ (underscore) using PatternReplaceFilterFactory and then
using WordDelimeterFilterFactory to generate word and number part to
increase user search hit.
Ah, but what is the definition of punctuation in Solr?
On Wed, Aug 21, 2013 at 11:15 PM, Jack Krupansky j...@basetechnology.comwrote:
I thought that the StandardTokenizer always split on punctuation,
Proving that you haven't read my book! The section on the standard
tokenizer details the
You are right, but here's my null hypothesis for studying the impact on
relevance.Hash the query to deterministically seed random number
generator.Pick one from column A or column B randomly.
This is of course wrong - a query might find two non-relevant results in
corpus A and lots of
OK - I see that this can be done with Field Collapsing/Grouping. I also
see the mentions in the Wiki for avoiding duplicates using a 16-byte hash.
So, question withdrawn...
On Thu, Aug 22, 2013 at 10:21 PM, Dan Davis dansm...@gmail.com wrote:
Suppose I have two documents with different id,
This is actually pretty far afield from my original subject, but it turns
out that I also had issues with NRT and multi-field geospatial
performance in Solr 4, so I'll follow that up.
I've been testing and working with David's SOLR-5170 patch ever since he
posted it, and I pushed it into
Awesome!
Be sure to watch the JIRA issue as it develops. The patch will improve
(I've already improved it but not posted it) and one day a solution is bound
to get committed.
~ David
Jeff Wartes wrote
This is actually pretty far afield from my original subject, but it turns
out that I also
Hi Quan
You claim to be using LatLonType, yet the error you posted makes it clear
you are in fact using SpatialRecursivePrefixTreeFieldType (RPT).
Regardless of which spatial field you use, it's not clear to me what sort of
statistics could be useful on a spatial field. The stats component
Dan,
StandardTokenizer implements the word boundary rules from the Unicode Text
Segmentation standard annex UAX#29:
http://www.unicode.org/reports/tr29/#Word_Boundaries
Every character sequence within UAX#29 boundaries that contains a numeric or an
alphabetic character is emitted as a
77 matches
Mail list logo