I had to overcome this issue, as I needed to analyze multivalued fields. The
fact that UIMA don't analyse multivalued fields is a known bug in UIMA. With
the help of Maryam, I solved the issue. The JIRA issue, along with a working
patch, can be found here:
Hi,
I wants to set -00-00T00:00:00Z value for date field where I do not
have the value. When the index the at field with value as desired it is
getting indexed as 0002-11-30T00:00:00Z.
What is the reason behind this?
With Regards
Aman Tandon
On Wed, 2014-10-29 at 23:37 +0100, Will Martin wrote:
This command only touches OS level caches that hold pages destined for (or
not) the swap cache. Its use means that disk will be hit on future requests,
but in many instances the pages were headed for ejection anyway.
It does not have
Hello all,
Is there a parameter in solr 4.10.1 api allowing user to fix prefix length
in fuzzy search.
Best regards,
Elisabeth
Hi,
We did some tests with 4 shards / 4 different tomcat instances on the
same server and the average latency was smaller than the one when having
only one shard.
We tested also é shards on different servers and the performance results
were also worse.
It seems that the sharding does not make
Hi guys,
just wondering if any solution was found for this?
I have a similar problem - Solr 4.7.2, 2-server cloud, single replicated
shard.
At random times one of the server dies with a the same message as in the
title of this thread.
I was hoping there might be a solution? (upgrading Solr is
Hi,
Luke has a feature of index exporting, given that output format suits your
needs (xml). https://github.com/DmitryKey/luke/releases/tag/luke-4.10.1
http://dmitrykan.blogspot.fi/2014/09/exporting-lucene-index-to-xml-with-luke.html
It does not have the option to export select fields only,
On top of what Shawn rightly said, two things:
1. Try to benchmark yourself (best bet) solution with and without the
shingles. Then you know better and have story with numbers to tell.
2. If you go with the shingles approach, consider removing duplicates with
How can I tell if the stop words is resolved? This is what I get when I turn
debugging on:
http://apaste.info/0Uz http://apaste.info/0Uz
When I put:
q=title:(what if) OR title:what if^10
I get this:
rawquerystring: title:(what if) OR title:\what if\^10,
querystring: title:(what if) OR
Edit:
I filtered my query to author:randall so I could see the score that it's
getting from the query. This is the score of the record that contains what
if:
score: 0.004032644
The other two books are getting this score:
score: 0.0069850935
So... the boost is obviously not hitting that record. I
Thank you Dmitry. Any ideas why the Solr /export is not working for me? I
forgot to mention that this is Solr Cloud.
I believe I've defined the field correctly, and I've also tried using
another field (title), but I get the same error:
Title must have DocValues to use this feature..
My goal is
Solr 4.10 is the very first release of the export feature. It does require
that all fields being sorted and exported have docValues = true in the
schema. This is likely to change in the future, but DocValues will likely
always provide the best indexing option for sorting and exporting full
result
On 10/30/2014 4:32 AM, Anca Kopetz wrote:
We did some tests with 4 shards / 4 different tomcat instances on the
same server and the average latency was smaller than the one when having
only one shard.
We tested also é shards on different servers and the performance results
were also worse.
Hello i have problem with design of schema in Solr. I have a transcript of a
telephone conversation in this format. I parse it at individual fields. I
have this schema:
?xml version=1.0?
add
doc
field name=id01.cn/field
field name=t0br / 1br / 2br / 2 br / 3 br / /field
field name=st0.00br /
Are you going to use the values stored on Solr to display the data in HTML? For
searching purposes I suggest to delete all the HTML tags, and store the plain
text, for this you could use the HTMLStripCharFilterFactory char filter, this
will clean your content and only pass the actual text which
Hi,
You are right, it is a mistake in my phrase, for the tests with 4
shards/ 4 instances, the latency was worse (therefore *bigger*) than
for the tests with one shard.
In our case, the query rate is high.
Thanks,
Anca
On 10/30/2014 03:48 PM, Shawn Heisey wrote:
On 10/30/2014 4:32 AM, Anca
The other ones are still rating higher. I think it's because the other two
titles contain what 3 times.. the more it says what, the higher it scores.
I'm not sure what else can be done. Does anybody else have any ideas?
--
View this message in context:
Thanks Michael. We’re looking into the use of localparams now.
On 29 Oct 2014, at 12:56, Michael Ryan mr...@moreover.com wrote:
It is indeed possible. Just need to use a different syntax. As far as I know,
the facet parameters need to be local parameters, like this...
I am afraid, it is not very clear what you are trying to do here (the
sentence below). Could you explain again the business level results.
Are you trying to search for words within particular given time range?
Can those words span the segments? Or are you trying to find segments
with all their
Thanks for the info Daniel. I will go forth and make a better client.
On Oct 29, 2014, at 2:28 AM, Daniel Collins danwcoll...@gmail.com wrote:
I kind of think this might be working as designed, but I'll be happy to
be corrected by others :)
We had a similar issue which we discovered by
Hi All,
This might be a simple question. I tried to find a solution, but not exactly
finding what I want. I have the following fields f1, f2 and f3. I want to do
an AND query in these fields.
If I want to search for single word in these 3 fields, then I am facing no
problem. I can simply
On 10/30/2014 04:47 AM, Otis Gospodnetic wrote:
Hi/Bok Jakov,
2) sounds good to me. It means no down-time. 1) means stoppage. If
stoppage is not OK, but falling behind with indexing new content is OK, you
could:
* add a new cluster
* start reading from old index and indexing into the new
Solr has never really owrked well with years prior to 1 because the
specs for how they should be formated/parsed -- in particular realted to
year 0 have always been painfully ambiguious/contradictory.
https://issues.apache.org/jira/browse/SOLR-2773
If you are really trying to deal with year 0
Hi, a simple question how to boost field-not-empty. For some reasons
solr(4.6) returns rows with empty fields first (while the fields are not
part of the search query).
I came across this old thread
http://grokbase.com/t/lucene/solr-user/125e4yenha/boosting-on-field-empty-or-not
, but no solution
Simple question:
What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script?
Thanks,
Craig
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman
You don't reindex Solr. You reindex data into Solr. So, this depends
where you data is coming from and how often it changes. If the data
does not change, no point re-indexing it. And how do you get the data
into the Solr in the first place?
Regards,
Alex.
Personal:
Right, of course. The data changes every few days. According to this
article, you can run a CRON Job to create a new index.
http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips
On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
You don't reindex
The data gets into Solr via MySQL script.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman
On Oct 30, 2014, at 12:11 PM, Craig Hoffman mountain@gmail.com wrote:
Right, of course. The
Then you have to run it again and again
30. okt. 2014 19:18 skrev Craig Hoffman mountain@gmail.com følgende:
The data gets into Solr via MySQL script.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW:
Do you mean DataImportHandler? If so, you can create full and
incremental queries and trigger them - from CRON - as often as you
would like. E.g. 1am nightly.
Regards,
Alex.
On 30 October 2014 14:17, Craig Hoffman mountain@gmail.com wrote:
The data gets into Solr via MySQL script.
Simple add this line to your crontab with crontab -e command:
0,30 * * * * /usr/bin/wget
http://solr_host:8983/solr/core_name/dataimport?command=full-import
This will full import every 30 minutes. Replace solr_host and core_name
with your configuration
*Using delta-import command*
Delta
You can use FunctionQuery that allows one to use the actual value of a field
and functions of those fields in a relevancy score.
Two function will help you, which are :
*exists*
exists(field|function) returns true if a value exists for a given document.
Example use: exists(myField) will return
Thanks! One more question. WGET seems to choking on a my URL in particular the
# and the character . What’s the best method escaping?
http://My Host
:8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true
--
Craig Hoffman
w:
You probably just need to put double quotes around the url.
On 10/30/14 15:27, Craig Hoffman wrote:
Thanks! One more question. WGET seems to choking on a my URL in particular the #
and the character . What’s the best method escaping?
http://My Host
On 10/30/2014 1:27 PM, Craig Hoffman wrote:
Thanks! One more question. WGET seems to choking on a my URL in particular
the # and the character . What’s the best method escaping?
http://My Host
:8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true
Putting
Thanks everyone. I got it working.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman
On Oct 30, 2014, at 1:48 PM, Shawn Heisey apa...@elyograg.org wrote:
On 10/30/2014 1:27 PM, Craig
Howdy all -
The short version is: We are not seeing Solr Cloud performance scale (event
close to) linearly as we add nodes. Can anyone suggest good diagnostics for
finding scaling bottlenecks? Are there known 'gotchas' that make Solr Cloud
fail to scale?
In detail:
We have used Solr (in
On 10/30/2014 2:23 PM, Ian Rose wrote:
My methodology is as follows.
1. Start up a K solr servers.
2. Remove all existing collections.
3. Create N collections, with numShards=K for each.
4. Start load testing. Every minute, print the number of successful
updates and the number of failed
Thanks :)
On Thu, Oct 30, 2014 at 7:49 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com
wrote:
You can use FunctionQuery that allows one to use the actual value of a
field
and functions of those fields in a relevancy score.
Two function will help you, which are :
*exists*
If you want to increase QPS, you should not be increasing numShards.
You need to increase replicationFactor. When your numShards matches the
number of servers, every single server will be doing part of the work
for every query.
I think this is true only for actual queries, right? I am
If you are issuing writes to shard non-leaders, then there is a large overhead
for the eventual redirect to the leader. I noticed a 3-5 times performance
increase by making my write client leader aware.
On Oct 30, 2014, at 2:56 PM, Ian Rose ianr...@fullstory.com wrote:
If you want to
On 10/30/2014 2:56 PM, Ian Rose wrote:
I think this is true only for actual queries, right? I am not issuing
any queries, only writes (document inserts). In the case of writes,
increasing the number of shards should increase my throughput (in
ops/sec) more or less linearly, right?
No, that
Hi All,
We have a SOLR cloud instance that has been humming along nicely for months.
Last week we started experiencing missing records.
Admin DIH Example:
Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s)
A *:* search claims that there are only 903,902 this is the first full
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me.
Regards
Olivier
2014-10-28 23:35 GMT+01:00 Erick Erickson erickerick...@gmail.com:
And one other consideration in addition to the two excellent responses
so far
In a SolrCloud environment, SolrJ via
I am curious , how many shards do you have and whats the replication factor
you are using ?
On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke aj.le...@securitylabs.com wrote:
Hi All,
We have a SOLR cloud instance that has been humming along nicely for
months.
Last week we started experiencing
Hi All,
As I previously reported due to no overlap in terms of the documets in the
SolrCloud replicas of the index shards , I have turned off the replication
and basically have there shards with a replication factor of 1.
It obviously seems will not be scalable due to the fact that the same core
I think ZK stuff may actually be easier to handle, no?
Add new ones to the existing ZK cluster and then remove the old ones.
Won't this work smoothly?
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr Elasticsearch Support * http://sematext.com/
On Thu, Oct
Actually I found out how to form the query. I just need to use,
q=f1:(word1 word2) AND f2:(word3 word4) AND f3:(word5 word6)
Thanks,
V.Sriram
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166744.html
Sent from the Solr - User mailing list
This is not too surprising. There are additional hops necessary for a
cloud setup. This is the sequence, let's say there are 4 shards and the
rows parameter on the query is 10 and you're sorting by score
node1 receives request.
node1 sends the request out to each shard
node1 receives the top 10
So what happens if you increase the boost to 100? or 20?
The problem is that boosting will always be more art than science.
What about the other 3 possibilities I mentioned?
Basically, you have to tweak things to fit your corpus, and it's often
an empirically determined thing.
Best,
Erick
On
Matt:
You might want to look at SolrJ, in particular with the use of CloudSolrServer.
The big benefit here is that it'll route the docs to the correct leader for each
shard rather than relying on the nodes to communicate with each other.
Here's a SolrJ example. NOTE: it used
bq: ...while the fields are not part of the search query
I'm really confused. The presence or absence of fields that
aren't part of the search should be totally irrelevant to
scoring. Are you perhaps sorting by a different field?
It'd help if you showed us the query you're sending, a sample
of
Hi Chris,
Thanks for replying.
but if your goal, as you said, is to index -00-00T00:00:00Z for
documenst that have no value in the date field -- i have to ask why?
I was just trying to index the fields returned by my msql and i found this
issue. So i asked in the group. Sorry for writing
Your indexing client, if written in SolrJ, should use CloudSolrServer
which is, in Matt's terms leader aware. It divides up the
documents to be indexed into packets that where each doc in
the packet belongs on the same shard, and then sends the packet
to the shard leader. This avoids a lot of
First question: Is there any possibility that some of the docs
have duplicate IDs (uniqueKeys)? If so, then some of
the docs will be replaced, which will lower your returns.
One way to figuring this out is to go to the admin screen and if
numDocs maxDoc, then documents have been replaced.
Also,
Jakov:
Be particularly aware of the ADDREPLICA collections API
command here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
That allows you to specify exactly which node the new replica should be on,
so you can force it to be on the new HW. Here's
: I was just trying to index the fields returned by my msql and i found this
If you are importing dates from MySql where you have -00-00T00:00:00Z
as the default value, you should actaully be getting an error lsat time i
checked, but this explains the right way to tell the MySQL JDBC
Right, but do be aware of one thing. The form
f1:(word1 word2) has an implicit OR between them
based on q.op which is specified in your
solrconfig.xml file for the request handler you're
using.
This is no problem, but if you ever specify q.op as AND
either in solrconfig.xml or as an explicit
+1 for CloudSolrServer
CloudSolrServer also has built in fault tolerance (i.e. if the master shard
is not reachable then it adds to the replica) and much better error
reporting than ConcurrentUpdateSolrServer. The only downside is lack of
batching. As long as you are adding documents in decent
Thanks Eric. I tried q.op=AND and noticed that it is equivalent to
specifying,
q=f1:word1 word2 AND f2:word3 word4 AND f3:word5 word6
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166760.html
Sent from the Solr - User mailing list archive at
Thanks for the suggestions so for, all.
1) We are not using SolrJ on the client (not using Java at all) but I am
working on writing a smart router so that we can always send to the
correct node. I am certainly curious to see how that changes things.
Nonetheless even with the overhead of extra
U. That may be true for your particular example data set, but not
in the general case, so don't be fooled.
q.op=AND is equivalent to
q=f1:(word1 AND word2) AND f2:(word3 AND word4) AND f3:(word5 AND word6)
This query
q=f1:word1 word2 AND f2:word3 word4 AND f3:word5 word6
would not match a
I'm really confused:
bq: I am not issuing any queries, only writes (document inserts)
bq: It's clear that once the load test client has ~40 simulated users
bq: A cluster of 3 shards over 3 Solr nodes *should* support
a higher QPS than 2 shards over 2 Solr nodes, right
QPS is usually used to
63 matches
Mail list logo