Thanks Chris. I understand this. But this test is to determine the *maximum*
latency a query can have and hence I have disabled all caches.
After disabling all caches in solrconfig, I was able to remove latency
variation for a single query in most of the cases. But still *sort* queries
are
Dears,
Hi,
According to my requirement I need to change the default behavior of Solr
for overwriting the whole document on unique-key duplication. I am going to
change that the overwrite just part of document (some fields) and other
parts of document (other fields) remain unchanged. First of all I
Hi all,
This is what happens when I run a regular wget query to log the current
number of documents indexed:
2014-07-08:07:23:28 QTime=20 numFound=5720168
2014-07-08:07:24:28 QTime=12 numFound=5721126
2014-07-08:07:25:28 QTime=19 numFound=5721126
2014-07-08:07:27:18 QTime=50071
Please look at https://wiki.apache.org/solr/Atomic_Updates
This does what you want just update relevant fields.
Thanks,
Himanshu
On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com wrote:
Dears,
Hi,
According to my requirement I need to change the default behavior of Solr
Hi all,
I am new to Solr and hdfs, actually, I am trying to index text content
extracted from binary files like PDF, MS Office...etc which are stored on
hdfs (single node), till now I've running Solr on HDFS, and create the core
but I couldn't send the files to solr for indexing.
Can someone
My First assumption: full gc.
Can you please tell us about your jvm setup and maybe trace what happens
the jvms?
On Jul 8, 2014 9:54 AM, Harald Kirsch harald.kir...@raytion.com wrote:
Hi all,
This is what happens when I run a regular wget query to log the current
number of documents indexed:
Dear Himanshu,
Hi,
You misunderstood what I meant. I am not going to update some field. I am
going to change what Solr do on duplication of uniquekey field. I dont want
to solr overwrite Whole document I just want to overwrite some parts of
document. This situation does not come from user side
No, no full GC.
The JVM does nothing during the outages, no CPU, no GC, as checked with
jvisualvm and htop.
Harald.
On 08.07.2014 10:12, Heyde, Ralf wrote:
My First assumption: full gc.
Can you please tell us about your jvm setup and maybe trace what happens
the jvms?
On Jul 8, 2014 9:54
Hi,
Need to optimize index created using CloudSolrServer APIs under SolrCloud
setup of 3 instances on separate machines. Currently it optimizes
sequentially if I invoke cloudSolrServer.optimize().
To make it parallel I tried making three separate HttpSolrServer instances
and invoked
Hi guys,
wondering if there is any proper way to access Schema API via Solrj.
Of course is possible to reach them in Java with a specific Http Request,
but in this way, using SolrCloud for example we become coupled to one
specific instance ( and we don't want) .
Code Example :
-- Forwarded message --
From: Poornima Jay poornima...@rocketmail.com
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1
When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and
When i use solr-langid-3.5.0.jar file after reloading the core i am getting the
below error
SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException
Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.
Thanks,
Poornima
On Tuesday, 8 July 2014 3:36 PM,
I just realized you are not using Solr language detect libraries. You
are using third party one. You did mention that in your first message.
I don't see that library integrated with Solr though, just as a
standalone library. So, you can't just plug in it.
Is there any reason you cannot use one
Hi,
Is this possible to not to count the facets for the blank values?
e.g. cat:
cats:[*,34324,*
10,8635,
20,8226,
50,5162,
30,759,
100,188,
40,13,
200,7]
How is this possible?
With Regards
Aman Tandon
I'm using the google library which I has mentioned in my first mail saying Im
using http://code.google.com/p/language-detection/. I have downloaded the jar
file from the below url
https://www.versioneye.com/java/org.apache.solr:solr-langid/3.6.1
Please let me know from where I need to
Do you need those values stored/indexed? If not, why not remove them
before they hit Solr with appropriate UpdateRequestProcessor?
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
On Tue, Jul 8,
On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote:
Hi,
Is this possible to not to count the facets for the blank values?
e.g. cat:
[...]
Either filter them out in the query, or remove them client-side when
displaying the results.
Regards,
Gora
@Alex, yes we need them to indexed and stored, as we are doing some
processing if fields are blank.
@Gora Thanks, i will try this one.
Thanks for your quick replies.
With Regards
Aman Tandon
On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:
On 8 July 2014 15:46, Aman
Right, but the blank field and missing field are different things. Are
they for you? If yes, then correct, you are stuck with getting them
back. But if blank field is the same as missing/empty field, then
you can pre-process unify them.
Regards,
Alex.
Personal website:
Yes, also i've the same problem.
In my case i have 2 type (parent and children) in a single collection and i
want to retrieve only the parent with a facet on a children field.
I've seen that is possible via block join query (availble by solr 4.5).
I've solr 1.2 and I've thinked about static facet
Hi,
I think most people on this list have heard of Sematext
http://sematext.com/, so I'll skip the company info, and just jump to the
meat, which involves a lot of fun work with Solr and/or Elasticsearch:
We have an opening for an engineer who knows either Elasticsearch or Solr
or both and wants
Hello,
A new Google Group has been recently started for Solr Users who want
to discuss Solr in Thai or need to discuss Solr issues around Thai
language (in Thai or English).
https://groups.google.com/forum/#!forum/solr-user-thailand
The group is monitored by the local Solr consultancy, one of
Good morning to one and all,
I'm using Solr 4.0 Final and I've been struggling mightily with the
elevation component.
It is too limited for our needs; it doesn't handle phrases very well and I
need to have more than one doc with the same keyword or phrase.
So, I need a better solution. One that
No both are same for me
With Regards
Aman Tandon
On Tue, Jul 8, 2014 at 4:01 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
Right, but the blank field and missing field are different things. Are
they for you? If yes, then correct, you are stuck with getting them
back. But if blank
Hi
I'm encountering a surprisingly high increase in response times when I insert
new documents into a SolrCloud, compared with a standalone Solr instance.
I have a SolrCloud set up for test and evaluation purposes. I have four
shards, each with a leader and a replica, distributed over four
Thanks shawn, I am already using the Boosting but the OR condition works for me
as you mentioned.
One question
If I used in search field (TAGs) , it is returning lot of Fields but if try
with the '( something like TAGs, it is getting less, why the ( ) are
changing the results.? They won't
You can sponsor more then 1 document per keyword.
query text=AAA
doc id=A /
doc id=B /
/query
And you might want to try str name=queryFieldTypestring/str instead
of another FieldType. I found that textFields remove whitespace and
concatenated the tokens.
Not sure if this is intended or
Updates are currently done locally before concurrently being sent to all
replicas - so on a single update, you can expect 2x just from that.
As for your results, it sounds like perhaps there is more overhead than we
would like in the code that sends to replicas and forwards updates? Someone
You probably do not need to force merge (mistakenly called optimize) your
index.
Solr does automatic merges, which work just fine.
There are only a few situations where a forced merge is even a good idea. The
most common one is a replicated (non-cloud) setup with a full reindex every
night.
Thanks for your suggestions and recommendations.
If I understand correctly, the MIGRATE command does shard splitting
(around the range of the split.key) and merging behind the scene.
Though, it's a bit difficult to properly monitor the actual migration,
set the proper timeouts, know when to
Hi,
I have an issue regarding collection delete.
when a solr node is in down mode and I delete a collection, all things
seems fine and it deletes the collection from cluster state too.
But when the dead node comes back it register the collection again.
Even when I delete the collection by
I think you are missunderstanding what Himanshu is suggesting to you.
You don't need to make lots of big changes ot the internals of solr's code
to get what you want -- instead you can leverage the Atomic Updates
Optimistic Concurrency features of Solr to get the existing internal Solr
to
I have the below config for the field type text_general. But then I search with
keyword e.g 100-001, it get 100-001, 100 in starting records ending with 001
. I want to treat - as another character not to split.
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
The word delimiter filter has a types parameter where you specify a file
that can map hyphen to alpha or numeric.
There is an example in my e-book.
-- Jack Krupansky
-Original Message-
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Sent: Tuesday, July 8, 2014 2:18
On 7/8/2014 1:53 AM, Harald Kirsch wrote:
Hi all,
This is what happens when I run a regular wget query to log the
current number of documents indexed:
2014-07-08:07:23:28 QTime=20 numFound=5720168
2014-07-08:07:24:28 QTime=12 numFound=5721126
2014-07-08:07:25:28 QTime=19 numFound=5721126
Local disks or shared network disks? --wunder
On Jul 8, 2014, at 11:43 AM, Shawn Heisey s...@elyograg.org wrote:
On 7/8/2014 1:53 AM, Harald Kirsch wrote:
Hi all,
This is what happens when I run a regular wget query to log the
current number of documents indexed:
2014-07-08:07:23:28
All,
There is a tech talk on AOL Dulles campus tomorrow. Do swing by if you can and
share it with your colleagues and friends.
www.meetup.com/Code-Brew/events/192361672/
There will be free food and beer served at this event :)
Thanks,
Rishi.
Alessandro,
I just got this to work myself:
public static final String DEFINED_FIELDS_API = /schema/fields;
public static final String DYNAMIC_FIELDS_API = /schema/dynamicfields;
...
// just get a connection to Solr as usual (the factory is mine - it
will use
Solr atomic update allows for changing only one or more fields of a
document without having to re-index the entire document. But what about
the case where I am sending in the entire document? In that case the whole
document will be re-indexed anyway, right? So I assume that there will be
no
Hello there,
I'm using a project named LIRE for image retrieval based on sole platform.
There is part of the code which i can't understand, so maybe you could help
me.
The project implements request handler named lireq :
public class LireRequestHandler extends RequestHandlerBase
The search
Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts
behaving strangely in a socket-related way. Knowing exactly what's happening at
the transport level is worth a month of guessing and poking.
On Jul 8, 2014, at 3:53 AM, Harald Kirsch harald.kir...@raytion.com wrote:
(Sorry - my mail was sent half ready)
hashes is an array of hash values generated some-how from the image.
So my question is what is the query being done in this part ?
I tried to reconstruct it by my own, by constructing select query with the
hash values seperated by OR but the results were
Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched
doc, then reindex. Whether you use atomic updates or send the entire doc to
Solr, it has to deleteById then add. The perf difference between the atomic
updates and normal updates is likely minimal.
Atomic updates
Thanks for that under-the-cover explanation.
I am not sure what you mean by mix atomic updates with regular field
values. Can you give an example?
Thanks.
Bill
On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote:
Atomic updates fetch the doc with RealTimeGet, apply the
Take a look at this update XML:
add
doc
field name=employeeId05991/field
field name=employeeNameSteve McKay/field
field name=office update=setWalla Walla/field
field name=skills update=addPython/field
/doc
/add
Let's say employeeId is the key. If there's a fourth field,
I see what you mean now. Thanks for the example. It makes things very
clear.
I have been thinking about the explanation in the original response more.
According to that, both regular update with entire doc and atomic update
involves a delete by id followed by a add. But both the Solr
The url for solr atomic update documentation should contain json in the end.
Here is the page -
https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example
curl http://localhost:8983/solr/update/*json* -H 'Content-type:application/json'
Why do you think so?
As of Solr 4, the CSV and JSON handlers have been unified in the
general update handler and the /update/json is there for legacy
reason.
The example should work. If it is not for you, it might be a different reason.
Regards,
Alex.
Personal website:
Hi,
I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
servers with number of shards as 2, replication factor as 2 and mas shards
per node as 4.
Now, I want to add another server to the SolrCloud as a replica. I can see
Collection API to add a new replica but that was added in
Hi ,
Our solr setup consists of 2 Masters and 2Slaves. The slaves would point to
any one of the Masters through a load balancer and replicate the data.
Master1(M1) is the primary indexer. I send data to M1. In case M1 fails, i
have a failover master, M2 and that would be indexing the data. The
Thanks Walter for your inputs.
Our use case and performance benchmark requires us to invoke optimize.
Here we see a chance of improvement in performance of optimize() if invoked
in parallel.
I found that if* distrib=false *is used, the optimization will happen in
parallel.
But I could not find
I'm working on a product hosted with AWS that uses Elastic Beanstalk
auto-scaling to good effect and we are trying to set up similar (more or
less) runtime scaling support with Solr. I think I understand how to set
this up, and wanted to check I was on the right track.
We currently run 3 cores on
Yes, you can just call a Core Admin CREATE on the new node with the
collection name and optionally the shard name.
On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta varun.vgu...@gmail.com wrote:
Hi,
I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
servers with number of
Yes, there is a way.
One node on which replica needs to be created hit
curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=corenamecollection=collectionshard=
http://localhost:8983/solr/admin/cores?action=CREATEname=mycorecollection=collection1shard=shard2
shardid'
For example
curl
I seriously doubt that you are required to force merge.
How much improvement? And is the big performance cost also OK?
I have worked on search engines that do automatic merges and offer forced
merges for over fifteen years. For all that time, forced merges have usually
caused problems.
Stop
Our index has almost 100M documents running on SolrCloud of 3 shards and
each shard has an index size of about 700GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
56 matches
Mail list logo