Hi Friends,
I'm new to solr, been working on it for the past 2-3 months trying to
really get my feet wet with it so that I can transition the current search
engine at my current job to solr. (Eww sphinx haha) anyway I need some
help. I was running around the net getting my suggester working and
On 12/12/2016 1:14 PM, Piyush Kunal wrote:
> We did the following change:
>
> 1. Previously we had 1 shard and 32 replicas for 1.2million documents of
> size 5 GB.
> 2. We changed it to 4 shards and 8 replicas for 1.2 million documents of
> size 5GB
How many machines and shards per machine were
Sharding adds inevitable overhead. Particularly
each request, rather than being serviced on a
single replica has to send out a first request
to each replica, get the ID and sort criteria back,
then send out a second request to get the actual docs.
Especially if you're asking for a lot of rows
bq: We are indexing with autocommit at 30 minutes
OK, check the size of your tlogs. What this means is that all the
updates accumulate for 30 minutes in a single tlog. That tlog will be
closed when autocommit happens and a new one opened for the
next 30 minutes. The first tlog won't be purged
bq: ...so I wonder if reducing the heap is going to help or it won’t
matter that much...
Well, if you're hitting OOM errors than you have no _choice_ but to
reduce the heap. Or increase the memory. And you don't have much
physical memory to grow into.
Longer term, reducing the JVM size (assuming
According to the post you linked to, it strongly advises to buy SSDs. I
got in touch with the systems department in my organization and it turns
out that our VM storage is SSD-backed, so I wonder if reducing the heap
is going to help or it won’t matter that much. Of course, there’s
nothing
One option:
First you may purge all documents before full-reindex that you don't need
to run optimize unless you need the data to serve queries same time.
i think you are running into out of space because your 43 million may be
consuming 30% of total disk space and when you re-index the total
We are having an issue with running out of space when trying to do a
full re-index.
We are indexing with autocommit at 30 minutes.
We have it set to only optimize at the end of an indexing cycle.
On 12/12/2016 02:43 PM, Erick Erickson wrote:
First off, optimize is actually rarely necessary.
All our shards and replicas reside on different machines with 16GB RAM and
4 cores.
On Tue, Dec 13, 2016 at 1:44 AM, Piyush Kunal
wrote:
> We did the following change:
>
> 1. Previously we had 1 shard and 32 replicas for 1.2million documents of
> size 5 GB.
> 2. We
We did the following change:
1. Previously we had 1 shard and 32 replicas for 1.2million documents of
size 5 GB.
2. We changed it to 4 shards and 8 replicas for 1.2 million documents of
size 5GB
We have a combined RPM of around 20k rpm for solr.
But unfortunately we saw a degrade in performance
How much difference between below two parameters from your Solr stats
screen. For e.g. in our case we have very frequent updates which results
into max docs = num docs x2 over the period of time and in that case I have
seen optimization helps in query performance. Unless you have huge
sorry my mistake.. sent to wrong list.
- Original Message -
From: "Shawn Heisey"
To: solr-user@lucene.apache.org
Sent: Monday, December 12, 2016 2:36:26 PM
Subject: Re: regex-urlfilter help
On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote:
> I'm using nutch
ive scoured my nutch and solr config files and I cant find any cause.
suggestions?
Monday, December 12, 2016 2:37:13 PMERROR nullRequestHandlerBase
org.apache.solr.common.SolrException: Unexpected character '&' (code 38) in
epilog; expected '<'
Hi,
I have an external Zookeeper. I don't wanna use SolrCloud as test. I upload
confs to Zookeeper:
server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig
-confdir server/solr/my_collection/conf -confname my_collection
Start servers:
Server 1: bin/solr start -cloud -d
First off, optimize is actually rarely necessary. I wouldn't bother
unless you have measurements to prove that it's desirable.
I would _certainly_ not call optimize every 10M docs. If you must call
it at all call it exactly once when indexing is complete. But see
above.
As far as the commit, I'd
The biggest bang for the buck is _probably_ docValues for the fields
you facet on. If that's the culprit, you can also reduce your JVM heap
considerably, as Toke says, leaving this little memory for the OS is
bad. Here's the writeup on why:
On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote:
> I'm using nutch 1.12 and Solr 5.4.1.
>
> Crawling a website and indexing into nutch.
>
> AFAIK the regex-urlfilter.txt file will cause content to not be crawled..
>
> what if I have
> https:///inside/default.cfm as my seed url...
>
> I don't see any weird character when I manual copy it to any text editor.
That's a good diagnostic step, but there's a chance that Adobe (or your viewer)
got it right, and Tika or PDFBox isn't getting it right.
If you run tika-app on the file [0], do you get the same problem? See our stub
I'm using nutch 1.12 and Solr 5.4.1.
Crawling a website and indexing into nutch.
AFAIK the regex-urlfilter.txt file will cause content to not be crawled..
what if I have
https:///inside/default.cfm as my seed url...
I want the links on this page to be crawled and indexed but I
Multilingual is - hard - fun. What you are trying to do is probably
not super-doable as copyField copies original text representation. You
don't want to copy tokens anyway, as your query-time analysis chains
are different too.
I would recommend looking at the books first.
Mine talks about
Double check if your queries are not running into deep pagination
(q=*:*...=). This is something i recently experienced
and was the only cause of OOM. You may have the gc logs when OOM happened
and drawing it on GC Viewer may give insight how gradual your heap got
filled and run into OOM.
Halp!
I need to reindex over 43 millions documents, when optimized the
collection is currently < 30% of disk space, we tried it over this
weekend and it ran out of space during the reindexing.
I'm thinking for the best solution for what we are trying to do is to
call commit/optimize every
We use jdeb maven plugin to build the debian packages, we use it for Solr
as well
On Dec 12, 2016 9:03 AM, "Adjamilton Junior" wrote:
> Hi folks,
>
> I am new here and I wonder to know why there's no Solr 6.x packages for
> ubuntu/debian?
>
> Thank you.
>
> Adjamilton Junior
>
Hi,
One can use * at highlight fields. As like:
content_*
So, content_de and content_en can match to it. However response will
include such fields:
"highlighting":{
"my query":{
"content_de":
"content_en":
...
Is it possible to map matched fields into a pre defined field.
Hi,
I'm testing language identification. I've enabled it solrconfig.xml. Here
is my dynamic fields at schema:
So, after indexing, I see that fields are generated:
content_en
content_ru
I copy my fields into a text field:
Here is my text field:
I want to let users only search on only
Hi Ahmet,
I don't see any weird character when I manual copy it to any text editor.
On Sat, Dec 10, 2016 at 6:19 PM, Ahmet Arslan
wrote:
> Hi Furkan,
>
> I am pretty sure this is a pdf extraction thing.
> Turkish characters caused us trouble in the past during
Thanks again.
I’m learning more about Solr in this thread than in my previous months
reading about it!
Moving to Solr Cloud is a possibility we’ve discussed and I guess it
will eventually happen, as the index will grow no matter what.
I’ve already lowered filterCache from 512 to 64 and I’m
Ah, 2-phase distributed search is the most likely answer (and
currently classified as more of a limitation than a bug)...
Phase 1 collects the top N ids from each shard (and merges them to
find the global top N)
Phase 2 retrieves the stored fields for the global top N
If any of the ids have been
On 12/12/2016 7:03 AM, Adjamilton Junior wrote:
> I am new here and I wonder to know why there's no Solr 6.x packages
> for ubuntu/debian?
There are no official Solr packages for ANY operating system. We have
binary releases that include an installation script for UNIX-like
operating systems
On 12/12/2016 3:13 AM, Alfonso Muñoz-Pomer Fuentes wrote:
> I’m writing because in our web application we’re using Solr 5.1.0 and
> currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are
> dedicated to Solr and nothing else is running there). We have four
> cores, that are this
You can also try following:
1. reduced stack size of thread using -Xss flag.
2. Try to use sharding instead of single large instance (if possible).
3. reduce cache size in solrconfig.xml
Regards,
Prateek Jain
-Original Message-
From: Alfonso Muñoz-Pomer Fuentes
Hi folks,
I am new here and I wonder to know why there's no Solr 6.x packages for
ubuntu/debian?
Thank you.
Adjamilton Junior
On 12/12/2016 3:56 AM, Rainer Gnan wrote:
> Do the query this way:
> http://hostname.de:8983/solr/live/select?indent=on=*:*
>
> I have no idea whether the behavior you are seeing is correct or wrong,
> but if you send the traffic directly to the alias it should work correctly.
>
> It might turn
I wasn’t aware of docValues and filterCache policies. We’ll try to
fine-tune it and see if it helps.
Thanks so much for the info!
On 12/12/2016 12:13, Toke Eskildsen wrote:
On Mon, 2016-12-12 at 10:13 +, Alfonso Muñoz-Pomer Fuentes wrote:
I’m writing because in our web application we’re
Thanks for the reply. Here’s some more info...
Disk space:
39 GB / 148 GB (used / available)
Deployment model:
Single instance
JVM version:
1.7.0_04
Number of queries:
avgRequestsPerSecond: 0.5478469104833896
GC algorithm:
None specified, so I guess it defaults to the parallel GC.
On
Hello - i need to traverse over the list of response docs in a SearchComponent,
get all values for a specific field, and then conditionally add a new field.
The request handler is configured as follows:
dostuff
I can see that Solr calls the component's process() method, but from
On Mon, 2016-12-12 at 10:13 +, Alfonso Muñoz-Pomer Fuentes wrote:
> I’m writing because in our web application we’re using Solr 5.1.0
> and currently we’re hosting it on a VM with 32 GB of RAM (of which 30
> are dedicated to Solr and nothing else is running there).
This leaves very little
Hi Shawn,
your workaround works and is exactly what I was looking for.
Did you find this solution via trial and error or can you point me to the
appropriate section in the APRGuide?
Thanks a lot!
Rainer
Rainer Gnan
Bayerische Staatsbibliothek
On 12/12/2016 3:32 AM, Rainer Gnan wrote:
> Hi,
>
> actually I am trying to use Collection Aliasing in a SolrCloud-environment.
>
> My set up is as follows:
>
> 1. Collection_1 (alias "live") linked with config_1
> 2. Collection_2 (alias "test") linked with config_2
> 3. Collection_1 is different
On 12/11/2016 8:00 PM, Brian Narsi wrote:
> We are using Solr 5.1.0 and DIH to build index.
>
> We are using DIH with clean=true and commit=true and optimize=true.
> Currently retrieving about 10.5 million records in about an hour.
>
> I will like to find from other member's experiences as to how
Hi,
actually I am trying to use Collection Aliasing in a SolrCloud-environment.
My set up is as follows:
1. Collection_1 (alias "live") linked with config_1
2. Collection_2 (alias "test") linked with config_2
3. Collection_1 is different to Collection _2
4. config_1 is different to config_2
Please provide some information like,
disk space available
deployment model of solr like solr-cloud or single instance
jvm version
no. of queries and type of queries etc.
GC algorithm used etc.
Regards,
Prateek Jain
-Original Message-
From: Alfonso Muñoz-Pomer Fuentes
I am not sure that it's related,
but with local tests we got to a scenario where we
Add doc that somehow has * empty key* and then, when querying with sort over
creationTime with rows=1, we get empty result set.
When specifying the recent doc shard with shards=shard2 we do have results.
I
Hi
Thanks for the reply.
We are using
select?q=*:*=creationTimestamp+desc=1
So as you said we should have got results.
Another piece of information is that we commit within 300ms when inserting
the "sanity" doc.
And again, we delete by query.
We don't have any custom plugin/query
Am 12.12.2016 um 04:00 schrieb Brian Narsi:
> We are using Solr 5.1.0 and DIH to build index.
>
> We are using DIH with clean=true and commit=true and optimize=true.
> Currently retrieving about 10.5 million records in about an hour.
>
> I will like to find from other member's experiences as to
Hi Erick,
thanks for the hint. Indeed, i just forgot to paste the
section into the email. It was configured just the same way as you
wrote. Do you have any idea what else could be the cause for the error?
Best regard,
Gero
-Original Message-
From: Erick Erickson
46 matches
Mail list logo