Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Shalin Shekhar Mangar
Hi Walter, I wonder why you think SolrCloud isn't necessary if you're indexing once per week. Isn't the automatic failover and auto-sharding still useful? One can also do custom sharding with SolrCloud if necessary. On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood wrote: > More memory or fast

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood
More memory or faster disks will make a much bigger improvement than a forced merge. What are you measuring? If it is average query time, that is not a good measure. Look at 90th or 95th percentile. Test with queries from logs. No user can see a 10% or 20% difference. If your managers are watch

Re: External File Field eating memory

2014-07-08 Thread Kamal Kishore Aggarwal
Hi All, It was found that external file, which was getting replicated after every 10 minutes was reloading the core as well. This was increasing the query time. Thanks Kamal Kishore On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal < kkroyal@gmail.com> wrote: > With the above replic

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather
Our index has almost 100M documents running on SolrCloud of 3 shards and each shard has an index size of about 700GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood
I seriously doubt that you are required to force merge. How much improvement? And is the big performance cost also OK? I have worked on search engines that do automatic merges and offer forced merges for over fifteen years. For all that time, forced merges have usually caused problems. Stop do

Re: Add a new replica to SolrCloud

2014-07-08 Thread Himanshu Mehrotra
Yes, there is a way. One node on which replica needs to be created hit curl ' http://localhost:8983/solr/admin/cores?action=CREATE&name=&collection=&shard=< shardid>' For example curl ' http:/

Re: Add a new replica to SolrCloud

2014-07-08 Thread Shalin Shekhar Mangar
Yes, you can just call a Core Admin CREATE on the new node with the collection name and optionally the shard name. On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta wrote: > Hi, > > I am currently using Solr 4.7.2 and have SolrCloud setup running on 2 > servers with number of shards as 2, replication

Planning ahead for Solr Cloud and Scaling

2014-07-08 Thread Zane Rockenbaugh
I'm working on a product hosted with AWS that uses Elastic Beanstalk auto-scaling to good effect and we are trying to set up similar (more or less) runtime scaling support with Solr. I think I understand how to set this up, and wanted to check I was on the right track. We currently run 3 cores on

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather
Thanks Walter for your inputs. Our use case and performance benchmark requires us to invoke optimize. Here we see a chance of improvement in performance of optimize() if invoked in parallel. I found that if* distrib=false *is used, the optimization will happen in parallel. But I could not find a

Synchronising two masters

2014-07-08 Thread Prasi S
Hi , Our solr setup consists of 2 Masters and 2Slaves. The slaves would point to any one of the Masters through a load balancer and replicate the data. Master1(M1) is the primary indexer. I send data to M1. In case M1 fails, i have a failover master, M2 and that would be indexing the data. The pro

Add a new replica to SolrCloud

2014-07-08 Thread Varun Gupta
Hi, I am currently using Solr 4.7.2 and have SolrCloud setup running on 2 servers with number of shards as 2, replication factor as 2 and mas shards per node as 4. Now, I want to add another server to the SolrCloud as a replica. I can see Collection API to add a new replica but that was added in

Re: fix wiki error

2014-07-08 Thread Alexandre Rafalovitch
Why do you think so? As of Solr 4, the CSV and JSON handlers have been unified in the general update handler and the /update/json is there for legacy reason. The example should work. If it is not for you, it might be a different reason. Regards, Alex. Personal website: http://www.outerthought

fix wiki error

2014-07-08 Thread Susmit Shukla
The url for solr atomic update documentation should contain json in the end. Here is the page - https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example curl http://localhost:8983/solr/update/*json* -H 'Content-type:application/json'

Re: Solr atomic updates question

2014-07-08 Thread Bill Au
I see what you mean now. Thanks for the example. It makes things very clear. I have been thinking about the explanation in the original response more. According to that, both regular update with entire doc and atomic update involves a delete by id followed by a add. But both the Solr reference

Re: Solr atomic updates question

2014-07-08 Thread Steve McKay
Take a look at this update XML: 05991 Steve McKay Walla Walla Python Let's say employeeId is the key. If there's a fourth field, salary, on the existing doc, should it be deleted or retained? With this update it will obviously be deleted: 05991 Steve McKay

Re: Solr atomic updates question

2014-07-08 Thread Bill Au
Thanks for that under-the-cover explanation. I am not sure what you mean by "mix atomic updates with regular field values". Can you give an example? Thanks. Bill On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay wrote: > Atomic updates fetch the doc with RealTimeGet, apply the updates to the > fe

Re: Solr atomic updates question

2014-07-08 Thread Steve McKay
Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched doc, then reindex. Whether you use atomic updates or send the entire doc to Solr, it has to deleteById then add. The perf difference between the atomic updates and "normal" updates is likely minimal. Atomic updates

Re: What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton
(Sorry - my mail was sent half ready) hashes is an array of hash values generated some-how from the image. So my question is what is the query being done in this part ? I tried to reconstruct it by my own, by constructing select query with the hash values seperated by OR but the results were diff

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Steve McKay
Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts behaving strangely in a socket-related way. Knowing exactly what's happening at the transport level is worth a month of guessing and poking. On Jul 8, 2014, at 3:53 AM, Harald Kirsch wrote: > Hi all, > > This is wha

What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton
Hello there, I'm using a project named LIRE for image retrieval based on sole platform. There is part of the code which i can't understand, so maybe you could help me. The project implements request handler named lireq : public class LireRequestHandler extends RequestHandlerBase The search metho

Solr atomic updates question

2014-07-08 Thread Bill Au
Solr atomic update allows for changing only one or more fields of a document without having to re-index the entire document. But what about the case where I am sending in the entire document? In that case the whole document will be re-indexed anyway, right? So I assume that there will be no savi

RE: [Solr Schema API] SolrJ Access

2014-07-08 Thread Cario, Elaine
Alessandro, I just got this to work myself: public static final String DEFINED_FIELDS_API = "/schema/fields"; public static final String DYNAMIC_FIELDS_API = "/schema/dynamicfields"; ... // just get a connection to Solr as usual (the factory is mine - it will use CloudSol

SOLR Talk at AOL Dulles Campus.

2014-07-08 Thread Rishi Easwaran
All, There is a tech talk on AOL Dulles campus tomorrow. Do swing by if you can and share it with your colleagues and friends. www.meetup.com/Code-Brew/events/192361672/ There will be free food and beer served at this event :) Thanks, Rishi.

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Walter Underwood
Local disks or shared network disks? --wunder On Jul 8, 2014, at 11:43 AM, Shawn Heisey wrote: > On 7/8/2014 1:53 AM, Harald Kirsch wrote: >> Hi all, >> >> This is what happens when I run a regular wget query to log the >> current number of documents indexed: >> >> 2014-07-08:07:23:28 QTime=

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Shawn Heisey
On 7/8/2014 1:53 AM, Harald Kirsch wrote: > Hi all, > > This is what happens when I run a regular wget query to log the > current number of documents indexed: > > 2014-07-08:07:23:28 QTime=20 numFound="5720168" > 2014-07-08:07:24:28 QTime=12 numFound="5721126" > 2014-07-08:07:25:28 QTime=19 numFoun

Re: Hypen in search keyword

2014-07-08 Thread Jack Krupansky
The word delimiter filter has a "types" parameter where you specify a file that can map hyphen to alpha or numeric. There is an example in my e-book. -- Jack Krupansky -Original Message- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: Tuesday, July 8, 2014 2:18

Hypen in search keyword

2014-07-08 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
I have the below config for the field type text_general. But then I search with keyword e.g 100-001, it get 100-001, 100 in starting records & ending with 001 . I want to treat "-" as another character not to split.

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Chris Hostetter
I think you are missunderstanding what Himanshu is suggesting to you. You don't need to make lots of big changes ot the internals of solr's code to get what you want -- instead you can leverage the Atomic Updates & Optimistic Concurrency features of Solr to get the existing internal Solr to re

SolrCloud delete replica

2014-07-08 Thread Arvin Barooni
Hi, I have an issue regarding collection delete. when a solr node is in down mode and I delete a collection, all things seems fine and it deletes the collection from cluster state too. But when the dead node comes back it register the collection again. Even when I delete the collection by DELETER

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-08 Thread Damien Dykman
Thanks for your suggestions and recommendations. If I understand correctly, the MIGRATE command does shard splitting (around the range of the split.key) and merging behind the scene. Though, it's a bit difficult to properly monitor the actual migration, set the proper timeouts, know when to di

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood
You probably do not need to force merge (mistakenly called "optimize") your index. Solr does automatic merges, which work just fine. There are only a few situations where a forced merge is even a good idea. The most common one is a replicated (non-cloud) setup with a full reindex every night.

Re: Slow inserts when using Solr Cloud

2014-07-08 Thread Mark Miller
Updates are currently done locally before concurrently being sent to all replicas - so on a single update, you can expect 2x just from that. As for your results, it sounds like perhaps there is more overhead than we would like in the code that sends to replicas and forwards updates? Someone wou

Re: I need a replacement for the QueryElevation Component

2014-07-08 Thread O. Klein
You can sponsor more then 1 document per keyword. And you might want to try string instead of another FieldType. I found that textFields remove whitespace and concatenated the tokens. Not sure if this is intended or not. -- View this message in context: http://lucene.472066.n3.

RE: Exact Match first in the list.

2014-07-08 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Thanks shawn, I am already using the Boosting but the OR condition works for me as you mentioned. One question If I used in search field "(TAGs)" , it is returning lot of Fields but if try with the '(" something like "TAGs", it is getting less, why the " ( )" are changing the results.? They wo

Slow inserts when using Solr Cloud

2014-07-08 Thread Ian Williams (NWIS - Applications Design)
Hi I'm encountering a surprisingly high increase in response times when I insert new documents into a SolrCloud, compared with a standalone Solr instance. I have a SolrCloud set up for test and evaluation purposes. I have four shards, each with a leader and a replica, distributed over four Win

Re: don't count facet on blank values

2014-07-08 Thread Aman Tandon
No both are same for me With Regards Aman Tandon On Tue, Jul 8, 2014 at 4:01 PM, Alexandre Rafalovitch wrote: > Right, but the blank field and missing field are different things. Are > they for you? If yes, then correct, you are stuck with getting them > back. But if "" blank field is the same

I need a replacement for the QueryElevation Component

2014-07-08 Thread eShard
Good morning to one and all, I'm using Solr 4.0 Final and I've been struggling mightily with the elevation component. It is too limited for our needs; it doesn't handle phrases very well and I need to have more than one doc with the same keyword or phrase. So, I need a better solution. One that all

[ANN] Solr Users Thailand - unofficial group

2014-07-08 Thread Alexandre Rafalovitch
Hello, A new Google Group has been recently started for Solr Users who want to discuss Solr in Thai or need to discuss Solr issues around Thai language (in Thai or English). https://groups.google.com/forum/#!forum/solr-user-thailand The group is monitored by the local Solr consultancy, one of Tha

JOB: Solr / Elasticsearch engineer @ Sematext

2014-07-08 Thread Otis Gospodnetic
Hi, I think most people on this list have heard of Sematext , so I'll skip the company info, and just jump to the meat, which involves a lot of fun work with Solr and/or Elasticsearch: We have an opening for an engineer who knows either Elasticsearch or Solr or both and want

Re: Facets on Nested documents

2014-07-08 Thread Walter Liguori
Yes, also i've the same problem. In my case i have 2 type (parent and children) in a single collection and i want to retrieve only the parent with a facet on a children field. I've seen that is possible via block join query (availble by solr 4.5). I've solr 1.2 and I've thinked about static facet f

Re: don't count facet on blank values

2014-07-08 Thread Alexandre Rafalovitch
Right, but the blank field and missing field are different things. Are they for you? If yes, then correct, you are stuck with getting them back. But if "" blank field is the same as missing/empty field, then you can pre-process unify them. Regards, Alex. Personal website: http://www.outerthough

Re: don't count facet on blank values

2014-07-08 Thread Aman Tandon
@Alex, yes we need them to indexed and stored, as we are doing some processing if fields are blank. @Gora Thanks, i will try this one. Thanks for your quick replies. With Regards Aman Tandon On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty wrote: > On 8 July 2014 15:46, Aman Tandon wrote: > > H

Re: don't count facet on blank values

2014-07-08 Thread Gora Mohanty
On 8 July 2014 15:46, Aman Tandon wrote: > Hi, > > Is this possible to not to count the facets for the blank values? > e.g. cat: [...] Either filter them out in the query, or remove them client-side when displaying the results. Regards, Gora

Re: don't count facet on blank values

2014-07-08 Thread Alexandre Rafalovitch
Do you need those values stored/indexed? If not, why not remove them before they hit Solr with appropriate UpdateRequestProcessor? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 201

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Poornima Jay
I'm using the google library which I has mentioned in my first mail saying Im using http://code.google.com/p/language-detection/. I have downloaded the jar file from the below url https://www.versioneye.com/java/org.apache.solr:solr-langid/3.6.1 Please let me know from where I need to download

don't count facet on blank values

2014-07-08 Thread Aman Tandon
Hi, Is this possible to not to count the facets for the blank values? e.g. cat: "cats":[*"",34324,* "10",8635, "20",8226, "50",5162, "30",759, "100",188, "40",13, "200",7] How is this possible? With Regards Aman Tandon

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Alexandre Rafalovitch
I just realized you are not using Solr language detect libraries. You are using third party one. You did mention that in your first message. I don't see that library integrated with Solr though, just as a standalone library. So, you can't just plug in it. Is there any reason you cannot use one of

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Poornima Jay
When i use solr-langid-3.5.0.jar file after reloading the core i am getting the below error  SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder. Thanks, Poornima On Tuesday, 8 July 2014 3:36 PM, Alexan

Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Alexandre Rafalovitch
-- Forwarded message -- From: Poornima Jay Date: Tue, Jul 8, 2014 at 5:03 PM Subject: Re: Language detection for solr 3.6.1 When i try to use solr-langid-3.6.1.jar file in my path /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ and define the path in the solrc

[Solr Schema API] SolrJ Access

2014-07-08 Thread Alessandro Benedetti
Hi guys, wondering if there is any proper way to access Schema API via Solrj. Of course is possible to reach them in Java with a specific Http Request, but in this way, using SolrCloud for example we become coupled to one specific instance ( and we don't want) . Code Example : HttpRe

Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather
Hi, Need to optimize index created using CloudSolrServer APIs under SolrCloud setup of 3 instances on separate machines. Currently it optimizes sequentially if I invoke cloudSolrServer.optimize(). To make it parallel I tried making three separate HttpSolrServer instances and invoked httpSolrServe

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Harald Kirsch
No, no full GC. The JVM does nothing during the outages, no CPU, no GC, as checked with jvisualvm and htop. Harald. On 08.07.2014 10:12, Heyde, Ralf wrote: My First assumption: full gc. Can you please tell us about your jvm setup and maybe trace what happens the jvms? On Jul 8, 2014 9:54 AM,

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian
Dear Himanshu, Hi, You misunderstood what I meant. I am not going to update some field. I am going to change what Solr do on duplication of uniquekey field. I dont want to solr overwrite Whole document I just want to overwrite some parts of document. This situation does not come from user side this

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Heyde, Ralf
My First assumption: full gc. Can you please tell us about your jvm setup and maybe trace what happens the jvms? On Jul 8, 2014 9:54 AM, "Harald Kirsch" wrote: > Hi all, > > This is what happens when I run a regular wget query to log the current > number of documents indexed: > > 2014-07-08:07:23

Re: SOLR on hdfs

2014-07-08 Thread shlash
Hi all, I am new to Solr and hdfs, actually, I am trying to index text content extracted from binary files like PDF, MS Office...etc which are stored on hdfs (single node), till now I've running Solr on HDFS, and create the core but I couldn't send the files to solr for indexing. Can someone please

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Himanshu Mehrotra
Please look at https://wiki.apache.org/solr/Atomic_Updates This does what you want just update relevant fields. Thanks, Himanshu On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian wrote: > Dears, > Hi, > According to my requirement I need to change the default behavior of Solr > for overwriting the

Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Harald Kirsch
Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound="5720168" 2014-07-08:07:24:28 QTime=12 numFound="5721126" 2014-07-08:07:25:28 QTime=19 numFound="5721126" 2014-07-08:07:27:18 QTime=50071 numFound=

Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian
Dears, Hi, According to my requirement I need to change the default behavior of Solr for overwriting the whole document on unique-key duplication. I am going to change that the overwrite just part of document (some fields) and other parts of document (other fields) remain unchanged. First of all I