Trouble in using Solr on Spatial Index

2015-08-31 Thread hanyabin
we use solar for our poi search in apps,and we use spatial search for LBS; Every morning we will full import the pois to solr for index,it cotainers the field like this and when we do full indexed,a full GC occour and cost more than 30s,resulting in a timeout for zookeeper so,the cluster down

RE: Solr 5.2: Same Document in Multiple Shard

2015-08-31 Thread Maulin Rathod
We are not doing anything special in terms of routing. The issue seems fixed after setting numShards=2 parameter in solr.in.cmd file. set -DnumShards=2 Not sure if anything changed in solr 5.2 which requires to add this parameter in solr.in.cmd file. In Solr 4.8 it was working fine even

Re: Get distinct results in Solr

2015-08-31 Thread Alexandre Rafalovitch
Re-read the question. You want to de-dupe on the full text-content. I would actually try to use the dedupe chain as per the link I gave but put results into a separate string field. Then, you group on that field. You cannot actually group on the long text field, that would kill any performance.

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
I tried to follow the de-duplication guide, but after I configured it in solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is no error message. I'm using SimplePostTool to index rich-text documents. Below are my configurations: In solrconfig.xml dedupe

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
Hi Alexandre, Will treating it as String affect the search or other functions like highlighting? Yes, the content must be in my index, unless I do a copyField to do de-duplication on that field.. Will that help? Regards, Edwin On 1 September 2015 at 10:04, Alexandre Rafalovitch

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
Thank you for your advice Alexandre. Will try out the de-duplication from the link you gave. Regards, Edwin On 1 September 2015 at 10:34, Alexandre Rafalovitch wrote: > Re-read the question. You want to de-dupe on the full text-content. > > I would actually try to use the

Re: Using join vs flattening structure

2015-08-31 Thread Erick Erickson
For 1-3, test and see. The problem I often see is that it is _assumed_ that flattening the data will cost a lot in terms of index size and maintenance. Test that assumption before going down the relational road. You haven't talked about how many documents you have, how much data would have to be

Re: Slow Replication between Shard & Replica

2015-08-31 Thread Upayavira
On Mon, Aug 31, 2015, at 02:23 PM, Maulin Rathod wrote: > We are using solrcloud 5.2 with 1 shard (in UK Data Center) and 1 replica > (in Australia Data Center). We observed that data inserted/updated in > shard > (UK Data center) is replicated very slowly to Replica in AUSTRALIA Data > Center

Re: 'missing content stream' issuing expungeDeletes=true

2015-08-31 Thread Upayavira
If you really must expunge deletes, use optimize. That will merge all index segments into one, and in the process will remove any deleted documents. Why do you need to expunge deleted documents anyway? It is generally done in the background for you, so you shouldn't need to worry about it.

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-31 Thread Rallavagu
Erick, Apologies for missing out on status on indexing (replication) issues as I have originally started this thread. After implementing CloudSolrServer instead of CouncurrentUpdateSolrServer things were much better. I simply wanted to follow up on understanding the memory behavior better

DataImportHandler scheduling

2015-08-31 Thread Troy Edwards
I am having a hard time finding documentation on DataImportHandler scheduling in SolrCloud. Can someone please post a link to that? I have a requirement that the DIH should be initiated at a specific time Monday through Friday. Thanks!

Re: DataImportHandler scheduling

2015-08-31 Thread Ahmet Arslan
Hi Troy, I think folks use corncobs (with curl utility) provided by the Operating System. Ahmet On Monday, August 31, 2015 8:26 PM, Troy Edwards wrote: I am having a hard time finding documentation on DataImportHandler scheduling in SolrCloud. Can someone please

RE: DataImportHandler scheduling

2015-08-31 Thread Davis, Daniel (NIH/NLM) [C]
So, I think corncobs is not a utility, but a pattern - you have cron run curl to invoke something on your web application on the localhost (and elsewhere), and it runs the job if the job needs running, thus the webapp keeps the state. There's a utility cronlock (https://github.com/kvz/cronlock)

Re: replication and HDFS

2015-08-31 Thread Joseph Obernberger
Thank you Erick. What about cache size? If we add replicas to our cluster and each replica has nGBytes of RAM allocated for HDFS caching, would that help performance? Specifically the performance we want to increase is time to facet data, time to cluster data and search time. While we

Custom merge logic in SolrCloud.

2015-08-31 Thread Mohan gupta
Hi Folks, I need to merge docs received from multiple shards via a custom logic, a straightforward score based priority queue doesn't work for my scenario (I need to maintain a blend/distribution of docs). How can I plugin my custom merge logic? One way might be to fully implement the

Re: Using join vs flattening structure

2015-08-31 Thread Brian Narsi
We have about 15 million items. Each item has 10 attributes that we are indexing at this time. We are planning on adding 15 more attributes in future. We have about 1 customers. Each of the items mentioned above can have special pricing, etc for each of the customers. There are 6 attributes

Solr 5.3 Faceting on Children with Block Join Parser

2015-08-31 Thread Tom Devel
Apologies for cross posting a question from SO here. I am very interested in the new faceting on child documents feature of Solr 5.3 and would like to know if somebody has figured out how to do it as asked in the question on

Re: Using join vs flattening structure

2015-08-31 Thread Erick Erickson
Mostly just do the most naive data-flattening you can and see how big the index is. You really have to generate the index then run representative queries at it. But naively flattening the data in this case approaches 15B documents, which is a problem, you're sharding over quite a few shards etc.

Re: replication and HDFS

2015-08-31 Thread Erick Erickson
Yes, No, Maybe. bq; Specifically the performance we want to increase is time to facet data, time to cluster data and search time Well, that about covers everything ;) You cannot talk about this without also taking about cache warming. Given your setup, I'm guessing you have very few searches on

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-31 Thread Erick Erickson
OK, thanks for wrapping this up! On Mon, Aug 31, 2015 at 10:08 AM, Rallavagu wrote: > Erick, > > Apologies for missing out on status on indexing (replication) issues as I > have originally started this thread. After implementing CloudSolrServer > instead of

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-31 Thread Tomás Fernández Löbbe
Sorry Jamie, I totally missed this email. There was no Jira that I could find. I created SOLR-7996 On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson wrote: > This sounds like a good idea, I'm assuming I'd need to make my own > UnInvertingReader (or subclass) to do this right?

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-08-31 Thread Kevin Lee
Anyone else running into any issues trying to get the authentication and authorization plugins in 5.3 working? > On Aug 29, 2015, at 2:30 AM, Kevin Lee wrote: > > Hi, > > I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem > to be working

Re: Slow Replication between Shard & Replica

2015-08-31 Thread Shawn Heisey
On 8/31/2015 7:23 AM, Maulin Rathod wrote: > We are using solrcloud 5.2 with 1 shard (in UK Data Center) and 1 replica > (in Australia Data Center). We observed that data inserted/updated in shard > (UK Data center) is replicated very slowly to Replica in AUSTRALIA Data > Center (Due to high

Re: DataImportHandler scheduling

2015-08-31 Thread Shawn Heisey
On 8/31/2015 11:26 AM, Troy Edwards wrote: > I am having a hard time finding documentation on DataImportHandler > scheduling in SolrCloud. Can someone please post a link to that? I have a > requirement that the DIH should be initiated at a specific time Monday > through Friday. Every modern

Overseer Leader gone

2015-08-31 Thread Rishi Easwaran
Hi All, I have a cluster that has the overseer leader gone. This is on Solr 4.10.3 version. Its completely gone from zookeeper and bouncing any instance does not start a new election process. Anyone experience this issue before and any ideas to fix this. Thanks, Rishi.

Re: 'missing content stream' issuing expungeDeletes=true

2015-08-31 Thread Derek Poh
Hi Upayavira In fact we are using optimize currently but was advised to use expunge deletes as it is less resource intensive. So expunge deletes will only remove deleted documents, it will not merge all index segments into one? If we don't use optimize, the deleted documents in the index

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
Thanks Jan. But I read that the field that is being collapsed on must be a single valued String, Int or Float. As I'm required to get the distinct results from "content" field that was indexed from a rich text document, I got the following error: "error":{ "msg":"java.io.IOException: 64

RE: testing with EmbeddedSolrServer

2015-08-31 Thread Moen Endre
Hi Mikhail, Im trying to read 7-8 xml files of data that contain realistic data from our production server. Then I would like to read this data into EmbeddedSolrServer to test for edge cases for our custom date search. The use of EmbeddedSolrServer is purely to separate the data testing from

Re: Get distinct results in Solr

2015-08-31 Thread Jan Høydahl
Hi Check out the CollapsingQParser (https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results). As long as you have a field that will be the same for all duplicates, you can “collapse” on that field. If you not have a “group id”, you can create one using e.g. an MD5

Re: .nabble.com is indexing each post, is it possible to delete my post or hide email id

2015-08-31 Thread Upayavira
Apache only removes or modifies posts when personal information is revealed, such as social security numbers. Email addresses and phone numbers are not considered such. Apache has no control over Nabble and such third party services. I would suggest you resubscribe with a different email address

.nabble.com is indexing each post, is it possible to delete my post or hide email id

2015-08-31 Thread Roshan Agarwal
.nabble.com is indexing each post, is it possible to delete my post or hide email id On Mon, Aug 10, 2015 at 11:24 AM, Roshan Agarwal wrote: > Dear All, > > Can any one let us know how to implement plagiarism Checker with solr, > how to index content with shingles and what

Clustering speed become slow after splitting shards

2015-08-31 Thread Zheng Lin Edwin Yeo
Hi, I've tried to split my collection from 1 shard to 2 shards using the command: http://localhost:8983/solr/admin/collections?action=SPLITSHARD=collection1=shard1 The shard was split successfully with all the index intact. The search and highlight gives the same results before and after the

Sorting parent documents based on a field from children

2015-08-31 Thread Florin Mandoc
Hi, I am trying to model am index from a relational database and i have 3 main entity types: products, buyers and sellers. I am using nested documents for sellers and buyers, as i have many sellers and many buyers for one product: { "Active" : "true", "CategoryID" : 59, "CategoryName" :

Re: Get distinct results in Solr

2015-08-31 Thread Alexandre Rafalovitch
Can't you just treat it as String? Also, do you actually want those documents in your index in the first place? If not, have you looked at De-duplication: https://cwiki.apache.org/confluence/display/solr/De-Duplication Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Re: solrcloud and core swapping

2015-08-31 Thread Upayavira
It doesn't matter which node you do it on. And, you can replace an existing alias by just creating another one with the same name. Upayavira On Mon, Aug 31, 2015, at 02:04 PM, Bill Au wrote: > Thank, Shawn. So I only need to issue the command to update the alias on > one of the node in the

Re: Sorting parent documents based on a field from children

2015-08-31 Thread Mikhail Khludnev
Florin, I disclosure some details in the recent post http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html. Let me know if you have further questions afterwards. I also notice that you use "obvious" syntax: BuyerID=83 but it's hardly ever possible. There is a good habit of

Re: testing with EmbeddedSolrServer

2015-08-31 Thread Mikhail Khludnev
Endre, As I suggested before, consider to avoid test framework, just put all code interacting with EmbeddedSolrServer into main() method. On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre wrote: > Hi Mikhail, > > Im trying to read 7-8 xml files of data that contain realistic data

Slow Replication between Shard & Replica

2015-08-31 Thread Maulin Rathod
We are using solrcloud 5.2 with 1 shard (in UK Data Center) and 1 replica (in Australia Data Center). We observed that data inserted/updated in shard (UK Data center) is replicated very slowly to Replica in AUSTRALIA Data Center (Due to high latency between UK and AUSTRALIA). We are looking to