Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
My guess was the documentation gap. I did a testing that turning off the CDCR by using action=stop, while continuously sending documents to the source cluster. The tlog files were growing; And after the hard commit, a new tlog file was created and the old files stayed there forever. As soon as

RE: ZooKeeper transaction logs

2017-07-10 Thread Xie, Sean
Not sure if I can answer the question, we previously use the manual command to cleanup the log, and use a linux daemon the schedule it. In windows, there should be corresponding tool to do so. We currently use the Netflix exhibitor to manage the zookeeper instances, and it works pretty well.

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Varun Thacker
Yeah it just seems weird that you would need to disable the buffer on the source cluster though. The docs say "Replicas do not need to buffer updates, and it is recommended to disable buffer on the target SolrCloud" which means the source should have it enabled. But the fact that it's working

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
Yes. Documents are being sent to target. Monitoring the output from “action=queues”, depending your settings, you will see the documents replication progress. On the other hand, if enable the buffer, the lastprocessedversion is always returning -1. Reading the source code, the

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Varun Thacker
After disabling the buffer are you still seeing documents being replicated to the target cluster(s) ? On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean wrote: > After several experiments and observation, finally make it work. > The key point is you have to also disablebuffer on

Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6

2017-07-10 Thread Arcadius Ahouansou
Hello Shawn. Thank you very much for the comment. On 24 June 2017 at 16:14, Shawn Heisey wrote: > On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote: > > Interpretation 1: > > ZooKeeper doesn't *need* an odd number of servers, but there's no > benefit to an even number. If

Re: How to "chain" import handlers: import from DB and from file system

2017-07-10 Thread Susheel Kumar
Use SolrJ if you end up developing Indexer in Java to send documents to Solr. Its been a long i have used DIH but you can gave it a try first, otherwise as Walter suggested developing external indexer is best. On Sun, Jul 9, 2017 at 6:46 PM, Walter Underwood wrote: > 4.

Re: How to "chain" import handlers: import from DB and from file system

2017-07-10 Thread Giovanni De Stefano
Thank you guys for your advice! I would rather take advantage as much as possible of the existing handlers/processors. I just realised that nested entities in DIH is extremely slow: I fixed that with a view on the DB (that does a join between 2 tables). The other thing I have to do is chain

Re: mm = 1 and multi-field searches

2017-07-10 Thread Susheel Kumar
How are you specifying multiple fields. Use qf parameter to specify multiple fields e.g. http://localhost:8983/solr/techproducts/select?indent=on=Samsung%20Maxtor%20hard=json=edismax=name%20manu=on=1 On Mon, Jul 10, 2017 at 4:51 PM, Michael Joyner wrote: > Hello all, > >

mm = 1 and multi-field searches

2017-07-10 Thread Michael Joyner
Hello all, How does setting mm = 1 for edismax impact multi-field searches? We set mm to 1 and get zero results back when specifying multiple fields to search across. Is there a way to set mm = 1 for each field, but to OR the individual field searches together? -Mike/NewsRx

Re: How to "chain" import handlers: import from DB and from file system

2017-07-10 Thread Walter Underwood
I did this at Netflix with Solr 1.3, read stuff out of various databases and sent it all to Solr. I’m not sure DIH even existed then. At Chegg, we have slightly more elaborate system because we have so many collections and data sources. Each content owner writes an “extractor” that makes a

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
After several experiments and observation, finally make it work. The key point is you have to also disablebuffer on source cluster. I don’t know why in the wiki, it didn’t mention it, but I figured this out through the source code. Once disablebuffer on source cluster, the lastProcessedVersion

Re: Returning results for multi-word search term

2017-07-10 Thread Erick Erickson
Well, one issue is that Paddle* Arm* has an implicit OR between the terms. Try +Paddle* +Arm* That'll reduce the documents found, although it would find "Paddle robotic armature" (no such thing, just sayin'). Although another possibility is that you're really sending some_field:Paddle* Arm*

RE: How to "chain" import handlers: import from DB and from file system

2017-07-10 Thread Allison, Timothy B.
>4. Write an external program that fetches the file, fetches the metadata, >combines them, and send them to Solr. I've done this with some custom crawls. Thanks to Erick Erickson, this is a snap: https://lucidworks.com/2012/02/14/indexing-with-solrj/ With the caveat that Tika should really be

RE: Returning results for multi-word search term

2017-07-10 Thread Miller, William K - Norman, OK - Contractor
I forgot to mention that I am using Solr 6.5.1 and I am indexing XML files. My Solr server is running on a Linux OS. ~~~ William Kevin Miller [ecsLogo] ECS Federal, Inc. USPS/MTSC (405) 573-2158 From: Miller, William K - Norman, OK - Contractor

Returning results for multi-word search term

2017-07-10 Thread Miller, William K - Norman, OK - Contractor
I am trying to return results when using a multi-word term. I am using "Paddle Arm" as my search term(including the quotes). I know that the field that I am querying against has these words together. If I run the query using Paddle* Arm* I get the following results, but I want to get only

Re: uploading solr.xml to zk

2017-07-10 Thread Cassandra Targett
In your command, you are missing the "zk" part of the command. Try: bin/solr zk cp file:local/file/path/to/solr.xml zk:/solr.xml -z localhost:2181 I see this is wrong in the documentation, I will fix it for the next release of the Ref Guide. I'm not sure about how to refer to it - I don't think

RE: DIH issue with streaming xml file

2017-07-10 Thread Miller, William K - Norman, OK - Contractor
Please consider this issue closed as we are looking at moving our xml files to the solr server for now. ~~~ William Kevin Miller ECS Federal, Inc. USPS/MTSC (405) 573-2158 -Original Message- From: Miller, William K - Norman, OK - Contractor Sent: Monday, June

Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-10 Thread Joel Bernstein
Yes the hashJoin will read the entire "hashed" query into memory. The documentation explains this. In general the streaming joins were designed for OLAP type work loads. Unless you have a large cluster powering streaming joins you are going to have problems with high QPS workloads. Joel

Re: Multiple Field Search on Solr

2017-07-10 Thread Erik Hatcher
I recommend first understanding the Solr API, and the parameters you need to add the capabilities with just the /select API. Once you are familiar with that, you can then learn what’s needed and apply that to the HTML and JavaScript. While the /browse UI is fairly straightforward, there’s a

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
Did some source code reading, and looks like when lastProcessedVersion==-1, then it will do nothing: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/CdcrUpdateLogSynchronizer.java // if we received -1, it means that the log reader on the

Multiple Field Search on Solr

2017-07-10 Thread Clare Lee
Hello, My name is Clare Lee and I'm working on Apache Solr-6.6.0, Solritas right now and I'm not able to do something I want to do. Could you help me with this? I want to be able to search solr with multiple fields. With the basic configurations(I'm using the core techproducts and just changing

Re: High disk write usage

2017-07-10 Thread Shawn Heisey
On 7/10/2017 2:57 AM, Antonio De Miguel wrote: > I continue deeping inside this problem... high writing rates continues. > > Searching in logs i see this: > > 2017-07-10 08:46:18.888 INFO (commitScheduler-11-thread-1) [c:ads s:shard2 > r:core_node47 x:ads_shard2_replica3]

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Michael McCarthy
We have been experiencing this same issue for months now, with version 6.2. No solution to date. -Original Message- From: Xie, Sean [mailto:sean@finra.org] Sent: Sunday, July 09, 2017 9:41 PM To: solr-user@lucene.apache.org Subject: [EXTERNAL] Re: CDCR - how to deal with the

Resources for solr design and Architecture

2017-07-10 Thread Ranganath B N
Hi, Is there any resource (article or book) which sheds light on the solr design and architecture (interaction between client and server modules in solr, interaction b/w solr modules (java source files) )? Thanks, Ranganath B. N.

RE: ZooKeeper transaction logs

2017-07-10 Thread Avi Steiner
I did use this class using batch file (from Windows server), but it still does not remove anything. I sent number of snapshots to keep as 3, but I have more in my folder. -Original Message- From: Xie, Sean [mailto:sean@finra.org] Sent: Sunday, July 9, 2017 7:33 PM To:

Re: High disk write usage

2017-07-10 Thread Antonio De Miguel
Hi! I continue deeping inside this problem... high writing rates continues. Searching in logs i see this: 2017-07-10 08:46:18.888 INFO (commitScheduler-11-thread-1) [c:ads s:shard2 r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream [DWPT][commitScheduler-11-thread-1]: flushed:

Re: index new discovered fileds of different types

2017-07-10 Thread Jan Høydahl
I think Thaer’s answer clarify how they do it. So at the time they assemble the full Solr doc to index, there may be a new field name not known in advance, but to my understanding the RDF source contains information on the type (else they could not do the mapping to dynamic field either) and so

Re: index new discovered fileds of different types

2017-07-10 Thread Thaer Sammar
Hi Rick, yes the RDF structure has subject, predicate and object. The object data type is not only text, it can be integer or double as well or other data types. The structure of our solar document doesn't only contain these three fields. We compose one document per subject and we use all found

RE: help on implicit routing

2017-07-10 Thread imran
Thanks for the reference, I am guessing this feature is not available through the post utility inside solr/bin Regards, Imran Sent from Mail for Windows 10 From: Jan Høydahl Sent: Friday, July 7, 2017 1:51 AM To: solr-user@lucene.apache.org Subject: Re: help on implicit routing