Re: Import html data in mysql and map schemas using only SolrCELL+TIKA+DIH [scottchu]

2016-05-23 Thread scott.chu
Can anyone show me an example or short help of how I can do it? I am to use Solr 5 or up to carry out it. scott.chu,scott@udngroup.com 2016/5/24 (週二) - Original Message - From: scott(自己) To: solr-user CC: Date: 2016/5/20 (週五) 14:17 Subject: Import html data in mysql and map

Re: SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp
Hi Tom, the pointer to the rule based placement was indeed what I was missing! I simply had to add the rule "shard:*,replica:<2,node:*", as documented, and my replicas do now get distributed as expected :-) thanks, Hendrik On 23/05/16 15:28, Tom Evans wrote: > On Mon, May 23, 2016 at 10:37 AM,

Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread scott.chu
Thanks for your considerable opinion. I'll try addreplica first. scott.chu,scott@udngroup.com 2016/5/24 (週二) - Original Message - From: Erick Erickson To: solr-user ; scott(自己) CC: Date: 2016/5/24 (週二) 01:56 Subject: Re: What to do best when expaning from 2 nodes to 4 nodes?

Re: SolrCloud increase replication factor

2016-05-23 Thread Erick Erickson
About (1), bq: The Solr Admin UI showed that my replication factor changed but otherwise nothing happened. this is as designed AFAIK. There's nothing built in to Solr to _automatically_ add replicas when this property is changed. My guess is that the MODIFYCOLLECTION code was written to help with

Re: How to use a regex search within a phrase query?

2016-05-23 Thread Erick Erickson
I'd play with the timeAllowed option with a full corpus to get a sense of how painful these queries are. There's also the issue of the impact of queries like this on other users to consider Other than that, I think you're on the right path in terms of supporting some common use-cases with

Re: Using solr with increasing complicated access control

2016-05-23 Thread Erick Erickson
I know this seems facetious, but Talk to your clients about _why_ they want such increasingly complex access requirements. Often the logic is pretty flawed for the complexity. Things like "allow user X to see document Y if they're part of groups A, B, C but not D or E unless they are also part

Re: Solr 6.0 Parallel SQL

2016-05-23 Thread Erick Erickson
For <2> and <3> well, yes. To do _anything_ in Solr you need to index the data to Solr. It doesn't magically reach out into the DB and do stuff. <3> you can either use DIH or a SolrJ program and yes, you do have to do some kind of mapping of database columns into Solr documents I want to

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Erick Erickson
Well, ya learn somethin' new every day On Mon, May 23, 2016 at 4:31 PM, Timothy Potter wrote: > Thanks Joel, that cleared things up nicely ... using 4 workers against > 4 shards resulted in 16 queries to the collection. However, not all > replicas were used for all

Using solr with increasing complicated access control

2016-05-23 Thread Lisheng Zhang
Hi, i have been using solr for many years and it is VERY helpful. My problem is that our app has an increasingly more complicated access control to satisfy client's requirement, in solr/lucene it means we need to add more and more fields into each document and use more and more complicated

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter
Thanks Joel, that cleared things up nicely ... using 4 workers against 4 shards resulted in 16 queries to the collection. However, not all replicas were used for all shards, so it's not as balanced as I thought it would be, but we're dealing with small numbers of shards and replicas here. On Mon,

Solr mysql Json import

2016-05-23 Thread vsriram30
Hi All, I am having an use case where I want to index a json field from mysql into solr. The Json field will contain entries as key value pairs. The Json can be nested, but I want to index only the first level field value pairs of Jsons into solr keys and nested levels can be present as value of

Re: highlight don't work if df not specified

2016-05-23 Thread Ahmet Arslan
Hi Solomon, How come hl.q=blah blah=normal_text,title would produce "undefined field text" error message? Please try hl.q=blah blah=normal_text,title just to verify there is a problem with the fielded queries. Ahmet On Monday, May 23, 2016 10:31 AM, michael solomon

Re: Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread Jeff Wartes
My first thought is that you haven’t indexed such that all values of the field you’re grouping on are found in the same cores. See the end of the article here: (Distributed Result Grouping Caveats) https://cwiki.apache.org/confluence/display/solr/Result+Grouping And the “Document Routing”

Re: Commit (hard) at shutdown?

2016-05-23 Thread Per Steffensen
Sorry, I did not see the responses here because I found out myself. I definitely seems like a hard commit it performed when shutting down gracefully. The info I got from production was wrong. It is not necessarily obvious that you will loose data on "kill -9". The tlog ought to save you, but it

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Joel Bernstein
Streaming expressions will utilize all replicas of a cluster when the number of workers >= the number of replicas. For example if there are 40 workers and 40 shards and 5 replicas. For a single parallel request: Each worker will send 1 query to a random replica in each shard. This is 1600

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Joel Bernstein
The image is the correct flow. Are you using workers? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, May 23, 2016 at 7:16 PM, Timothy Potter wrote: > This image from the wiki kind of gives that impression to me: > > >

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter
This image from the wiki kind of gives that impression to me: https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1=1447365789000=v2 On Mon, May 23, 2016 at 11:50 AM, Erick Erickson wrote: > I _think_ this is a distinction between >

Re: Solr 6.0 Parallel SQL

2016-05-23 Thread Joel Bernstein
The docs describe the current capabilities. So if it's not in the docs, it's not supported yet. For example the docs don't mention joins or intersections and they are not supported. Another example is that select count(*) is supported, and select distinct is supported, but select count(distinct)

Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread Erick Erickson
Take a look at the SPLITSHARD Collections API here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 Best value of numShards and replicationFactor: Impossible to say. You have to stress test respecting your SLAs. See:

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Erick Erickson
I _think_ this is a distinction between serving the query and processing the results. The query is the standard Solr processing returning results from one replica per shard. Those results can be partitioned out to N Solr instances for sub-processing, where N is however many worker nodes you

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti
Furthermore I was checking the internals of the old facet implementation ( which comes when using the classic request parameter based, instead of the json facet). It seems that if you enable docValues even with the enun method passed as parameter , actually fc with docValues will be used. i will

Re: SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp
What I find odd is that creating a collection with a replication factor greater then 1 does seem to not end up with replicas on the same node. However when one wants to add replicas later on one need to do the whole placement manually to avoid single point of failures. On 23/05/16 15:28, Tom

Re: Auto Suggestion in solr

2016-05-23 Thread Erick Erickson
Have you seen: https://lucidworks.com/blog/2015/03/04/solr-suggester/ Best, Erick On Sun, May 22, 2016 at 10:07 PM, Mugeesh Husain wrote: > Hello everyone, > > I am looking for some suggestion for auto-suggest like imdb.com. > > just type "samp" in search box in imdb.com

Re: How to use "fq"

2016-05-23 Thread Yonik Seeley
On Mon, May 23, 2016 at 12:41 PM, Steven White wrote: > Thank you Erik and Scott. {!terms} did the job!! I tested like so: > fq={!terms f=category}1,2,3,4,...N > > I read that {!terms} treats the terms in the list as OR, if I have a need > to force AND on my terms, how do

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-23 Thread Abdel Belkasri
That would be a welcomed feature for sure! On Mon, May 23, 2016 at 6:11 AM, Horváth Péter Gergely < peter.gergely.horv...@gmail.com> wrote: > Hi Steve, > > Thank you very much for your inputs. Yes, I do know the aliasing mechanism > offered in Solr. I think the whole question boils down to one

Re: Atomic updates and "stored"

2016-05-23 Thread Erick Erickson
Yes, currently when using Atomic updates _all_ fields have to be stored, except the _destinations_ of copyField directives. Yes, it will make your index bigger. The affects on speed are probably minimal though. The stored data is in your *.fdt and *.fdx segments files and are not referenced only

Re: How to use "fq"

2016-05-23 Thread Erick Erickson
Steven: I'm not sure you can, the terms query parser is built to OR things together. You might be able to use some of the nested query stuff. Or, assuming you have an _additional_ fq clause you want to use just use it as: fq={!terms f=category}1,2,3,4,...N=whaterver Then you're taking advantage

Re: SolrCloud increase replication factor

2016-05-23 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager was designed to provide some easier operations for common kinds of cluster operation. It hasn’t been tested with 6.0 though, so if you try it, please let me know your experience. On 5/23/16, 6:28 AM, "Tom Evans"

Re: How to stop searches to solr while full data import is going in SOLR

2016-05-23 Thread Jeff Wartes
The PingRequestHandler contains support for a file check, which allows you to control whether the ping request succeeds based on the presence/absence of a file on disk on the node. http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html I suppose you could

Solr 6.0 Parallel SQL

2016-05-23 Thread Steven White
Hi everyone, I'm reading on Solr's Parallel SQL. I see some good examples but not much on how to set it up and what are the limitations. My reading on it is that I can use Parallel SQL to send to Solr SQL syntax to search in Solr, but: 1) Does this mean all of SQL's query statements are

Re: How to use "fq"

2016-05-23 Thread Steven White
Thank you Erik and Scott. {!terms} did the job!! I tested like so: fq={!terms f=category}1,2,3,4,...N I read that {!terms} treats the terms in the list as OR, if I have a need to force AND on my terms, how do I do that? Steve On Mon, May 23, 2016 at 9:39 AM, Scott Chu

Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter
I've seen docs and diagrams that seem to indicate a streaming expression can utilize all replicas of a shard but I'm seeing only 1 replica per shard (I have 2) being queried. All replicas are on the same host for my experimentation, could that be the issue? What are the circumstances where all

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein
If you can make min/max work for you instead of sort then it should be faster, but I haven't spent time comparing the performance. But if you're using the top_fc with the min/max param the performance between Solr 4 & Solr 6 should be very close as the data structures behind them are the same.

What to do best when expaning from 2 nodes to 4 nodes? (fix typo) [scottchu]

2016-05-23 Thread Scott Chu
Sorry for the typo. I rewrite my question again: I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes. I am to migrate from 2 nodes to 4 nodes. I am wondering what's the best strategy to split this single shard? Furthermore, if I am ok to reindex, what's the best

What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread Scott Chu
I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes. I am to migrate from 2 nodes to 4 node. I am wondering what's the best stragedy to split this single shard? Furthermore, If I am ok to reindex, what's the best adequate experienced value of numShards and

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti
Hi Joel, thanks for the reply, actually we were not using field collapsing before, we basically want to replace grouping with that. The grouping performance between Solr 4 and 6 are basically comparable. It's surprising I got so big degradation with the field collapsing. So basically the

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein
For exact syntax of the top_fc hint use the official docs. The blog is using an upper case hint, but that was changed to a lower case hint. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, May 23, 2016 at 2:56 PM, Joel Bernstein wrote: > Also I wrote a guide for Solr 5

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein
Also I wrote a guide for Solr 5 Collapsing/Expand performance, that use to be on Heliosearch.org. It's now long available accept through the magic of the Wayback machine. What's not covered is the sort param, which came later. Here it is:

Re: hello i am solr cloud user! i have question!

2016-05-23 Thread Shawn Heisey
On 5/23/2016 6:35 AM, 김두형 wrote: > actually, i want to insert some logs into solrindexsearcher. so the place > where solrindexsearcher is solr-core.jar in dist. i replace new made > solr-core.jar with old solr-core.jar in dist. > in solrconfig i made this solrconfig refered this jar like below. >

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein
Were you using the sort param or min/max param in Solr 4 to select the group head? The sort work came later and I'm not sure how it compares in performance to the min/max param. Since you are collapsing on a string field you can use the top_fc hint which will use a top level field cache for the

Re: How to use "fq"

2016-05-23 Thread Scott Chu
Yonik has a very well article about term qp: Solr Terms Query for matching many terms - Solr 'n Stuff http://yonik.com/solr-terms-query/ Scott Chu,scott@udngroup.com 2016/5/23 (週一) - Original Message - From: Erik Hatcher To: solr-user CC: Date: 2016/5/23 (週一) 21:14 Subject: Re:

Re: SolrCloud increase replication factor

2016-05-23 Thread Tom Evans
On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp wrote: > Hi, > > I have a SolrCloud 6.0 setup and created my collection with a > replication factor of 1. Now I want to increase the replication factor > but would like the replicas for the same shard to be on different

Re: How to use "fq"

2016-05-23 Thread Erik Hatcher
Try the {!terms} query parser. That should make it work well for you. Let us know how it does. Erik > On May 23, 2016, at 08:52, Steven White wrote: > > Hi everyone, > > I'm trying to figure out what's the best way for me to use "fq" when the > list of items is

Atomic updates and "stored"

2016-05-23 Thread Mark Robinson
Hi, I have some 150 fields in my schema out of which about 100 are dynamic fields which I am not storing (stored="false"). In case I need to do an atomic update to one or two fields which belong to the stored list of fields, do I need to change my dynamic fields (100 or so now not "stored") to

How to use "fq"

2016-05-23 Thread Steven White
Hi everyone, I'm trying to figure out what's the best way for me to use "fq" when the list of items is large (up to 200, but I have few cases with up to 1000). My current usage is like so: =category:(1 OR 2 OR 3 OR 4 ... 200) When I tested with up to 1000, I hit the "too many boolean clauses",

hello i am solr cloud user! i have question!

2016-05-23 Thread 김두형
actually, i want to insert some logs into solrindexsearcher. so the place where solrindexsearcher is solr-core.jar in dist. i replace new made solr-core.jar with old solr-core.jar in dist. in solrconfig i made this solrconfig refered this jar like below. . . . however, solr did not refer what

Re: Sorting on child document field.

2016-05-23 Thread Pranaya Behera
Hi Mikhail, Thanks. Missed it completely thought it would handle by default. On Monday 23 May 2016 02:08 PM, Mikhail Khludnev wrote: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter sort=score asc On Mon, May 23,

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti
Let's add some additional details guys : 1) *Faceting* Currently the facet method used is "enum" and it runs over 20 fields more or less. Mainly using it on low cardinality fields except one which has a cardinality of 1000 terms. I am aware of the famous Jira related faceting regression :

Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread preeti kumari
Hi All, I am using grouping query with solr cloud version 5.2.1 . Parameters added in my query is =SIM*group=true=amid=1=true. But each time I hit the query i get different results i.e top 10 results are different each time. Why is it so ? Please help me with this. Is there any way by which I

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-23 Thread Horváth Péter Gergely
Hi Steve, Thank you very much for your inputs. Yes, I do know the aliasing mechanism offered in Solr. I think the whole question boils down to one thing: how much do you know about the data being stored -- and sometimes you know nothing about that. In some cases, you have to provide a generic

RE: How to use a regex search within a phrase query?

2016-05-23 Thread Erez Michalak
Good points, thanks Erick. As you guessed, the use case is not in the main flow for the general user, but an advanced flow for a technical one. Regarding the performance issue, I thought of a few optimizations for some expected expressions I need to support. For instance, to walk around the

Re: problems with nested queries

2016-05-23 Thread Matteo Grolla
Sure, sorry for the delay 2016-05-16 16:57 GMT+02:00 Yonik Seeley : > Thanks Matteo, looks like you found a bug. > I can reproduce this with simpler queries too: > > _query_:"ABC" name_t:"white cat"~3 > is parsed to > text:abc name_t:"white cat" > > Can you open a JIRA

Re: Sorting on child document field.

2016-05-23 Thread Mikhail Khludnev
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter sort=score asc On Mon, May 23, 2016 at 11:17 AM, Pranaya Behera wrote: > Hi Mikhail, > I saw the blog post tried to do that with parent block

Re: Error opening new searcher

2016-05-23 Thread Victor D'agostino
Hi Erick Thanks for your help, it is alright now. Have a good day Victor Message original *Sujet: *Re: Error opening new searcher *De : *Erick Erickson *Pour : *solr-user *Date : *20/05/2016 17:57 Actually, it almost

Re: Sorting on child document field.

2016-05-23 Thread Pranaya Behera
Hi Mikhail, I saw the blog post tried to do that with parent block query {!parent} as I dont have the reference for the parent in the child to use in the {!join}. This is my result. https://gist.github.com/shadow-fox/b728683b27a2f39d1b5e1aac54b7a8fb . This yields me the

Re: Parallel SQL and function queries?

2016-05-23 Thread Joel Bernstein
Also, I believe this syntax should work as well with SQL we'll need to test it out: _query_:"{!dismax qf=myfield}how now brown cow" Joel Bernstein http://joelsolr.blogspot.com/ On Mon, May 23, 2016 at 2:59 AM, Joel Bernstein wrote: > I opened SOLR-9148 and added a patch to

Re: highlight don't work if df not specified

2016-05-23 Thread michael solomon
Hi, When I'm increase hl.maxAnalyzedChars nothing happened. AND hl.q=blah blah=normal_text,title I get: "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"undefined field text",