Re: Allow Join over two sharded collection

2017-06-29 Thread Damien Kamerman
Joins will work with shards as long as the docs you're joining from/to are in the shard. Why not go compositeId routing (either ID=uniqueKey!docId or router.field)? Is there no 'uniqueKey' which will distribute randomly? You may need to put the same ACL docs in all shards depending on your use

Re: Live update the zookeeper for SOLR

2017-06-29 Thread Xie, Sean
Believe I found the issue, it was caused a script when starting the zk instance, it also cleared all the solr configuration data from zk, causing the solr to stop working. However, a new issue is coming: When using static IPs for zookeeper ensemble, it works perfectly and SOLR can reconnect

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Rick Leir
Walter, Erick, David Thanks for the info. Maybe the default for stopwords should be disabled? Cheers -- Rick On June 29, 2017 5:14:16 PM EDT, Walter Underwood wrote: >My blog post has a list of movie titles. I forgot to list the TV series >“Once and Again”. > >Some bands

Re: Allow Join over two sharded collection

2017-06-29 Thread mganeshs
Hi Erick, Initially I also thought of using Streaming for Joins. But looks like Joins with Streaming is not for heavy QPS sort of queries and that's my use case. Currently things are working fine with normal join for us as we have only one shard. But in coming days number of documents to be

Re: Is there any particular reason why ExternalFileField is read from data directory

2017-06-29 Thread Koji Sekiguchi
Hi, ExternalFileField was introduced via SOLR-351. https://issues.apache.org/jira/browse/SOLR-351 The author thought values could optionally be updated often... I think it describes why it is read from not config, but datadir. Koji On 2017/06/29 17:17, apoorvqwerty wrote: Hi, As per the

Include JSON facet inside Solr Streaming

2017-06-29 Thread Zheng Lin Edwin Yeo
Hi, Is it currently possible to include JSON facet inside Solr Streaming? I am trying out with the following query, which combines JSON facet together with the hashJoin from Streaming, but we get the error saying that is not a proper expression clause. If it is possible, what should be the

Re: Issue with SynonymGraphFilterFactory

2017-06-29 Thread Diogo Edelmuth
Opened: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-10980 Let me know I marked something inappropriately. Diogo > Em 29 de jun de 2017, às 18:54, Steve Rowe escreveu: > > Hi Diogo, > > That sounds like a bug to me. Would you mind filing a JIRA? > > --

SOLR 4.10 Data import error

2017-06-29 Thread Ghorpade, Parinita
Hi, I am getting following error , when I index data using Dataimporter. I am using File Data source in the data config file here is the config file

Re: Issue with SynonymGraphFilterFactory

2017-06-29 Thread Steve Rowe
Hi Diogo, That sounds like a bug to me. Would you mind filing a JIRA? -- Steve www.lucidworks.com > On Jun 29, 2017, at 4:46 PM, diogo wrote: > > I just checked debug=query > > Seems like spanNearQuery function is getting the slope parameter as 0, no > matter what comes

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
My blog post has a list of movie titles. I forgot to list the TV series “Once and Again”. Some bands that are not searchable with stopwords: * The Who * Was (not Was) * The The wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 2:09

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Erick Erickson
bq: Mostly, stopwords were a performance hack back when people ran search engines on 16-bit machines Ah, _those_ were the days when programmers were _real_ programmers. Actually I'm glad they're gone but that's another story. "to be or not to be". Can't search that if you enable stopwords.

Re: Not highlighting "and" and "or"?

2017-06-29 Thread David Hastings
Agreed. Stop words from the moment I started using them caused complaints and problems right off the bat. They may have been implemented less than a week before needing a re-index to fix all the problems they caused. On Thu, Jun 29, 2017 at 4:55 PM, Walter Underwood

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
Ultraseek (and Infoseek) never used stopwords. They cause odd failures, like not being able to search for “Vitamin A”. Stopwords are an on/off approach to term frequency. idf is a proportional approach. Once you have idf, you don’t need stopwords. When I was bringing up Solr for Netflix, I

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Rick Leir
Walter Sorry for the tangent, but the stopwords feature sounds useful. You say you do not use this? Did Ultraseek not do it either? Thanks Rick On June 29, 2017 10:53:42 AM EDT, Walter Underwood wrote: >Nope. Haven’t used stopwords for the last 20 years. > >I wonder if

Re: Issue with SynonymGraphFilterFactory

2017-06-29 Thread diogo
I just checked debug=query Seems like spanNearQuery function is getting the slope parameter as 0, no matter what comes after the tilde: "parsedquery":"SpanNearQuery(spanNear([laudo:mother, spanOr([laudo:hipoatenuaca, laudo:hipodens])],* 0*, true))" For searching: "mother grandmother"~8 or

Re: Trouble connecting to IRC

2017-06-29 Thread Aravind D
Hi Anshum, I'm not having any issue connecting. -Aravind -- View this message in context: http://lucene.472066.n3.nabble.com/Trouble-connecting-to-IRC-tp4343512p4343515.html Sent from the Solr - User mailing list archive at Nabble.com.

Trouble connecting to IRC

2017-06-29 Thread Anshum Gupta
Hi, I’ve been having issues connecting to the freenode IRC server for about 45 min now. Any one else seeing something similar ? -Anshum

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
Setting lowercaseOperators=false for the request handler defaults fixes this. Probably also fixes some relevance anomalies. Thanks! wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 6:38 AM, Shawn Heisey wrote:

Re: master slave replication taking time

2017-06-29 Thread Erick Erickson
So you say that master/slave replication takes up to 20 seconds but it takes 8-10 minutes just to copy the entire index? Solr is _already_ speeding up your copy enormously by just copying changed segments. There's nothing magic you can do to make it faster. And if your master happens to merge

Re: Is there any particular reason why ExternalFileField is read from data directory

2017-06-29 Thread Erick Erickson
Should be OK if you put it in your conf directory of your configset. Do note that each and every time you start Solr, each and every replica will download it. So if it's a large file you'll have problems. The default limit for jute.maxbuffer is 1M so you'd also have to bump that up. Frankly,

Re: Solrcloud updating issue.

2017-06-29 Thread Erick Erickson
bq: we have also 5 zookeeper instances running on each node If that's not a typo, it's bad practice. Do you mean "5 Solr instances"? You should need no more than 3 ZK instances in this case. My guess is that you're seeing timeouts but that the indexing is going on in the background. Are you

Re: Allow Join over two sharded collection

2017-06-29 Thread Erick Erickson
Probably won't be in 7.0. In fact it appears to have lost momentum so I don't know if it'll ever be committed. Don't know that it _won't_, but there's no way to say. There's been a lot of work in the Solr Streaming world to do joins and it's quite possible that that'll do what you need. Best,

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-06-29 Thread Erick Erickson
Can you work up a patch if it's a priority for you? Best, Erick On Thu, Jun 29, 2017 at 8:51 AM, Zheng Lin Edwin Yeo wrote: > Hi Joel, > > Thanks for your reply. > > Hopefully we can see it in the new version soon, as it will be helpful for > the project which we are

Re: Number of occurrences in Solr Documents

2017-06-29 Thread David Hastings
I am using 5.2 and this works: select?q=*%3A*=csv=true=totaltermfreq(text%2WORDIWANTTOFIND)=1 On Thu, Jun 29, 2017 at 11:52 AM, Kaushik wrote: > Thanks to Susheel and Shawn. Unfortunately the Solr version we have is Solr > 5.3 and it does not include the

Re: Number of occurrences in Solr Documents

2017-06-29 Thread Kaushik
Thanks to Susheel and Shawn. Unfortunately the Solr version we have is Solr 5.3 and it does not include the totaltermfrequency feature. Is there any downside of using TermVectorFrequency ; like peformance issues? On Thu, Jun 29, 2017 at 11:49 AM, Susheel Kumar wrote: >

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-06-29 Thread Zheng Lin Edwin Yeo
Hi Joel, Thanks for your reply. Hopefully we can see it in the new version soon, as it will be helpful for the project which we are working on. Regards, Edwin On 29 June 2017 at 21:32, Joel Bernstein wrote: > This is mainly due to focus on other things. It would great to

Re: Number of occurrences in Solr Documents

2017-06-29 Thread Susheel Kumar
That's even better. Thanks, Shawn. On Thu, Jun 29, 2017 at 11:45 AM, Shawn Heisey wrote: > On 6/29/2017 8:40 AM, Kaushik wrote: > > We are trying to get the most frequently used words in a collection. > > My understanding is that using facet.field=content_txt. An e.g. of >

Re: Number of occurrences in Solr Documents

2017-06-29 Thread Shawn Heisey
On 6/29/2017 8:40 AM, Kaushik wrote: > We are trying to get the most frequently used words in a collection. > My understanding is that using facet.field=content_txt. An e.g. of > content_txt value is "The fox jumped over another fox". In such a > scenario, I am expecting the facet to return with

Re: Number of occurrences in Solr Documents

2017-06-29 Thread Susheel Kumar
Checkout Term Vector component https://wiki.apache.org/solr/TermVectorComponent On Thu, Jun 29, 2017 at 10:40 AM, Kaushik wrote: > Hello, > > We are trying to get the most frequently used words in a collection. My > understanding is that using facet.field=content_txt. An

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
Nope. Haven’t used stopwords for the last 20 years. I wonder if lowercaseOperators is true. The docs don’t give the default value for that in edismax. https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html wunder Walter Underwood wun...@wunderwood.org

Allow Join over two sharded collection

2017-06-29 Thread mganeshs
All, Any idea when this ticket will be addressed. https://issues.apache.org/jira/browse/SOLR-8297 One of the comments says by SOLR 7.0. Can we expect that by 7.0 ? Regards, -- View this message in context:

Number of occurrences in Solr Documents

2017-06-29 Thread Kaushik
Hello, We are trying to get the most frequently used words in a collection. My understanding is that using facet.field=content_txt. An e.g. of content_txt value is "The fox jumped over another fox". In such a scenario, I am expecting the facet to return with "fox" and with a count value of 2.

Re: Tlogs not being deleted/truncated

2017-06-29 Thread Webster Homer
I don't think that this is part of the problem, but it's bad practice. Looking at the source code for TransactionLog I noticed this, it's very bad form: if (deleteOnClose) { try { Files.deleteIfExists(tlogFile.toPath()); } catch (IOException e) { // TODO:

RE: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Markus Jelsma
Thanks. I probably should have mentioned there is no firewall limiting connections between those hosts. Actually, the processes run on the same hosts as the Solr cluster is running on. Thanks, Markus -Original message- > From:Alexandre Rafalovitch > Sent:

Re: Limit for facet function of Streaming Expressions in solr cloud

2017-06-29 Thread Pratik Patel
Thanks Joel. For my use case I can switch to rollup for now which can work with "/export" query type. On Thu, Jun 29, 2017 at 10:11 AM, Joel Bernstein wrote: > Yes, I see this is hardcoded into the parameter checks. We can create a > ticket to allow unlimited. > > Joel

Re: Limit for facet function of Streaming Expressions in solr cloud

2017-06-29 Thread Joel Bernstein
Yes, I see this is hardcoded into the parameter checks. We can create a ticket to allow unlimited. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 29, 2017 at 10:06 AM, Pratik Patel wrote: > Hey Everyone, > > This is about the facet function of Streaming

Limit for facet function of Streaming Expressions in solr cloud

2017-06-29 Thread Pratik Patel
Hey Everyone, This is about the facet function of Streaming Expression. Is there any way to set limit for number of facets to infinite? The *bucketSizeLimit parameter *seems to accept only those numbers which are greater than 0. Thanks, Pratik

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Shawn Heisey
On 6/28/2017 10:46 PM, Walter Underwood wrote: > Hmm, “and” is missing from the individual terms but present in the phrase. > > "rawquerystring":"once and again", > "querystring":"once and again", > > "parsedquery":"(+(+DisjunctionMaxQuery(((concept_ai_concepts_names_default:once)^2.0

Re: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Alexandre Rafalovitch
One thing to check is whether there is a firewall between the client and the server. They - sometimes - cut the silent connections in the _middle_ (at the firewall). The usual solution is keepAlive request of some kind or not using the connection pool. One way to check is with network tracer like

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-06-29 Thread Joel Bernstein
This is mainly due to focus on other things. It would great to support all the aggregate functions in facet, rollup and timeseries expressions. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 29, 2017 at 8:23 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > We are

Re:Opposite termfrequency / Solr LTR

2017-06-29 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Stefan, Thanks for the question. The existing FieldValueFeature class uses the field value and you would instead like to map the value to a number that will vary from query to query. External feature information (efi) can help with the query-to-query variation but for the mapping we don't

Unique() metrics not supported in Solr Streaming facet stream source

2017-06-29 Thread Zheng Lin Edwin Yeo
Hi, We are working on the Solr Streaming expression, using the facet stream source. As the underlying structure is using JSON Facet, would like to find out why the unique() metrics is not supported? Currently, it only supports sum(col) , avg(col), min(col), max(col), count(*) I'm using Solr

RE: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Markus Jelsma
Hi, Everything is 6.6.0. I could include a stack trace (i don't print them in my program), but that would only be the the trace from getById() to CloudSolrClient.requestWithRetryOnStaleState() and little deeper, that what you're looking for? We haven't called close() in that particular part

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Rick Leir
Stopwords? On June 28, 2017 5:13:43 PM EDT, Walter Underwood wrote: >Is there some special casing in the highlighter to skip query syntax >words? The words “and” and “or” don’t get highlighted. > >This is in 6.5.0. > > question > html > 440 >

Issue with SynonymGraphFilterFactory

2017-06-29 Thread Diogo Leão
Hello, I have recently started using query-time synonym searching, and hava configured a field with the SynonymGraphFilterFactory to handle multi-term synonyms. All has worked well for simple (non quoted) queries, as well as for exact sequences (quoted) queries. However, I do not seem to get

Opposite termfrequency / Solr LTR

2017-06-29 Thread steveWunderbar
Hello everybody, I'm using Solr LTR and i want to calculte a Feature value using the following way: I have a String with all Categories that are in a users search-history: e.g. "shoes,shoes,socks,shoes" Now I'd like to count the occurances of the value of the category field in the String (for

Solrcloud updating issue.

2017-06-29 Thread Wudong Liu
Hi All: We are trying to index a large number of documents in solrcloud and keep seeing the following error: org.apache.solr.common.SolrException: Service Unavailable, or org.apache.solr.common.SolrException: Service Unavailable but with a similar stack: request:

Re: Suggester and fuzzy/infix suggestions

2017-06-29 Thread alessandro.benedetti
Another path to follow could be to design a specific collection(index) for the auto-suggestion. In there you can define the analysis chain as you like ( for example using edge-ngram filtering on top of tokenisation) to provide infix autocompletion. Then you can play with your queries as you like

How to index binary files from ftp Servers using Solr DIH?

2017-06-29 Thread Alejandro Rivas Martinez
I need a way to index binary files from ftp servers, using UrlDataSource. I’m doing this locally but I need to do the same from remote sources (Ftp servers). I read a lot and I can’t find any example of indexing binary files from ftps. Is it possible to achieve that? How can I use Data Import

RE: Using asterik(*) with unicode characters.

2017-06-29 Thread Preeti Bhat
Thanks Erick, its working now as expected. Thanks and Regards, Preeti Bhat -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, June 28, 2017 9:20 PM To: solr-user Subject: Re: Using asterik(*) with unicode characters. There's a long blog on

Is there any particular reason why ExternalFileField is read from data directory

2017-06-29 Thread apoorvqwerty
Hi, As per the documentation for ExternalFileField we need to put external_field with the map in parallel with the data directory on all the shards. Is it possible to read the file from a central location or zookeeper? -- View this message in context:

Re: master slave replication taking time

2017-06-29 Thread Midas A
Erick, when we copy entire index it takes 8- 10 mins . On Wed, Jun 28, 2017 at 9:22 PM, Erick Erickson wrote: > How long it takes to copy the entire index from one machine to another > over your network. Solr can't go any faster than your network can > support.