ttl on merge-time possible somehow ?

2016-12-15 Thread Dorian Hoxha
Hello searchers, I did some search for TTL on solr, and found only a way to do it with a delete-query. But that ~sucks, because you have to do a lot of inserts (and queries). The other(kinda better) way to do it, is to set a collection-level ttl, and when indexes are merged, they will drop the

Re: Solr has a CPU% spike when indexing a batch of data

2016-12-15 Thread forest_soup
Thanks a lot, Shawn. We'll consider your suggestion to tune our solr servers. Will let you know the result. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-has-a-CPU-spike-when-indexing-a-batch-of-data-tp4309529p4310002.html Sent from the Solr - User mailing

Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Reth RM
The primary difference has been solr to solr-cloud in later version, starting from solr4.0 And what happens if you try starting solr in stand alone mode, solr cloud does not consider 'core' anymore, it considers 'collection' as param. On Thu, Dec 15, 2016 at 11:05 PM, Manan Sheth

Re: Solr - Amazon like search

2016-12-15 Thread Reth RM
There's a ecommerce features checklist with what solr can do listed here https://lucidworks.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/ That should be good start and then there are some more other references links listed below, I would try all of

Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Manan Sheth
Thanks Reth. As noted this is the same map reduce based indexer tool that comes shipped with the solr distribution by default. It only take the zk_host details and extracts all required information from there only. It does not have core specific configurations. The same tool released with solr

Re: (Newbie Help!) Seeking guidance in regards to Solr's suggestor and others

2016-12-15 Thread Reth RM
This issue is on solarium-client php code, which is likely not traversing further to pick results from collation tag of solr response. at line 190 https://github.com/solariumphp/solarium/blob/master/library/Solarium/QueryType/Suggester/Result/Result.php#L190 verify if this is issue and do pull

Re: error diagnosis help.

2016-12-15 Thread Reth RM
Are you indexing xml files through nutch? This exception purely looks like processing of in-correct format xml file. On Mon, Dec 12, 2016 at 11:53 AM, KRIS MUSSHORN wrote: > ive scoured my nutch and solr config files and I cant find any cause. > suggestions? > Monday,

Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Reth RM
It looks like command line tool that you are using to initiate index process, is expecting some name to solr-core with respective command line param. use -help on the command line tool that you are using and check the solr-core-name parameter key, pass that also with some value. On Tue, Dec 13,

Re: Solr on HDFS: increase in query time with increase in data

2016-12-15 Thread Reth RM
I think the shard index size is huge and should be split. On Wed, Dec 14, 2016 at 10:58 AM, Chetas Joshi wrote: > Hi everyone, > > I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have > the following config. > maxShardsperNode: 1 > replicationFactor:

Re: CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Xie, Sean
Thanks for pointing out the java.lang.Character. I did find the existence of org.apache.lucene.analysis.CharacterUtils, but I was not able to find the needed methods in it. Sean On 12/15/16, 8:58 PM, "Shawn Heisey" wrote: On 12/15/2016 6:20 PM, Xie, Sean wrote:

Re: Stemming with SOLR

2016-12-15 Thread Alexandre Rafalovitch
If you need the full fidelity solution taking care of multiple edge-cases, it could be worth looking at commercial solutions. http://www.basistech.com/ has one, including a free-level SAAS plan. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced

Re: Stemming with SOLR

2016-12-15 Thread Lasitha Wattaladeniya
Hi all, Thanks for the replies, @eric, ahmet : since those stemmers are logical stemmers it won't work on words such as caught, ran and so on. So in our case it won't work @susheel : Yes I thought about it but problems we have is, the documents we index are some what large text, so copy

Re: CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Shawn Heisey
On 12/15/2016 6:20 PM, Xie, Sean wrote: > We have implemented some customized filter/tokenizer, that is using > org.apache.lucene.analysis.util.CharacterUtils. After upgrading to > Solr 6.3, the class is no longer available. Is there any reason the > utility class is removed? This is not really

CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Xie, Sean
Dear user group, We have implemented some customized filter/tokenizer, that is using org.apache.lucene.analysis.util.CharacterUtils. After upgrading to Solr 6.3, the class is no longer available. Is there any reason the utility class is removed? What I had to do is copy the class

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Erick Erickson
bq: shouldn't the two replicas have the same number of deletions Not necessarily. We're back to the fact that commits on the replicas in a single shard fire at different wall clock times. Plus, when segments are merged, the deleted docs are purged. So it's quite common that two replicas in the

Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Erick Erickson
Right, so if I'm doing the math right you have 2,400 replicas per JVM? I'm not clear whether each node has a single JVM or not. Anyway. 2048 is indeed much too high. If nothing else, dropping it to, say, 64 would show whether this was the real root of your problem or not. Even if it slowed

Re: Checking Optimal Values for BM25

2016-12-15 Thread Sascha Szott
Hi Furkan, in order to change the BM25 parameter values k1 and b, the following XML snippet needs to be added in your schema.xml configuration file: 1.3 0.7 It is even possible to specify the SimilarityFactory on individual index fields. See [1] for more details. Best Sascha [1]

Re: field length within BM25 score calculation in Solr 6.3

2016-12-15 Thread Sascha Szott
Hi, bumping my question after 10 days. Any clarification is appreciated. Best Sascha Hi folks, my Solr index consists of one document with a single valued field "title" of type "text_general". The title field was index with the content: 1 2 3 4 5 6 7 8 9. The field type text_general uses

Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Yago Riveiro
Yes, I changed the value of coreLoadThreads. With the default value a node takes like 40 minutes to be available with all replicas up. Right now I have ~1.2K collections with 12 shards each, 2 replicas spread in 12 nodes. Indeed the value I configured maybe is too much (2048) but I can start

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
Something I hadn't know until now. The source cdcr collection has 2 shards with 1 replica, our target cloud has 2 shards with 2 replicas Both Source and Target have indexes that are not current Also we have set all of our collections to ignore external commits On Thu, Dec 15, 2016 at 1:31 PM,

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
Looking through our replicas I noticed that in one of our shards (each shard has 2 replicas) 1 replica shows: "replicas": [ { "name": "core_node1", "core": "sial-catalog-material_shard2_replica2", "baseUrl": "http://ae1b-ecom-msc04:8983/solr;, "nodeName": "ae1b-ecom-msc04:8983_solr", "state":

RE: DocTransformer not always working

2016-12-15 Thread Chris Hostetter
: Well, i can work with this really fine knowing this, but does it make : sense? I did assume (or be wrong in doing so) that fl=minhash:[binstr] : should mean get that field and pass it through the transformer. At least : i just now fell for it, maybe other shouldn't :) that's what it *can*

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
I am trying to find the reported inconsistencies now. The timestamp I have was created by our ETL process, which may not be in exactly the same order as the indexing occurred When I tried to sort the results by _docid_ desc, solr through a 500 error: { "responseHeader":{ "zkConnected":true,

Re: Exception while creating a HttpSolrClinet

2016-12-15 Thread Shawn Heisey
On 12/15/2016 10:32 AM, tesm...@gmail.com wrote: > I am getting the following exception while creating a Solr client. Any help > is appreciated > > =This is code snipper to create a SolrClient=== > > public void populate (String args) throws IOException, SolrServerException > { >

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread Shawn Heisey
On 12/14/2016 7:36 AM, GW wrote: > I understand accessing solr directly. I'm doing REST calls to a single > machine. > > If I have a cluster of five servers and say three Apache servers, I can > round robin the REST calls to all five in the cluster? > > I guess I'm going to find out. :-) If so I

Exception while creating a HttpSolrClinet

2016-12-15 Thread tesm...@gmail.com
Hi, I am getting the following exception while creating a Solr client. Any help is appreciated =This is code snipper to create a SolrClient=== public void populate (String args) throws IOException, SolrServerException { String urlString = "http://localhost:8983/solr;;

Re: Stemming with SOLR

2016-12-15 Thread Susheel Kumar
We did extensive comparison in the past for Snowball, KStem and Hunspell and there are cases where one of them works better but not other or vice-versa. You may utilise all three of them by having 3 different fields (fieldTypes) and during query, search in all of them. For some of the cases where

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
> > Interesting I don't recall a bug like that being fixed. > Anyway, glad it works for you now! > -Yonik Then it’s probably because it’s Christmas time! :-)

Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Erick Erickson
Hmmm, have you changed coreLoadThreads? We had a problem with this a while back with loading lots and lots of cores, see: https://issues.apache.org/jira/browse/SOLR-7280 But that was fixed in 6.2, so unless you changed the number of threads used to load cores it shouldn't be a problem on 6.3...

Re: Search only for single value of Solr multivalue field

2016-12-15 Thread Erick Erickson
Phrase queries and slop and positionIncrementGap ;) The fieldType has a positionIncrementGap. This is the token delta between the end token of one entry and the beginning of the next. so the first entry: IFREMER, Ctr Brest, DRO Geosci Marines, F-29280 Plouzane, France IFREMER would have a

Re: Stemming with SOLR

2016-12-15 Thread Ahmet Arslan
Hi, KStemFilter returns legitimate English words, please use it. Ahmet On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya wrote: Hello devs, I'm trying to develop this indexing and querying flow where it converts the words to its original form (lemmatization).

Re: Stemming with SOLR

2016-12-15 Thread Erick Erickson
What about things like PorterStemFilterFactory, EnglishMinimalStemFilterFactory and the like? Best, Erick On Thu, Dec 15, 2016 at 7:16 AM, Lasitha Wattaladeniya wrote: > Hello devs, > > I'm trying to develop this indexing and querying flow where it converts the > words to its

Re: File system choices?

2016-12-15 Thread Walter Underwood
About ten years ago, I accidentally put indexes on an NFS volume. Solr ran about 100X slower, so I haven’t tried it since. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 15, 2016, at 8:17 AM, Michael Kuhlmann wrote: > > Yes,

Re: File system choices?

2016-12-15 Thread Erick Erickson
NFS isn't the first choice. That said, numbers of organizations _doou have to manally remove _ use NFS for their Lucene indexes. See the recommendations here: https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/store/NativeFSLockFactory.html What it really amounts to is that you may find

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread GW
Thanks Tom, It looks like there is an PHP extension on Git. seems like a phpized C lib to create a Zend module to work with ZK. No mention of solr but I'm guessing I can poll the ensemble for pretty much anything ZK. Thanks for the direction! A ZK aware app is the way I need to go. I'll give it

Re: File system choices?

2016-12-15 Thread Michael Kuhlmann
Yes, and we're doing such things at my company. However we most often do things you shouldn't do; this is one of these. Solr needs to load data quite fast, otherwise you'll be having a performance killer. It's often recommended to use an SSD instead of a normal hard disk; a network share would be

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Yonik Seeley
Interesting I don't recall a bug like that being fixed. Anyway, glad it works for you now! -Yonik On Thu, Dec 15, 2016 at 11:01 AM, Chantal Ackermann wrote: > Hi Yonik, > > after upgrading to Solr 6.3.0, the nested function works as expected! (Both > with and

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
Hi Yonik, after upgrading to Solr 6.3.0, the nested function works as expected! (Both with and without docValues.) "facets":{ "count":3179500, "all_pop":1.5901646171168616E8, "shop_cat":{ "buckets":[{ "val":"Kontaktlinsen > Torische Linsen", "count":75168,

File system choices?

2016-12-15 Thread Michael Joyner (NewsRx)
Hello all, Can the Solr indexes be safely stored and used via mounted NFS shares? -Mike

Stemming with SOLR

2016-12-15 Thread Lasitha Wattaladeniya
Hello devs, I'm trying to develop this indexing and querying flow where it converts the words to its original form (lemmatization). I was doing bit of research lately but the information on the internet is very limited. I tried using hunspellfactory but it doesn't convert the word to it's

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
Hi Yonik, are you certain that nesting a function works as documented on http://yonik.com/solr-subfacets/? top_authors:{ type: terms, field: author, limit: 7, sort: "revenue desc", facet:{ revenue: "sum(sales)" } } I’m getting

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread Tom Evans
On Thu, Dec 15, 2016 at 12:37 PM, GW wrote: > While my client is all PHP it does not use a solr client. I wanted to stay > with he latest Solt Cloud and the PHP clients all seemed to have some kind > of issue being unaware of newer Solr Cloud versions. The client makes pure

Re: Searching for a term which isn't a part of an expression

2016-12-15 Thread Dean Gurvitz
I think queries would usually not contain more than one phrase per query, but there isn't a fixed list. Anyways, your solution is very very good for us. We could write a QueryParser or a SearchComponent that edits the Lucene Query object in the ResponseBuilder to include the relevant

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
Hi Yonik, here is an update on what I’ve tried so far, unfortunately without any more luck. The field directive is (should have included this when asking the question): /query? json.facet={ num_pop:{query: "popularity[* TO *]“}, all_pop: "sum(popularity)“, shop_cat: {type:terms,

Re: Searching for a term which isn't a part of an expression

2016-12-15 Thread Ahmet Arslan
Hi, Span query family would be a pure query-time solution, SpanNotQuery in particular. SpanNearQuery include = new SpanTermQuery(new Term(FIELD, "world"); SpanNearQuery exclude = new SpanNearQuery(new SpanQuery[] { new SpanTermQuery(new Term(FIELD, "hello")), new SpanTermQuery(new

Checking Optimal Values for BM25

2016-12-15 Thread Furkan KAMACI
Hi, Sole's default similarity is BM25 anymore. Its parameters are defined as k1=1.2, b=0.75 as default. However is there any way that to check the effect of using different coefficients to calculate BM25 to find the optimal values? Kind Regards, Furkan KAMACI

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread GW
While my client is all PHP it does not use a solr client. I wanted to stay with he latest Solt Cloud and the PHP clients all seemed to have some kind of issue being unaware of newer Solr Cloud versions. The client makes pure REST calls with Curl. It is stateful through local storage. There is no

Re: Search only for single value of Solr multivalue field

2016-12-15 Thread Dorian Hoxha
You should be able to filter "(word1 in field OR word2 in field) AND NOT(word1 in field AND word2 in field)". Translate that into the right syntax. I don't know if lucene is smart enough to execute the filter only once (it should be i guess). Makes sense ? On Thu, Dec 15, 2016 at 12:12 PM, Leo

Search only for single value of Solr multivalue field

2016-12-15 Thread Leo BRUVRY-LAGADEC
Hi, I have a multivalued field in my schema called "idx_affilliation". IFREMER, Ctr Brest, DRO Geosci Marines, F-29280 Plouzane, France. Univ Lisbon, Ctr Geofis, P-1269102 Lisbon, Portugal. Univ Bretagne Occidentale, Inst Univ Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France. Total

Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Yago Riveiro
Hi, I'm getting this error in my log 12/15/2016, 9:28:18 AM ERROR true ExecutorUtilUncaught exception java.lang.StackOverflowError thrown by thread: coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr x:collection1_shard3_replica2 s:shard3 c:collection1-visitors

Re: Getting Error - Session expired for /collections/sprod/state.json

2016-12-15 Thread Piyush Kunal
This is happening when heavy indexing like 100/second is going on. On Thu, Dec 15, 2016 at 4:33 PM, Piyush Kunal wrote: > - We have solr6.1.0 cluster running on production with 1 shard and 5 > replicas. > - Zookeeper quorum on 3 nodes. > - Using a chroot in zookeeper to

Getting Error - Session expired for /collections/sprod/state.json

2016-12-15 Thread Piyush Kunal
- We have solr6.1.0 cluster running on production with 1 shard and 5 replicas. - Zookeeper quorum on 3 nodes. - Using a chroot in zookeeper to segregate the configs from other collections. - Using solrj5.1.0 as our client to query solr. Usually things work fine but on and off we witness this

HBase table indexing in Solr using morphline conf

2016-12-15 Thread Gurdeep Singh
Hi All, I am trying to index a HBase table into Solr using HBase indexer and morphline conf. file. The issue I'm facing is that, one of the column in HBase table is a count field (with values as integer) and except this column all other string type HBase columns are getting indexed in Solr as

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread Dorian Hoxha
See replies inline: On Wed, Dec 14, 2016 at 3:36 PM, GW wrote: > Thanks, > > I understand accessing solr directly. I'm doing REST calls to a single > machine. > > If I have a cluster of five servers and say three Apache servers, I can > round robin the REST calls to all

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread CA
Hi Yonik, thank you for your quick reply. (((I just send my original e-mail a second time (I did not confirm the subscription so I thought it might not have been send the first time, I’m sorry. We are using SOLR 6.1.0. Sorry, I should have mentioned. The low number is because of the test

Nested JSON Facets (Subfacets)

2016-12-15 Thread CA
Hi all, this is about using a function in nested facets, specifically the „sum()“ function inside a „terms“ facet using the json.facet api. My json.facet parameter looks like this: json.facet={shop_cat: {type:terms, field:shop_cat, facet: {cat_pop:"sum(popularity)"}}} A snippet of the