Re: Boost(bf) function in solr

2016-05-30 Thread Mugeesh Husain
Thanks Doug, that clear my understanding, when i get free, will study your
book.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-bf-function-in-solr-tp4279792p4279860.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow date filter query

2016-05-30 Thread Jay Potharaju
There are about 30 Million Docs and the index size is 75 GB. Using a full
timestamp value when querying and not using NOW.  The fq queries covers
almost all the docs(20+ million) in the index.
Thanks


On Mon, May 30, 2016 at 8:17 PM, Erick Erickson 
wrote:

> Oops, fat fingers.
>
> see:
> searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> If you're not re-using the _same_ filter query, you'll be better
> off using fq={!cache=false}range_query
>
> Best,
> Erick
>
> On Mon, May 30, 2016 at 8:16 PM, Erick Erickson 
> wrote:
> > That does seem long, but you haven't provided many details
> > about the fields. Are there 100 docs in your index? 100M docs? 500M docs?
> >
> > Are you using NOW in appropriately? See:
> >
> > On Fri, May 27, 2016 at 1:32 PM, Jay Potharaju 
> wrote:
> >> Hi,
> >> I am running filter query(range query) on date fields(high cardinality)
> and
> >> the performance is really bad ...it takes about 2-5 seconds for it to
> come
> >> back with response. I am rebuilding the index to have docvalues & tdates
> >> instead of "date" field. But not sure if that will alleviate the problem
> >> because of high cardinality.
> >>
> >> Can I store the date as MMDD and run range queries on them instead
> of
> >> date fields?
> >> Is that a good option?
> >>
> >> --
> >> Thanks
> >> Jay
>



-- 
Thanks
Jay Potharaju


Re: Clarity on Sharding Concepts.

2016-05-30 Thread Mugeesh Husain
Hi,

To read out this document
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
 
for proper understanding.

FYI, you are using implicit router, a document will be divided randomly
based on hashing technique.

If you indexed 50 documents, it will be divided into 2 parts, 1 goes to
shard1, second one is shard2 and same document will be go their replica
respectively .


Thanks
Mugeesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clarity-on-Sharding-Concepts-tp4279842p4279856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re(2): Can "Using cp replica and modifying core.properties" rather thanADDREPLICA API work?

2016-05-30 Thread scott.chu
Thanks for your advice, Erick. I think you point out what I didn't think of and 
a possible side effect in the future. I'll go back to the "normal" way  next 
time I do the same job.


scott.chu,scott@udngroup.com
2016/5/31 (週二)
- Original Message - 
From: Erick Erickson 
To: solr-user ; scott(自己) 
CC: 
Date: 2016/5/31 (週二) 11:12
Subject: Re: Can "Using cp replica and modifying core.properties" rather 
thanADDREPLICA API work?


Well, that'll work, but you better know _exactly_ what you're doing. 
For instance, you better not be indexing and have committed before 
you start your copy. You better make sure your third node 
is up before you index anything. Etc. Etc. 

Why do you think this "saves time"? Have you measured? Is the 
time savings worth the risk? Because using ADDREPLICA handles all 
the edge cases for you and essentially does what you're talking about 
behind the scenes. And will work even if you're actively indexing. 

Manually copying things seems to be adding additional places for you 
to get it wrong for a pretty rare operation, I'd bet that the time you spend 
manually creating a replica could be better spent letting the ADDREPLICA 
run in the background while you do other more important things. 

Best, 
Erick 

On Thu, May 26, 2016 at 7:33 PM, scott.chu  wrote: 

> 
> On my lab under Windows PC: 
> 
> 2 Solrcloud nodes, 1 collection, named cugna, with numShards=1 and 
> replicationFactor=2, add index up to 90GB 
> 
> After it worked, I migrate them to CentOS (1 node 1 machine) but I want to 
> add 3rd node to 3rd machine. I think there's only 1 shard and 
> replicationFactor is only a "startup" parameter, not a "limitation". So I do 
> these tasks: 
> 
> * Copy node 2's solr to 3rd machine 
> * Go into solr.home 
> * Rename folder 'cugna_shard1_replica2' to 'cugna_shard1_replica3' 
> * Go into 'cugna_shard1_replica3' foler, edit core.properties by 
> change 'name' parameter to 'cugna_shard1_replica3' 
> change 'coreNodeName' parametere to 'core_node3' 
> 
> Then Start 3 nodes, they look ok when I go to admin UI to see the cloud 
> diagram. 
> 
> However, I'm wondering if this gonna be ok or if there's something might 
> cause inconsistency that doesn't show on admin ui? 
> 
> p.s. I did this because I want to save the time to create a new replica. 

> 
> scott.chu,scott@udngroup.com 
> 2016/5/27 (週五) 


- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4591/12331 - 發佈日期: 05/30/16 


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread scott.chu

For those who might have same need to use Solarium, this is the best tutorial I 
can find by googling, it's actually a chapter in the book "Apache Solr PHP 
Integration"

https://www.packtpub.com/packtlib/book/Big-Data-and-Business-Intelligence/9781782164920/1/ch01lvl1sec13/Installing%20Solarium

I follow it and install and use Solarium correctly.

scott.chu,scott@udngroup.com
2016/5/31 (週二)
- Original Message - 
From: Shawn Heisey 
To: solr-user 
CC: 
Date: 2016/5/31 (週二) 02:57
Subject: Re: Recommended api/lib to search Solr using PHP


On 5/30/2016 12:32 PM, GW wrote: 
> I would say look at the urls for searches you build in the query tool 
> 
> In my case 
> 
> http://172.16.0.1:8983/solr/#/products/query 
> 
> When you build queries with the Query tool, for example an edismax query, 
> the URL is there for you to copy. 
> Use the url structure with curl in your programming/scripting. The results 
> come back as REST data. 
> 
> This is what I do with PHP and it's pretty tight. 

Be careful with URLs in the admin UI. 

URLs with "#" in them will *only* work in a browser. They are not the 
REST endpoints. 

When you run a query in the admin UI, it will give you a URL to make the 
same query, but it will NOT be the URL in the address bar of the 
browser. There is a link right above the query results. 

Thanks, 
Shawn 



- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4591/12331 - 發佈日期: 05/30/16


Re: Cloud Solr 5.3.1 + 6.0.1 cannot delete documents

2016-05-30 Thread Erick Erickson
bq: I checked in the Solr Admin and noticed that the same document
resided in both shards on the same node

If this means two _different_ shards (as opposed to two replicas in
the _same_ shard) showed the
document, then that's the proverbial "smoking gun", somehow your setup
isn't what you think
it is, perhaps you are somehow using implicit routing and routing the
doc with the same ID to
two different shards?

try querying each of your replicas with =false to see if the
doc is somehow on two different
shards. If so, I suspect that's the root of your problems and figuring
out _how_ that happened
is the next step I'd recommend.

As to why the raw URL deletes should work and CloudSolrClient doesn't,
CloudSolrClient
tries to send updates only to the shard that they should end up on. So
if your routing is
odd or you somehow have the same doc on two shards, the "wrong" shard wouldn't
see the delete. There's some speculation here BTW, I didn't trace
through the code...

But this functionality is tested in the unit tests
(CloudSolrClientTest.java), so I suspect it's
something odd in your setup

Best,
Erick

On Mon, May 30, 2016 at 12:33 PM, Moritz Becker  wrote:
> Hi,
>
> I have the following issue:
> I initially started with a Solr 5.3.1 + Zookeeper 3.4.6 cloud setup with 2 
> solr nodes and with one collection consisting of 2 shards and 2 replicas.
>
> I am accessing the cluster using the CloudSolrClient. When I tried to delete 
> a document, no error occurred but after deletion and subsequent commit, the 
> document was still available via index queries.
> I checked in the Solr Admin and noticed that the same document resided in 
> both shards on the same node which I thought was odd.
> Also after deleting the collection and recreating it, the issue remained.
>
> Then I tried upgrading to latest Solr 6.0.1 with the same setup. Again, I 
> recreated the collection but I still could not delete the documents. Here is 
> a log snippet of the deletion attempt of a single document:
>
> 
>
> 126023 INFO  (qtp12209492-16) [c:cc5363_dm_documentversion s:shard1 
> r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
> o.a.s.u.p.LogUpdateProcessorFactory 
> [cc5363_dm_documentversion_shard1_replica1]  webapp=/solr path=/update 
> params={update.distrib=FROMLEADER=http://localhost:8983/solr/cc5363_dm_documentversion_shard1_replica2/=javabin=2}{delete=[12535
>  (-1535773473331216384)]} 0 16
> 126024 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
> s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
> o.a.s.u.DirectUpdateHandler2 start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 126036 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
> s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
> o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening: 
> org.apache.solr.search.SolrIndexSearcher
> 126038 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
> s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 126049 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
> o.a.s.u.DirectUpdateHandler2 start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 126050 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 126051 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
> r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
> o.a.s.u.DirectUpdateHandler2 start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 126054 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] o.a.s.c.SolrCore 
> SolrIndexSearcher has not changed - not re-opening: 
> org.apache.solr.search.SolrIndexSearcher
> 126056 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 126055 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
> r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 126057 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
> o.a.s.u.p.LogUpdateProcessorFactory 
> [cc5363_dm_documentversion_shard2_replica1]  webapp=/solr path=/update 
> 

Re: Slow date filter query

2016-05-30 Thread Erick Erickson
Oops, fat fingers.

see:
searchhub.org/2012/02/23/date-math-now-and-filter-queries/

If you're not re-using the _same_ filter query, you'll be better
off using fq={!cache=false}range_query

Best,
Erick

On Mon, May 30, 2016 at 8:16 PM, Erick Erickson  wrote:
> That does seem long, but you haven't provided many details
> about the fields. Are there 100 docs in your index? 100M docs? 500M docs?
>
> Are you using NOW in appropriately? See:
>
> On Fri, May 27, 2016 at 1:32 PM, Jay Potharaju  wrote:
>> Hi,
>> I am running filter query(range query) on date fields(high cardinality) and
>> the performance is really bad ...it takes about 2-5 seconds for it to come
>> back with response. I am rebuilding the index to have docvalues & tdates
>> instead of "date" field. But not sure if that will alleviate the problem
>> because of high cardinality.
>>
>> Can I store the date as MMDD and run range queries on them instead of
>> date fields?
>> Is that a good option?
>>
>> --
>> Thanks
>> Jay


Re: Slow date filter query

2016-05-30 Thread Erick Erickson
That does seem long, but you haven't provided many details
about the fields. Are there 100 docs in your index? 100M docs? 500M docs?

Are you using NOW in appropriately? See:

On Fri, May 27, 2016 at 1:32 PM, Jay Potharaju  wrote:
> Hi,
> I am running filter query(range query) on date fields(high cardinality) and
> the performance is really bad ...it takes about 2-5 seconds for it to come
> back with response. I am rebuilding the index to have docvalues & tdates
> instead of "date" field. But not sure if that will alleviate the problem
> because of high cardinality.
>
> Can I store the date as MMDD and run range queries on them instead of
> date fields?
> Is that a good option?
>
> --
> Thanks
> Jay


Re: Can "Using cp replica and modifying core.properties" rather than ADDREPLICA API work?

2016-05-30 Thread Erick Erickson
Well, that'll work, but you better know _exactly_ what you're doing.
For instance, you better not be indexing and have committed before
you start your copy. You better make sure your third node
is up before you index anything. Etc. Etc.

Why do you think this "saves time"? Have you measured? Is the
time savings worth the risk? Because using ADDREPLICA handles all
the edge cases for you and essentially does what you're talking about
behind the scenes. And will work even if you're actively indexing.

Manually copying things seems to be adding additional places for you
to get it wrong for a pretty rare operation, I'd bet that the time you spend
manually creating a replica could be better spent letting the ADDREPLICA
run in the background while you do other more important things.

Best,
Erick

On Thu, May 26, 2016 at 7:33 PM, scott.chu  wrote:
>
> On my lab under Windows PC:
>
> 2 Solrcloud nodes, 1 collection, named cugna, with numShards=1 and 
> replicationFactor=2, add index up to 90GB
>
> After it worked, I migrate them  to CentOS (1 node 1 machine) but I want to 
> add 3rd node to 3rd machine. I think there's only 1 shard and 
> replicationFactor is only a "startup" parameter, not a "limitation". So I do 
> these tasks:
>
> * Copy node 2's solr to 3rd machine
> * Go into solr.home
> * Rename folder 'cugna_shard1_replica2' to 'cugna_shard1_replica3'
> * Go into 'cugna_shard1_replica3' foler, edit core.properties by
> change 'name' parameter to 'cugna_shard1_replica3'
> change 'coreNodeName' parametere to 'core_node3'
>
> Then Start 3 nodes, they look ok when I go to admin UI to see the cloud 
> diagram.
>
> However, I'm wondering if this gonna be ok or if there's something might 
> cause inconsistency that doesn't show on admin ui?
>
> p.s. I did this because I want to save the time to create a new replica.
>
> scott.chu,scott@udngroup.com
> 2016/5/27 (週五)


Re: SolrCloud and Zookeeper integration issue in .net application

2016-05-30 Thread Erick Erickson
You'd probably get a more knowledgeable response on the SolrNet user's list.

I have no idea the state of that project, the Java client is the one maintained
by the Apache Solr project.

On a quick look at the Apache SolrNet project, I don't see any
activity recently,
but I have no clue what the real status is.

There was quite a bit of work done in the SolrJ client to make it
zookeeper-aware and "do the right thing" when talking to SolrCloud, it's quite
possible that this work hasn't been keeping up to date with Solr.

So, unless there's some kind of constructor in the Solr.net code that takes
a Zookeeper ensamble, you'll have to write your own.

Best,
Erick

On Sun, May 29, 2016 at 10:26 PM, shivendra.tiwari
 wrote:
> Hi Shawn,
>
> Thank you for the reply. What I meant was that have configured Solr5.3 and
> Zookeeper 3.4.8.
> Currently using solrnet for Master and Slave concept but trying to use solr
> in cloud mode as mentioned previous and these clients doesn't have solution
> to talk to multiple solr server, I hope so.
>
> You have mentioned about load balancer please tell me what is load balancer
> (It is Hardware load balancer?) and how to configure on my machine for .net
> client.
>
> Thanks in advance.
>
>
> Warm Regards!
> Shivendra
>
> -Original Message- From: Shawn Heisey
> Sent: Friday, May 27, 2016 9:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Fw: SolrCloud and Zookeeper integration issue in .net
> application
>
> On 5/27/2016 5:57 AM, shivendra.tiwari wrote:
>>
>> Currently I am using Solr lower version it is working fine but now, we are
>> trying to configure SolrCloud for load balance so,
>> I have configured-  2 Solr nodes and 1 ZooKeeper node, created collections
>> and
>> shards also getting data from SQL server on Solr but i need to call
>> SolrCloud in .net application. Please tell me, need to call zookeeper or
>> SolrCloud and How it will configure in my .net application and what i need
>> to configure it please help anyone.
>
>
> Unless you write the code to do it, a .net application will not know how
> to talk to zookeeper.
>
> There are multiple choices for .net clients which know how to talk to
> Solr via HTTP:
>
> https://wiki.apache.org/solr/IntegratingSolr#C.23_.2F_.NET
>
> I do not know if any of these clients know how to talk to multiple Solr
> servers so you don't have a single point of failure.  If they don't, you
> will need a load balancer in front of your cloud.
>
> Solr itself will load balance the requests across the cloud, but if your
> application only talks HTTP to a single server:port, that instance of
> Solr becomes a single point of failure.
>
> For redundancy, you will need three zookeeper nodes:
>
> http://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkMulitServerSetup
>
> Thanks,
> Shawn
>
>


Re: float or string type for a field with whole number and decimal number values?

2016-05-30 Thread Erick Erickson
bq: Should I change the field type to "float" or "string"?

I'd go with float. Let's assume you want to sort by
this field. 10.00 sorts before 9.0 if you
just use Strings. Plus floats are generally much more
compact.

bq: do I need to delete all documents in the index and do a full indexing

That's the way I'd do it. You can always index to a _new_ collection
(assuming SolrCloud) and use collection aliasing to switch your
search all at once

Best,
Erick

On Sun, May 29, 2016 at 12:56 AM, Derek Poh  wrote:
> I am using solr 4.10.4.
>
>
> On 5/29/2016 3:52 PM, Derek Poh wrote:
>>
>> Hi
>>
>> I have a field that is of "int" type currentlyand it's values are whole
>> numbers.
>>
>> > stored="true" multiValued="false"/>
>>
>> Due tochange inbusiness requirement, this field will need to take in
>> decimal numbers as well.
>> This fieldis sorted onand filter by range (field:[ 1 to *]).
>>
>> Should I change the field type to "float" or "string"?
>> For the change to take effect, do I need to delete all documents in the
>> index and do a full indexing? Or I can just do a full indexing without
>> theneed to delete all documents first?
>>
>> Derek
>>
>> --
>> CONFIDENTIALITY NOTICE
>> This e-mail (including any attachments) may contain confidential and/or
>> privileged information. If you are not the intended recipient or have
>> received this e-mail in error, please inform the sender immediately and
>> delete this e-mail (including any attachments) from your computer, and you
>> must not use, disclose to anyone else or copy this e-mail (including any
>> attachments), whether in whole or in part.
>> This e-mail and any reply to it may be monitored for security, legal,
>> regulatory compliance and/or other appropriate reasons.
>
>
>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.


Re(2): Recommended api/lib to search Solr using PHP

2016-05-30 Thread scott.chu
Thanks, guys! My engineers just find another thing called 'SolrPhpClient'. But  
I am trying solarium again. It just looks like a well-structured API. (Note: 
Actually, I've noticed it from very beginning when it's developed but never 
give it a try.) 


scott.chu,scott@udngroup.com
2016/5/31 (週二)
- Original Message - 
From: GW 
To: solr-user ; scott(自己) 
CC: 
Date: 2016/5/31 (週二) 02:32
Subject: Re: Recommended api/lib to search Solr using PHP


I would say look at the urls for searches you build in the query tool 

In my case 

http://172.16.0.1:8983/solr/#/products/query 

When you build queries with the Query tool, for example an edismax query, 
the URL is there for you to copy. 
Use the url structure with curl in your programming/scripting. The results 

come back as REST data. 

This is what I do with PHP and it's pretty tight. 


On 30 May 2016 at 02:29, scott.chu  wrote: 

> 
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3. 

> Our engineers currently just use fopen with url to search Solr but it's 
> kinda unenough when we want to do more advanced, complex queries. We've 
> tried to use something called 'Solarium' but its installtion steps has 
> something to do with symphony, which is kinda complicated. We can't get the 
> installation done ok. I'd like to know if there are some other 
> better-structured PHP libraries or APIs? 
> 
> Note: Solr is 5.4.1. 
> 
> scott.chu,scott@udngroup.com 
> 2016/5/30 (週一) 
> 



- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4591/12331 - 發佈日期: 05/30/16


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread MaryJo Sminkey
It's been awhile since I installed it so I really can't say. I'm more of a
code monkey than a server gal (particularly Linux... I'm amazed I got Solr
installed in the first place, LOL!) So I had asked our network guy to look
it over recently and see if it looked like I did it okay. He said since it
shows up in the list of jars in the Solr admin that it's installed if
that's not necessarily true, I probably need to point him in the right
direction for what else to do since he really doesn't know Solr well
either.

Mary Jo




On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff 
wrote:

> Thanks for the comment Mary Jo...
>
> The error loading the class rings a bell - did you find and follow
> instructions for adding that to the WAR file?  I vaguely remember seeing
> something about that.
>
> I'm going to try my own tests on the auto phrasing one..  If I'm
> successful, I'll post back.
>
> On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey 
> wrote:
>
> > This is a very timely discussion for me as well as we're trying to tackle
> > the multi term synonym issue as well and have not been able to hon-lucene
> > plugin to work, the jar shows up as installed but when we set up the
> sample
> > request handler it throws this error:
> >
> >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Error loading class
> >
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
> >
> > I have tried the auto-phrasing one as well (I did set up a field using
> copy
> > to configure it on) but when testing it didn't seem to return the
> synonyms
> > as expected. So gave up on that one too (am willing to give it another
> try
> > though, that was awhile ago). Would definitely like to hear what other
> > people have found works on the latest versions of Solr 5.x and/or 6. Just
> > sucks that this issue has never been fixed in the core product such that
> > you still need to mess with plugins and patches to get such a basic
> > functionality working properly.
> >
> >
> > *Mary Jo Sminkey*
> > *Senior ColdFusion Developer*
> >
> > *CF Webtools*
> > You Dream It... We Build It. 
> > 11204 Davenport Suite 100
> > Omaha, Nebraska 68154
> > O: 402.408.3733 x128
> > E:  maryjo.smin...@cfwebtools.com
> > Skype: maryjos.cfwebtools
> >
> >
> > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > So I'm looking at the solution mentioned here:
> > >
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > > The thing that's troubling me slightly is that the way it's documented
> it
> > > seems to be missing a small but important link...
> > >
> > > What exactly causes the results listed to be returned?
> > >
> > > Here's my thought process:
> > >
> > > 1. The entry for /autophrase searchHandler does not specify a default
> > > search field.
> > > 2. The field type "text_autophrase" is set up as the one with the
> > > AutoPhrasingFilterFactory as part of it's indexing
> > >
> > > There isn't any mention (perhaps because it's too obvious) of the need
> to
> > > copy or otherwise get data into the "text_autophrase" field at index
> > time.
> > >
> > > There isn't any explicit listing of "text_autophrase" as the default
> > search
> > > field in the /autophrase search handler
> > >
> > > There isn't any explicit statement of "df=text_autophrase" in the query
> > > statment: [/autophrase?q=New+York]
> > >
> > > Therefore it seems to me that if someone tries to implement this,
> they're
> > > going to be disappointed in the results unless they:
> > > a. copy or otherwise get ALL the text they're interested in -- into the
> > > "text_autophrase" field as part of the schema.xml setup (to happen at
> > index
> > > time)
> > > b. somehow explicitly declare "text_autophrase" as the default search
> > field
> > > - either in the searchHandler or wherever else the default field is
> > > configured.
> > >
> > > If anyone out there has done this specific approach - could you
> validate
> > > whether my thought process is correct and / or if I'm missing
> something?
> > > Yes - I get that I can set it all up and try - but it's what I don't
> > know I
> > > don't know that bothers me...
> > >
> > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > > j...@johnbickerstaff.com
> > > > wrote:
> > >
> > > > Thank you Steve -- very helpful.
> > > >
> > > > I can see that whatever implementation I decide to try, some testing
> > will
> > > > be in order.  If anyone is aware of significant gotchas with this
> > synonym
> > > > thing that are not mentioned in the already-listed URLs, please feel
> > free
> > > > to comment.
> > > >
> > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe 
> wrote:
> > > >
> > > >> I’m working on addressing problems using multi-term synonyms at
> query

Clarity on Sharding Concepts.

2016-05-30 Thread Siddhartha Singh Sandhu
Hi Community,

I need some help understanding some concepts.

I have the config on 2 severs:

2 shards each with 1 replica.

Hence, on each server I have
1.  shard1_replica1
2 . shard2_replica1

Suppose I have 50 documents then,
shard1_replica1 + shard2_replica1 = 50 ?

or shard2_replica1 = 50 && shard1_replica1 = 50 ?

Regards,

Sid.


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread John Bickerstaff
Thanks for the comment Mary Jo...

The error loading the class rings a bell - did you find and follow
instructions for adding that to the WAR file?  I vaguely remember seeing
something about that.

I'm going to try my own tests on the auto phrasing one..  If I'm
successful, I'll post back.

On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey  wrote:

> This is a very timely discussion for me as well as we're trying to tackle
> the multi term synonym issue as well and have not been able to hon-lucene
> plugin to work, the jar shows up as installed but when we set up the sample
> request handler it throws this error:
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Error loading class
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>
> I have tried the auto-phrasing one as well (I did set up a field using copy
> to configure it on) but when testing it didn't seem to return the synonyms
> as expected. So gave up on that one too (am willing to give it another try
> though, that was awhile ago). Would definitely like to hear what other
> people have found works on the latest versions of Solr 5.x and/or 6. Just
> sucks that this issue has never been fixed in the core product such that
> you still need to mess with plugins and patches to get such a basic
> functionality working properly.
>
>
> *Mary Jo Sminkey*
> *Senior ColdFusion Developer*
>
> *CF Webtools*
> You Dream It... We Build It. 
> 11204 Davenport Suite 100
> Omaha, Nebraska 68154
> O: 402.408.3733 x128
> E:  maryjo.smin...@cfwebtools.com
> Skype: maryjos.cfwebtools
>
>
> On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > So I'm looking at the solution mentioned here:
> >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > The thing that's troubling me slightly is that the way it's documented it
> > seems to be missing a small but important link...
> >
> > What exactly causes the results listed to be returned?
> >
> > Here's my thought process:
> >
> > 1. The entry for /autophrase searchHandler does not specify a default
> > search field.
> > 2. The field type "text_autophrase" is set up as the one with the
> > AutoPhrasingFilterFactory as part of it's indexing
> >
> > There isn't any mention (perhaps because it's too obvious) of the need to
> > copy or otherwise get data into the "text_autophrase" field at index
> time.
> >
> > There isn't any explicit listing of "text_autophrase" as the default
> search
> > field in the /autophrase search handler
> >
> > There isn't any explicit statement of "df=text_autophrase" in the query
> > statment: [/autophrase?q=New+York]
> >
> > Therefore it seems to me that if someone tries to implement this, they're
> > going to be disappointed in the results unless they:
> > a. copy or otherwise get ALL the text they're interested in -- into the
> > "text_autophrase" field as part of the schema.xml setup (to happen at
> index
> > time)
> > b. somehow explicitly declare "text_autophrase" as the default search
> field
> > - either in the searchHandler or wherever else the default field is
> > configured.
> >
> > If anyone out there has done this specific approach - could you validate
> > whether my thought process is correct and / or if I'm missing something?
> > Yes - I get that I can set it all up and try - but it's what I don't
> know I
> > don't know that bothers me...
> >
> > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > > wrote:
> >
> > > Thank you Steve -- very helpful.
> > >
> > > I can see that whatever implementation I decide to try, some testing
> will
> > > be in order.  If anyone is aware of significant gotchas with this
> synonym
> > > thing that are not mentioned in the already-listed URLs, please feel
> free
> > > to comment.
> > >
> > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe  wrote:
> > >
> > >> I’m working on addressing problems using multi-term synonyms at query
> > >> time in Lucene and Solr.
> > >>
> > >> I recommend these two blogs for understanding the issues (the second
> one
> > >> was mentioned earlier in this thread):
> > >>
> > >> <
> > >>
> >
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> > >> >
> > >> 
> > >>
> > >> In addition to the already-mentioned projects, there is also:
> > >>
> > >> 
> > >>
> > >> All of these projects try in various ways to work around the fact that
> > >> Lucene’s QueryParser splits on whitespace before sending text to
> > analysis,
> > >> one token at a time, so in a synonym filter, multi-word synonyms can
> > never
> > >> match and add alternatives.  See <
> > >> https://issues.apache.org/jira/browse/LUCENE-2605>, 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread MaryJo Sminkey
This is a very timely discussion for me as well as we're trying to tackle
the multi term synonym issue as well and have not been able to hon-lucene
plugin to work, the jar shows up as installed but when we set up the sample
request handler it throws this error:

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Error loading class
'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'

I have tried the auto-phrasing one as well (I did set up a field using copy
to configure it on) but when testing it didn't seem to return the synonyms
as expected. So gave up on that one too (am willing to give it another try
though, that was awhile ago). Would definitely like to hear what other
people have found works on the latest versions of Solr 5.x and/or 6. Just
sucks that this issue has never been fixed in the core product such that
you still need to mess with plugins and patches to get such a basic
functionality working properly.


*Mary Jo Sminkey*
*Senior ColdFusion Developer*

*CF Webtools*
You Dream It... We Build It. 
11204 Davenport Suite 100
Omaha, Nebraska 68154
O: 402.408.3733  x128
E:  maryjo.smin...@cfwebtools.com
Skype: maryjos.cfwebtools


On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff 
wrote:

> So I'm looking at the solution mentioned here:
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> The thing that's troubling me slightly is that the way it's documented it
> seems to be missing a small but important link...
>
> What exactly causes the results listed to be returned?
>
> Here's my thought process:
>
> 1. The entry for /autophrase searchHandler does not specify a default
> search field.
> 2. The field type "text_autophrase" is set up as the one with the
> AutoPhrasingFilterFactory as part of it's indexing
>
> There isn't any mention (perhaps because it's too obvious) of the need to
> copy or otherwise get data into the "text_autophrase" field at index time.
>
> There isn't any explicit listing of "text_autophrase" as the default search
> field in the /autophrase search handler
>
> There isn't any explicit statement of "df=text_autophrase" in the query
> statment: [/autophrase?q=New+York]
>
> Therefore it seems to me that if someone tries to implement this, they're
> going to be disappointed in the results unless they:
> a. copy or otherwise get ALL the text they're interested in -- into the
> "text_autophrase" field as part of the schema.xml setup (to happen at index
> time)
> b. somehow explicitly declare "text_autophrase" as the default search field
> - either in the searchHandler or wherever else the default field is
> configured.
>
> If anyone out there has done this specific approach - could you validate
> whether my thought process is correct and / or if I'm missing something?
> Yes - I get that I can set it all up and try - but it's what I don't know I
> don't know that bothers me...
>
> On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > Thank you Steve -- very helpful.
> >
> > I can see that whatever implementation I decide to try, some testing will
> > be in order.  If anyone is aware of significant gotchas with this synonym
> > thing that are not mentioned in the already-listed URLs, please feel free
> > to comment.
> >
> > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe  wrote:
> >
> >> I’m working on addressing problems using multi-term synonyms at query
> >> time in Lucene and Solr.
> >>
> >> I recommend these two blogs for understanding the issues (the second one
> >> was mentioned earlier in this thread):
> >>
> >> <
> >>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >> >
> >> 
> >>
> >> In addition to the already-mentioned projects, there is also:
> >>
> >> 
> >>
> >> All of these projects try in various ways to work around the fact that
> >> Lucene’s QueryParser splits on whitespace before sending text to
> analysis,
> >> one token at a time, so in a synonym filter, multi-word synonyms can
> never
> >> match and add alternatives.  See <
> >> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
> >> patch to directly address that problem - note that it’s still a work in
> >> progress.
> >>
> >> Once LUCENE-2605 has been fixed, there is still work to do getting
> >> (e)dismax to work with the modified Lucene QueryParser, and addressing
> >> problems with how queries are constructed from Lucene’s “sausagized”
> token
> >> stream.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >> > On May 26, 2016, at 2:21 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> >> wrote:
> >> >
> >> > Thanks Chris --
> >> >
> >> > The two projects I'm aware of are:
> >> >
> >> > 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread John Bickerstaff
So I'm looking at the solution mentioned here:
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

The thing that's troubling me slightly is that the way it's documented it
seems to be missing a small but important link...

What exactly causes the results listed to be returned?

Here's my thought process:

1. The entry for /autophrase searchHandler does not specify a default
search field.
2. The field type "text_autophrase" is set up as the one with the
AutoPhrasingFilterFactory as part of it's indexing

There isn't any mention (perhaps because it's too obvious) of the need to
copy or otherwise get data into the "text_autophrase" field at index time.

There isn't any explicit listing of "text_autophrase" as the default search
field in the /autophrase search handler

There isn't any explicit statement of "df=text_autophrase" in the query
statment: [/autophrase?q=New+York]

Therefore it seems to me that if someone tries to implement this, they're
going to be disappointed in the results unless they:
a. copy or otherwise get ALL the text they're interested in -- into the
"text_autophrase" field as part of the schema.xml setup (to happen at index
time)
b. somehow explicitly declare "text_autophrase" as the default search field
- either in the searchHandler or wherever else the default field is
configured.

If anyone out there has done this specific approach - could you validate
whether my thought process is correct and / or if I'm missing something?
Yes - I get that I can set it all up and try - but it's what I don't know I
don't know that bothers me...

On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff  wrote:

> Thank you Steve -- very helpful.
>
> I can see that whatever implementation I decide to try, some testing will
> be in order.  If anyone is aware of significant gotchas with this synonym
> thing that are not mentioned in the already-listed URLs, please feel free
> to comment.
>
> On Fri, May 27, 2016 at 10:28 AM, Steve Rowe  wrote:
>
>> I’m working on addressing problems using multi-term synonyms at query
>> time in Lucene and Solr.
>>
>> I recommend these two blogs for understanding the issues (the second one
>> was mentioned earlier in this thread):
>>
>> <
>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>> >
>> 
>>
>> In addition to the already-mentioned projects, there is also:
>>
>> 
>>
>> All of these projects try in various ways to work around the fact that
>> Lucene’s QueryParser splits on whitespace before sending text to analysis,
>> one token at a time, so in a synonym filter, multi-word synonyms can never
>> match and add alternatives.  See <
>> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
>> patch to directly address that problem - note that it’s still a work in
>> progress.
>>
>> Once LUCENE-2605 has been fixed, there is still work to do getting
>> (e)dismax to work with the modified Lucene QueryParser, and addressing
>> problems with how queries are constructed from Lucene’s “sausagized” token
>> stream.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On May 26, 2016, at 2:21 PM, John Bickerstaff 
>> wrote:
>> >
>> > Thanks Chris --
>> >
>> > The two projects I'm aware of are:
>> >
>> > https://github.com/healthonnet/hon-lucene-synonyms
>> >
>> > and the one referenced from the Lucidworks page here:
>> >
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> >
>> > ... which is here :
>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>> >
>> > Is there anything else out there that you would recommend I look at?
>> >
>> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley 
>> wrote:
>> >
>> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>> >>
>> >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
>> >> We worked mostly off of Ted Sullivan's work and also off of some
>> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point where
>> we
>> >> have a more sophisticated internal implementation, however, we've found
>> >> that it is very difficult to make it do what you want it to do, and
>> also be
>> >> sufficiently performant.  Watch out for exceptional situations with mm
>> >> (minimum should match).
>> >>
>> >> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have
>> also
>> >> done work in this area.
>> >>
>> >> It should be very possible to get this kind of thing working on
>> >> SolrCloud.  I haven't tried it yet but I think theoretically, it should
>> >> just work.  The synonyms stuff is mostly about doing things at index
>> time
>> >> and query time.  The index time stuff should translate to SolrCloud
>> >> 

Cloud Solr 5.3.1 + 6.0.1 cannot delete documents

2016-05-30 Thread Moritz Becker
Hi,
 
I have the following issue:
I initially started with a Solr 5.3.1 + Zookeeper 3.4.6 cloud setup with 2 solr 
nodes and with one collection consisting of 2 shards and 2 replicas.

I am accessing the cluster using the CloudSolrClient. When I tried to delete a 
document, no error occurred but after deletion and subsequent commit, the 
document was still available via index queries.
I checked in the Solr Admin and noticed that the same document resided in both 
shards on the same node which I thought was odd.
Also after deleting the collection and recreating it, the issue remained.
 
Then I tried upgrading to latest Solr 6.0.1 with the same setup. Again, I 
recreated the collection but I still could not delete the documents. Here is a 
log snippet of the deletion attempt of a single document:
 


126023 INFO  (qtp12209492-16) [c:cc5363_dm_documentversion s:shard1 
r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.p.LogUpdateProcessorFactory [cc5363_dm_documentversion_shard1_replica1] 
 webapp=/solr path=/update 
params={update.distrib=FROMLEADER=http://localhost:8983/solr/cc5363_dm_documentversion_shard1_replica2/=javabin=2}{delete=[12535
 (-1535773473331216384)]} 0 16
126024 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
126036 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
126038 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 end_commit_flush
126049 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
o.a.s.u.DirectUpdateHandler2 start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
126050 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
126051 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
126054 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] o.a.s.c.SolrCore 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
126056 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
o.a.s.u.DirectUpdateHandler2 end_commit_flush
126055 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
126057 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
o.a.s.u.p.LogUpdateProcessorFactory [cc5363_dm_documentversion_shard2_replica1] 
 webapp=/solr path=/update 
params={update.distrib=FROMLEADER=true=true=true=false=http://localhost:8983/solr/cc5363_dm_documentversion_shard2_replica2/_end_point=true=javabin=2=false}{commit=}
 0 10
126059 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] o.a.s.c.SolrCore 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
126063 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 end_commit_flush
126064 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
o.a.s.u.p.LogUpdateProcessorFactory [cc5363_dm_documentversion_shard1_replica1] 
 webapp=/solr path=/update 
params={update.distrib=FROMLEADER=true=true=true=false=http://localhost:8983/solr/cc5363_dm_documentversion_shard2_replica2/_end_point=true=javabin=2=false}{commit=}
 0 13

 
I used the CloudSolrClient.deleteById(collection, id); to delete the document.
 
According to the logs, Solr thinks that nothing has changed and does not 
recreate the searcher so I tried to restart the instances but the document was 
still there.
Finally, I was able to manually delete the document via the following request:
 
POST 

Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread Shawn Heisey
On 5/30/2016 12:32 PM, GW wrote:
> I would say look at the urls for searches you build in the query tool
>
> In my case
>
> http://172.16.0.1:8983/solr/#/products/query
>
> When you build queries with the Query tool, for example an edismax query,
> the URL is there for you to copy.
> Use the url structure with curl in your programming/scripting. The results
> come back as REST data.
>
> This is what I do with PHP and it's pretty tight.

Be careful with URLs in the admin UI.

URLs with "#" in them will *only* work in a browser.  They are not the
REST endpoints. 

When you run a query in the admin UI, it will give you a URL to make the
same query, but it will NOT be the URL in the address bar of the
browser.  There is a link right above the query results.

Thanks,
Shawn



Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread GW
I would say look at the urls for searches you build in the query tool

In my case

http://172.16.0.1:8983/solr/#/products/query

When you build queries with the Query tool, for example an edismax query,
the URL is there for you to copy.
Use the url structure with curl in your programming/scripting. The results
come back as REST data.

This is what I do with PHP and it's pretty tight.


On 30 May 2016 at 02:29, scott.chu  wrote:

>
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> Our engineers currently just use fopen with url to search Solr but it's
> kinda unenough when we want to do more advanced, complex queries. We've
> tried to use something called 'Solarium' but its installtion steps has
> something to do with symphony, which is kinda complicated. We can't get the
> installation done ok. I'd like to know if there are some other
> better-structured PHP libraries or APIs?
>
> Note: Solr is 5.4.1.
>
> scott.chu,scott@udngroup.com
> 2016/5/30 (週一)
>


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread John Blythe
we also use Solarium. the documentation is pretty spotty in some cases (tho
they've recently updated it, or at least the formatting, which seems to be
a move in the right direction), but overall pretty simple to use. some good
plugins at hand to help extend the base power, too. i'd say give it a whirl

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, May 30, 2016 at 4:27 AM, Georg Sorst  wrote:

> We've had good experiences with Solarium, so it's probably worth spending
> some time in getting it to run.
>
> scott.chu  schrieb am Mo., 30. Mai 2016 um
> 09:30 Uhr:
>
> >
> > We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> > Our engineers currently just use fopen with url to search Solr but it's
> > kinda unenough when we want to do more advanced, complex queries. We've
> > tried to use something called 'Solarium' but its installtion steps has
> > something to do with symphony, which is kinda complicated. We can't get
> the
> > installation done ok. I'd like to know if there are some other
> > better-structured PHP libraries or APIs?
> >
> > Note: Solr is 5.4.1.
> >
> > scott.chu,scott@udngroup.com
> > 2016/5/30 (週一)
> >
>


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread Shawn Heisey
On 5/30/2016 1:29 AM, scott.chu wrote:
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3. Our 
> engineers currently just use fopen with url to search Solr but it's kinda 
> unenough when we want to do more advanced, complex queries. We've tried to 
> use something called 'Solarium' but its installtion steps has something to do 
> with symphony, which is kinda complicated. We can't get the installation done 
> ok. I'd like to know if there are some other better-structured PHP libraries 
> or APIs? 

There are a *lot* of PHP clients out there.  Note that none of them can
be supported here, because they are all third-party software.

https://wiki.apache.org/solr/IntegratingSolr#PHP

Thanks,
Shawn



Re: Boost(bf) function in solr

2016-05-30 Thread Doug Turnbull
Let's say you're building search for your blog. If popularity is say number
of page views, than a handful might have a million (they made it to hacker
news and slashdot). A few dozen may have hundreds of thousand (they only
made it to slashdot). The vast majority might have less than 100 page
views.

Taking the log ensures you're boosting less aggressively by popularity.
You're letting the unpopular articles still compete.

I might also recommend a multiplicative boost in this case. log(popularity)
* text relevance

Shameless plug, but this is discussed quite a bit in chapter 7 of my book,
Relevant Search. Email me directly if you'd like a discount code

http://manning.com/turnbull

Best
Doug
On Mon, May 30, 2016 at 9:03 AM Mugeesh Husain  wrote:

> Hi,
> could any one explain me why people use log function for boosting like
> below
> product(log(sum(popularity,wiegh),100)^20
>
> what is log function please elaborate it ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Boost-bf-function-in-solr-tp4279792.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SQL Interface vs geofilter (radius search)

2016-05-30 Thread Vachon , Jean-Sébastien
Hi All,

Does the SQL interface allow searching around a specific lat/long coordinates 
for all documents within a radius of 50 kilometers?
If so, what is the syntax to perform such a query?

Thanks




CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay Street 
Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.




Boost(bf) function in solr

2016-05-30 Thread Mugeesh Husain
Hi,
could any one explain me why people use log function for boosting like below
product(log(sum(popularity,wiegh),100)^20

what is log function please elaborate it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-bf-function-in-solr-tp4279792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: searching in two indices

2016-05-30 Thread Bernd Fehling
Thanks for sharing your solution and experience.

I'm just thinking about to load all article data (100 mio.)
and all personal data (4 mio.) into one core with a selector
field "db" containing either "article" or "pdata".
But still not really satisfied with this solution.

Anyway, MySQL is a good hint.

Regards,
Bernd

Am 30.05.2016 um 13:36 schrieb John Blythe:
> We had previously done something of the sort. With some sources of truth type 
> of cores we would do initial searches on customer transaction data before 
> fetching the related information from those "truth" tables. We would use the 
> various pertinent fields from results #1 to find related data in core #2.
> 
> We moved just last week to making this match during the initial processing 
> stage of core #1. Instead of processing our data during a large xml import we 
> moved towards the processing saving this information in a new table (in 
> MySQL) and then have Solr do a quick read directly from that source. It's 
> given us more flexibility and a ton more speed/efficiency in terms of giving 
> our users that second tier data right out the gate w their first result set.
> 
> Worth noting: our hand was a bit forced as some search results would need to 
> be in the thousands and as such the secondary lookup would be incredibly slow 
> and painful, so YMMV
> 
> 
> On May 30, 2016, 6:21 AM -0400, Bernd 
> Fehling, wrote:
>> Has anyone experiences with searching in two indices?
>>
>> E.g. having one index with nearly static data (like personal data)
>> and a second index with articles which changes pretty much.
>>
>> A search would then start for articles and from the list of results
>> (e.g. first page, 10 articles) start a sub search in the second
>> index for personal data to display the results side by side.
>>
>> Has anyone managed this and how?
>>
>> If not, how would you try to solve this?
>>
>>
>> Regards,
>> Bernd
> 


Re: searching in two indices

2016-05-30 Thread John Blythe
We had previously done something of the sort. With some sources of truth type 
of cores we would do initial searches on customer transaction data before 
fetching the related information from those "truth" tables. We would use the 
various pertinent fields from results #1 to find related data in core #2.

We moved just last week to making this match during the initial processing 
stage of core #1. Instead of processing our data during a large xml import we 
moved towards the processing saving this information in a new table (in MySQL) 
and then have Solr do a quick read directly from that source. It's given us 
more flexibility and a ton more speed/efficiency in terms of giving our users 
that second tier data right out the gate w their first result set.

Worth noting: our hand was a bit forced as some search results would need to be 
in the thousands and as such the secondary lookup would be incredibly slow and 
painful, so YMMV


On May 30, 2016, 6:21 AM -0400, Bernd Fehling, 
wrote:
> Has anyone experiences with searching in two indices?
> 
> E.g. having one index with nearly static data (like personal data)
> and a second index with articles which changes pretty much.
> 
> A search would then start for articles and from the list of results
> (e.g. first page, 10 articles) start a sub search in the second
> index for personal data to display the results side by side.
> 
> Has anyone managed this and how?
> 
> If not, how would you try to solve this?
> 
> 
> Regards,
> Bernd


Re: After Solr 5.5, mm parameter doesn't work properly

2016-05-30 Thread Jan Høydahl
Hi,

This may be related to SOLR-8812, but still different. Please file a JIRA issue 
for this.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 29. mai 2016 kl. 18.20 skrev Issei Nishigata :
> 
> Hi,
> 
> “mm" parameter does not work properly, when I set "q.op=AND” after Solr 5.5.
> In Solr 5.4, mm parameter works expectedly with the following setting.
> 
> ---
> [schema]
> 
>   
>  maxGramSize="2"/>
>   
> 
> 
> 
> [request]
> http://localhost:8983/solr/collection1/select?defType=edismax=AND=2=solar
> —
> 
> After Solr 5.5, the result will not be the same as Solr 5.4.
> Has the setting of mm parameter specs, or description of file setting changed?
> 
> 
> [Solr 5.4]
> 
> ...
>   
> 2
> solar
> edismax
> AND
>   
> ...
> 
>   
> 0
> 
>   solr
> 
>   
> 
> 
>   solar
>   solar
>   
>   (+DisjunctionMaxQuerytext:so text:ol text:la text:ar)~2/no_coord
>   
>   +(((text:so text:ol text:la 
> text:ar)~2))
>   ...
> 
> 
> 
> 
> [Solr 6.0.1]
> 
> 
> ...
>   
> 2
> solar
> edismax
> AND
>   
> ...
> 
>   
> solar
> solar
> 
> (+DisjunctionMaxQuery(((+text:so +text:ol +text:la +text:ar/no_coord
> 
> +((+text:so +text:ol +text:la 
> +text:ar))
> ...
> 
> 
> As shown above, parsedquery also differs from Solr 5.4 and Solr 6.0.1(after 
> Solr 5.5).
> 
> 
> —
> Thanks 
> Issei Nishigata



searching in two indices

2016-05-30 Thread Bernd Fehling
Has anyone experiences with searching in two indices?

E.g. having one index with nearly static data (like personal data)
and a second index with articles which changes pretty much.

A search would then start for articles and from the list of results
(e.g. first page, 10 articles) start a sub search in the second
index for personal data to display the results side by side.

Has anyone managed this and how?

If not, how would you try to solve this?


Regards,
Bernd


Re: Activate Fuzzy Queries for each term by default

2016-05-30 Thread Sebastian Landwehr
The „Did you mean“ thing is more the spell checker, which I already included. 
Fuzzy Queries are for terms where multiple spellings in fact exist in the 
index. At least that’s what I’m aiming at.

> Am 30.05.2016 um 10:33 schrieb Georg Sorst :
> 
> AFAIK this is not possible, but it probably doesn't make so much sense
> either. In my experience fuzzy search should be explicit to the user
> (Google does a pretty good job at this, eg. "Did you mean" etc.).
> 
> What are you trying to achieve and what results do you want to return?
> 
> Sebastian Landwehr  schrieb am Mo., 30. Mai 2016 um
> 09:41 Uhr:
> 
>> Hi there,
>> 
>> I got a question regarding fuzzy queries:
>> 
>> I know that I can create a fuzzy query by appending a „~" with the maximal
>> edit distance to a word. Is it also possible to automatically create a
>> fuzzy query for each search term? I know that I could theoretically append
>> the „~" programmatically, but it seems to be hard to handle all features if
>> the query syntax.
>> 
>> Thanks and best wishes,
>> Sebastian



Re: Solr 6 CDCR does not work

2016-05-30 Thread Renaud Delbru

Hi Adam,

could you check the response of the monitoring commands [1], QUEUES, 
ERRORS, OPS. This might help undeerstanding if documents are flowing or 
if there are issues.


Also, do you have an autocommit configured on the target ? CDCR does not 
replicate commit, and therefore you have to send a commit command on the 
target to ensure that the latest replicated documents are visible.


[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462#CrossDataCenterReplication%28CDCR%29-Monitoringcommands


--
Renaud Delbru

On 29/05/16 12:10, Adam Majid Sanjaya wrote:

I’m testing Solr 6 CDCR, but it’s seems not working.

Source configuration:

   
 targetzkip:2181
 corehol
 corehol
   

   
 1
 1000
 128
   

   
 5000
   



   
 ${solr.ulog.dir:}
   


Target(s) configuration:

   
 disabled
   



   
   



   
 cdcr-proccessor-chain
   



   
 ${solr.ulog.dir:}
   


Source Log: no cdcr
Target Log: no cdcr

Create a core (solrconfig.xml modification directly from the folder
data_driven_schema_configs):
#bin/solr create -c corehol -p 8983

Start cross-data center replication by running the START command on the
source data center
http://sourceip::8983/solr/corehol/cdcr?action=START

Disable buffer by running the DISABLEBUFFER command on the target data
center
http://targetip::8983/solr/corehol/cdcr?action=DISABLEBUFFER

The documents are not replicated to the target zone.

What should I examine?





Re: Activate Fuzzy Queries for each term by default

2016-05-30 Thread Georg Sorst
AFAIK this is not possible, but it probably doesn't make so much sense
either. In my experience fuzzy search should be explicit to the user
(Google does a pretty good job at this, eg. "Did you mean" etc.).

What are you trying to achieve and what results do you want to return?

Sebastian Landwehr  schrieb am Mo., 30. Mai 2016 um
09:41 Uhr:

> Hi there,
>
> I got a question regarding fuzzy queries:
>
> I know that I can create a fuzzy query by appending a „~" with the maximal
> edit distance to a word. Is it also possible to automatically create a
> fuzzy query for each search term? I know that I could theoretically append
> the „~" programmatically, but it seems to be hard to handle all features if
> the query syntax.
>
> Thanks and best wishes,
> Sebastian


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread Georg Sorst
We've had good experiences with Solarium, so it's probably worth spending
some time in getting it to run.

scott.chu  schrieb am Mo., 30. Mai 2016 um
09:30 Uhr:

>
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> Our engineers currently just use fopen with url to search Solr but it's
> kinda unenough when we want to do more advanced, complex queries. We've
> tried to use something called 'Solarium' but its installtion steps has
> something to do with symphony, which is kinda complicated. We can't get the
> installation done ok. I'd like to know if there are some other
> better-structured PHP libraries or APIs?
>
> Note: Solr is 5.4.1.
>
> scott.chu,scott@udngroup.com
> 2016/5/30 (週一)
>


Activate Fuzzy Queries for each term by default

2016-05-30 Thread Sebastian Landwehr
Hi there,

I got a question regarding fuzzy queries:

I know that I can create a fuzzy query by appending a „~" with the maximal edit 
distance to a word. Is it also possible to automatically create a fuzzy query 
for each search term? I know that I could theoretically append the „~" 
programmatically, but it seems to be hard to handle all features if the query 
syntax.

Thanks and best wishes,
Sebastian

Recommended api/lib to search Solr using PHP

2016-05-30 Thread scott.chu

We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3. Our 
engineers currently just use fopen with url to search Solr but it's kinda 
unenough when we want to do more advanced, complex queries. We've tried to use 
something called 'Solarium' but its installtion steps has something to do with 
symphony, which is kinda complicated. We can't get the installation done ok. 
I'd like to know if there are some other better-structured PHP libraries or 
APIs? 

Note: Solr is 5.4.1.

scott.chu,scott@udngroup.com
2016/5/30 (週一)