Not (!) operator

2016-05-26 Thread Anil
HI,

We have status text field in our solr document and it is optional.

search query status: !Closed returning documents with no status as well.

how to get only documents having status and it is !Closed ?

one way is status:* AND status:!Closed . any other way ? Thanks


Regards,
Anil


Re: How to save index data to other place? [scottchu]

2016-05-26 Thread scott.chu

I want to migrate my Solrcloud from Windows to CentOS. Because I am new to 
CentOS, not familiar with how to install Solr on it and I did a lot of config 
in my Solrcloud on Windows, so I use ftp to upload solr-5.4.1 and 
zookeeper-3.4.6 folders to 3 different servers running CentOS. (They are all 
under /local). Then I tweak something in 3rd machine (See my other post titled 
"Can "Using cp replica and modifying core.properties" rather than ADDREPLICA 
API work?") and make my Solrcloud run with 3 replicas ok.

I do wish to follow the default folder/file convention of solr. So can you show 
me (or hint me) how to:

* Install solr and zookeeper with install shell script under CentOS 6.4?
* How to auto start solr and zookeeper under CentOS 6.4?

Thanks in advance!

scott.chu,scott@udngroup.com
2016/5/27 (週五)
- Original Message - 
From: Shawn Heisey 
To: solr-user 
CC: 
Date: 2016/5/27 (週五) 02:34
Subject: Re: How to save index data to other place? [scottchu]


On 5/25/2016 10:16 PM, scott.chu wrote: 
> Thanks! I thought I have to tune solrconfig.xml. 
> 
> scott.chu,scott@udngroup.com 
> 2016/5/26 (週四) 
> - Original Message - 
> From: Jay Potharaju 
> To: solr-user ; scott(自己) 
> CC: 
> Date: 2016/5/26 (週四) 11:31 
> Subject: Re: How to save index data to other place? [scottchu] 
> 
> 
> use property.*dataDir*=*value* 
> https://cwiki.apache.org/confluence/display/solr/Defining+core.properties 

In general, I only place *relative* paths in dataDir ... but for most 
people, I would actually recommend not setting dataDir at all, and 
letting it default to "data". 

Rather, what I would do is set the solr home to another location, so 
*all* cores live there by default, and let Solr create its default 
directory structure in that location. 

You haven't indicated in this thread whether you're running on a UNIX or 
UNIX-like OS such as Linux, or whether you're running on Windows. You 
used both slashes and backslashes when you described your actual and 
desired index paths. 

For most operating systems other than Windows, I strongly recommend 
using the solr installer shell script. This script has a concept of a 
"var dir", and the solr home is a directory inside that. 

If you are using the installer shell script to install Solr and do not 
provide any options, then Solr itself will install to /opt/solr-X.Y.Z, a 
symlink at /opt/solr will be created that points to the install 
directory, and Solr will set its var dir to /var/solr. A configuration 
script will be created at /etc/default/solr.in.sh. In the configuration 
script, the solr home gets set to /var/solr/data. Each core created 
would create a directory under /var/solr/data with the core's name, and 
inside that directory would be a data directory, containing the index 
directory and the tlog directory. These paths assume you've got 5.5 or 
later. 

On a dev server, I used the installer script with a service name (-s 
option) of solr5. With this option, most of the paths get changed from 
the default. The symlink is /opt/solr5 pointing to /opt/solr-5.5.1, the 
var dir is /var/solr5, the configuration script is 
/etc/default/solr5.in.sh, and the solr home is /var/solr5/data. 

Thanks, 
Shawn 



- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4568/12302 - 發佈日期: 05/26/16


Can "Using cp replica and modifying core.properties" rather than ADDREPLICA API work?

2016-05-26 Thread scott.chu

On my lab under Windows PC:

2 Solrcloud nodes, 1 collection, named cugna, with numShards=1 and 
replicationFactor=2, add index up to 90GB

After it worked, I migrate them  to CentOS (1 node 1 machine) but I want to add 
3rd node to 3rd machine. I think there's only 1 shard and replicationFactor is 
only a "startup" parameter, not a "limitation". So I do these tasks:

* Copy node 2's solr to 3rd machine
* Go into solr.home
* Rename folder 'cugna_shard1_replica2' to 'cugna_shard1_replica3'
* Go into 'cugna_shard1_replica3' foler, edit core.properties by
change 'name' parameter to 'cugna_shard1_replica3'
change 'coreNodeName' parametere to 'core_node3'

Then Start 3 nodes, they look ok when I go to admin UI to see the cloud 
diagram. 

However, I'm wondering if this gonna be ok or if there's something might cause 
inconsistency that doesn't show on admin ui?

p.s. I did this because I want to save the time to create a new replica.

scott.chu,scott@udngroup.com
2016/5/27 (週五)


Re: Facet data type

2016-05-26 Thread Nick D
Although you did mention that you wont need to sort and you are using
mutlivalued=true. On the off chance you do change something like
multivalued=false docValues=false then this will come in to play:

https://issues.apache.org/jira/browse/SOLR-7495

This has been a rather large pain to deal with in terms of faceting. (the
Lucene change that caused a number of Issues is also referenced in this
Jira).

Nick


On Thu, May 26, 2016 at 11:45 AM, Erick Erickson 
wrote:

> I always prefer ints to strings, they can't help but take
> up less memory, comparing two ints is much faster than
> two strings etc. Although Lucene can play some tricks
> to make that less noticeable.
>
> Although if these are just a few values, it'll be hard to
> actually measure the perf difference.
>
> And if it's a _lot_ of unique values, you have other problems
> than the int/string distinction. Faceting on very high
> cardinality fields is something that can have performance
> implications.
>
> But I'd certainly add docValues="true" to the definition no matter
> which you decide on.
>
> Best,
> Erick
>
> On Wed, May 25, 2016 at 9:29 AM, Steven White 
> wrote:
> > Hi everyone,
> >
> > I will be faceting on data of type integers and I'm wonder if there is
> any
> > difference on how I design my schema.  I have no need to sort or use
> range
> > facet, given this, in terms of Lucene performance and index size, does it
> > make any difference if I use:
> >
> > #1:  indexed="true"
> > required="true" stored="false"/>
> >
> > Or
> >
> > #2:  > required="true" stored="false"/>
> >
> > (notice how I changed the "type" from "string" to "int" in #2)
> >
> > Thanks in advanced.
> >
> > Steve
>


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread Erick Erickson
There's nothing saying you have
to highlight fields you search on. So you
can specify hl.fl to be the "normal" (perhaps
stored-only) fields and still search on the
uber-field.

Best,
Erick

On Thu, May 26, 2016 at 2:08 PM, kostali hassan
 wrote:
> I did it , I copied all my dynamic field into text field and it work great.
> just one question even if I copied text into content and the inverse for
> get highliting , thats not work ,they are another way to get highliting?
> thank you eric
>
> 2016-05-26 18:28 GMT+01:00 Erick Erickson :
>
>> And, you can copy all of the fields into an "uber field" using the
>> copyField directive and just search the "uber field".
>>
>> Best,
>> Erick
>>
>> On Thu, May 26, 2016 at 7:35 AM, kostali hassan
>>  wrote:
>> > thank you it make sence .
>> > have a good day
>> >
>> > 2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu > >:
>> >
>> >> The schema.xml/managed_schema defines the default search field as
>> `text`.
>> >>
>> >> You can make all fields that you want searchable type `text`.
>> >>
>> >> On Thu, May 26, 2016 at 10:23 AM, kostali hassan <
>> >> med.has.kost...@gmail.com>
>> >> wrote:
>> >>
>> >> > I import data from sql databases with DIH . I am looking for serch
>> term
>> >> in
>> >> > all fields not by field.
>> >> >
>> >>
>>


Re: Issues with coordinates in Solr during updating of fields

2016-05-26 Thread Erick Erickson
Should be fine. When the location field is
re-indexed (as it is with Atomic Updates)
the two fields will be filled back in.

Best,
Erick

On Thu, May 26, 2016 at 4:45 PM, Zheng Lin Edwin Yeo
 wrote:
> Thanks Erick for your reply.
>
> It works when I remove the 'stored="true" ' from the gps_0_coordinate and
> gps_1_coordinate.
>
> But will this affect the search functions of the gps coordinates in the
> future?
>
> Yes, I am referring to Atomic Updates.
>
> Regards,
> Edwin
>
>
> On 27 May 2016 at 02:02, Erick Erickson  wrote:
>
>> Try removing the 'stored="true" ' from the gps_0_coordinate and
>> gps_1_coordinate.
>>
>> When you say "...tried to do an update on any other fileds" I'm assuming
>> you're
>> talking about Atomic Updates, which require that the destinations of
>> copyFields are single valued. Under the covers the location type is
>> split and copied to the other two fields so I suspect that's what's going
>> on.
>>
>> And you could also try one of the other types, see:
>> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>>
>> Best,
>> Erick
>>
>> On Thu, May 26, 2016 at 1:46 AM, Zheng Lin Edwin Yeo
>>  wrote:
>> > Anyone has any solutions to this problem?
>> >
>> > I tried to remove the gps_0_coordinate and gps_1_coordinate, but I will
>> get
>> > the following error during indexing.
>> > ERROR: [doc=id1] unknown field 'gps_0_coordinate'
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 25 May 2016 at 11:37, Zheng Lin Edwin Yeo 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I have an implementation of storing the coordinates in Solr during
>> >> indexing.
>> >> During indexing, I will only store the value in the field name ="gps".
>> For
>> >> the field name = "gps_0_coordinate" and "gps_1_coordinate", the value
>> will
>> >> be auto filled and indexed from the "gps" field.
>> >>
>> >>> required="false"/>
>> >>> stored="true" required="false"/>
>> >>> stored="true" required="false"/>
>> >>
>> >> But when I tried to do an update on any other fields in the index, Solr
>> >> will try to add another value in the "gps_0_coordinate" and
>> >> "gps_1_coordinate". However, as these 2 fields are not multi-Valued, it
>> >> will lead to an error:
>> >> multiple values encountered for non multiValued field gps_0_coordinate:
>> >> [1.0,1.0]
>> >>
>> >> Does anyone knows how we can solve this issue?
>> >>
>> >> I am using Solr 5.4.0
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>>


Re: Issues with coordinates in Solr during updating of fields

2016-05-26 Thread Zheng Lin Edwin Yeo
Thanks Erick for your reply.

It works when I remove the 'stored="true" ' from the gps_0_coordinate and
gps_1_coordinate.

But will this affect the search functions of the gps coordinates in the
future?

Yes, I am referring to Atomic Updates.

Regards,
Edwin


On 27 May 2016 at 02:02, Erick Erickson  wrote:

> Try removing the 'stored="true" ' from the gps_0_coordinate and
> gps_1_coordinate.
>
> When you say "...tried to do an update on any other fileds" I'm assuming
> you're
> talking about Atomic Updates, which require that the destinations of
> copyFields are single valued. Under the covers the location type is
> split and copied to the other two fields so I suspect that's what's going
> on.
>
> And you could also try one of the other types, see:
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>
> Best,
> Erick
>
> On Thu, May 26, 2016 at 1:46 AM, Zheng Lin Edwin Yeo
>  wrote:
> > Anyone has any solutions to this problem?
> >
> > I tried to remove the gps_0_coordinate and gps_1_coordinate, but I will
> get
> > the following error during indexing.
> > ERROR: [doc=id1] unknown field 'gps_0_coordinate'
> >
> > Regards,
> > Edwin
> >
> >
> > On 25 May 2016 at 11:37, Zheng Lin Edwin Yeo 
> wrote:
> >
> >> Hi,
> >>
> >> I have an implementation of storing the coordinates in Solr during
> >> indexing.
> >> During indexing, I will only store the value in the field name ="gps".
> For
> >> the field name = "gps_0_coordinate" and "gps_1_coordinate", the value
> will
> >> be auto filled and indexed from the "gps" field.
> >>
> >> required="false"/>
> >> stored="true" required="false"/>
> >> stored="true" required="false"/>
> >>
> >> But when I tried to do an update on any other fields in the index, Solr
> >> will try to add another value in the "gps_0_coordinate" and
> >> "gps_1_coordinate". However, as these 2 fields are not multi-Valued, it
> >> will lead to an error:
> >> multiple values encountered for non multiValued field gps_0_coordinate:
> >> [1.0,1.0]
> >>
> >> Does anyone knows how we can solve this issue?
> >>
> >> I am using Solr 5.4.0
> >>
> >> Regards,
> >> Edwin
> >>
>


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
I did it , I copied all my dynamic field into text field and it work great.
just one question even if I copied text into content and the inverse for
get highliting , thats not work ,they are another way to get highliting?
thank you eric

2016-05-26 18:28 GMT+01:00 Erick Erickson :

> And, you can copy all of the fields into an "uber field" using the
> copyField directive and just search the "uber field".
>
> Best,
> Erick
>
> On Thu, May 26, 2016 at 7:35 AM, kostali hassan
>  wrote:
> > thank you it make sence .
> > have a good day
> >
> > 2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu  >:
> >
> >> The schema.xml/managed_schema defines the default search field as
> `text`.
> >>
> >> You can make all fields that you want searchable type `text`.
> >>
> >> On Thu, May 26, 2016 at 10:23 AM, kostali hassan <
> >> med.has.kost...@gmail.com>
> >> wrote:
> >>
> >> > I import data from sql databases with DIH . I am looking for serch
> term
> >> in
> >> > all fields not by field.
> >> >
> >>
>


Re: Can a DocTransformer access the whole results tree?

2016-05-26 Thread Mikhail Khludnev
public abstract class ResultContext {

 /// here are all results
  public abstract DocList getDocList();

  public abstract ReturnFields getReturnFields();

  public abstract SolrIndexSearcher getSearcher();

  public abstract Query getQuery();

  public abstract SolrQueryRequest getRequest();

On Thu, May 26, 2016 at 11:25 PM, Upayavira  wrote:

> Hi Mikhail,
>
> Is there really? If I look at ResultContext, I see it is an abstract
> class, completed by BasicResultContext. I don't see any context method
> there. I can see a getContext() on SolrQueryRequest which just returns a
> hashmap. Will I find the response in there? Is that what you are
> suggesting?
>
> Upayavira
>
> On Thu, 26 May 2016, at 06:28 PM, Mikhail Khludnev wrote:
> > Hello,
> >
> > There is a protected ResultContext field named context.
> >
> > On Thu, May 26, 2016 at 5:31 PM, Upayavira  wrote:
> >
> > > Looking at the code for a sample DocTransformer, it seems that a
> > > DocTransformer only has access to the document itself, not to the whole
> > > results. Because of this, it isn't possible to use a DocTransformer to
> > > merge, for example, the highlighting results into the main document.
> > >
> > > Am I missing something?
> > >
> > > Upayavira
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Can a DocTransformer access the whole results tree?

2016-05-26 Thread Upayavira
Hi Mikhail,

Is there really? If I look at ResultContext, I see it is an abstract
class, completed by BasicResultContext. I don't see any context method
there. I can see a getContext() on SolrQueryRequest which just returns a
hashmap. Will I find the response in there? Is that what you are
suggesting?

Upayavira

On Thu, 26 May 2016, at 06:28 PM, Mikhail Khludnev wrote:
> Hello,
> 
> There is a protected ResultContext field named context.
> 
> On Thu, May 26, 2016 at 5:31 PM, Upayavira  wrote:
> 
> > Looking at the code for a sample DocTransformer, it seems that a
> > DocTransformer only has access to the document itself, not to the whole
> > results. Because of this, it isn't possible to use a DocTransformer to
> > merge, for example, the highlighting results into the main document.
> >
> > Am I missing something?
> >
> > Upayavira
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 


Re: SolrCloud Shard console shows roughly same number of documents?

2016-05-26 Thread Siddhartha Singh Sandhu
Hi Erick,

Thank you for the reply. What I meant was suppose I have the config:

2 shards each with 1 replica.

Hence, on both servers I have
1.  shard1_replica1
2 . shard2_replica1

Suppose I have 50 documents then,
shard1_replica1 + shard2_replica1 = 50 ?

or shard2_replica1 = 50 && shard1_replica1 = 50 ?

Regards,

Sid.

On Thu, May 26, 2016 at 2:30 PM, Erick Erickson 
wrote:

> Q1: Not quite sure what you mean. Let's say I have 2 shards, 3
> replicas each 16 docs on each.I _think_ you're
> talking about the "core selector", which shows the docs on that
> particular core, 16 in our case not 48.
>
> Q2: Yes, that's how SolrCloud is designed. It has to be for HA/DR.
> Every replica in a shard has all the docs, 16 as above. Otherwise if
> one of your machines went down there could be no guarantee even
> attempted about there not being data loss.
>
> Q3: Yes, indexing will be slower when there is more than one replica
> per shard since the raw document is forwarded from the leader to all
> followers before acking back. In distributed situations, you will have
> a bunch (potentially) more machines doing indexing so total throughput
> can be faster.
>
> Why do you care? Is there a problem or is this just general background
> info? There are a number of techniques for speeding up indexing, the
> first is to use SolrJ and CloudSolrClient and send batches of docs at
> once rather than one-at-a-time.
>
> Best,
> Erick
>
> On Wed, May 25, 2016 at 1:54 PM, Siddhartha Singh Sandhu
>  wrote:
> > Hi,
> >
> > I recently moved to a SolrCloud config. I had a few questions:
> >
> > Q1. Does a shard show cumulative number of documents or documents present
> > in that particular shard on the admin console of respective shard?
> >
> > Q2. If 1's answer is non-cumulative then my shards(on different servers)
> > are indexing all the documents on each instance of shard. Is this
> natural?
> > I created the shards with compositeId.
> >
> > Q3. If the answer to 1 is cumulative then my indexing was slower then a
> > single core instance which was on the same machine of which I have 2
> >  now(my shards). What could I be missing while configuring Solr?
> >
> >
> > I am using Solr 6.0.0 on Ubuntu 14.04 with external zookeeper.
> >
> > Regards,
> >
> > Sid.
>


Re: import efficiencies

2016-05-26 Thread John Blythe
all good ideas and recs, guys. erick, i'd thought of much the same after
reading through the SolrJ post and beginning to get a bit anxious at the
idea of implementation (not a java dev here lol). we're already doing some
processing before the import, taking a few million records, rolling them up
/ flattening them down into single versions of representative data, and
then running some processes on them to get even more insight out of them. i
think the lowest hanging fruit at this point will be to simply include some
extra processing at this stage, further upstream, and grab up the related
data that the DIH is currently straining under due to the plethora of open
connections.

thanks for all the thoughts and sparks flying around on this one, guys!

best,


-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, May 26, 2016 at 2:42 PM, John Bickerstaff 
wrote:

> Having more carefully read Erick's post - I see that is essentially what he
> said in a much more straightforward way.
>
> I will also second Erick's suggestion of hammering on the SQL.  We found
> that fruitful many times at the same gig.  I develop and am not a SQL
> master.  In a similar situation I'll usually seek out a specialist to help
> me make sure the query isn't wasteful.  It frequently was and I learned a
> lot.
>
> On Thu, May 26, 2016 at 12:31 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > It may or may not be helpful, but there's a similar class of problem that
> > is frequently solved either by stored procedures or by running the query
> on
> > a time-frame and storing the results...  Doesn't matter if the end-point
> > for the data is Solr or somewhere else.
> >
> > The problem is long running queries that are extremely complex and stress
> > the database performance too heavily.
> >
> > The solution is to de-normalize the data you need... store it in that
> form
> > and then the query gets really fast... sort of like a data warehouse type
> > of thing.  (Don't shoot, I know this isn't data warehousing...)
> >
> > Postgres even has something called an "automatically updateable view"
> that
> > might serve - if that's your back end.
> >
> > Anyway - the underlying strategy is to find a way to flatten your data
> > preparatory to turning it into solr documents by some means - either by
> > getting it out on shorter-running queries all the time into some kind of
> > store (Kafka, text file, whatever) or by using some feature of the
> database
> > (stored procs writing to a summary table, automatically updatable view or
> > similar).
> >
> > In this way, when you make your query, you make it against the
> "flattened"
> > data - which is, ideally, all in one table - and then all the complexity
> of
> > joins etc... is washed away and things ought to run pretty fast.
> >
> > The cost, of course, is a huge table with tons of duplicated data...
> Only
> > you can say if that's worth it.  I did this at my last gig and we
> truncated
> > the table every 2 weeks to prevent it growing forever.
> >
> > In case it's helpful...
> >
> > PS - if you have the resources, a duplicate database can really help here
> > too - again my experience is mostly with Postgres which allows a "warm"
> > backup to be live.  We frequently used this for executive queries that
> were
> > using the database like a data warehouse because they were so
> > time-consuming.  It kept the load off production.
> >
> > On Thu, May 26, 2016 at 12:18 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Forgot to add... sometimes really hammering at the SQL query in DIH
> >> can be fruitful, can you make a huge, monster query that's faster than
> >> the sub-queries?
> >>
> >> I've also seen people run processes on the DB that move all the
> >> data into a temporary place making use of all of the nifty stuff you
> >> can do there and then use DIH on _that_. Or the view.
> >>
> >> All that said, I generally prefer using SolrJ if DIH doesn't do the job
> >> after a day or two of fiddling, it gives more control.
> >>
> >> Good Luck!
> >> Erick
> >>
> >> On Thu, May 26, 2016 at 11:02 AM, John Blythe 
> wrote:
> >> > oo gotcha. cool, will make sure to check it out and bounce any related
> >> > questions through here.
> >> >
> >> > thanks!
> >> >
> >> > best,
> >> >
> >> >
> >> > --
> >> > *John Blythe*
> >> > Product Manager & Lead Developer
> >> >
> >> > 251.605.3071 | j...@curvolabs.com
> >> > www.curvolabs.com
> >> >
> >> > 58 Adams Ave
> >> > Evansville, IN 47713
> >> >
> >> > On Thu, May 26, 2016 at 1:45 PM, Erick Erickson <
> >> erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> >> Solr commits aren't the issue I'd guess. All the time is
> >> >> probably being spent getting the data from MySQL.
> >> >>
> >> >> I've had some luck writing to Solr from a DB through a
> >> >> SolrJ program, here's a place to get 

Re: Facet data type

2016-05-26 Thread Erick Erickson
I always prefer ints to strings, they can't help but take
up less memory, comparing two ints is much faster than
two strings etc. Although Lucene can play some tricks
to make that less noticeable.

Although if these are just a few values, it'll be hard to
actually measure the perf difference.

And if it's a _lot_ of unique values, you have other problems
than the int/string distinction. Faceting on very high
cardinality fields is something that can have performance
implications.

But I'd certainly add docValues="true" to the definition no matter
which you decide on.

Best,
Erick

On Wed, May 25, 2016 at 9:29 AM, Steven White  wrote:
> Hi everyone,
>
> I will be faceting on data of type integers and I'm wonder if there is any
> difference on how I design my schema.  I have no need to sort or use range
> facet, given this, in terms of Lucene performance and index size, does it
> make any difference if I use:
>
> #1:  required="true" stored="false"/>
>
> Or
>
> #2:  required="true" stored="false"/>
>
> (notice how I changed the "type" from "string" to "int" in #2)
>
> Thanks in advanced.
>
> Steve


Re: import efficiencies

2016-05-26 Thread John Bickerstaff
Having more carefully read Erick's post - I see that is essentially what he
said in a much more straightforward way.

I will also second Erick's suggestion of hammering on the SQL.  We found
that fruitful many times at the same gig.  I develop and am not a SQL
master.  In a similar situation I'll usually seek out a specialist to help
me make sure the query isn't wasteful.  It frequently was and I learned a
lot.

On Thu, May 26, 2016 at 12:31 PM, John Bickerstaff  wrote:

> It may or may not be helpful, but there's a similar class of problem that
> is frequently solved either by stored procedures or by running the query on
> a time-frame and storing the results...  Doesn't matter if the end-point
> for the data is Solr or somewhere else.
>
> The problem is long running queries that are extremely complex and stress
> the database performance too heavily.
>
> The solution is to de-normalize the data you need... store it in that form
> and then the query gets really fast... sort of like a data warehouse type
> of thing.  (Don't shoot, I know this isn't data warehousing...)
>
> Postgres even has something called an "automatically updateable view" that
> might serve - if that's your back end.
>
> Anyway - the underlying strategy is to find a way to flatten your data
> preparatory to turning it into solr documents by some means - either by
> getting it out on shorter-running queries all the time into some kind of
> store (Kafka, text file, whatever) or by using some feature of the database
> (stored procs writing to a summary table, automatically updatable view or
> similar).
>
> In this way, when you make your query, you make it against the "flattened"
> data - which is, ideally, all in one table - and then all the complexity of
> joins etc... is washed away and things ought to run pretty fast.
>
> The cost, of course, is a huge table with tons of duplicated data...  Only
> you can say if that's worth it.  I did this at my last gig and we truncated
> the table every 2 weeks to prevent it growing forever.
>
> In case it's helpful...
>
> PS - if you have the resources, a duplicate database can really help here
> too - again my experience is mostly with Postgres which allows a "warm"
> backup to be live.  We frequently used this for executive queries that were
> using the database like a data warehouse because they were so
> time-consuming.  It kept the load off production.
>
> On Thu, May 26, 2016 at 12:18 PM, Erick Erickson 
> wrote:
>
>> Forgot to add... sometimes really hammering at the SQL query in DIH
>> can be fruitful, can you make a huge, monster query that's faster than
>> the sub-queries?
>>
>> I've also seen people run processes on the DB that move all the
>> data into a temporary place making use of all of the nifty stuff you
>> can do there and then use DIH on _that_. Or the view.
>>
>> All that said, I generally prefer using SolrJ if DIH doesn't do the job
>> after a day or two of fiddling, it gives more control.
>>
>> Good Luck!
>> Erick
>>
>> On Thu, May 26, 2016 at 11:02 AM, John Blythe  wrote:
>> > oo gotcha. cool, will make sure to check it out and bounce any related
>> > questions through here.
>> >
>> > thanks!
>> >
>> > best,
>> >
>> >
>> > --
>> > *John Blythe*
>> > Product Manager & Lead Developer
>> >
>> > 251.605.3071 | j...@curvolabs.com
>> > www.curvolabs.com
>> >
>> > 58 Adams Ave
>> > Evansville, IN 47713
>> >
>> > On Thu, May 26, 2016 at 1:45 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Solr commits aren't the issue I'd guess. All the time is
>> >> probably being spent getting the data from MySQL.
>> >>
>> >> I've had some luck writing to Solr from a DB through a
>> >> SolrJ program, here's a place to get started:
>> >> searchhub.org/2012/02/14/indexing-with-solrj/
>> >> you can peel out the Tika bits pretty easily I should
>> >> think.
>> >>
>> >> One technique I've used is to cache
>> >> some of the DB tables in Java's memory to keep
>> >> from having to do the secondary lookup(s). This only
>> >> really works if the "secondary table" is small enough to fit in
>> >> Java's memory of course. You can do some creative
>> >> things with caching partial tables if you can sort appropriately.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Thu, May 26, 2016 at 9:01 AM, John Blythe 
>> wrote:
>> >> > hi all,
>> >> >
>> >> > i've got layered entities in my solr import. it's calling on some
>> >> > transactional data from a MySQL instance. there are two fields that
>> are
>> >> > used to then lookup other information from other tables via their
>> related
>> >> > UIDs, one of which has its own child entity w yet another select
>> >> statement
>> >> > to grab up more data.
>> >> >
>> >> > it fetches at about 120/s but processes at ~50-60/s. we currently
>> only
>> >> have
>> >> > close to 500k records, but it's growing quickly and thus is becoming
>> >> > increasingly painful 

Re: Solr 5.5.2

2016-05-26 Thread Erick Erickson
Note that <3> is actually not hard at all, the "ant package" target
does it all for you. You do need to install ant and a Java JDK, but
the rest is pretty automatic, just apply the patch and execute the
above target.

Details are here in case you get desperate ;)...

https://wiki.apache.org/solr/HowToContribute

Best,
Erick

On Thu, May 26, 2016 at 11:29 AM, Nick Vasilyev
 wrote:
> Thanks Erik, option 4 is my favorite so far :)
>
> On Thu, May 26, 2016 at 2:15 PM, Erick Erickson 
> wrote:
>
>> There is no plan to release 5.5.2, development has moved to trunk and
>> 6.x. Also, while there
>> is a patch for that JIRA it hasn't been committed even in trunk/6.0.
>>
>> So I think your choices are:
>> 1> find a work-around
>> 2> see about moving to Solr 6.0.1 (in release process now),
>> assuming that it solves the problem.
>> 3> See if the patch supplied with SOLR-8940 works for you and compile
>> it locally.
>> 4> agitate for a 5.5.2 that includes this fix (after the fix has been
>> vetted).
>>
>> Best,
>> Erick
>>
>> On Thu, May 26, 2016 at 11:08 AM, Nick Vasilyev
>>  wrote:
>> > Is there an anticipated release date for 5.5.2? I know 5.5.1 was just
>> > released a while ago and although it fixes the faceting performance
>> > (SOLR-8096), distributed grouping is broken (SOLR-8940).
>> >
>> > I just need a solid 5.x release that is stable and with all core
>> > functionality working.
>> >
>> > Thanks
>>


Re: How to save index data to other place? [scottchu]

2016-05-26 Thread Shawn Heisey
On 5/25/2016 10:16 PM, scott.chu wrote:
> Thanks! I thought I have to tune solrconfig.xml.
>
> scott.chu,scott@udngroup.com
> 2016/5/26 (週四)
> - Original Message - 
> From: Jay Potharaju 
> To: solr-user ; scott(自己) 
> CC: 
> Date: 2016/5/26 (週四) 11:31
> Subject: Re: How to save index data to other place? [scottchu]
>
>
> use property.*dataDir*=*value* 
> https://cwiki.apache.org/confluence/display/solr/Defining+core.properties 

In general, I only place *relative* paths in dataDir ... but for most
people, I would actually recommend not setting dataDir at all, and
letting it default to "data".

Rather, what I would do is set the solr home to another location, so
*all* cores live there by default, and let Solr create its default
directory structure in that location.

You haven't indicated in this thread whether you're running on a UNIX or
UNIX-like OS such as Linux, or whether you're running on Windows.  You
used both slashes and backslashes when you described your actual and
desired index paths.

For most operating systems other than Windows, I strongly recommend
using the solr installer shell script.  This script has a concept of a
"var dir", and the solr home is a directory inside that.

If you are using the installer shell script to install Solr and do not
provide any options, then Solr itself will install to /opt/solr-X.Y.Z, a
symlink at /opt/solr will be created that points to the install
directory, and Solr will set its var dir to /var/solr.  A configuration
script will be created at /etc/default/solr.in.sh.  In the configuration
script, the solr home gets set to /var/solr/data.  Each core created
would create a directory under /var/solr/data with the core's name, and
inside that directory would be a data directory, containing the index
directory and the tlog directory.  These paths assume you've got 5.5 or
later.

On a dev server, I used the installer script with a service name (-s
option) of solr5.  With this option, most of the paths get changed from
the default.  The symlink is /opt/solr5 pointing to /opt/solr-5.5.1, the
var dir is /var/solr5, the configuration script is
/etc/default/solr5.in.sh, and the solr home is /var/solr5/data.

Thanks,
Shawn



Re: import efficiencies

2016-05-26 Thread John Bickerstaff
It may or may not be helpful, but there's a similar class of problem that
is frequently solved either by stored procedures or by running the query on
a time-frame and storing the results...  Doesn't matter if the end-point
for the data is Solr or somewhere else.

The problem is long running queries that are extremely complex and stress
the database performance too heavily.

The solution is to de-normalize the data you need... store it in that form
and then the query gets really fast... sort of like a data warehouse type
of thing.  (Don't shoot, I know this isn't data warehousing...)

Postgres even has something called an "automatically updateable view" that
might serve - if that's your back end.

Anyway - the underlying strategy is to find a way to flatten your data
preparatory to turning it into solr documents by some means - either by
getting it out on shorter-running queries all the time into some kind of
store (Kafka, text file, whatever) or by using some feature of the database
(stored procs writing to a summary table, automatically updatable view or
similar).

In this way, when you make your query, you make it against the "flattened"
data - which is, ideally, all in one table - and then all the complexity of
joins etc... is washed away and things ought to run pretty fast.

The cost, of course, is a huge table with tons of duplicated data...  Only
you can say if that's worth it.  I did this at my last gig and we truncated
the table every 2 weeks to prevent it growing forever.

In case it's helpful...

PS - if you have the resources, a duplicate database can really help here
too - again my experience is mostly with Postgres which allows a "warm"
backup to be live.  We frequently used this for executive queries that were
using the database like a data warehouse because they were so
time-consuming.  It kept the load off production.

On Thu, May 26, 2016 at 12:18 PM, Erick Erickson 
wrote:

> Forgot to add... sometimes really hammering at the SQL query in DIH
> can be fruitful, can you make a huge, monster query that's faster than
> the sub-queries?
>
> I've also seen people run processes on the DB that move all the
> data into a temporary place making use of all of the nifty stuff you
> can do there and then use DIH on _that_. Or the view.
>
> All that said, I generally prefer using SolrJ if DIH doesn't do the job
> after a day or two of fiddling, it gives more control.
>
> Good Luck!
> Erick
>
> On Thu, May 26, 2016 at 11:02 AM, John Blythe  wrote:
> > oo gotcha. cool, will make sure to check it out and bounce any related
> > questions through here.
> >
> > thanks!
> >
> > best,
> >
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | j...@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Thu, May 26, 2016 at 1:45 PM, Erick Erickson  >
> > wrote:
> >
> >> Solr commits aren't the issue I'd guess. All the time is
> >> probably being spent getting the data from MySQL.
> >>
> >> I've had some luck writing to Solr from a DB through a
> >> SolrJ program, here's a place to get started:
> >> searchhub.org/2012/02/14/indexing-with-solrj/
> >> you can peel out the Tika bits pretty easily I should
> >> think.
> >>
> >> One technique I've used is to cache
> >> some of the DB tables in Java's memory to keep
> >> from having to do the secondary lookup(s). This only
> >> really works if the "secondary table" is small enough to fit in
> >> Java's memory of course. You can do some creative
> >> things with caching partial tables if you can sort appropriately.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, May 26, 2016 at 9:01 AM, John Blythe 
> wrote:
> >> > hi all,
> >> >
> >> > i've got layered entities in my solr import. it's calling on some
> >> > transactional data from a MySQL instance. there are two fields that
> are
> >> > used to then lookup other information from other tables via their
> related
> >> > UIDs, one of which has its own child entity w yet another select
> >> statement
> >> > to grab up more data.
> >> >
> >> > it fetches at about 120/s but processes at ~50-60/s. we currently only
> >> have
> >> > close to 500k records, but it's growing quickly and thus is becoming
> >> > increasingly painful to make modifications due to the reimport that
> needs
> >> > to then occur.
> >> >
> >> > i feel like i'd seen some threads regarding commits of new data,
> >> > master/slave, or solrcloud/sharding that could help in some ways
> related
> >> to
> >> > this but as of yet can't scrounge them up w my searches (ironic :p).
> >> >
> >> > can someone help by pointing me to some good material related to this
> >> sort
> >> > of thing?
> >> >
> >> > thanks-
> >>
>


Re: SolrCloud Shard console shows roughly same number of documents?

2016-05-26 Thread Erick Erickson
Q1: Not quite sure what you mean. Let's say I have 2 shards, 3
replicas each 16 docs on each.I _think_ you're
talking about the "core selector", which shows the docs on that
particular core, 16 in our case not 48.

Q2: Yes, that's how SolrCloud is designed. It has to be for HA/DR.
Every replica in a shard has all the docs, 16 as above. Otherwise if
one of your machines went down there could be no guarantee even
attempted about there not being data loss.

Q3: Yes, indexing will be slower when there is more than one replica
per shard since the raw document is forwarded from the leader to all
followers before acking back. In distributed situations, you will have
a bunch (potentially) more machines doing indexing so total throughput
can be faster.

Why do you care? Is there a problem or is this just general background
info? There are a number of techniques for speeding up indexing, the
first is to use SolrJ and CloudSolrClient and send batches of docs at
once rather than one-at-a-time.

Best,
Erick

On Wed, May 25, 2016 at 1:54 PM, Siddhartha Singh Sandhu
 wrote:
> Hi,
>
> I recently moved to a SolrCloud config. I had a few questions:
>
> Q1. Does a shard show cumulative number of documents or documents present
> in that particular shard on the admin console of respective shard?
>
> Q2. If 1's answer is non-cumulative then my shards(on different servers)
> are indexing all the documents on each instance of shard. Is this natural?
> I created the shards with compositeId.
>
> Q3. If the answer to 1 is cumulative then my indexing was slower then a
> single core instance which was on the same machine of which I have 2
>  now(my shards). What could I be missing while configuring Solr?
>
>
> I am using Solr 6.0.0 on Ubuntu 14.04 with external zookeeper.
>
> Regards,
>
> Sid.


Re: Solr 5.5.2

2016-05-26 Thread Nick Vasilyev
Thanks Erik, option 4 is my favorite so far :)

On Thu, May 26, 2016 at 2:15 PM, Erick Erickson 
wrote:

> There is no plan to release 5.5.2, development has moved to trunk and
> 6.x. Also, while there
> is a patch for that JIRA it hasn't been committed even in trunk/6.0.
>
> So I think your choices are:
> 1> find a work-around
> 2> see about moving to Solr 6.0.1 (in release process now),
> assuming that it solves the problem.
> 3> See if the patch supplied with SOLR-8940 works for you and compile
> it locally.
> 4> agitate for a 5.5.2 that includes this fix (after the fix has been
> vetted).
>
> Best,
> Erick
>
> On Thu, May 26, 2016 at 11:08 AM, Nick Vasilyev
>  wrote:
> > Is there an anticipated release date for 5.5.2? I know 5.5.1 was just
> > released a while ago and although it fixes the faceting performance
> > (SOLR-8096), distributed grouping is broken (SOLR-8940).
> >
> > I just need a solid 5.x release that is stable and with all core
> > functionality working.
> >
> > Thanks
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff
Thanks Chris --

The two projects I'm aware of are:

https://github.com/healthonnet/hon-lucene-synonyms

and the one referenced from the Lucidworks page here:
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

... which is here : https://github.com/LucidWorks/auto-phrase-tokenfilter

Is there anything else out there that you would recommend I look at?

On Thu, May 26, 2016 at 12:01 PM, Chris Morley  wrote:

> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>
>  Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
> We worked mostly off of Ted Sullivan's work and also off of some
> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
> have a more sophisticated internal implementation, however, we've found
> that it is very difficult to make it do what you want it to do, and also be
> sufficiently performant.  Watch out for exceptional situations with mm
> (minimum should match).
>
>  Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
> done work in this area.
>
>  It should be very possible to get this kind of thing working on
> SolrCloud.  I haven't tried it yet but I think theoretically, it should
> just work.  The synonyms stuff is mostly about doing things at index time
> and query time.  The index time stuff should translate to SolrCloud
> directly, while the query time stuff might pose some issues, but probably
> not too bad, if there are any issues at all.
>
>  I've had decent luck porting our various plugins from 4.10.x to 5.5.0
> because a lot of stuff is just Java, and it still works within the Jetty
> context.
>
>  -Chris.
>
>
>
>
> 
>  From: "John Bickerstaff" 
> Sent: Thursday, May 26, 2016 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> potentially interesting links...
>
> http://wiki.apache.org/solr/QueryParser (search the page for
> synonum_edismax)
>
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
> post about what became the synonym_edissmax Query Parser)
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> This last was useful for lots of reasons and contains links to other
> interesting, related web pages...
>
> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes 
> wrote:
>
> > Oh, interesting. I've certainty encountered issues with multi-word
> > synonyms, but I hadn't come across this. If you end up using it with a
> > recent solr verison, I'd be glad to hear your experience.
> >
> > I haven't used it, but I am aware of one other project in this vein that
> > you might be interested in looking at:
> > https://github.com/LucidWorks/auto-phrase-tokenfilter
> >
> >
> > On 5/26/16, 9:29 AM, "John Bickerstaff" 
> wrote:
> >
> > >Ahh - for question #3 I may have spoken too soon. This line from the
> > >github repository readme suggests a way.
> > >
> > >Update: We have tested to run with the jar in $SOLR_HOME/lib as well,
> and
> > >it works (Jetty).
> > >
> > >I'll try that and only respond back if that doesn't work.
> > >
> > >Questions 1 and 2 still stand of course... If anyone on the list has
> > >experience in this area...
> > >
> > >Thanks.
> > >
> > >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > >> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm creating a Solr Cloud that will index and search medical text.
> > >> Multi-word synonyms are a pretty important factor.
> > >>
> > >> I find that there are some challenges around multi-word synonyms and I
> > >> also found on the wiki that there is a recommended 3rd-party parser
> > >> (synonym_edismax parser) created by Nolan Lawson and found here:
> > >> https://github.com/healthonnet/hon-lucene-synonyms
> > >>
> > >> Here's the thing - the instructions on the github site involve
> bringing
> > >> the jar file into the war file - which is not applicable any more...
> at
> > >> least I think it's not...
> > >>
> > >> I have three questions:
> > >>
> > >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> > Cloud
> > >> doesn't break it in some way)
> > >> 2. Is there a tool or plug-in out there that the contributors would
> > >> recommend above this one?
> > >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated
> procedure
> > >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> > >>
> > >> Thanks
> > >>
> >
> >
>
>
>


Re: import efficiencies

2016-05-26 Thread Erick Erickson
Forgot to add... sometimes really hammering at the SQL query in DIH
can be fruitful, can you make a huge, monster query that's faster than
the sub-queries?

I've also seen people run processes on the DB that move all the
data into a temporary place making use of all of the nifty stuff you
can do there and then use DIH on _that_. Or the view.

All that said, I generally prefer using SolrJ if DIH doesn't do the job
after a day or two of fiddling, it gives more control.

Good Luck!
Erick

On Thu, May 26, 2016 at 11:02 AM, John Blythe  wrote:
> oo gotcha. cool, will make sure to check it out and bounce any related
> questions through here.
>
> thanks!
>
> best,
>
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Thu, May 26, 2016 at 1:45 PM, Erick Erickson 
> wrote:
>
>> Solr commits aren't the issue I'd guess. All the time is
>> probably being spent getting the data from MySQL.
>>
>> I've had some luck writing to Solr from a DB through a
>> SolrJ program, here's a place to get started:
>> searchhub.org/2012/02/14/indexing-with-solrj/
>> you can peel out the Tika bits pretty easily I should
>> think.
>>
>> One technique I've used is to cache
>> some of the DB tables in Java's memory to keep
>> from having to do the secondary lookup(s). This only
>> really works if the "secondary table" is small enough to fit in
>> Java's memory of course. You can do some creative
>> things with caching partial tables if you can sort appropriately.
>>
>> Best,
>> Erick
>>
>> On Thu, May 26, 2016 at 9:01 AM, John Blythe  wrote:
>> > hi all,
>> >
>> > i've got layered entities in my solr import. it's calling on some
>> > transactional data from a MySQL instance. there are two fields that are
>> > used to then lookup other information from other tables via their related
>> > UIDs, one of which has its own child entity w yet another select
>> statement
>> > to grab up more data.
>> >
>> > it fetches at about 120/s but processes at ~50-60/s. we currently only
>> have
>> > close to 500k records, but it's growing quickly and thus is becoming
>> > increasingly painful to make modifications due to the reimport that needs
>> > to then occur.
>> >
>> > i feel like i'd seen some threads regarding commits of new data,
>> > master/slave, or solrcloud/sharding that could help in some ways related
>> to
>> > this but as of yet can't scrounge them up w my searches (ironic :p).
>> >
>> > can someone help by pointing me to some good material related to this
>> sort
>> > of thing?
>> >
>> > thanks-
>>


Re: Solr 5.5.2

2016-05-26 Thread Erick Erickson
There is no plan to release 5.5.2, development has moved to trunk and
6.x. Also, while there
is a patch for that JIRA it hasn't been committed even in trunk/6.0.

So I think your choices are:
1> find a work-around
2> see about moving to Solr 6.0.1 (in release process now),
assuming that it solves the problem.
3> See if the patch supplied with SOLR-8940 works for you and compile
it locally.
4> agitate for a 5.5.2 that includes this fix (after the fix has been vetted).

Best,
Erick

On Thu, May 26, 2016 at 11:08 AM, Nick Vasilyev
 wrote:
> Is there an anticipated release date for 5.5.2? I know 5.5.1 was just
> released a while ago and although it fixes the faceting performance
> (SOLR-8096), distributed grouping is broken (SOLR-8940).
>
> I just need a solid 5.x release that is stable and with all core
> functionality working.
>
> Thanks


Re: Even if my solr has the document id , solr cloud query gives no result

2016-05-26 Thread Erick Erickson
Don't mess with distrib=true|false to start. What you're
seeing is that when a query comes in to SolrCloud, a
sub-query is being sent to one replica of every shard. That
sub-query has distrib=false set. Then when preliminary
results are returned and collated by the distributor, another
request is made to the shards to get the actual data for the
true top N. So this is a total red herring.

Likewise, "ids" is a red-herring, part of internal sub-query
processing (I think).

In essence, you should pretty much ignore these sub-queries
and concentrate on the query you send to Solr. In particular
add =query and see what the _parsed_ query looks
like. Odds are it's not quite parsing as you expect. If that doesn't
help, please post the results and we'll see.

What guarantee do you have that you have any document in
your corpus with sysid of 11382?

Best,
Erick

On Wed, May 25, 2016 at 5:27 AM, preeti kumari  wrote:
> Hi,
>
> I am using solr 5.2.1 cloud version I am facing an issue.
> From client the query which goes to solr is :q=(sysid:11382)
>
> But in my solr logs i can see the actual query getting fired is :
>
> ids=0323_00011382=false=javabin=2=1=1464177290469=
> http://host1:8009/solr/collection1_shard3_replica2/|http://host2:8009/solr/collection1_shard3_replica1/=*,score=text=0=(sysid:11382)
>
> which gives no result.
> In my schema.xml I have unique field as "docid" not "ids".
> I need to know what is this field "ids" and why distrib=false is added in
> query . I tried adding distrib=true in my /select request handler in solr
> config still i get the same query.
>
> If i give query with "ids=0323_00011382=true" it gives error
> java.lang.NullPointerException\n\tat
> org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1176)\n\tat
> org.apache.solr.handler.component.QueryComponent.mergeIds
>
>
> When i remove "ids=0323_00011382=false" this from  query i get
> results with docid as 0323_00011382.
>
> Please help me to understand and how can i override this behaviour of solr
> cloud.
>
> Thanks
> Preeti


Solr 5.5.2

2016-05-26 Thread Nick Vasilyev
Is there an anticipated release date for 5.5.2? I know 5.5.1 was just
released a while ago and although it fixes the faceting performance
(SOLR-8096), distributed grouping is broken (SOLR-8940).

I just need a solid 5.x release that is stable and with all core
functionality working.

Thanks


Re: is API versioning supported in rolr?

2016-05-26 Thread Shawn Heisey
On 5/26/2016 2:37 AM, Nuhaa All Bakry wrote:
> Wondering if versioning is built-in in Solr? Say I have deployed a working 
> SolrCloud (v1.0) and there are applications consuming the REST APIs. Is there 
> a way to deploy the next v1.1 without removing v1.0? The reason I ask is 
> because we dont want the deployment of Solr to be tightly dependent on the 
> deployment of the applications, or vice versa.
>
> I cant find a documentation on this (yet). Please share if you know where I 
> can read more about this.

In general, the Solr HTTP API does *not* change very much.  Quite a lot
of the HTTP API is dictated by solrconfig.xml, with recommendations in
the examples, so it is frequently possible to make a new *major* version
behave like the previous major version.  Upgrading through two major
versions might reveal differences that cannot be adjusted by configuration.

The response formats (json, xml, javabin, etc) can accept a version
parameter which can control aspects of that response format, and using
an old version number can ensure compatibility with older code.  New
versions of the response formats are very rare, and many people use them
without a version number and don't have any problems.

If you are only using the HTTP API, then I would not be concerned too
much about this when upgrading within a major version ... and your code
may also work perfectly fine through a major version upgrade.

There is one exception to what I said above: If you are using
CloudSolrClient from SolrJ (or another cloud-aware client) to talk to
your SolrCloud, then this is much more tightly coupled to the version,
because CloudSolrClient talks to *zookeeper* (as well as using the HTTP
API) and is strongly aware of SolrCloud internals.  Mixing different
SolrJ and Solr versions is discouraged when using CloudSolrClient,
unless the version difference is very small and SolrJ is newer.  The
reason this can be a problem is that SolrCloud internals have been
evolving *very* rapidly with each new release.

Thanks,
Shawn



Re: import efficiencies

2016-05-26 Thread John Blythe
oo gotcha. cool, will make sure to check it out and bounce any related
questions through here.

thanks!

best,


-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, May 26, 2016 at 1:45 PM, Erick Erickson 
wrote:

> Solr commits aren't the issue I'd guess. All the time is
> probably being spent getting the data from MySQL.
>
> I've had some luck writing to Solr from a DB through a
> SolrJ program, here's a place to get started:
> searchhub.org/2012/02/14/indexing-with-solrj/
> you can peel out the Tika bits pretty easily I should
> think.
>
> One technique I've used is to cache
> some of the DB tables in Java's memory to keep
> from having to do the secondary lookup(s). This only
> really works if the "secondary table" is small enough to fit in
> Java's memory of course. You can do some creative
> things with caching partial tables if you can sort appropriately.
>
> Best,
> Erick
>
> On Thu, May 26, 2016 at 9:01 AM, John Blythe  wrote:
> > hi all,
> >
> > i've got layered entities in my solr import. it's calling on some
> > transactional data from a MySQL instance. there are two fields that are
> > used to then lookup other information from other tables via their related
> > UIDs, one of which has its own child entity w yet another select
> statement
> > to grab up more data.
> >
> > it fetches at about 120/s but processes at ~50-60/s. we currently only
> have
> > close to 500k records, but it's growing quickly and thus is becoming
> > increasingly painful to make modifications due to the reimport that needs
> > to then occur.
> >
> > i feel like i'd seen some threads regarding commits of new data,
> > master/slave, or solrcloud/sharding that could help in some ways related
> to
> > this but as of yet can't scrounge them up w my searches (ironic :p).
> >
> > can someone help by pointing me to some good material related to this
> sort
> > of thing?
> >
> > thanks-
>


Re: Issues with coordinates in Solr during updating of fields

2016-05-26 Thread Erick Erickson
Try removing the 'stored="true" ' from the gps_0_coordinate and
gps_1_coordinate.

When you say "...tried to do an update on any other fileds" I'm assuming you're
talking about Atomic Updates, which require that the destinations of
copyFields are single valued. Under the covers the location type is
split and copied to the other two fields so I suspect that's what's going on.

And you could also try one of the other types, see:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search

Best,
Erick

On Thu, May 26, 2016 at 1:46 AM, Zheng Lin Edwin Yeo
 wrote:
> Anyone has any solutions to this problem?
>
> I tried to remove the gps_0_coordinate and gps_1_coordinate, but I will get
> the following error during indexing.
> ERROR: [doc=id1] unknown field 'gps_0_coordinate'
>
> Regards,
> Edwin
>
>
> On 25 May 2016 at 11:37, Zheng Lin Edwin Yeo  wrote:
>
>> Hi,
>>
>> I have an implementation of storing the coordinates in Solr during
>> indexing.
>> During indexing, I will only store the value in the field name ="gps". For
>> the field name = "gps_0_coordinate" and "gps_1_coordinate", the value will
>> be auto filled and indexed from the "gps" field.
>>
>>> required="false"/>
>>> required="false"/>
>>> required="false"/>
>>
>> But when I tried to do an update on any other fields in the index, Solr
>> will try to add another value in the "gps_0_coordinate" and
>> "gps_1_coordinate". However, as these 2 fields are not multi-Valued, it
>> will lead to an error:
>> multiple values encountered for non multiValued field gps_0_coordinate:
>> [1.0,1.0]
>>
>> Does anyone knows how we can solve this issue?
>>
>> I am using Solr 5.4.0
>>
>> Regards,
>> Edwin
>>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread Chris Morley
Chris Morley here, from Wayfair.  (Depahelix = my domain)

 Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.  We 
worked mostly off of Ted Sullivan's work and also off of some suggestions from 
Koorosh Vakhshoori.  We have gotten to a point where we have a more 
sophisticated internal implementation, however, we've found that it is very 
difficult to make it do what you want it to do, and also be sufficiently 
performant.  Watch out for exceptional situations with mm (minimum should 
match).

 Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also done 
work in this area.

 It should be very possible to get this kind of thing working on SolrCloud.  I 
haven't tried it yet but I think theoretically, it should just work.  The 
synonyms stuff is mostly about doing things at index time and query time.  The 
index time stuff should translate to SolrCloud directly, while the query time 
stuff might pose some issues, but probably not too bad, if there are any issues 
at all.

 I've had decent luck porting our various plugins from 4.10.x to 5.5.0 because 
a lot of stuff is just Java, and it still works within the Jetty context.

 -Chris.





 From: "John Bickerstaff" 
Sent: Thursday, May 26, 2016 1:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser  
Hey Jeff (or anyone interested in multi-word synonyms) here are some
potentially interesting links...

http://wiki.apache.org/solr/QueryParser (search the page for
synonum_edismax)

https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
post about what became the synonym_edissmax Query Parser)

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

This last was useful for lots of reasons and contains links to other
interesting, related web pages...

On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes 
wrote:

> Oh, interesting. I've certainty encountered issues with multi-word
> synonyms, but I hadn't come across this. If you end up using it with a
> recent solr verison, I'd be glad to hear your experience.
>
> I haven't used it, but I am aware of one other project in this vein that
> you might be interested in looking at:
> https://github.com/LucidWorks/auto-phrase-tokenfilter
>
>
> On 5/26/16, 9:29 AM, "John Bickerstaff"  wrote:
>
> >Ahh - for question #3 I may have spoken too soon. This line from the
> >github repository readme suggests a way.
> >
> >Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
> >it works (Jetty).
> >
> >I'll try that and only respond back if that doesn't work.
> >
> >Questions 1 and 2 still stand of course... If anyone on the list has
> >experience in this area...
> >
> >Thanks.
> >
> >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> wrote:
> >
> >> Hi all,
> >>
> >> I'm creating a Solr Cloud that will index and search medical text.
> >> Multi-word synonyms are a pretty important factor.
> >>
> >> I find that there are some challenges around multi-word synonyms and I
> >> also found on the wiki that there is a recommended 3rd-party parser
> >> (synonym_edismax parser) created by Nolan Lawson and found here:
> >> https://github.com/healthonnet/hon-lucene-synonyms
> >>
> >> Here's the thing - the instructions on the github site involve bringing
> >> the jar file into the war file - which is not applicable any more... at
> >> least I think it's not...
> >>
> >> I have three questions:
> >>
> >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> Cloud
> >> doesn't break it in some way)
> >> 2. Is there a tool or plug-in out there that the contributors would
> >> recommend above this one?
> >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
> >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> >>
> >> Thanks
> >>
>
>




Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff
fixing typo:

http://wiki.apache.org/solr/QueryParser  (search the page for
synonym_edismax)

On Thu, May 26, 2016 at 11:50 AM, John Bickerstaff  wrote:

> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> potentially interesting links...
>
> http://wiki.apache.org/solr/QueryParser  (search the page for
> synonum_edismax)
>
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
>  (blog post about what became the synonym_edissmax Query Parser)
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> This last was useful for lots of reasons and contains links to other
> interesting, related web pages...
>
> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes 
> wrote:
>
>> Oh, interesting. I’ve certainty encountered issues with multi-word
>> synonyms, but I hadn’t come across this. If you end up using it with a
>> recent solr verison, I’d be glad to hear your experience.
>>
>> I haven’t used it, but I am aware of one other project in this vein that
>> you might be interested in looking at:
>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>>
>>
>> On 5/26/16, 9:29 AM, "John Bickerstaff"  wrote:
>>
>> >Ahh - for question #3 I may have spoken too soon.  This line from the
>> >github repository readme suggests a way.
>> >
>> >Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
>> >it works (Jetty).
>> >
>> >I'll try that and only respond back if that doesn't work.
>> >
>> >Questions 1 and 2 still stand of course...  If anyone on the list has
>> >experience in this area...
>> >
>> >Thanks.
>> >
>> >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
>> j...@johnbickerstaff.com
>> >> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I'm creating a Solr Cloud that will index and search medical text.
>> >> Multi-word synonyms are a pretty important factor.
>> >>
>> >> I find that there are some challenges around multi-word synonyms and I
>> >> also found on the wiki that there is a recommended 3rd-party parser
>> >> (synonym_edismax parser) created by Nolan Lawson and found here:
>> >> https://github.com/healthonnet/hon-lucene-synonyms
>> >>
>> >> Here's the thing - the instructions on the github site involve bringing
>> >> the jar file into the war file - which is not applicable any more... at
>> >> least I think it's not...
>> >>
>> >> I have three questions:
>> >>
>> >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
>> Cloud
>> >> doesn't break it in some way)
>> >> 2. Is there a tool or plug-in out there that the contributors would
>> >> recommend above this one?
>> >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
>> >> for bringing it in to Solr Cloud (I'm running 5.4.x)
>> >>
>> >> Thanks
>> >>
>>
>>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff
Hey Jeff (or anyone interested in multi-word synonyms) here are some
potentially interesting links...

http://wiki.apache.org/solr/QueryParser  (search the page for
synonum_edismax)

https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/  (blog
post about what became the synonym_edissmax Query Parser)

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

This last was useful for lots of reasons and contains links to other
interesting, related web pages...

On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes 
wrote:

> Oh, interesting. I’ve certainty encountered issues with multi-word
> synonyms, but I hadn’t come across this. If you end up using it with a
> recent solr verison, I’d be glad to hear your experience.
>
> I haven’t used it, but I am aware of one other project in this vein that
> you might be interested in looking at:
> https://github.com/LucidWorks/auto-phrase-tokenfilter
>
>
> On 5/26/16, 9:29 AM, "John Bickerstaff"  wrote:
>
> >Ahh - for question #3 I may have spoken too soon.  This line from the
> >github repository readme suggests a way.
> >
> >Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
> >it works (Jetty).
> >
> >I'll try that and only respond back if that doesn't work.
> >
> >Questions 1 and 2 still stand of course...  If anyone on the list has
> >experience in this area...
> >
> >Thanks.
> >
> >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> wrote:
> >
> >> Hi all,
> >>
> >> I'm creating a Solr Cloud that will index and search medical text.
> >> Multi-word synonyms are a pretty important factor.
> >>
> >> I find that there are some challenges around multi-word synonyms and I
> >> also found on the wiki that there is a recommended 3rd-party parser
> >> (synonym_edismax parser) created by Nolan Lawson and found here:
> >> https://github.com/healthonnet/hon-lucene-synonyms
> >>
> >> Here's the thing - the instructions on the github site involve bringing
> >> the jar file into the war file - which is not applicable any more... at
> >> least I think it's not...
> >>
> >> I have three questions:
> >>
> >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> Cloud
> >> doesn't break it in some way)
> >> 2. Is there a tool or plug-in out there that the contributors would
> >> recommend above this one?
> >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
> >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> >>
> >> Thanks
> >>
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread Jeff Wartes
Oh, interesting. I’ve certainty encountered issues with multi-word synonyms, 
but I hadn’t come across this. If you end up using it with a recent solr 
verison, I’d be glad to hear your experience.

I haven’t used it, but I am aware of one other project in this vein that you 
might be interested in looking at: 
https://github.com/LucidWorks/auto-phrase-tokenfilter


On 5/26/16, 9:29 AM, "John Bickerstaff"  wrote:

>Ahh - for question #3 I may have spoken too soon.  This line from the
>github repository readme suggests a way.
>
>Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
>it works (Jetty).
>
>I'll try that and only respond back if that doesn't work.
>
>Questions 1 and 2 still stand of course...  If anyone on the list has
>experience in this area...
>
>Thanks.
>
>On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff > wrote:
>
>> Hi all,
>>
>> I'm creating a Solr Cloud that will index and search medical text.
>> Multi-word synonyms are a pretty important factor.
>>
>> I find that there are some challenges around multi-word synonyms and I
>> also found on the wiki that there is a recommended 3rd-party parser
>> (synonym_edismax parser) created by Nolan Lawson and found here:
>> https://github.com/healthonnet/hon-lucene-synonyms
>>
>> Here's the thing - the instructions on the github site involve bringing
>> the jar file into the war file - which is not applicable any more... at
>> least I think it's not...
>>
>> I have three questions:
>>
>> 1. Is this still a good solution for multi-word synonyms (I.e. Solr Cloud
>> doesn't break it in some way)
>> 2. Is there a tool or plug-in out there that the contributors would
>> recommend above this one?
>> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
>> for bringing it in to Solr Cloud (I'm running 5.4.x)
>>
>> Thanks
>>



Re: import efficiencies

2016-05-26 Thread Erick Erickson
Solr commits aren't the issue I'd guess. All the time is
probably being spent getting the data from MySQL.

I've had some luck writing to Solr from a DB through a
SolrJ program, here's a place to get started:
searchhub.org/2012/02/14/indexing-with-solrj/
you can peel out the Tika bits pretty easily I should
think.

One technique I've used is to cache
some of the DB tables in Java's memory to keep
from having to do the secondary lookup(s). This only
really works if the "secondary table" is small enough to fit in
Java's memory of course. You can do some creative
things with caching partial tables if you can sort appropriately.

Best,
Erick

On Thu, May 26, 2016 at 9:01 AM, John Blythe  wrote:
> hi all,
>
> i've got layered entities in my solr import. it's calling on some
> transactional data from a MySQL instance. there are two fields that are
> used to then lookup other information from other tables via their related
> UIDs, one of which has its own child entity w yet another select statement
> to grab up more data.
>
> it fetches at about 120/s but processes at ~50-60/s. we currently only have
> close to 500k records, but it's growing quickly and thus is becoming
> increasingly painful to make modifications due to the reimport that needs
> to then occur.
>
> i feel like i'd seen some threads regarding commits of new data,
> master/slave, or solrcloud/sharding that could help in some ways related to
> this but as of yet can't scrounge them up w my searches (ironic :p).
>
> can someone help by pointing me to some good material related to this sort
> of thing?
>
> thanks-


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread Erick Erickson
And, you can copy all of the fields into an "uber field" using the
copyField directive and just search the "uber field".

Best,
Erick

On Thu, May 26, 2016 at 7:35 AM, kostali hassan
 wrote:
> thank you it make sence .
> have a good day
>
> 2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu :
>
>> The schema.xml/managed_schema defines the default search field as `text`.
>>
>> You can make all fields that you want searchable type `text`.
>>
>> On Thu, May 26, 2016 at 10:23 AM, kostali hassan <
>> med.has.kost...@gmail.com>
>> wrote:
>>
>> > I import data from sql databases with DIH . I am looking for serch term
>> in
>> > all fields not by field.
>> >
>>


Re: Can a DocTransformer access the whole results tree?

2016-05-26 Thread Mikhail Khludnev
Hello,

There is a protected ResultContext field named context.

On Thu, May 26, 2016 at 5:31 PM, Upayavira  wrote:

> Looking at the code for a sample DocTransformer, it seems that a
> DocTransformer only has access to the document itself, not to the whole
> results. Because of this, it isn't possible to use a DocTransformer to
> merge, for example, the highlighting results into the main document.
>
> Am I missing something?
>
> Upayavira
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: how can we use multi term search along with stop words

2016-05-26 Thread Ahmet Arslan
Hi,

Are you firing both trailing and leading wildcard query?
Or you just put stars for emphasizing purposes?

Please consider using normal queries, since you are already using a tokenized 
field.

By the way what is 'tollc soon'?

Ahmet



On Thursday, May 26, 2016 4:33 PM, Preeti Bhat  wrote:
Hi Ahmet & Sid,

Thanks for the reply

I have the below requirement
1) If I search with say company_nm:*llc* then we should not return any results  
or only few results where llc is embedded in other words like tollc soon. So I 
had implemented the stopwords.
2) But If I search with say company_nm:*google llc* then it should return the 
result of google llc  and soon.

The problem here is 1st part is working perfectly, while the second part is not 
working.


Thanks and Regards,
Preeti Bhat
Shore Group Associates LLC
(C) +91-996-644-8187
www.ShoreGroupAssociates.com

-Original Message-
From: Siddhartha Singh Sandhu [mailto:sandhus...@gmail.com]
Sent: Thursday, May 26, 2016 6:54 PM
To: solr-user@lucene.apache.org; Ahmet Arslan
Subject: Re: how can we use multi term search along with stop words

Hi Preeti,

You can use the analysis tool in the Solr console to see how your queries are 
being tokenized. Based on your results you might need to make changes in 
"strings_ci".

Also, If you want to be able to search on stopwords you might want to remove 
solr.StopFilterFactory from indexing and query analyzer of "strings_ci". The 
stopwords.txt is present in the core conf directory. You will need to re-index 
after you make these changes.

Regards,

Sid.


On Thu, May 26, 2016 at 7:26 AM, Ahmet Arslan 
wrote:

> Hi Bhat,
>
> What do you mean by multi term search?
> In your first e-mail, your example uses quotes, which means
> phrase/proximity search.
>
> ahmet
>
>
>
> On Thursday, May 26, 2016 11:49 AM, Preeti Bhat
> 
> wrote:
> HI All,
>
> Sorry for asking the same question again, but could someone please
> advise me on this.
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
> From: Preeti Bhat
> Sent: Wednesday, May 25, 2016 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: how can we use multi term search along with stop words
>
> HI,
>
> I am trying to search the field named company_nm with value "Google llc".
> We have the stopword on "llc", so when I try to search it returns 0
> results. Could anyone please guide me through the process of using
> stopwords in multi term search.
>
> Please note I am using solr 6.0.0 and using standard parser.
>
> 
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
>   
> 
>   
> 
>  stored="true"/>
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential
> and/or privileged information. If you are not the intended recipient
> (or have received this communication in error) please notify the
> sender and it-supp...@shoregrp.com immediately, and destroy this
> communication. Any unauthorized copying, disclosure or distribution of
> the material in this communication is strictly forbidden. Any views or
> opinions presented in this email are solely those of the author and do
> not necessarily represent those of the company. Finally, the recipient
> should check this email and any attachments for the presence of
> viruses. The company accepts no liability for any damage caused by any virus 
> transmitted by this email.

>

NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff
Ahh - for question #3 I may have spoken too soon.  This line from the
github repository readme suggests a way.

Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
it works (Jetty).

I'll try that and only respond back if that doesn't work.

Questions 1 and 2 still stand of course...  If anyone on the list has
experience in this area...

Thanks.

On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff  wrote:

> Hi all,
>
> I'm creating a Solr Cloud that will index and search medical text.
> Multi-word synonyms are a pretty important factor.
>
> I find that there are some challenges around multi-word synonyms and I
> also found on the wiki that there is a recommended 3rd-party parser
> (synonym_edismax parser) created by Nolan Lawson and found here:
> https://github.com/healthonnet/hon-lucene-synonyms
>
> Here's the thing - the instructions on the github site involve bringing
> the jar file into the war file - which is not applicable any more... at
> least I think it's not...
>
> I have three questions:
>
> 1. Is this still a good solution for multi-word synonyms (I.e. Solr Cloud
> doesn't break it in some way)
> 2. Is there a tool or plug-in out there that the contributors would
> recommend above this one?
> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
> for bringing it in to Solr Cloud (I'm running 5.4.x)
>
> Thanks
>


Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff
Hi all,

I'm creating a Solr Cloud that will index and search medical text.
Multi-word synonyms are a pretty important factor.

I find that there are some challenges around multi-word synonyms and I also
found on the wiki that there is a recommended 3rd-party parser
(synonym_edismax parser) created by Nolan Lawson and found here:
https://github.com/healthonnet/hon-lucene-synonyms

Here's the thing - the instructions on the github site involve bringing the
jar file into the war file - which is not applicable any more... at least I
think it's not...

I have three questions:

1. Is this still a good solution for multi-word synonyms (I.e. Solr Cloud
doesn't break it in some way)
2. Is there a tool or plug-in out there that the contributors would
recommend above this one?
3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure for
bringing it in to Solr Cloud (I'm running 5.4.x)

Thanks


import efficiencies

2016-05-26 Thread John Blythe
hi all,

i've got layered entities in my solr import. it's calling on some
transactional data from a MySQL instance. there are two fields that are
used to then lookup other information from other tables via their related
UIDs, one of which has its own child entity w yet another select statement
to grab up more data.

it fetches at about 120/s but processes at ~50-60/s. we currently only have
close to 500k records, but it's growing quickly and thus is becoming
increasingly painful to make modifications due to the reimport that needs
to then occur.

i feel like i'd seen some threads regarding commits of new data,
master/slave, or solrcloud/sharding that could help in some ways related to
this but as of yet can't scrounge them up w my searches (ironic :p).

can someone help by pointing me to some good material related to this sort
of thing?

thanks-


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
thank you it make sence .
have a good day

2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu :

> The schema.xml/managed_schema defines the default search field as `text`.
>
> You can make all fields that you want searchable type `text`.
>
> On Thu, May 26, 2016 at 10:23 AM, kostali hassan <
> med.has.kost...@gmail.com>
> wrote:
>
> > I import data from sql databases with DIH . I am looking for serch term
> in
> > all fields not by field.
> >
>


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread Siddhartha Singh Sandhu
The schema.xml/managed_schema defines the default search field as `text`.

You can make all fields that you want searchable type `text`.

On Thu, May 26, 2016 at 10:23 AM, kostali hassan 
wrote:

> I import data from sql databases with DIH . I am looking for serch term in
> all fields not by field.
>


Re: sort by custom function of similarity score

2016-05-26 Thread Joel Bernstein
Reranking is done after the collapse. So you'll get the original score in
cscore()

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 26, 2016 at 12:56 PM, aanilpala  wrote:

> with cscore() in collapse, will I get the similarity score from lucene or
> the
> reranked score by the raranker if I am using a plugin that reranks the
> results? I guess the answer depends on which of fq or rq is applied first.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/sort-by-custom-function-of-similarity-score-tp4279228p4279233.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Can a DocTransformer access the whole results tree?

2016-05-26 Thread Upayavira
Looking at the code for a sample DocTransformer, it seems that a
DocTransformer only has access to the document itself, not to the whole
results. Because of this, it isn't possible to use a DocTransformer to
merge, for example, the highlighting results into the main document.

Am I missing something?

Upayavira


"data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
I import data from sql databases with DIH . I am looking for serch term in
all fields not by field.


Metadata and HTML ending up in searchable text

2016-05-26 Thread Simon Blandford

Hi,

I am using Solr 6.0 on Ubuntu 14.04.

I am ending up with loads of junk in the text body. It starts like,

The JSON entry output of a search result shows the indexed text starting 
with...
body_txt_en: " stream_size 36499 X-Parsed-By 
org.apache.tika.parser.DefaultParser X-Parsed-By"


And then once it gets to the actual text I get CSS class names appearing 
that were in  or  tags etc.
e.g. "the power of calibre3 silence calibre2 and", where 
"calibre3" etc are the CSS class names.


All this junk is searchable and is polluting the index.

I would like to index _only_ the actual content I am interested in 
searching for.


Steps to reproduce:

1) Solr installed by untaring solr tgz in /opt.

2) Core created by typing "bin/solr create -c mycore"

3) Solr started with bin/solr start

4) TXT document index using the following command
curl 
"http://localhost:8983/solr/mycore/update/extract?literal.id=doc1=attr_=body_txt_en=true; 
-F 
"content/UsingMailingLists.txt=@/home/user/Documents/library/UsingMailingLists.txt"


5) HTML document index using following command
curl 
"http://localhost:8983/solr/mycore/update/extract?literal.id=doc2=attr_=body_txt_en=true; 
-F 
"content/UsingMailingLists.html=@/home/user/Documents/library/UsingMailingLists.html"


6) Query using URL: 
http://localhost:8983/solr/mycore/select?q=especially=json


Result:

For the txt file, I get the following JSON for the document...

{
id: "doc1",
attr_stream_size: [
"8107"
],
attr_x_parsed_by: [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.txt.TXTParser"
],
attr_stream_content_type: [
"text/plain"
],
attr_stream_name: [
"UsingMailingLists.txt"
],
attr_stream_source_info: [
"content/UsingMailingLists.txt"
],
attr_content_encoding: [
"ISO-8859-1"
],
attr_content_type: [
"text/plain; charset=ISO-8859-1"
],
body_txt_en: " stream_size 8107 X-Parsed-By 
org.apache.tika.parser.DefaultParser X-Parsed-By 
org.apache.tika.parser.txt.TXTParser stream_content_type text/plain 
stream_name UsingMailingLists.txt stream_source_info 
content/UsingMailingLists.txt Content-Encoding ISO-8859-1 Content-Type 
text/plain; charset=ISO-8859-1 Search: [value ] [Titles] [Text] 
Solr_Wiki Login ** UsingMailingLists ** * FrontPage * 
RecentChanges...etc",

_version_: 1535398235801124900
}

For the HTML file,  I get the following JSON for the document...

{
id: "doc2",
attr_stream_size: [
"20440"
],
attr_x_parsed_by: [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.html.HtmlParser"
],
attr_stream_content_type: [
"text/html"
],
attr_stream_name: [
"UsingMailingLists.html"
],
attr_stream_source_info: [
"content/UsingMailingLists.html"
],
attr_dc_title: [
"UsingMailingLists - Solr Wiki"
],
attr_content_encoding: [
"UTF-8"
],
attr_robots: [
"index,nofollow"
],
attr_title: [
"UsingMailingLists - Solr Wiki"
],
attr_content_type: [
"text/html; charset=utf-8"
],
body_txt_en: " stylesheet text/css utf-8 all 
/wiki/modernized/css/common.css stylesheet text/css utf-8 screen 
/wiki/modernized/css/screen.css stylesheet text/css utf-8 print 
/wiki/modernized/css/print.css stylesheet text/css utf-8 projection 
/wiki/modernized/css/projection.css alternate Solr Wiki: 
UsingMailingLists 
/solr/UsingMailingLists?diffs=1_att=1=rss_rc=0=UsingMailingLists=1 
application/rss+xml Start /solr/FrontPage Alternate Wiki Markup 
/solr/UsingMailingLists?action=raw Alternate print Print View 
/solr/UsingMailingLists?action=print Search /solr/FindPage Index 
/solr/TitleIndex Glossary /solr/WordIndex Help /solr/HelpOnFormatting 
stream_size 20440 X-Parsed-By org.apache.tika.parser.DefaultParser 
X-Parsed-By org.apache.tika.parser.html.HtmlParser stream_content_type 
text/html stream_name UsingMailingLists.html stream_source_info...etc",

_version_: 1535398408383103000
}





RE: how can we use multi term search along with stop words

2016-05-26 Thread Preeti Bhat
Hi Ahmet & Sid,

Thanks for the reply

I have the below requirement
 1) If I search with say company_nm:*llc* then we should not return any results 
 or only few results where llc is embedded in other words like tollc soon. So I 
had implemented the stopwords.
2) But If I search with say company_nm:*google llc* then it should return the 
result of google llc  and soon.

The problem here is 1st part is working perfectly, while the second part is not 
working.


Thanks and Regards,
Preeti Bhat
Shore Group Associates LLC
(C) +91-996-644-8187
www.ShoreGroupAssociates.com

-Original Message-
From: Siddhartha Singh Sandhu [mailto:sandhus...@gmail.com]
Sent: Thursday, May 26, 2016 6:54 PM
To: solr-user@lucene.apache.org; Ahmet Arslan
Subject: Re: how can we use multi term search along with stop words

Hi Preeti,

You can use the analysis tool in the Solr console to see how your queries are 
being tokenized. Based on your results you might need to make changes in 
"strings_ci".

Also, If you want to be able to search on stopwords you might want to remove 
solr.StopFilterFactory from indexing and query analyzer of "strings_ci". The 
stopwords.txt is present in the core conf directory. You will need to re-index 
after you make these changes.

Regards,

Sid.


On Thu, May 26, 2016 at 7:26 AM, Ahmet Arslan 
wrote:

> Hi Bhat,
>
> What do you mean by multi term search?
> In your first e-mail, your example uses quotes, which means
> phrase/proximity search.
>
> ahmet
>
>
>
> On Thursday, May 26, 2016 11:49 AM, Preeti Bhat
> 
> wrote:
> HI All,
>
> Sorry for asking the same question again, but could someone please
> advise me on this.
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
> From: Preeti Bhat
> Sent: Wednesday, May 25, 2016 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: how can we use multi term search along with stop words
>
> HI,
>
> I am trying to search the field named company_nm with value "Google llc".
> We have the stopword on "llc", so when I try to search it returns 0
> results. Could anyone please guide me through the process of using
> stopwords in multi term search.
>
> Please note I am using solr 6.0.0 and using standard parser.
>
> 
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
>   
> 
>   
> 
>  stored="true"/>
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential
> and/or privileged information. If you are not the intended recipient
> (or have received this communication in error) please notify the
> sender and it-supp...@shoregrp.com immediately, and destroy this
> communication. Any unauthorized copying, disclosure or distribution of
> the material in this communication is strictly forbidden. Any views or
> opinions presented in this email are solely those of the author and do
> not necessarily represent those of the company. Finally, the recipient
> should check this email and any attachments for the presence of
> viruses. The company accepts no liability for any damage caused by any virus 
> transmitted by this email.
>

NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.




Re: how can we use multi term search along with stop words

2016-05-26 Thread Siddhartha Singh Sandhu
Hi Preeti,

You can use the analysis tool in the Solr console to see how your queries
are being tokenized. Based on your results you might need to make changes
in "strings_ci".

Also, If you want to be able to search on stopwords you might want to
remove solr.StopFilterFactory from indexing and query analyzer of
"strings_ci". The stopwords.txt is present in the core conf directory. You
will need to re-index after you make these changes.

Regards,

Sid.


On Thu, May 26, 2016 at 7:26 AM, Ahmet Arslan 
wrote:

> Hi Bhat,
>
> What do you mean by multi term search?
> In your first e-mail, your example uses quotes, which means
> phrase/proximity search.
>
> ahmet
>
>
>
> On Thursday, May 26, 2016 11:49 AM, Preeti Bhat 
> wrote:
> HI All,
>
> Sorry for asking the same question again, but could someone please advise
> me on this.
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
> From: Preeti Bhat
> Sent: Wednesday, May 25, 2016 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: how can we use multi term search along with stop words
>
> HI,
>
> I am trying to search the field named company_nm with value "Google llc".
> We have the stopword on "llc", so when I try to search it returns 0
> results. Could anyone please guide me through the process of using
> stopwords in multi term search.
>
> Please note I am using solr 6.0.0 and using standard parser.
>
> 
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
>   
> 
>   
> 
> 
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this communication in error) please notify the sender and
> it-supp...@shoregrp.com immediately, and destroy this communication. Any
> unauthorized copying, disclosure or distribution of the material in this
> communication is strictly forbidden. Any views or opinions presented in
> this email are solely those of the author and do not necessarily represent
> those of the company. Finally, the recipient should check this email and
> any attachments for the presence of viruses. The company accepts no
> liability for any damage caused by any virus transmitted by this email.
>


Re: debugging solr query

2016-05-26 Thread Jay Potharaju
Hi,
Thanks for the feedback. The queries I run are very basic filter queries
with some sorting.

q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc

I noticed that the date fields(dt1,dt2) are using date instead of tdate
fields & there are no docValues set on any of the fields used for sorting.

In order to fix this I plan to add a new field using tdate & docvalues
where required to the schema & update the new columns only for documents
that have fieldA set to abc. Once the fields are updated query on the new
fields to measure query performance .


   - Would the new added fields be used effectively by the solr index when
   querying & filtering? What I am not sure is whether only populating small
   number of documents(fieldA:abc) that are used for the above query provide
   performance benefits.
   - Would there be a performance penalty because majority of the
   documents(!fieldA:abc) dont have values in the new columns?

Thanks

On Wed, May 25, 2016 at 8:40 PM, Jay Potharaju 
wrote:

> Any links that illustrate and talk about solr internals and how
> indexing/querying works would be a great help.
> Thanks
> Jay
>
> On Wed, May 25, 2016 at 6:30 PM, Jay Potharaju 
> wrote:
>
>> Hi,
>> Thanks for the feedback. The queries I run are very basic filter queries
>> with some sorting.
>>
>> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
>> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>>
>> I noticed that the date fields(dt1,dt2) are using date instead of tdate
>> fields & there are no docValues set on any of the fields used for sorting.
>>
>> In order to fix this I plan to add a new field using tdate & docvalues
>> where required to the schema & update the new columns only for documents
>> that have fieldA set to abc. Once the fields are updated query on the new
>> fields to measure query performance .
>>
>>
>>- Would the new added fields be used effectively by the solr index
>>when querying & filtering? What I am not sure is whether only populating
>>small number of documents(fieldA:abc) that are used for the above query
>>provide performance benefits.
>>- Would there be a performance penalty because majority of the
>>documents(!fieldA:abc) dont have values in the new columns?
>>
>>
>> Thanks
>> Jay
>>
>> On Tue, May 24, 2016 at 8:06 PM, Erick Erickson 
>> wrote:
>>
>>> Try adding debug=timing, that'll give you an idea of what component is
>>> taking all the time.
>>> From there, it's "more art than science".
>>>
>>> But you haven't given us much to go on. What is the query? Are you
>>> grouping?
>>> Faceting on high-cardinality fields? Returning 10,000 rows?
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, May 24, 2016 at 4:52 PM, Ahmet Arslan 
>>> wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > Is it QueryComponent taking time?
>>> > Ot other components?
>>> >
>>> > Also make sure there is plenty of RAM for OS cache.
>>> >
>>> > Ahmet
>>> >
>>> > On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju <
>>> jspothar...@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > Hi,
>>> > I am trying to debug solr performance problems on an old version of
>>> solr,
>>> > 4.3.1.
>>> > The queries are taking really long -in the range of 2-5 seconds!!.
>>> > Running filter query with only one condition also takes about a second.
>>> >
>>> > There is memory available on the box for solr to use. I have been
>>> looking
>>> > at the following link but was looking for some more reference that
>>> would
>>> > tell me why a particular query is slow.
>>> >
>>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>>> >
>>> > Solr version:4.3.1
>>> > Index size:128 GB
>>> > Heap:65 GB
>>> > Index size:75 GB
>>> > Memory usage:70 GB
>>> >
>>> > Even though there is available memory is high all is not being used ..i
>>> > would expect the complete index to be in memory but it doesnt look
>>> like it
>>> > is. Any recommendations ??
>>> >
>>> > --
>>> > Thanks
>>> > Jay
>>>
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju
>>
>>
>
>
>
> --
> Thanks
> Jay Potharaju
>
>



-- 
Thanks
Jay Potharaju


DocTransformer [explain] not working in Solr 5

2016-05-26 Thread Charles Sanders
Not able to get the DocTransformer [explain] to work in Solr 5. I'm sure I'm 
doing something wrong. But I'm following the example in the documentation. 
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents 

Other transformers ( [docid] and [shard] ) are working as expected. Anyone know 
how to make this work? I've used it in the past with Solr 4. Any information 
greatly appreciated. Query and results follow. 

Thanks, 
Charles 

http://localhost:8983/solr/access_shard1_replica1/select?q=generic 
=1=id,[docid],[shard],[explain 
style=nl]=json=true=edismax=allTitle^3 allText^2 
contents=true 


{
  "responseHeader":{
"status":0,
"QTime":13,
"params":{
  "q":"generic\n",
  "defType":"edismax",
  "indent":"true",
  "qf":"allTitle^3 allText^2 contents",
  "fl":"id,[docid],[shard],[explain style=nl]",
  "rows":"1",
  "wt":"json",
  "debugQuery":"true"}},
  "response":{"numFound":5,"start":0,"maxScore":2.890213,"docs":[
  {
"id":"99",
"[docid]":70,
"[shard]":"http://10.13.49.9:8983/solr/access_shard1_replica1/"}]
  },
  "spellcheck":{
"suggestions":[],
"collations":[]},
  "debug":{
"track":{
  "rid":"-access_shard1_replica1-1464265350378-8",
  "EXECUTE_QUERY":{
"http://10.13.49.9:8983/solr/access_shard1_replica1/":{
  "QTime":"1",
  "ElapsedTime":"5",
  "RequestPurpose":"GET_TOP_IDS",
  "NumFound":"5",
  
"Response":"{responseHeader={status=0,QTime=1,params={df=allText,distrib=false,spellcheck.dictionary=andreasAutoComplete,fl=[uri,
 
score],shards.purpose=4,spellcheck.maxCollations=5,fsv=true,shard.url=http://10.13.49.9:8983/solr/access_shard1_replica1/,rid=-access_shard1_replica1-1464265350378-8,defType=edismax,qf=allTitle^3
 allText^2 contents,wt=javabin,debug=[false, timing, 
track],qt=/select,start=0,rows=1,version=2,shards.qt=/select,q=generic\n,enableElevation=false,spellcheck=true,requestPurpose=GET_TOP_IDS,NOW=1464265350377,spellcheck.onlyMorePopular=true,isShard=true,spellcheck.count=5,debugQuery=false,spellcheck.collate=true}},response={numFound=5,start=0,maxScore=2.890213,docs=[SolrDocument{uri=http://foo/bar/Generic/99,
 
score=2.890213}]},sort_values={},spellcheck={suggestions={},collations={},originalTerms=[generic]},debug={timing={time=1.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},elevator={time=0.0},autoComplete={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},elevator={time=0.0},autoComplete={time=0.0},debug={time=0.0}"}},
  "GET_FIELDS":{
"http://10.13.49.9:8983/solr/access_shard1_replica1/":{
  "QTime":"2",
  "ElapsedTime":"4",
  "RequestPurpose":"GET_FIELDS,GET_DEBUG",
  "NumFound":"1",
  
"Response":"{responseHeader={status=0,QTime=2,params={df=allText,distrib=false,spellcheck.dictionary=andreasAutoComplete,fl=[id,[docid],[shard],[explain
 style=nl], 
uri],shards.purpose=320,spellcheck.maxCollations=5,shard.url=http://10.13.49.9:8983/solr/access_shard1_replica1/,rid=-access_shard1_replica1-1464265350378-8,defType=edismax,qf=allTitle^3
 allText^2 contents,wt=javabin,debug=[timing, 
track],qt=/select,rows=1,version=2,shards.qt=/select,q=generic\n,enableElevation=false,spellcheck=false,requestPurpose=GET_FIELDS,GET_DEBUG,NOW=1464265350377,spellcheck.onlyMorePopular=true,ids=http://foo/bar/Generic/99,isShard=true,spellcheck.count=5,debugQuery=true,spellcheck.collate=true}},response={numFound=1,start=0,docs=[SolrDocument{id=99,
 [docid]=70, 
[shard]=http://10.13.49.9:8983/solr/access_shard1_replica1/}]},debug={rawquerystring=generic\n,querystring=generic\n,parsedquery=(+DisjunctionMaxQuery((allTitle:gener^3.0
 | contents:gener | 
allText:gener^2.0)))/no_coord,parsedquery_toString=+(allTitle:gener^3.0 | 
contents:gener | 
allText:gener^2.0),explain={http://foo/bar/Generic/99=\n2.890213 = max of:\n  
2.890213 = weight(allTitle:gener^3.0 in 70) [DefaultSimilarity], result of:\n   
 2.890213 = fieldWeight in 70, product of:\n  1.0 = tf(freq=1.0), with freq 
of:\n1.0 = termFreq=1.0\n  4.624341 = idf(docFreq=1, maxDocs=75)\n  
0.625 = fieldNorm(doc=70)\n  0.33601448 = weight(contents:gener in 70) 
[DefaultSimilarity], result of:\n0.33601448 = score(doc=70,freq=1.0), 
product of:\n  0.2541428 = queryWeight, product of:\n3.5257287 = 
idf(docFreq=5, maxDocs=75)\n0.07208234 = queryNorm\n  1.3221483 = 
fieldWeight in 70, product of:\n1.0 = tf(freq=1.0), with freq of:\n 
 1.0 = termFreq=1.0\n3.5257287 = idf(docFreq=5, maxDocs=75)\n   
 0.375 = 

Re: join and faceting

2016-05-26 Thread Zaccheo Bagnati
Thank you for your answer but I'm not sure I've understood: document.type
is not in the same core as annotations, how can I facet on that field?

Il giorno gio 26 mag 2016 alle ore 14:06 Upayavira  ha
scritto:

>
>
> On Thu, 26 May 2016, at 01:02 PM, Zaccheo Bagnati wrote:
> > Hi all,
> > I have a SOLR core containing documents:
> >   document (id, type, text)
> > and a core containing annotations (each document has 0 or more
> > annotations):
> > annotation (id, document_id, user, text)
> >
> > I can filter annotations on document fields using JoinQueryParser but how
> > can I create a faceting? Let's say I want to build a faceting on
> > document.type counting how many annotations there are per each document
> > type.
> > how would you deal with such a case in SOLR? Is there a better data
> > design
> > to obtain that result?
>
> Just do the facet query on the annotations collection. A pivot facet on
> document id and type would give you what you want. Or, use a join to the
> docs collection to limit the number of documents you are faceting on.
>


Re: join and faceting

2016-05-26 Thread Upayavira


On Thu, 26 May 2016, at 01:02 PM, Zaccheo Bagnati wrote:
> Hi all,
> I have a SOLR core containing documents:
>   document (id, type, text)
> and a core containing annotations (each document has 0 or more
> annotations):
> annotation (id, document_id, user, text)
> 
> I can filter annotations on document fields using JoinQueryParser but how
> can I create a faceting? Let's say I want to build a faceting on
> document.type counting how many annotations there are per each document
> type.
> how would you deal with such a case in SOLR? Is there a better data
> design
> to obtain that result?

Just do the facet query on the annotations collection. A pivot facet on
document id and type would give you what you want. Or, use a join to the
docs collection to limit the number of documents you are faceting on.


join and faceting

2016-05-26 Thread Zaccheo Bagnati
Hi all,
I have a SOLR core containing documents:
  document (id, type, text)
and a core containing annotations (each document has 0 or more annotations):
annotation (id, document_id, user, text)

I can filter annotations on document fields using JoinQueryParser but how
can I create a faceting? Let's say I want to build a faceting on
document.type counting how many annotations there are per each document
type.
how would you deal with such a case in SOLR? Is there a better data design
to obtain that result?
Thank you

Zaccheo


Re: sort by custom function of similarity score

2016-05-26 Thread aanilpala
with cscore() in collapse, will I get the similarity score from lucene or the
reranked score by the raranker if I am using a plugin that reranks the
results? I guess the answer depends on which of fq or rq is applied first.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-custom-function-of-similarity-score-tp4279228p4279233.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort by custom function of similarity score

2016-05-26 Thread Joel Bernstein
Also if you're using the min/max param within a collapse you can use the
cscore() function, which is much more efficient then the query() function.
But cscore() is only available within the context of a collapse, to select
the group head. Outside of the collapse, query() is the approach.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 26, 2016 at 12:32 PM, Ahmet Arslan 
wrote:

> Hi,
>
> Probably, using the 'query' function query, which returns the score of a
> given query.
>
> https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-UsingFunctionQuery
>
>
>
>
> On Thursday, May 26, 2016 1:59 PM, aanilpala  wrote:
> is it allowed to provide a sort function (sortspec) that is using
> similarity
> score. for example something like in the following:
>
> sort=product(2,score) desc
>
> seems that it won't work. is there an alternative way to achieve this?
>
> using solr6
>
> thanks in advance.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/sort-by-custom-function-of-similarity-score-tp4279228.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: sort by custom function of similarity score

2016-05-26 Thread Ahmet Arslan
Hi,

Probably, using the 'query' function query, which returns the score of a given 
query.
https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-UsingFunctionQuery




On Thursday, May 26, 2016 1:59 PM, aanilpala  wrote:
is it allowed to provide a sort function (sortspec) that is using similarity
score. for example something like in the following:

sort=product(2,score) desc

seems that it won't work. is there an alternative way to achieve this?

using solr6

thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-custom-function-of-similarity-score-tp4279228.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: collapsing on a filter query

2016-05-26 Thread Joel Bernstein
No need for a new thread.

Yes, there can only be one ranking collector. I believe the effect of
having two rq's is that one would simply be ignored as you mentioned.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 26, 2016 at 11:21 AM, aanilpala  wrote:

> thanks, it indeed works that way.
>
> I was curious if the same would work with rq but it seems not (from the
> results I can at least say that one reranker is ignored). Is there a way to
> combine two rq components?
>
> PS: I know now it is another question, should I open a start thread?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/collapsing-on-a-filter-query-tp4279218p4279225.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how can we use multi term search along with stop words

2016-05-26 Thread Ahmet Arslan
Hi Bhat,

What do you mean by multi term search?
In your first e-mail, your example uses quotes, which means phrase/proximity 
search.

ahmet



On Thursday, May 26, 2016 11:49 AM, Preeti Bhat  
wrote:
HI All,

Sorry for asking the same question again, but could someone please advise me on 
this.


Thanks and Regards,
Preeti Bhat


From: Preeti Bhat
Sent: Wednesday, May 25, 2016 2:22 PM
To: solr-user@lucene.apache.org
Subject: how can we use multi term search along with stop words

HI,

I am trying to search the field named company_nm with value "Google llc". We 
have the stopword on "llc", so when I try to search it returns 0 results. Could 
anyone please guide me through the process of using stopwords in multi term 
search.

Please note I am using solr 6.0.0 and using standard parser.


  



  
  



  
  
  

  




Thanks and Regards,
Preeti Bhat



NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.


sort by custom function of similarity score

2016-05-26 Thread aanilpala
is it allowed to provide a sort function (sortspec) that is using similarity
score. for example something like in the following:

sort=product(2,score) desc

seems that it won't work. is there an alternative way to achieve this?

using solr6

thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-custom-function-of-similarity-score-tp4279228.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: collapsing on a filter query

2016-05-26 Thread aanilpala
thanks, it indeed works that way.

I was curious if the same would work with rq but it seems not (from the
results I can at least say that one reranker is ignored). Is there a way to
combine two rq components? 

PS: I know now it is another question, should I open a start thread?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/collapsing-on-a-filter-query-tp4279218p4279225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: collapsing on a filter query

2016-05-26 Thread Joel Bernstein
Could you use two filter queries:

fq=is_valid:true={!collapse field=user_id}

This syntax should work fine. It first filter the results based on
is_valid:true and then collapse the results.




Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 26, 2016 at 10:41 AM, aanilpala  wrote:

> hi there,
>
> I can't seem to find a way to collapse results on a filter query. For
> example, imagine that I have a query with filter is_valid:true. Now, if I
> want to collapse the results on another field than is_valid (i.e user_id),
> neither of the following works:
>
> fq=is_valid:true AND {!collapse field=user_id}
> fq={!collapse field=user_id}is_valid:true
>
> First one causes "Query  does not implement createWeight" error and in the
> result set of the second on, the scores are extremely high (in the
> millions)
> which is far bigger than the scores of the documents in the non-collapsed
> version of the query.
>
> I am using solr6 without shards.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/collapsing-on-a-filter-query-tp4279218.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Unit tests, Session expired for ...state.json in AbstractFullDistribZkTestBase

2016-05-26 Thread Markus Jelsma
Also, tests sometimes fail with:
org.apache.solr.common.SolrException: No registered leader was found after 
waiting for 1ms , collection: collection1 slice: shard1

Despite having waitForThingsToLevelOut(45);

If anyone has a suggestion for this as well, it would be much appreciated :)

Markus
 
 
-Original message-
> From:Markus Jelsma 
> Sent: Thursday 26th May 2016 11:37
> To: solr-user 
> Subject: Unit tests, Session expired for ...state.json in 
> AbstractFullDistribZkTestBase
> 
> Hi,
> 
> We have a bunch of tests extending AbstractFullDistribZkTestBase on 6.0 and 
> our builds sometimes fail with the following message:
> 
> org.apache.solr.common.SolrException: Could not load collection from ZK: 
> collection1 
> at io.
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /collections/collection1/state.json 
> at io
> 
> Strange thing is, the line where it failed is after 
> waitForThingsToLevelOut(45); and waitForRecoveriesToFinish(false); ánd having 
> indexed a few thousand documents! These sporadic failures are a nuisance 
> because it raises red flags in our Jenkins. Is there anything we can do to 
> prevent these things from happening? 
> 
> Many thanks!
> Markus
> 


collapsing on a filter query

2016-05-26 Thread aanilpala
hi there,

I can't seem to find a way to collapse results on a filter query. For
example, imagine that I have a query with filter is_valid:true. Now, if I
want to collapse the results on another field than is_valid (i.e user_id),
neither of the following works:

fq=is_valid:true AND {!collapse field=user_id}
fq={!collapse field=user_id}is_valid:true

First one causes "Query  does not implement createWeight" error and in the
result set of the second on, the scores are extremely high (in the millions)
which is far bigger than the scores of the documents in the non-collapsed
version of the query.

I am using solr6 without shards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/collapsing-on-a-filter-query-tp4279218.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unit tests, Session expired for ...state.json in AbstractFullDistribZkTestBase

2016-05-26 Thread Markus Jelsma
Hi,

We have a bunch of tests extending AbstractFullDistribZkTestBase on 6.0 and our 
builds sometimes fail with the following message:

org.apache.solr.common.SolrException: Could not load collection from ZK: 
collection1 
at io.
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /collections/collection1/state.json 
at io

Strange thing is, the line where it failed is after 
waitForThingsToLevelOut(45); and waitForRecoveriesToFinish(false); ánd having 
indexed a few thousand documents! These sporadic failures are a nuisance 
because it raises red flags in our Jenkins. Is there anything we can do to 
prevent these things from happening? 

Many thanks!
Markus


RE: how can we use multi term search along with stop words

2016-05-26 Thread Preeti Bhat
HI All,

Sorry for asking the same question again, but could someone please advise me on 
this.


Thanks and Regards,
Preeti Bhat

From: Preeti Bhat
Sent: Wednesday, May 25, 2016 2:22 PM
To: solr-user@lucene.apache.org
Subject: how can we use multi term search along with stop words

HI,

I am trying to search the field named company_nm with value "Google llc". We 
have the stopword on "llc", so when I try to search it returns 0 results. Could 
anyone please guide me through the process of using stopwords in multi term 
search.

Please note I am using solr 6.0.0 and using standard parser.


  



  
  



  
  
  

  




Thanks and Regards,
Preeti Bhat



NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.




Re: Issues with coordinates in Solr during updating of fields

2016-05-26 Thread Zheng Lin Edwin Yeo
Anyone has any solutions to this problem?

I tried to remove the gps_0_coordinate and gps_1_coordinate, but I will get
the following error during indexing.
ERROR: [doc=id1] unknown field 'gps_0_coordinate'

Regards,
Edwin


On 25 May 2016 at 11:37, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> I have an implementation of storing the coordinates in Solr during
> indexing.
> During indexing, I will only store the value in the field name ="gps". For
> the field name = "gps_0_coordinate" and "gps_1_coordinate", the value will
> be auto filled and indexed from the "gps" field.
>
> required="false"/>
> required="false"/>
> required="false"/>
>
> But when I tried to do an update on any other fields in the index, Solr
> will try to add another value in the "gps_0_coordinate" and
> "gps_1_coordinate". However, as these 2 fields are not multi-Valued, it
> will lead to an error:
> multiple values encountered for non multiValued field gps_0_coordinate:
> [1.0,1.0]
>
> Does anyone knows how we can solve this issue?
>
> I am using Solr 5.4.0
>
> Regards,
> Edwin
>


is API versioning supported in rolr?

2016-05-26 Thread Nuhaa All Bakry
Hello,

Wondering if versioning is built-in in Solr? Say I have deployed a working 
SolrCloud (v1.0) and there are applications consuming the REST APIs. Is there a 
way to deploy the next v1.1 without removing v1.0? The reason I ask is because 
we dont want the deployment of Solr to be tightly dependent on the deployment 
of the applications, or vice versa.

I cant find a documentation on this (yet). Please share if you know where I can 
read more about this.


regards,
nuhaa