Exact match

2019-12-02 Thread OTH
Hello,

What would be the best way to get exact matches (if any) to a query?

E.g.:  Let's the document text is:  "united states of america".
Currently, any query containing one or more of the three words "united",
"states", or "america" will match with the above document.  I would like a
way so that the document matches only and only if the query were also
"united states of america" (case-insensitive).

Document field type:  TextField
Index Analyzer: TokenizerChain
Index Tokenizer: StandardTokenizerFactory
Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
SnowballPorterFilterFactory
The Query Analyzer / Tokenizer / Token Filters are the same as the Index
ones above.

FYI I'm relatively novice at Solr / Lucene / Search.

Much appreciated
Omer


Score certain documents higher based on a weight field

2018-04-09 Thread OTH
Hello,

Is there a way to assign a higher score to certain documents based on a
'weight' field?


E.g., if I have the following two documents:
{
"name":"United Kingdom",
"weight":2730,
} {
"name":"United States of America",
"weight":11246,
}

Currently, if I issue the following query:
q=name:united

These are the scores I get:
{
"name":"United Kingdom",
"weight":2730,
"score":9.464103},
} {
"name":"United States of America",
"weight":11246,
"score":7.766276}]
}


However, I'd like the score to somehow factor in the number in the "weight"
column.  (And hence, increase the score assigned to "United States of
America" in this case.)

Much thanks


Re: Solr JDBC with Core (vs Collection)

2017-10-16 Thread OTH
Hello,
Sorry for continuing this thread after such a long time.
I just wanted to check, whether streaming expressions / SQL are now working
in non-SolrCloud mode, in the latest Solr release?
Much thanks
Omer

On Thu, Mar 9, 2017 at 1:27 AM, Joel Bernstein <joels...@gmail.com> wrote:

> Getting streaming expression and SQL working in non-SolrCloud mode is my
> top priority right now.
>
> I'm testing the first parts of
> https://issues.apache.org/jira/browse/SOLR-10200 today and will be
> committing soon. The first functionality delivered will be the
> significantTerms Streaming Expression. Here is a sample query:
>
> expr=significantTerms(enron, q="from:tana.jo...@enron.com", field="to",
> limit="20")=http://localhost:8983/solr/enron
>
> Notice the enron.shards http param. This provides the shards for the
> "enron" collection.
>
> This will release as part of the first release of the significantTerms
> expression in Solr 6.5.
>
> Solr 6.6 will likely have support for all stream source and parallel
> SQL/JDBC.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Mar 8, 2017 at 2:19 PM, OTH <omer.t@gmail.com> wrote:
>
> > Hello,
> >
> > Yes, I was trying to use it with a non-cloud setup.
> >
> > Basically, our application probably won't be requiring cloud features;
> > however, it would be extremely helpful to use JDBC with Solr.
> >
> > Of course, we don't mind using SolrCloud if that's what is needed for
> JDBC.
> >
> > Are there any drawbacks to using SolrCloud, if a distributed setup
> probably
> > won't be required?
> >
> > Much thanks
> >
> > On Thu, Mar 9, 2017 at 12:13 AM, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > I believe JDBC requires streams, which requires SolrCloud, which
> > > requires Collections (even if it is a single-core collection).
> > >
> > > Are you trying to use it with non-cloud setup?
> > >
> > > Regards,
> > >Alex.
> > > 
> > > http://www.solr-start.com/ - Resources for Solr users, new and
> > experienced
> > >
> > >
> > > On 8 March 2017 at 14:02, OTH <omer.t@gmail.com> wrote:
> > > > Hello,
> > > >
> > > > From the examples I am seeing online and in the reference guide (
> > > > https://cwiki.apache.org/confluence/display/solr/Solr+
> > > JDBC+-+SQuirreL+SQL),
> > > > I can only see Solr JDBC being used against a collection.  Is it
> > possible
> > > > however to use it with a core?  What should the JDBC URL be like in
> > that
> > > > case?
> > > >
> > > > Thanks
> > >
> >
>


Re: Need help with query syntax

2017-08-11 Thread OTH
Hi, thanks for sharing the article.

On Fri, Aug 11, 2017 at 4:38 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Omer:
>
> Solr does not implement pure boolean logic, see:
> https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.
>
> With appropriate parentheses it can give the same results as you're
> discovering.
>
> Best
> Erick
>
> On Thu, Aug 10, 2017 at 3:00 PM, OTH <omer.t@gmail.com> wrote:
> > Thanks for the help!
> > That's resolved the issue.
> >
> > On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> >
> >> type:value AND (name:america^1+name:state^1+name:united^1)
> >>
> >> but in reality what you want to do is use the fq parameter with
> type:value
> >>
> >> On Thu, Aug 10, 2017 at 4:36 PM, OTH <omer.t@gmail.com> wrote:
> >>
> >> > Hello,
> >> >
> >> > I have the following use case:
> >> >
> >> > I have two fields (among others); one is 'name' and the other is
> 'type'.
> >> >  'Name' is the field I need to search, whereas, with 'type', I need to
> >> make
> >> > sure that it has a certain value, depending on the situation.  Often,
> >> when
> >> > I search the 'name' field, the search query would have multiple
> tokens.
> >> > Furthermore, each query token needs to have a scoring weight attached
> to
> >> > it.
> >> >
> >> > However, I'm unable to figure out the syntax which would allow all
> these
> >> > things to happen.
> >> >
> >> > For example, if I use the following query:
> >> > select?q=type:value+AND+name:america^1+name:state^1+name:united^1
> >> > It would only return documents where 'name' includes the token
> 'america'
> >> > (and where type==value).  It will totally ignore
> >> > "+name:state^1+name:united^1", it seems.
> >> >
> >> > This does not happen if I omit "type:value+AND+".  So, with the
> following
> >> > query:
> >> > select?q=name:america^1+name:state^1+name:united^1
> >> > It returns all documents which contain any of the three tokens
> {america,
> >> > state, united}; which is what I need.  However, it also returns
> documents
> >> > where type != value; which I can't have.
> >> >
> >> > If I put "type:value" at the end of the query command, like so:
> >> > select?q=name:america^1+name:state^1+name:united^1+AND+type:value
> >> > In this case, it will only return documents which contain the "united"
> >> > token in the name field (and where type==value).  Again, it will
> totally
> >> > ignore "name:america^1+name:state^1", it seems.
> >> >
> >> > I tried putting an "AND" between everything, like so:
> >> > select?q=type:value+AND+name:america^1+AND+name:state^1+
> >> AND+name:united^1
> >> > But this, of course, would only return documents which contain all the
> >> > tokens {america, state, united}; whereas I need all documents which
> >> contain
> >> > any of those tokens.
> >> >
> >> >
> >> > If anyone could help me out with how this could be done / what the
> >> correct
> >> > syntax would be, that would be a huge help.
> >> >
> >> > Much thanks
> >> > Omer
> >> >
> >>
>


Re: Need help with query syntax

2017-08-10 Thread OTH
Thanks for the help!
That's resolved the issue.

On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> type:value AND (name:america^1+name:state^1+name:united^1)
>
> but in reality what you want to do is use the fq parameter with type:value
>
> On Thu, Aug 10, 2017 at 4:36 PM, OTH <omer.t@gmail.com> wrote:
>
> > Hello,
> >
> > I have the following use case:
> >
> > I have two fields (among others); one is 'name' and the other is 'type'.
> >  'Name' is the field I need to search, whereas, with 'type', I need to
> make
> > sure that it has a certain value, depending on the situation.  Often,
> when
> > I search the 'name' field, the search query would have multiple tokens.
> > Furthermore, each query token needs to have a scoring weight attached to
> > it.
> >
> > However, I'm unable to figure out the syntax which would allow all these
> > things to happen.
> >
> > For example, if I use the following query:
> > select?q=type:value+AND+name:america^1+name:state^1+name:united^1
> > It would only return documents where 'name' includes the token 'america'
> > (and where type==value).  It will totally ignore
> > "+name:state^1+name:united^1", it seems.
> >
> > This does not happen if I omit "type:value+AND+".  So, with the following
> > query:
> > select?q=name:america^1+name:state^1+name:united^1
> > It returns all documents which contain any of the three tokens {america,
> > state, united}; which is what I need.  However, it also returns documents
> > where type != value; which I can't have.
> >
> > If I put "type:value" at the end of the query command, like so:
> > select?q=name:america^1+name:state^1+name:united^1+AND+type:value
> > In this case, it will only return documents which contain the "united"
> > token in the name field (and where type==value).  Again, it will totally
> > ignore "name:america^1+name:state^1", it seems.
> >
> > I tried putting an "AND" between everything, like so:
> > select?q=type:value+AND+name:america^1+AND+name:state^1+
> AND+name:united^1
> > But this, of course, would only return documents which contain all the
> > tokens {america, state, united}; whereas I need all documents which
> contain
> > any of those tokens.
> >
> >
> > If anyone could help me out with how this could be done / what the
> correct
> > syntax would be, that would be a huge help.
> >
> > Much thanks
> > Omer
> >
>


Need help with query syntax

2017-08-10 Thread OTH
Hello,

I have the following use case:

I have two fields (among others); one is 'name' and the other is 'type'.
 'Name' is the field I need to search, whereas, with 'type', I need to make
sure that it has a certain value, depending on the situation.  Often, when
I search the 'name' field, the search query would have multiple tokens.
Furthermore, each query token needs to have a scoring weight attached to
it.

However, I'm unable to figure out the syntax which would allow all these
things to happen.

For example, if I use the following query:
select?q=type:value+AND+name:america^1+name:state^1+name:united^1
It would only return documents where 'name' includes the token 'america'
(and where type==value).  It will totally ignore
"+name:state^1+name:united^1", it seems.

This does not happen if I omit "type:value+AND+".  So, with the following
query:
select?q=name:america^1+name:state^1+name:united^1
It returns all documents which contain any of the three tokens {america,
state, united}; which is what I need.  However, it also returns documents
where type != value; which I can't have.

If I put "type:value" at the end of the query command, like so:
select?q=name:america^1+name:state^1+name:united^1+AND+type:value
In this case, it will only return documents which contain the "united"
token in the name field (and where type==value).  Again, it will totally
ignore "name:america^1+name:state^1", it seems.

I tried putting an "AND" between everything, like so:
select?q=type:value+AND+name:america^1+AND+name:state^1+AND+name:united^1
But this, of course, would only return documents which contain all the
tokens {america, state, united}; whereas I need all documents which contain
any of those tokens.


If anyone could help me out with how this could be done / what the correct
syntax would be, that would be a huge help.

Much thanks
Omer


Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread OTH
Hello - Sorry, I obviously made a mistake here.

I said earlier that it seems to me that the word 'united' is being
lemmatized (to 'unite').  But it seems that's not the case.  It seems that
there isn't any lemmatization or stemming being done.  I had previously
assumed that the default 'text_general' fieldtype in Solr probably handles
this; but seems that's not the case.

I realize that what is going on with me is something else.  I will start
another email thread for that.

Thanks.


On Thu, Aug 10, 2017 at 11:33 PM, OTH <omer.t@gmail.com> wrote:

> Hi,
>
> Regarding 'analysis chain':
>
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
>  positionIncrementGap="100" multiValued="true">
> 
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>ignoreCase="true"/>
>ignoreCase="true" synonyms="synonyms.txt"/>
>   
> 
>   
>
>
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
>
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
>
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
>
>
> So - should 'states' not be lemmatized to 'state' using these settings?
>  (If not, then I would need to figure out how to use a different lemmatizer)
>
> Thanks
>
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> saying the field is "text_general" is not sufficient, please post the
>> analysis chain defined in your schema.
>>
>> Also the admin UI>>analysis page will help you figure out exactly what
>> part of the analysis chain does what.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 10, 2017 at 8:37 AM, OTH <omer.t@gmail.com> wrote:
>> > Hello,
>> >
>> > It seems for me that the token "states" is not getting lemmatized to
>> > "state" by Solr.
>> >
>> > Eg, I have a document with the value "united states of america".
>> > This document is not returned when the following query is issued:
>> > q=name:state^1+name:america^1+name:united^1
>> > However, all documents which contain the token "state" are indeed
>> returned,
>> > with the above query.
>> > The "united states of america" document is returned if I change "state"
>> in
>> > the query to "states"; so:
>> > q=name:states^1+name:america^1+name:united^1
>> >
>> > At first I thought maybe the lemmatization isn't working for some
>> reason.
>> > However, when I changed "united" in the query to "unite", then it did
>> still
>> > return the "united states of america" document:
>> > q=name:states^1+name:america^1+name:unite^1
>> > Which means that the lemmatization is working for the token "united",
>> but
>> > not for the token "states".
>> >
>> > The "name" field above is defined as "text_general".
>> >
>> > So it seems to me, that perhaps the default Solr lemmatizer does not
>> > lemmatize "states" to "state"?
>> > Can anyone confirm if this is indeed the expected behaviour?
>> > And what can I do to change it?
>> > If I need to put in a customer lemmatizer, then what would be the (best)
>> > way to do that?
>> >
>> > Much thanks
>> > Omer
>>
>
>


Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread OTH
Hi,

Regarding 'analysis chain':

I'm using Solr 6.4.1, and in the managed-schema file, I find the following:


  
  
  


  
  
  
  

  


Regarding the Admin UI >> Analysis page:  I just tried that, and to be
honest, I can't seem to get much useful info out of it, especially in terms
of lemmatization.

For example, for any text I enter in it to "analyse", all it does is seem
to tell me which analysers (if that's the right term?) are being used for
the selected field / fieldtype, and for each of these analyzers, it would
give some very basic info, like text, raw_bytes, etc.  Eg, for the input
"united" in the "field value (index)" box, having "text_general" selected
for fieldtype, all I get is this:

ST
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
SF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
LCF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
"org.apache.lucene.analysis.standard.StandardTokenizer",
"org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.


So - should 'states' not be lemmatized to 'state' using these settings?
 (If not, then I would need to figure out how to use a different lemmatizer)

Thanks

On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> saying the field is "text_general" is not sufficient, please post the
> analysis chain defined in your schema.
>
> Also the admin UI>>analysis page will help you figure out exactly what
> part of the analysis chain does what.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 8:37 AM, OTH <omer.t@gmail.com> wrote:
> > Hello,
> >
> > It seems for me that the token "states" is not getting lemmatized to
> > "state" by Solr.
> >
> > Eg, I have a document with the value "united states of america".
> > This document is not returned when the following query is issued:
> > q=name:state^1+name:america^1+name:united^1
> > However, all documents which contain the token "state" are indeed
> returned,
> > with the above query.
> > The "united states of america" document is returned if I change "state"
> in
> > the query to "states"; so:
> > q=name:states^1+name:america^1+name:united^1
> >
> > At first I thought maybe the lemmatization isn't working for some reason.
> > However, when I changed "united" in the query to "unite", then it did
> still
> > return the "united states of america" document:
> > q=name:states^1+name:america^1+name:unite^1
> > Which means that the lemmatization is working for the token "united", but
> > not for the token "states".
> >
> > The "name" field above is defined as "text_general".
> >
> > So it seems to me, that perhaps the default Solr lemmatizer does not
> > lemmatize "states" to "state"?
> > Can anyone confirm if this is indeed the expected behaviour?
> > And what can I do to change it?
> > If I need to put in a customer lemmatizer, then what would be the (best)
> > way to do that?
> >
> > Much thanks
> > Omer
>


Token "states" not getting lemmatized by Solr?

2017-08-10 Thread OTH
Hello,

It seems for me that the token "states" is not getting lemmatized to
"state" by Solr.

Eg, I have a document with the value "united states of america".
This document is not returned when the following query is issued:
q=name:state^1+name:america^1+name:united^1
However, all documents which contain the token "state" are indeed returned,
with the above query.
The "united states of america" document is returned if I change "state" in
the query to "states"; so:
q=name:states^1+name:america^1+name:united^1

At first I thought maybe the lemmatization isn't working for some reason.
However, when I changed "united" in the query to "unite", then it did still
return the "united states of america" document:
q=name:states^1+name:america^1+name:unite^1
Which means that the lemmatization is working for the token "united", but
not for the token "states".

The "name" field above is defined as "text_general".

So it seems to me, that perhaps the default Solr lemmatizer does not
lemmatize "states" to "state"?
Can anyone confirm if this is indeed the expected behaviour?
And what can I do to change it?
If I need to put in a customer lemmatizer, then what would be the (best)
way to do that?

Much thanks
Omer


Re: Score higher if multiple terms match

2017-06-08 Thread OTH
Hi - Sorry it was very late at night for me and I think I didn't pick my
wordings right.
bq: it is indeed returning documents with only either one of the two query
terms
What I meant was:  Initially, I thought it was only returning documents
which contained both 'tv' and 'promotion'.  Then I realized I was mistaken;
it was also returning documents which contained either 'tv' or 'promotion'
(as well as documents which contained both, which were scored higher).
I hope that clears the confusion.
Thanks

On Thu, Jun 8, 2017 at 9:04 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: it is indeed returning documents with only either one of the two query
> terms
>
> Uhm, this should not be true. What's the output of adding debug=query?
> And are you totally sure the above is true and you're just not seeing
> the other term in the return? Or that you have a synonyms file that is
> somehow making docs match? Or ???
>
> So you're saying you get the exact same number of hits for
> name:tv OR name:promotion
> and
> name:tv AND name:promotion
> ??? Definitely not expected unless all docs happen to have both these
> terms in the name field either through normal input or synonyms etc.
>
> You should need something like:
> name:tv OR name:promotion OR (name:tv AND name:promotion)^100
> to score all the docs with both terms in the name field higher than just
> one.
>
> Best,
> Erick
>
> On Wed, Jun 7, 2017 at 3:05 PM, OTH <omer.t@gmail.com> wrote:
> > I'm sorry, there was a mistake.
> >
> > I previously wrote:
> >
> > However, these are returning only those documents which have both the
> terms
> >> 'tv promotion' in them (there are a few).  It's not returning any
> >> document which have only 'tv' or only 'promotion' in them.
> >
> >
> > That's not true at all; it is indeed returning documents with only either
> > one of the two query terms (so, documents with only 'tv' or only
> > 'promotion' in them).  Sorry.  You can disregard my question in the last
> > email.
> >
> > Thanks
> >
> > On Thu, Jun 8, 2017 at 2:03 AM, OTH <omer.t@gmail.com> wrote:
> >
> >> Thanks.
> >> Both of these are working in my case:
> >> name:"tv promotion"   -->  name:"tv promotion"
> >> name:tv AND name:promotion --> name:tv AND name:promotion
> >> (Although I'm assuming, the first might not have worked if my document
> had
> >> been say 'promotion tv' or 'tv xyz promotion')
> >>
> >> However, these are returning only those documents which have both the
> >> terms 'tv promotion' in them (there are a few).  It's not returning any
> >> document which have only 'tv' or only 'promotion' in them.
> >>
> >> That's not an absolute requirement of mine, I could work around it, but
> I
> >> was just wondering, if it were possible to pass a single solr query with
> >> both the terms 'tv' and 'promotion' in them, and have them return all
> the
> >> documents which contain either of those terms, but with higher scores
> >> attached to those documents with both those terms?
> >>
> >> Much thanks
> >>
> >> On Thu, Jun 8, 2017 at 1:43 AM, David Hastings <
> >> hastings.recurs...@gmail.com> wrote:
> >>
> >>> sorry, i meant debug query where you would get output like this:
> >>>
> >>> "debug": {
> >>> "rawquerystring": "name:tv promotion",
> >>> "querystring": "name:tv promotion",
> >>> "parsedquery": "+name:tv +text:promotion",
> >>>
> >>>
> >>> On Wed, Jun 7, 2017 at 4:41 PM, David Hastings <
> >>> hastings.recurs...@gmail.com
> >>> > wrote:
> >>>
> >>> > well, short answer, use the analyzer to see whats happening.
> >>> > long answer
> >>> >  theres a difference between
> >>> > name:tv promotion   -->  name:tv default_field:promotion
> >>> > name:"tv promotion"   -->  name:"tv promotion"
> >>> > name:tv AND name:promotion --> name:tv AND name:promotion
> >>> >
> >>> >
> >>> > since your default field most likely isnt name, its going to search
> only
> >>> > the default field for it.  you can alter this behavior using qf
> >>> parameters:
> >>> >
> >>> >
> >>> >
> >>> > qf='name^5 text'
> >>> >
> >>> &

Re: Score higher if multiple terms match

2017-06-07 Thread OTH
I'm sorry, there was a mistake.

I previously wrote:

However, these are returning only those documents which have both the terms
> 'tv promotion' in them (there are a few).  It's not returning any
> document which have only 'tv' or only 'promotion' in them.


That's not true at all; it is indeed returning documents with only either
one of the two query terms (so, documents with only 'tv' or only
'promotion' in them).  Sorry.  You can disregard my question in the last
email.

Thanks

On Thu, Jun 8, 2017 at 2:03 AM, OTH <omer.t@gmail.com> wrote:

> Thanks.
> Both of these are working in my case:
> name:"tv promotion"   -->  name:"tv promotion"
> name:tv AND name:promotion --> name:tv AND name:promotion
> (Although I'm assuming, the first might not have worked if my document had
> been say 'promotion tv' or 'tv xyz promotion')
>
> However, these are returning only those documents which have both the
> terms 'tv promotion' in them (there are a few).  It's not returning any
> document which have only 'tv' or only 'promotion' in them.
>
> That's not an absolute requirement of mine, I could work around it, but I
> was just wondering, if it were possible to pass a single solr query with
> both the terms 'tv' and 'promotion' in them, and have them return all the
> documents which contain either of those terms, but with higher scores
> attached to those documents with both those terms?
>
> Much thanks
>
> On Thu, Jun 8, 2017 at 1:43 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
>> sorry, i meant debug query where you would get output like this:
>>
>> "debug": {
>> "rawquerystring": "name:tv promotion",
>> "querystring": "name:tv promotion",
>> "parsedquery": "+name:tv +text:promotion",
>>
>>
>> On Wed, Jun 7, 2017 at 4:41 PM, David Hastings <
>> hastings.recurs...@gmail.com
>> > wrote:
>>
>> > well, short answer, use the analyzer to see whats happening.
>> > long answer
>> >  theres a difference between
>> > name:tv promotion   -->  name:tv default_field:promotion
>> > name:"tv promotion"   -->  name:"tv promotion"
>> > name:tv AND name:promotion --> name:tv AND name:promotion
>> >
>> >
>> > since your default field most likely isnt name, its going to search only
>> > the default field for it.  you can alter this behavior using qf
>> parameters:
>> >
>> >
>> >
>> > qf='name^5 text'
>> >
>> >
>> > for example would apply a boost of 5 if it matched the field 'name', and
>> > only 1 for 'text'
>> >
>> > On Wed, Jun 7, 2017 at 4:35 PM, OTH <omer.t@gmail.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> I have what I would think to be a fairly simple problem to solve,
>> however
>> >> I'm not sure how it's done in Solr and couldn't find an answer on
>> Google.
>> >>
>> >> Say I have two documents, "TV" and "TV promotion".  If the search
>> query is
>> >> "TV promotion", then, obviously, I would like the document "TV
>> promotion"
>> >> to score higher.  However, that is not the case right now.
>> >>
>> >> My syntax is something like this:
>> >> http://localhost:8983/solr/sales/select?indent=on=json;
>> >> fl=*,score=name:tv
>> >> promotion
>> >> (I tried "q=name:tv+promotion (added the '+'), but it made no
>> difference.)
>> >>
>> >> It's not scoring the document "TV promotion" higher than "TV"; in fact
>> >> it's
>> >> scoring it lower.
>> >>
>> >> Thanks
>> >>
>> >
>> >
>>
>
>


Re: Score higher if multiple terms match

2017-06-07 Thread OTH
Thanks.
Both of these are working in my case:
name:"tv promotion"   -->  name:"tv promotion"
name:tv AND name:promotion --> name:tv AND name:promotion
(Although I'm assuming, the first might not have worked if my document had
been say 'promotion tv' or 'tv xyz promotion')

However, these are returning only those documents which have both the terms
'tv promotion' in them (there are a few).  It's not returning any document
which have only 'tv' or only 'promotion' in them.

That's not an absolute requirement of mine, I could work around it, but I
was just wondering, if it were possible to pass a single solr query with
both the terms 'tv' and 'promotion' in them, and have them return all the
documents which contain either of those terms, but with higher scores
attached to those documents with both those terms?

Much thanks

On Thu, Jun 8, 2017 at 1:43 AM, David Hastings <hastings.recurs...@gmail.com
> wrote:

> sorry, i meant debug query where you would get output like this:
>
> "debug": {
> "rawquerystring": "name:tv promotion",
> "querystring": "name:tv promotion",
> "parsedquery": "+name:tv +text:promotion",
>
>
> On Wed, Jun 7, 2017 at 4:41 PM, David Hastings <
> hastings.recurs...@gmail.com
> > wrote:
>
> > well, short answer, use the analyzer to see whats happening.
> > long answer
> >  theres a difference between
> > name:tv promotion   -->  name:tv default_field:promotion
> > name:"tv promotion"   -->  name:"tv promotion"
> > name:tv AND name:promotion --> name:tv AND name:promotion
> >
> >
> > since your default field most likely isnt name, its going to search only
> > the default field for it.  you can alter this behavior using qf
> parameters:
> >
> >
> >
> > qf='name^5 text'
> >
> >
> > for example would apply a boost of 5 if it matched the field 'name', and
> > only 1 for 'text'
> >
> > On Wed, Jun 7, 2017 at 4:35 PM, OTH <omer.t@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> I have what I would think to be a fairly simple problem to solve,
> however
> >> I'm not sure how it's done in Solr and couldn't find an answer on
> Google.
> >>
> >> Say I have two documents, "TV" and "TV promotion".  If the search query
> is
> >> "TV promotion", then, obviously, I would like the document "TV
> promotion"
> >> to score higher.  However, that is not the case right now.
> >>
> >> My syntax is something like this:
> >> http://localhost:8983/solr/sales/select?indent=on=json;
> >> fl=*,score=name:tv
> >> promotion
> >> (I tried "q=name:tv+promotion (added the '+'), but it made no
> difference.)
> >>
> >> It's not scoring the document "TV promotion" higher than "TV"; in fact
> >> it's
> >> scoring it lower.
> >>
> >> Thanks
> >>
> >
> >
>


Score higher if multiple terms match

2017-06-07 Thread OTH
Hello,

I have what I would think to be a fairly simple problem to solve, however
I'm not sure how it's done in Solr and couldn't find an answer on Google.

Say I have two documents, "TV" and "TV promotion".  If the search query is
"TV promotion", then, obviously, I would like the document "TV promotion"
to score higher.  However, that is not the case right now.

My syntax is something like this:
http://localhost:8983/solr/sales/select?indent=on=json=*,score=name:tv
promotion
(I tried "q=name:tv+promotion (added the '+'), but it made no difference.)

It's not scoring the document "TV promotion" higher than "TV"; in fact it's
scoring it lower.

Thanks


Re: AnalyzingInfixSuggester performance

2017-04-18 Thread OTH
I see.  I had actually overlooked the fact that Suggester provides a
'weightField', and I could possibly use that in my case instead of the
regular Solr index with bq.

So if I understand then - the main advantage of using the
AnalyzingInfixSuggester instead of a regular Solr index (since both are
using standard Lucene?) is that the AInfixSuggester does sorting at
index-time using the weightField?  So it's only ever advantageous to use
this Suggester if you need sorting based on a field?

Thanks

On Tue, Apr 18, 2017 at 2:20 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> AnalyzingInfixSuggester uses index-time sort, to sort all postings by the
> suggest weight, so that lookup, as long as your sort by the suggest weight
> is extremely fast.
>
> But if you need to rank at lookup time by something not "congruent" with
> the index-time sort then you lose that benefit.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Apr 16, 2017 at 11:46 AM, OTH <omer.t@gmail.com> wrote:
>
> > Hello,
> >
> > From what I understand, the AnalyzingInfixSuggester is using a simple
> > Lucene query; so I was wondering, how then would this suggester have
> better
> > performance than using a simple Solr 'select' query on a regular Solr
> index
> > (with an asterisk placed at the start and end of the query string).  I
> > could understand why say an FST based suggester would be faster, but I
> > wanted to confirm if that indeed is the case with
> AnalyzingInfixSuggester.
> >
> > One reason I ask is:
> > I needed the results to be boosted based on the value of another field;
> > e.g., if a user in the UK is searching for cities, then I'd need the
> cities
> > which are in the UK to be boosted.  I was able to do this with a regular
> > Solr index by adding something like these parameters:
> > defType=edismax=country:UK^2.0
> >
> > However, I'm not sure if this is possible with the Suggester.  Moreover -
> > other than the 'country' field above, there are other fields as well
> which
> > I need to be returned with the results.  Since the Suggester seems to
> only
> > allow one additional field, called 'payload', I'm able to do this by
> > putting the values of all the other fields into a JSON and then placing
> > that into the 'payload' field - however, I don't know if it would be
> > possible then to incorporate the boosting mechanism I showed above.
> >
> > So I was thinking of just using a regular Solr index instead of the
> > Suggester; I wanted to confirm, what if any is the performance
> improvement
> > in using the AnalyzingInfixSuggester over using a regular index?
> >
> > Much thanks
> >
>


Re: Need help with Query Syntax

2017-04-17 Thread OTH
I tried that, but it returned no results.
I understand now that the issue is that since the field has been tokenized
- searching for "*san\ *" will try to search for individual tokens which
contain the string sequence "san ", and so of course it won't find any.
I think I've found another workaround though which might work for me.
Thanks

On Tue, Apr 18, 2017 at 12:56 AM, Mikhail Khludnev <m...@apache.org> wrote:

> This can be done with escaping space
> select?q=field:*san\ *
> Probably sow=false in new version might also helo
>
>
> On Mon, Apr 17, 2017 at 4:42 PM, OTH <omer.t@gmail.com> wrote:
>
> > If I submit the query:
> >  "select?q=field:*san*"
> > Then it works as expected; returning all values in the field which
> contain
> > the string "san".
> >
> > However if I submit:
> > "select?q=field:*san *"
> > It then seems to return all the values of the field, regardless of what
> the
> > value is (!)
> >
> > I only wish in this case to get the values which contain the string "san
> ",
> > but I'm unable to achieve that.
> >
> > Thanks
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Need help with Query Syntax

2017-04-17 Thread OTH
Ok.  What analyzer / fieldtype should I use to be able to search across
tokens?
Basically, I'm just trying to replicate the functionality of the
AnalyzingInfixLookupFactor Suggester, but I need to do it using a regular
index, because I need to utilize multiple fields using edismax bq, which
seems to not be possible with the Suggester
Thanks

On Mon, Apr 17, 2017 at 10:46 PM, Binoy Dalal <binoydala...@gmail.com>
wrote:

> Use the analyser available in the solr admin console to find out exactly
> how your query is analysed. That should give you a lot more information.
>
> On Mon 17 Apr, 2017, 21:58 OTH, <omer.t@gmail.com> wrote:
>
> > Ok, I get it now, it's because the field has been indexed as tokens.  So
> > maybe I should use a field which does not have a tokenizer index?  I'll
> try
> > something like that.  Thanks
> >
> > On Mon, Apr 17, 2017 at 9:16 PM, OTH <omer.t@gmail.com> wrote:
> >
> > > The field type is "text_general".
> > >
> > > On Mon, Apr 17, 2017 at 7:15 PM, Binoy Dalal <binoydala...@gmail.com>
> > > wrote:
> > >
> > >> I think it returns everything because your query matches *san or " *".
> > >> What is your field type definition?
> > >>
> > >> On Mon 17 Apr, 2017, 19:12 OTH, <omer.t@gmail.com> wrote:
> > >>
> > >> > If I submit the query:
> > >> >  "select?q=field:*san*"
> > >> > Then it works as expected; returning all values in the field which
> > >> contain
> > >> > the string "san".
> > >> >
> > >> > However if I submit:
> > >> > "select?q=field:*san *"
> > >> > It then seems to return all the values of the field, regardless of
> > what
> > >> the
> > >> > value is (!)
> > >> >
> > >> > I only wish in this case to get the values which contain the string
> > >> "san ",
> > >> > but I'm unable to achieve that.
> > >> >
> > >> > Thanks
> > >> >
> > >> --
> > >> Regards,
> > >> Binoy Dalal
> > >>
> > >
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: Need help with Query Syntax

2017-04-17 Thread OTH
Ok, I get it now, it's because the field has been indexed as tokens.  So
maybe I should use a field which does not have a tokenizer index?  I'll try
something like that.  Thanks

On Mon, Apr 17, 2017 at 9:16 PM, OTH <omer.t@gmail.com> wrote:

> The field type is "text_general".
>
> On Mon, Apr 17, 2017 at 7:15 PM, Binoy Dalal <binoydala...@gmail.com>
> wrote:
>
>> I think it returns everything because your query matches *san or " *".
>> What is your field type definition?
>>
>> On Mon 17 Apr, 2017, 19:12 OTH, <omer.t@gmail.com> wrote:
>>
>> > If I submit the query:
>> >  "select?q=field:*san*"
>> > Then it works as expected; returning all values in the field which
>> contain
>> > the string "san".
>> >
>> > However if I submit:
>> > "select?q=field:*san *"
>> > It then seems to return all the values of the field, regardless of what
>> the
>> > value is (!)
>> >
>> > I only wish in this case to get the values which contain the string
>> "san ",
>> > but I'm unable to achieve that.
>> >
>> > Thanks
>> >
>> --
>> Regards,
>> Binoy Dalal
>>
>
>


Re: Need help with Query Syntax

2017-04-17 Thread OTH
The field type is "text_general".

On Mon, Apr 17, 2017 at 7:15 PM, Binoy Dalal <binoydala...@gmail.com> wrote:

> I think it returns everything because your query matches *san or " *".
> What is your field type definition?
>
> On Mon 17 Apr, 2017, 19:12 OTH, <omer.t@gmail.com> wrote:
>
> > If I submit the query:
> >  "select?q=field:*san*"
> > Then it works as expected; returning all values in the field which
> contain
> > the string "san".
> >
> > However if I submit:
> > "select?q=field:*san *"
> > It then seems to return all the values of the field, regardless of what
> the
> > value is (!)
> >
> > I only wish in this case to get the values which contain the string "san
> ",
> > but I'm unable to achieve that.
> >
> > Thanks
> >
> --
> Regards,
> Binoy Dalal
>


Need help with Query Syntax

2017-04-17 Thread OTH
If I submit the query:
 "select?q=field:*san*"
Then it works as expected; returning all values in the field which contain
the string "san".

However if I submit:
"select?q=field:*san *"
It then seems to return all the values of the field, regardless of what the
value is (!)

I only wish in this case to get the values which contain the string "san ",
but I'm unable to achieve that.

Thanks


AnalyzingInfixSuggester performance

2017-04-16 Thread OTH
Hello,

>From what I understand, the AnalyzingInfixSuggester is using a simple
Lucene query; so I was wondering, how then would this suggester have better
performance than using a simple Solr 'select' query on a regular Solr index
(with an asterisk placed at the start and end of the query string).  I
could understand why say an FST based suggester would be faster, but I
wanted to confirm if that indeed is the case with AnalyzingInfixSuggester.

One reason I ask is:
I needed the results to be boosted based on the value of another field;
e.g., if a user in the UK is searching for cities, then I'd need the cities
which are in the UK to be boosted.  I was able to do this with a regular
Solr index by adding something like these parameters:
defType=edismax=country:UK^2.0

However, I'm not sure if this is possible with the Suggester.  Moreover -
other than the 'country' field above, there are other fields as well which
I need to be returned with the results.  Since the Suggester seems to only
allow one additional field, called 'payload', I'm able to do this by
putting the values of all the other fields into a JSON and then placing
that into the 'payload' field - however, I don't know if it would be
possible then to incorporate the boosting mechanism I showed above.

So I was thinking of just using a regular Solr index instead of the
Suggester; I wanted to confirm, what if any is the performance improvement
in using the AnalyzingInfixSuggester over using a regular index?

Much thanks


Re: Need help with auto-suggester

2017-04-15 Thread OTH
I see, thanks.  So I"m just using a string field to store the JSON.

On Sat, Apr 15, 2017 at 11:15 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> Sorry, that was formatted. The quotes are actually escaped, like this:
>
> {"term":"microsoft office","weight":14,"payload":"{\"count\":
> 1534255, \"id\": \"microsoft office\"}”}
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Apr 15, 2017, at 10:40 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >
> > JSON does not have a binary data type, so true BLOBs are not possible in
> JSON. Sorry, I wasn’t clear.
> >
> > The payload I use is JSON in a string. It looks like this:
> >
> > suggest: {
> > skill_names_infix: {
> > m: {
> > numFound: 10,
> > suggestions: [
> > {
> > term: "microsoft office",
> > weight: 14,
> > payload: "{"count": 1534255, "id": "microsoft office"}"
> > },
> > {
> > term: "microsoft excel",
> > weight: 13,
> > payload: "{"count": 940151, "id": "microsoft excel"}"
> > },
> >
> > wunder
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Apr 15, 2017, at 9:07 AM, OTH <omer.t@gmail.com> wrote:
> >>
> >> Hi - just wondering, what would be the difference between using a blob /
> >> binary field to store the JSON rather than simply using a string field?
> >> Thanks
> >>
> >> On Sat, Apr 15, 2017 at 2:50 AM, Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>
> >>> We recently needed multiple values in the payload, so I put a JSON
> blob in
> >>> there. It comes back as a string, so you have to decode that JSON
> >>> separately. Otherwise, it was a pretty clean solution.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>>> On Apr 14, 2017, at 1:57 PM, OTH <omer.t@gmail.com> wrote:
> >>>>
> >>>> Thanks, that works!  But is it possible to have multiple
> payloadFields?
> >>>>
> >>>> On Sat, Apr 15, 2017 at 1:23 AM, Marek Tichy <ma...@gn.apc.org>
> wrote:
> >>>>
> >>>>> Utilize the payload field.
> >>>>>> I don't need to search multiple fields; I need to search just one
> field
> >>>>> but
> >>>>>> get the corresponding values from another field as well.
> >>>>>> I.e. if a user is searching for cities, I wouldn't need the
> countries
> >>> to
> >>>>>> also be searched.  However, when the list of cities is displayed, I
> >>> need
> >>>>>> their corresponding countries to also be displayed.
> >>>>>> This is obviously possible with the regular Solr index, but I can't
> >>>>> figure
> >>>>>> out how to do it with the Suggester index, which seems to only be
> able
> >>> to
> >>>>>> have one field.
> >>>>>> Thanks
> >>>>>>
> >>>>>> On Fri, Apr 14, 2017 at 8:46 AM, Binoy Dalal <
> binoydala...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> You can create a copy field and copy to it from all the fields you
> >>> want
> >>>>> to
> >>>>>>> retrieve the suggestions from and then use that field with the
> >>>>> suggester.
> >>>>>>>
> >>>>>>> On Thu 13 Apr, 2017, 23:21 OTH, <omer.t@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I've followed the steps here to set up auto-suggest:
> >>>>>>>> https://lucidworks.com/2015/03/04/solr-suggester/
> >>>>>>>>
> >>>>>>>> So basically I configured the auto-suggester in solrconfig.xml,
> >>> where I
> >>>>>>>> told it which field in my index needs to be used for
> auto-suggestion.
> >>>>>>>>
> >>>>>>>> The problem is:
> >>>>>>>> When the user searches in the text box in the front end, if they
> are
> >>>>>>>> searching for cities, I also need the countries to appear in the
> >>>>>>> drop-down
> >>>>>>>> list which the user sees.
> >>>>>>>> The field which is being searched is only 'city' here.  However, I
> >>> need
> >>>>>>> to
> >>>>>>>> retrieve the corresponding value in the 'country' field as well.
> >>>>>>>>
> >>>>>>>> How could I do this using the suggester?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>> --
> >>>>>>> Regards,
> >>>>>>> Binoy Dalal
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >
>
>


Re: Need help with auto-suggester

2017-04-15 Thread OTH
Hi - just wondering, what would be the difference between using a blob /
binary field to store the JSON rather than simply using a string field?
Thanks

On Sat, Apr 15, 2017 at 2:50 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> We recently needed multiple values in the payload, so I put a JSON blob in
> there. It comes back as a string, so you have to decode that JSON
> separately. Otherwise, it was a pretty clean solution.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Apr 14, 2017, at 1:57 PM, OTH <omer.t@gmail.com> wrote:
> >
> > Thanks, that works!  But is it possible to have multiple payloadFields?
> >
> > On Sat, Apr 15, 2017 at 1:23 AM, Marek Tichy <ma...@gn.apc.org> wrote:
> >
> >> Utilize the payload field.
> >>> I don't need to search multiple fields; I need to search just one field
> >> but
> >>> get the corresponding values from another field as well.
> >>> I.e. if a user is searching for cities, I wouldn't need the countries
> to
> >>> also be searched.  However, when the list of cities is displayed, I
> need
> >>> their corresponding countries to also be displayed.
> >>> This is obviously possible with the regular Solr index, but I can't
> >> figure
> >>> out how to do it with the Suggester index, which seems to only be able
> to
> >>> have one field.
> >>> Thanks
> >>>
> >>> On Fri, Apr 14, 2017 at 8:46 AM, Binoy Dalal <binoydala...@gmail.com>
> >> wrote:
> >>>
> >>>> You can create a copy field and copy to it from all the fields you
> want
> >> to
> >>>> retrieve the suggestions from and then use that field with the
> >> suggester.
> >>>>
> >>>> On Thu 13 Apr, 2017, 23:21 OTH, <omer.t@gmail.com> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I've followed the steps here to set up auto-suggest:
> >>>>> https://lucidworks.com/2015/03/04/solr-suggester/
> >>>>>
> >>>>> So basically I configured the auto-suggester in solrconfig.xml,
> where I
> >>>>> told it which field in my index needs to be used for auto-suggestion.
> >>>>>
> >>>>> The problem is:
> >>>>> When the user searches in the text box in the front end, if they are
> >>>>> searching for cities, I also need the countries to appear in the
> >>>> drop-down
> >>>>> list which the user sees.
> >>>>> The field which is being searched is only 'city' here.  However, I
> need
> >>>> to
> >>>>> retrieve the corresponding value in the 'country' field as well.
> >>>>>
> >>>>> How could I do this using the suggester?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> --
> >>>> Regards,
> >>>> Binoy Dalal
> >>>>
> >>
> >>
>
>


Re: Need help with auto-suggester

2017-04-14 Thread OTH
Great!  That's what I was about to resort to do, but thanks for the
confirmation!

On Sat, Apr 15, 2017 at 2:50 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> We recently needed multiple values in the payload, so I put a JSON blob in
> there. It comes back as a string, so you have to decode that JSON
> separately. Otherwise, it was a pretty clean solution.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Apr 14, 2017, at 1:57 PM, OTH <omer.t@gmail.com> wrote:
> >
> > Thanks, that works!  But is it possible to have multiple payloadFields?
> >
> > On Sat, Apr 15, 2017 at 1:23 AM, Marek Tichy <ma...@gn.apc.org> wrote:
> >
> >> Utilize the payload field.
> >>> I don't need to search multiple fields; I need to search just one field
> >> but
> >>> get the corresponding values from another field as well.
> >>> I.e. if a user is searching for cities, I wouldn't need the countries
> to
> >>> also be searched.  However, when the list of cities is displayed, I
> need
> >>> their corresponding countries to also be displayed.
> >>> This is obviously possible with the regular Solr index, but I can't
> >> figure
> >>> out how to do it with the Suggester index, which seems to only be able
> to
> >>> have one field.
> >>> Thanks
> >>>
> >>> On Fri, Apr 14, 2017 at 8:46 AM, Binoy Dalal <binoydala...@gmail.com>
> >> wrote:
> >>>
> >>>> You can create a copy field and copy to it from all the fields you
> want
> >> to
> >>>> retrieve the suggestions from and then use that field with the
> >> suggester.
> >>>>
> >>>> On Thu 13 Apr, 2017, 23:21 OTH, <omer.t@gmail.com> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I've followed the steps here to set up auto-suggest:
> >>>>> https://lucidworks.com/2015/03/04/solr-suggester/
> >>>>>
> >>>>> So basically I configured the auto-suggester in solrconfig.xml,
> where I
> >>>>> told it which field in my index needs to be used for auto-suggestion.
> >>>>>
> >>>>> The problem is:
> >>>>> When the user searches in the text box in the front end, if they are
> >>>>> searching for cities, I also need the countries to appear in the
> >>>> drop-down
> >>>>> list which the user sees.
> >>>>> The field which is being searched is only 'city' here.  However, I
> need
> >>>> to
> >>>>> retrieve the corresponding value in the 'country' field as well.
> >>>>>
> >>>>> How could I do this using the suggester?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> --
> >>>> Regards,
> >>>> Binoy Dalal
> >>>>
> >>
> >>
>
>


Re: Need help with auto-suggester

2017-04-14 Thread OTH
Thanks, that works!  But is it possible to have multiple payloadFields?

On Sat, Apr 15, 2017 at 1:23 AM, Marek Tichy <ma...@gn.apc.org> wrote:

> Utilize the payload field.
> > I don't need to search multiple fields; I need to search just one field
> but
> > get the corresponding values from another field as well.
> > I.e. if a user is searching for cities, I wouldn't need the countries to
> > also be searched.  However, when the list of cities is displayed, I need
> > their corresponding countries to also be displayed.
> > This is obviously possible with the regular Solr index, but I can't
> figure
> > out how to do it with the Suggester index, which seems to only be able to
> > have one field.
> > Thanks
> >
> > On Fri, Apr 14, 2017 at 8:46 AM, Binoy Dalal <binoydala...@gmail.com>
> wrote:
> >
> >> You can create a copy field and copy to it from all the fields you want
> to
> >> retrieve the suggestions from and then use that field with the
> suggester.
> >>
> >> On Thu 13 Apr, 2017, 23:21 OTH, <omer.t@gmail.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> I've followed the steps here to set up auto-suggest:
> >>> https://lucidworks.com/2015/03/04/solr-suggester/
> >>>
> >>> So basically I configured the auto-suggester in solrconfig.xml, where I
> >>> told it which field in my index needs to be used for auto-suggestion.
> >>>
> >>> The problem is:
> >>> When the user searches in the text box in the front end, if they are
> >>> searching for cities, I also need the countries to appear in the
> >> drop-down
> >>> list which the user sees.
> >>> The field which is being searched is only 'city' here.  However, I need
> >> to
> >>> retrieve the corresponding value in the 'country' field as well.
> >>>
> >>> How could I do this using the suggester?
> >>>
> >>> Thanks
> >>>
> >> --
> >> Regards,
> >> Binoy Dalal
> >>
>
>


Re: Need help with auto-suggester

2017-04-14 Thread OTH
I don't need to search multiple fields; I need to search just one field but
get the corresponding values from another field as well.
I.e. if a user is searching for cities, I wouldn't need the countries to
also be searched.  However, when the list of cities is displayed, I need
their corresponding countries to also be displayed.
This is obviously possible with the regular Solr index, but I can't figure
out how to do it with the Suggester index, which seems to only be able to
have one field.
Thanks

On Fri, Apr 14, 2017 at 8:46 AM, Binoy Dalal <binoydala...@gmail.com> wrote:

> You can create a copy field and copy to it from all the fields you want to
> retrieve the suggestions from and then use that field with the suggester.
>
> On Thu 13 Apr, 2017, 23:21 OTH, <omer.t@gmail.com> wrote:
>
> > Hello,
> >
> > I've followed the steps here to set up auto-suggest:
> > https://lucidworks.com/2015/03/04/solr-suggester/
> >
> > So basically I configured the auto-suggester in solrconfig.xml, where I
> > told it which field in my index needs to be used for auto-suggestion.
> >
> > The problem is:
> > When the user searches in the text box in the front end, if they are
> > searching for cities, I also need the countries to appear in the
> drop-down
> > list which the user sees.
> > The field which is being searched is only 'city' here.  However, I need
> to
> > retrieve the corresponding value in the 'country' field as well.
> >
> > How could I do this using the suggester?
> >
> > Thanks
> >
> --
> Regards,
> Binoy Dalal
>


Re: Autosuggestion

2017-04-13 Thread OTH
Hello
So, from what I've picked up so far:
FST only matches from the beginning of the input, but can handle spelling
errors and do stemming.
AnalyzingInfix can't handle spelling errors or stemming but can match from
the middle of the string.
(Is there anyway to achieve both of the functionalities above, if need be?)
Performance-wise, FST's are faster and more compact?

Thanks

On Thu, Apr 13, 2017 at 7:57 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq:  FST-based vs AnalyzingInfix
>
> They are two totally different things. FST-based suggesters are very
> fast and compact. But they only match from the beginning of the input.
>
> AnalyzingInfix creates a "sidecar" index that's searched like a normal
> index and the _field_ is returned. Thus analyzinginfix can suggest
> "my dog has fleas" when entering "fleas", but the FST-based suggesters
> cannot.
>
> Best,
> Erick
>
> On Thu, Apr 13, 2017 at 6:24 AM, OTH <omer.t@gmail.com> wrote:
> > Thanks, that's very helpful!
> > The third link especially is quite helpful.
> > Is there any recommendation regarding using FST-based vs AnalyzingInfix
> > suggesters?
> > Thanks
> >
> > On Wed, Apr 12, 2017 at 6:23 PM, Andrea Gazzarini <gxs...@gmail.com>
> wrote:
> >
> >> Hi,
> >> I think you got an old post. I would have a look at the built-in
> feature,
> >> first. These posts can help you to get a quick overview:
> >>
> >> https://cwiki.apache.org/confluence/display/solr/Suggester
> >> http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html
> >> https://lucidworks.com/2015/03/04/solr-suggester/
> >>
> >> HTH,
> >> Andrea
> >>
> >>
> >> On 12/04/17 14:43, OTH wrote:
> >>
> >>> Hello,
> >>>
> >>> Is there any recommended way to achieve auto-suggestion in textboxes
> using
> >>> Solr?
> >>>
> >>> I'm new to Solr, but right now I have achieved this functionality by
> using
> >>> an example I found online, doing this:
> >>>
> >>> I added a copy field, which is of the following type:
> >>>
> >>> >>> positionIncrementGap="100">
> >>>  
> >>> >>> maxGramSize="10"/>
> >>>
> >>>  
> >>>  
> >>> minGramSize="2"
> >>> maxGramSize="10"/>
> >>>
> >>>  
> >>>
> >>>
> >>> In the search box, after each character is typed, the above field is
> >>> queried, and the results are shown in a drop-down list.
> >>>
> >>> However, this is performing quite slow.  I'm not sure if that has to do
> >>> with the front-end code, or because I'm not using the recommended
> approach
> >>> in terms of how I'm using Solr.  Is there any other recommended way to
> use
> >>> Solr to achieve this functionality?
> >>>
> >>> Thanks
> >>>
> >>>
> >>
>


Need help with auto-suggester

2017-04-13 Thread OTH
Hello,

I've followed the steps here to set up auto-suggest:
https://lucidworks.com/2015/03/04/solr-suggester/

So basically I configured the auto-suggester in solrconfig.xml, where I
told it which field in my index needs to be used for auto-suggestion.

The problem is:
When the user searches in the text box in the front end, if they are
searching for cities, I also need the countries to appear in the drop-down
list which the user sees.
The field which is being searched is only 'city' here.  However, I need to
retrieve the corresponding value in the 'country' field as well.

How could I do this using the suggester?

Thanks


Re: Autosuggestion

2017-04-13 Thread OTH
Thanks, that's very helpful!
The third link especially is quite helpful.
Is there any recommendation regarding using FST-based vs AnalyzingInfix
suggesters?
Thanks

On Wed, Apr 12, 2017 at 6:23 PM, Andrea Gazzarini <gxs...@gmail.com> wrote:

> Hi,
> I think you got an old post. I would have a look at the built-in feature,
> first. These posts can help you to get a quick overview:
>
> https://cwiki.apache.org/confluence/display/solr/Suggester
> http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html
> https://lucidworks.com/2015/03/04/solr-suggester/
>
> HTH,
> Andrea
>
>
> On 12/04/17 14:43, OTH wrote:
>
>> Hello,
>>
>> Is there any recommended way to achieve auto-suggestion in textboxes using
>> Solr?
>>
>> I'm new to Solr, but right now I have achieved this functionality by using
>> an example I found online, doing this:
>>
>> I added a copy field, which is of the following type:
>>
>>> positionIncrementGap="100">
>>  
>>> maxGramSize="10"/>
>>
>>  
>>  
>>> maxGramSize="10"/>
>>
>>  
>>
>>
>> In the search box, after each character is typed, the above field is
>> queried, and the results are shown in a drop-down list.
>>
>> However, this is performing quite slow.  I'm not sure if that has to do
>> with the front-end code, or because I'm not using the recommended approach
>> in terms of how I'm using Solr.  Is there any other recommended way to use
>> Solr to achieve this functionality?
>>
>> Thanks
>>
>>
>


Autosuggestion

2017-04-12 Thread OTH
Hello,

Is there any recommended way to achieve auto-suggestion in textboxes using
Solr?

I'm new to Solr, but right now I have achieved this functionality by using
an example I found online, doing this:

I added a copy field, which is of the following type:

  

  
  


  
  

  

In the search box, after each character is typed, the above field is
queried, and the results are shown in a drop-down list.

However, this is performing quite slow.  I'm not sure if that has to do
with the front-end code, or because I'm not using the recommended approach
in terms of how I'm using Solr.  Is there any other recommended way to use
Solr to achieve this functionality?

Thanks


Possible bug

2017-04-06 Thread OTH
I'm not sure if any one else had this problem, but this is a problem I had:

I'm using Solr 6.4.1, on Windows, and when would run 'bin\solr delete -c
', it wouldn't work properly.  It turned out it was because
there was a space character which shouldn't have been there at the end of
line 1380 in the solr.bat file.  I'm not sure if that's the way it came or
if maybe I had accidentally added that space at some point, though I don't
seem to remember doing anything like that.

After removing that space, the delete command works fine.

Regards


Searchable archive of this mailing list

2017-03-31 Thread OTH
Hi all,

Is there a searchable archive of this mailing list?

I'm asking just so I don't have to post a question in the future which may
have been answered before already.

Thanks


Re: Data Import

2017-03-17 Thread OTH
Are Kafka and SQS interchangeable?  (The latter does not seem to be free.)

@Wunder:
I'm assuming, that updating to Solr would fail if Solr is unavailable not
just if posting via say a DB trigger, but probably also if trying to post
through SolrJ?  (Which is what I'm using for now.)  So, even if using
SolrJ, it would be a good idea to use a queuing software?

Thanks

On Fri, Mar 17, 2017 at 10:12 PM, vishal jain <jain02...@gmail.com> wrote:

> Streaming the data through kafka would be a good option if near real time
> data indexing is the key requirement.
> In our application the RDBMS data is populated by an ETL job periodically
> so we don't need real time data indexing for now.
>
> Cheers,
> Vishal
>
> On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Or set a trigger on your RDBMS's main table to put the relevant
> > information in a different table (call it EVENTS) and have your SolrJ
> > consult the EVENTS table periodically. Essentially you're using the
> > EVENTS table as a queue where the trigger is the producer and the
> > SolrJ program is the consumer.
> >
> > It's a polling solution though, so not event-driven. There's no
> > mechanism that I know of have, say, your RDBMS push an event to DIH
> > for instance.
> >
> > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> > for this kind of problem..
> >
> > Best,
> > Erick
> >
> > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
> > <arafa...@gmail.com> wrote:
> > > One assumes by hooking into the same code that updates RDBMS, as
> > > opposed to be reverse engineering the changes from looking at the DB
> > > content. This would be especially the case for Delete changes.
> > >
> > > Regards,
> > >Alex.
> > > 
> > > http://www.solr-start.com/ - Resources for Solr users, new and
> > experienced
> > >
> > >
> > > On 17 March 2017 at 11:37, OTH <omer.t@gmail.com> wrote:
> > >>>
> > >>> Also, solrj is good when you want your RDBMS updates make immediately
> > >>> available in solr.
> > >>
> > >> How can SolrJ be used to make RDBMS updates immediately available?
> > >> Thanks
> > >>
> > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> > sujaybawas...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Vishal,
> > >>>
> > >>> As per my experience DIH is the best for RDBMS to solr index. DIH
> with
> > >>> caching has best performance. DIH nested entities allow you to define
> > >>> simple queries.
> > >>> Also, solrj is good when you want your RDBMS updates make immediately
> > >>> available in solr. DIH full import can be used for index all data
> first
> > >>> time or restore index in case index is corrupted.
> > >>>
> > >>> Thanks,
> > >>> Sujay
> > >>>
> > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain <jain02...@gmail.com>
> > wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> >
> > >>> > I am new to Solr and am trying to move data from my RDBMS to Solr.
> I
> > know
> > >>> > the available options are:
> > >>> > 1) Post Tool
> > >>> > 2) DIH
> > >>> > 3) SolrJ (as ours is a J2EE application).
> > >>> >
> > >>> > I want to know what is the recommended way for Data import in
> > production
> > >>> > environment.
> > >>> > Will sending data via SolrJ in batches be faster than posting a csv
> > using
> > >>> > POST tool?
> > >>> >
> > >>> >
> > >>> > Thanks,
> > >>> > Vishal
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks,
> > >>> Sujay P Bawaskar
> > >>> M:+91-77091 53669
> > >>>
> >
>


Re: Data Import

2017-03-17 Thread OTH
Could the database trigger not just post the change to solr?

On Fri, Mar 17, 2017 at 10:00 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Or set a trigger on your RDBMS's main table to put the relevant
> information in a different table (call it EVENTS) and have your SolrJ
> consult the EVENTS table periodically. Essentially you're using the
> EVENTS table as a queue where the trigger is the producer and the
> SolrJ program is the consumer.
>
> It's a polling solution though, so not event-driven. There's no
> mechanism that I know of have, say, your RDBMS push an event to DIH
> for instance.
>
> Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> for this kind of problem..
>
> Best,
> Erick
>
> On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
> <arafa...@gmail.com> wrote:
> > One assumes by hooking into the same code that updates RDBMS, as
> > opposed to be reverse engineering the changes from looking at the DB
> > content. This would be especially the case for Delete changes.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 17 March 2017 at 11:37, OTH <omer.t@gmail.com> wrote:
> >>>
> >>> Also, solrj is good when you want your RDBMS updates make immediately
> >>> available in solr.
> >>
> >> How can SolrJ be used to make RDBMS updates immediately available?
> >> Thanks
> >>
> >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> sujaybawas...@gmail.com>
> >> wrote:
> >>
> >>> Hi Vishal,
> >>>
> >>> As per my experience DIH is the best for RDBMS to solr index. DIH with
> >>> caching has best performance. DIH nested entities allow you to define
> >>> simple queries.
> >>> Also, solrj is good when you want your RDBMS updates make immediately
> >>> available in solr. DIH full import can be used for index all data first
> >>> time or restore index in case index is corrupted.
> >>>
> >>> Thanks,
> >>> Sujay
> >>>
> >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain <jain02...@gmail.com>
> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> >
> >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I
> know
> >>> > the available options are:
> >>> > 1) Post Tool
> >>> > 2) DIH
> >>> > 3) SolrJ (as ours is a J2EE application).
> >>> >
> >>> > I want to know what is the recommended way for Data import in
> production
> >>> > environment.
> >>> > Will sending data via SolrJ in batches be faster than posting a csv
> using
> >>> > POST tool?
> >>> >
> >>> >
> >>> > Thanks,
> >>> > Vishal
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Sujay P Bawaskar
> >>> M:+91-77091 53669
> >>>
>


Re: Data Import

2017-03-17 Thread OTH
>
> Also, solrj is good when you want your RDBMS updates make immediately
> available in solr.

How can SolrJ be used to make RDBMS updates immediately available?
Thanks

On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar 
wrote:

> Hi Vishal,
>
> As per my experience DIH is the best for RDBMS to solr index. DIH with
> caching has best performance. DIH nested entities allow you to define
> simple queries.
> Also, solrj is good when you want your RDBMS updates make immediately
> available in solr. DIH full import can be used for index all data first
> time or restore index in case index is corrupted.
>
> Thanks,
> Sujay
>
> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain  wrote:
>
> > Hi,
> >
> >
> > I am new to Solr and am trying to move data from my RDBMS to Solr. I know
> > the available options are:
> > 1) Post Tool
> > 2) DIH
> > 3) SolrJ (as ours is a J2EE application).
> >
> > I want to know what is the recommended way for Data import in production
> > environment.
> > Will sending data via SolrJ in batches be faster than posting a csv using
> > POST tool?
> >
> >
> > Thanks,
> > Vishal
> >
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>


Best way to synonymize with Wordnet

2017-03-13 Thread OTH
Hello all,

I am looking to incorporate synonymization using Wordnet in my Solr
application.

Does any one have any advice on how to do this, and what the 'best
practices' would be in this regard?

Much thanks


Re: Solr JDBC with Core (vs Collection)

2017-03-08 Thread OTH
Hello,

Yes, I was trying to use it with a non-cloud setup.

Basically, our application probably won't be requiring cloud features;
however, it would be extremely helpful to use JDBC with Solr.

Of course, we don't mind using SolrCloud if that's what is needed for JDBC.

Are there any drawbacks to using SolrCloud, if a distributed setup probably
won't be required?

Much thanks

On Thu, Mar 9, 2017 at 12:13 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> I believe JDBC requires streams, which requires SolrCloud, which
> requires Collections (even if it is a single-core collection).
>
> Are you trying to use it with non-cloud setup?
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 8 March 2017 at 14:02, OTH <omer.t@gmail.com> wrote:
> > Hello,
> >
> > From the examples I am seeing online and in the reference guide (
> > https://cwiki.apache.org/confluence/display/solr/Solr+
> JDBC+-+SQuirreL+SQL),
> > I can only see Solr JDBC being used against a collection.  Is it possible
> > however to use it with a core?  What should the JDBC URL be like in that
> > case?
> >
> > Thanks
>


Solr JDBC with Core (vs Collection)

2017-03-08 Thread OTH
Hello,

>From the examples I am seeing online and in the reference guide (
https://cwiki.apache.org/confluence/display/solr/Solr+JDBC+-+SQuirreL+SQL),
I can only see Solr JDBC being used against a collection.  Is it possible
however to use it with a core?  What should the JDBC URL be like in that
case?

Thanks


Re: Managed schema vs schema.xml

2017-03-07 Thread OTH
In the reference guide, in the chapter named "The Well Configured Solr
Instance", it says (I'm copying+pasting from the PDF version) :

Switching from Managed Schema to Manually Edited schema.xml
> If you have started Solr with managed schema enabled and you would like to
> switch to manually editing a schem
> a.xml
> a.xml file, you should take the following steps:
> Rename the
> Rename the managed-schema file to schema.xml.
> Modify
> Modify solrconfig.xml to replace the schemaFactory class.
> Remove any
> Remove any ManagedIndexSchemaFactory definition if it exists.
> Add a
> Add a ClassicIndexSchemaFactory definition as shown above
> Reload the core(s).
> Reload the core(s).
> Apache Solr Reference Guide 6.4 515
> If you are using SolrCloud, you may need to modify the files via
> ZooKeeper. The
> If you are using SolrCloud, you may need to modify the files via
> ZooKeeper. The bin/solr script provides an
> easy way to download the files from ZooKeeper and upload them back after
> edits. See the section
> easy way to download the files from ZooKeeper and upload them back after
> edits. See the section ZooKeeper
> Operations
> Operations for more information.
> IndexConfig in SolrConfig
> The  section of solrconfig.xml defines low-level behavior of
> the Lucene index writers.
> By default, the settings are commented out in the sample
> By default, the settings are commented out in the sample solrconfig.xml 
> included
> with Solr, which means
> the defaults are used. In most cases, the defaults are fine.
> the defaults are used. In most cases, the defaults are fine.
> 
> ...
> 
> Parameters covered in this section:
> Writing New Segments
> Merging Index Segments
> Compound File Segments
> Index Locks
> Other Indexing Settings
> Writing New Segments
> ramBufferSizeMB
> Once accumulated document updates exceed this much memory space (defined
> in megabytes), then the
> pending updates are flushed. This can also create new segments or trigger
> a merge. Using this setting is
> generally preferable to maxBufferedDocs. If both maxBufferedDocs and 
> ramBufferSizeMB
> are set in s
> olrconfig.xml
> olrconfig.xml, then a flush will occur when either limit is reached. The
> default is 100Mb.
> 100
> maxBufferedDocs
> Sets the number of document updates to buffer in memory before they are
> flushed as a new segment. This
> may also trigger a merge. The default Solr configuration sets to flush by
> RAM usage (ramBufferSizeMB).
> 1000
> useCompoundFile
> Controls whether newly written (and not yet merged) index segments should
> use the Compound File
> Segment
> Segment format. The default is false.
> false
> To have full control over your schema.xml file, you may also want to
> disable schema guessing, which
> allows unknown fields to be added to the schema during indexing. The
> properties that enable this feature
> are discussed in the section
> allows unknown fields to be added to the schema during indexing. The
> properties that enable this feature
> are discussed in the section Schemaless Mode


On Wed, Mar 8, 2017 at 1:32 AM, Phil Scadden  wrote:

> I would second that guide could be clearer on that. I read and reread
> several times trying to get my head around the schema.xml/managed-schema
> bit. I came away from first cursory reading with the idea that
> managed-schema was mostly for schema-less mode and only after some stuff
> ups and puzzling over comments in the basic-config schema file itself did I
> go back for more careful re-read. I am still not sure that I have got all
> the nuances. My understanding is:
>
> If you don’t want ability to edit it via admin UI or config api, rename to
> schema.xml. Unclear whether you have to make changes to other configs to do
> this. Also unclear to me whether there was any upside at all to using
> schema.xml? Why degrade functionality? Does the capacity for schema.xml
> only exist for backward compatibility?
>
> If you want to run schema-less, you have to use managed-schema? (I
> didn’t delve too deep into this).
>
> In the end, I used basic-config to create core and then hacked
> managed-schema from there.
>
>
> I would have to say the "basic-config" seems distinctly more than basic.
> It is still a huge file. I thought perhaps I could delete every unused
> field type, but worried there were some "system" dependencies. Ie if you
> want *target type wildcard queries do you need to have text_general_reverse
> and a copy to it? If you always explicitly set only defined fields in a
> custom indexer, then can you dump the whole dynamic fields bit?
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>


Re: Managed schema vs schema.xml

2017-03-07 Thread OTH
Hi,

Thanks, I should've consulted this guide more thoroughly.  I actually had
encountered this section when reading the guide, but somehow forgot about
it when asking this question.  I think, it doesn't clarify some things very
well, which could leave a beginner a bit confused.

Specifically, that 'managed-schema' could indeed be modified by hand, or
even that what the HTTP API is doing is actually modifying this file.
When I was first checking out Solr, I saw this section and remembered
thinking how verbose it was to make changes this way, because I saw on some
website how someone was making changes to a 'schema.xml' file instead, and
that seemed easier.  This file was supposed to be in 'conf' but I couldn't
find it... so I tried making the changes to modified-schema instead and it
worked.  But then I also read somewhere that you aren't supposed to do
that, so I wasn't sure how to do things going forward.

Anyways, I'm clearer now that the managed-schema does safely allow
hand-edits if done properly, which might in some cases be easier than the
HTTP calls; and at the same time it offers the HTTP API as an option as
well when needed / preferred.

Much thanks

On Tue, Mar 7, 2017 at 9:50 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Yes, it has been asked many times and has been answered both on the
> list and in the - awesome - Reference Guide. I'd recommend reading
> that and then coming back again with more specific question:
> https://cwiki.apache.org/confluence/display/solr/Overview+of+Documents%2C+
> Fields%2C+and+Schema+Design
>
> One confusion to clarify though. API is HTTP API, Admin UI just uses
> it and does not - yet - expose everything possible. You can always
> just hit Solr directly for the missing bits. Again, RTARG (.. Awesome
> Reference Guide) and then come back with specifics:
> https://cwiki.apache.org/confluence/display/solr/Schema+API
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 11:41, OTH <omer.t@gmail.com> wrote:
> > Hello
> >
> > I'm sure this has been asked many times but I'm having some confusion
> here.
> >
> > I understand that managed-schema is not supposed to be edited by hand but
> > only via the "API".  All I understand about this "API" however, is that
> it
> > may be referring to the "Schema" page in the Solr browser-based Admin.
> >
> > However, in this "Schema" page, it provides options for "Add Field", "Add
> > Dynamic Field", "Add Copy Field"; but when I was trying to add a
> > "fieldType", I couldn't find any way to do this from this web page.
> >
> > So I instead edited the managed-schema page by hand, which I understand
> can
> > be problematic if the schema is ever edited it via the API later on?
> >
> > I am using v. 6.4.1; when I create a new core, it creates the
> > managed-schema file in the 'conf' folder.  Is there any way to use the
> > older 'schema.xml' format instead?  Because there seems to be more
> > documentation available for that, and like I describe, the browser API
> > seems to perhaps be lacking.
> >
> > If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> > aware this depends on individual preference, but would be nice to get
> > others' feedback.)
> >
> > Thanks
>


Re: Managed schema vs schema.xml

2017-03-07 Thread OTH
Hi,

Thanks, that sufficiently answers the question.
It's especially good to know now that hand-editing is fine, as long as it's
separated from API calls with restarts in between.

Thanks

On Tue, Mar 7, 2017 at 9:57 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/7/2017 9:41 AM, OTH wrote:
> > I understand that managed-schema is not supposed to be edited by hand but
> > only via the "API".  All I understand about this "API" however, is that
> it
> > may be referring to the "Schema" page in the Solr browser-based Admin.
> >
> > However, in this "Schema" page, it provides options for "Add Field", "Add
> > Dynamic Field", "Add Copy Field"; but when I was trying to add a
> > "fieldType", I couldn't find any way to do this from this web page.
>
> The schema page in the admin UI is not actually the Schema API, but it
> USES the Schema API.  The admin UI is a javascript app that runs in your
> browser and makes Solr API requests.  Admin UI URLs are useless outside
> of a full browser.
>
> > So I instead edited the managed-schema page by hand, which I understand
> can
> > be problematic if the schema is ever edited it via the API later on?
>
> Hand-editing is only problematic if you mix those edits with using the
> API and forget to reload or restart after a hand-edit and before using
> the API.  If you are careful to reload/restart before switching editing
> methods, there will be no problems.
>
> > I am using v. 6.4.1; when I create a new core, it creates the
> > managed-schema file in the 'conf' folder.  Is there any way to use the
> > older 'schema.xml' format instead?  Because there seems to be more
> > documentation available for that, and like I describe, the browser API
> > seems to perhaps be lacking.
>
> The "format" of the schema never changes.  It is exactly the same with
> either file.  It is the filename that is different.  Also, the managed
> schema allows the Schema API to be used, so you can edit it with HTTP
> requests.  If you switch to the Classic schema, then it will go back to
> schema.xml.  Depending on which example configuration you start with,
> switching back to Classic may require more config edits beyond just
> changing the schema factory.  There are additional features Solr can use
> that rely on the managed schema.
>
> > If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> > aware this depends on individual preference, but would be nice to get
> > others' feedback.)
>
> As for what users prefer, I do not know.  I can tell you that the
> default schema factory has been the managed schema since version 5.5,
> and all example configs since that version are using it.  When I upgrade
> to a 6.x version in production, I plan on keeping the managed schema,
> because it's good to go with defaults unless there's a good reason not
> to, but I will continue to hand-edit for all changes.
>
> Thanks,
> Shawn
>
>


Re: Tokenized querying

2017-03-07 Thread OTH
Hi,

Thanks a lot for the help.  Adding 'score' to 'fl' worked.

I had been using Lucene for some time (thought not at an expert level), and
I was usually pretty satisfied with the scoring; so I'm assuming Solr
should work fine for me too.  At the time being I'm just trying to get a
handle on how to use Solr in the first place though.

Thanks

On Tue, Mar 7, 2017 at 9:45 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Try adding "score" as a pseudo-field in the 'fl' parameter:
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#
> CommonQueryParameters-Thefl(FieldList)Parameter
>
> You can also enable debug and debug.explain.structured, if you want to
> go all inception on figuring the scores out:
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#
> CommonQueryParameters-ThedebugParameter
> . And if you do, https://www.manning.com/books/relevant-search is your
> friend and I think Manning is running 40% discount right now on
> Twitter.
>
> Regards,
>Alex.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 11:41, OTH <omer.t@gmail.com> wrote:
> > Hello,
> >
> > Thanks for your response; it turned out the fields were indeed of
> 'string'
> > type, and when I changed them to 'text_general', it started to work as I
> > wanted.
> >
> > However, I'm still not sure how to extract the scores?  I don't seem to
> be
> > getting that in the response.
> >
> > Much thanks
> >
> > On Tue, Mar 7, 2017 at 8:07 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> The default text field definition (text_general) tokenizes on spaces,
> >> so - if I understand the question correctly - it should just work. Are
> >> you by any chance searching against name field that is defined as
> >> String (and is not tokenized).
> >>
> >> If you do Solr tutorial, you search on "ipod", which seems like a
> >> similar case to me. So, can you start from there? You can just index
> >> your own text into the example config for example.
> >>
> >> Regards,
> >>Alex.
> >> P.s. If you are coming from Lucene, copyField instruction may be
> >> slightly confusing. In the examples provided, your text is copied from
> >> named specific fields to text/_text_ field which is actually the
> >> default field searched, using the type definition associated with that
> >> text/_text_ field, rather than with the original field.
> >> 
> >> http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >>
> >>
> >> On 7 March 2017 at 09:30, OTH <omer.t@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I am new to Solr.  I am using v. 6.4.1.  I have what is probably a
> pretty
> >> > simple question.
> >> >
> >> > Let's say I have these documents with the following values in a single
> >> > field (let's call it "name"):
> >> >
> >> > sando...@company.example.com
> >> > sandb...@company.example.com
> >> > sa...@company.example.com
> >> > Sancho Landolt
> >> > Sanders Greenley
> >> > Sanders Massey
> >> > Santa Catarina
> >> > San Carlos de Bariloche
> >> > San Francisco
> >> > San Mateo
> >> >
> >> > I would like, if the search query is "San", for Solr to return the
> >> > following and only the following:
> >> > San Carlos de Bariloche
> >> > San Francisco
> >> > San Mateo
> >> >
> >> > So basically, I'd like to search based on tokens.  I'd also like Solr
> to
> >> > return an associated score.  So eg, if the user searches "San
> Francisco",
> >> > it should still return the above results, but obviously the score for
> the
> >> > document with "San Francisco" would be much higher.
> >> >
> >> > I've been doing this pretty easily using Lucene from Java, however I'm
> >> > unable to figure out how to do it using Solr.
> >> >
> >> > Much thanks
> >>
>


Managed schema vs schema.xml

2017-03-07 Thread OTH
Hello

I'm sure this has been asked many times but I'm having some confusion here.

I understand that managed-schema is not supposed to be edited by hand but
only via the "API".  All I understand about this "API" however, is that it
may be referring to the "Schema" page in the Solr browser-based Admin.

However, in this "Schema" page, it provides options for "Add Field", "Add
Dynamic Field", "Add Copy Field"; but when I was trying to add a
"fieldType", I couldn't find any way to do this from this web page.

So I instead edited the managed-schema page by hand, which I understand can
be problematic if the schema is ever edited it via the API later on?

I am using v. 6.4.1; when I create a new core, it creates the
managed-schema file in the 'conf' folder.  Is there any way to use the
older 'schema.xml' format instead?  Because there seems to be more
documentation available for that, and like I describe, the browser API
seems to perhaps be lacking.

If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
aware this depends on individual preference, but would be nice to get
others' feedback.)

Thanks


Re: Tokenized querying

2017-03-07 Thread OTH
Hello,

Thanks for your response; it turned out the fields were indeed of 'string'
type, and when I changed them to 'text_general', it started to work as I
wanted.

However, I'm still not sure how to extract the scores?  I don't seem to be
getting that in the response.

Much thanks

On Tue, Mar 7, 2017 at 8:07 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> The default text field definition (text_general) tokenizes on spaces,
> so - if I understand the question correctly - it should just work. Are
> you by any chance searching against name field that is defined as
> String (and is not tokenized).
>
> If you do Solr tutorial, you search on "ipod", which seems like a
> similar case to me. So, can you start from there? You can just index
> your own text into the example config for example.
>
> Regards,
>Alex.
> P.s. If you are coming from Lucene, copyField instruction may be
> slightly confusing. In the examples provided, your text is copied from
> named specific fields to text/_text_ field which is actually the
> default field searched, using the type definition associated with that
> text/_text_ field, rather than with the original field.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 09:30, OTH <omer.t@gmail.com> wrote:
> > Hello,
> >
> > I am new to Solr.  I am using v. 6.4.1.  I have what is probably a pretty
> > simple question.
> >
> > Let's say I have these documents with the following values in a single
> > field (let's call it "name"):
> >
> > sando...@company.example.com
> > sandb...@company.example.com
> > sa...@company.example.com
> > Sancho Landolt
> > Sanders Greenley
> > Sanders Massey
> > Santa Catarina
> > San Carlos de Bariloche
> > San Francisco
> > San Mateo
> >
> > I would like, if the search query is "San", for Solr to return the
> > following and only the following:
> > San Carlos de Bariloche
> > San Francisco
> > San Mateo
> >
> > So basically, I'd like to search based on tokens.  I'd also like Solr to
> > return an associated score.  So eg, if the user searches "San Francisco",
> > it should still return the above results, but obviously the score for the
> > document with "San Francisco" would be much higher.
> >
> > I've been doing this pretty easily using Lucene from Java, however I'm
> > unable to figure out how to do it using Solr.
> >
> > Much thanks
>


Tokenized querying

2017-03-07 Thread OTH
Hello,

I am new to Solr.  I am using v. 6.4.1.  I have what is probably a pretty
simple question.

Let's say I have these documents with the following values in a single
field (let's call it "name"):

sando...@company.example.com
sandb...@company.example.com
sa...@company.example.com
Sancho Landolt
Sanders Greenley
Sanders Massey
Santa Catarina
San Carlos de Bariloche
San Francisco
San Mateo

I would like, if the search query is "San", for Solr to return the
following and only the following:
San Carlos de Bariloche
San Francisco
San Mateo

So basically, I'd like to search based on tokens.  I'd also like Solr to
return an associated score.  So eg, if the user searches "San Francisco",
it should still return the above results, but obviously the score for the
document with "San Francisco" would be much higher.

I've been doing this pretty easily using Lucene from Java, however I'm
unable to figure out how to do it using Solr.

Much thanks


Re: Viewing more than 10 results in Solr Admin

2017-02-28 Thread OTH
As per your advice I just tried submitting with the following text in the
"q" field:
>
> *:*=0=20

However I got the following response / error:

{
  "responseHeader":{
"status":400,
"QTime":13,
"params":{
  "q":"*:*=0=20",
  "indent":"on",
  "wt":"json",
  "_":"1488305315988"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field *",
"code":400}}


On Tue, Feb 28, 2017 at 11:19 PM, OTH <omer.t@gmail.com> wrote:

> Hello, thanks for response - There are no screenshots attached to your
> email though.
>
> On Tue, Feb 28, 2017 at 11:12 PM, Deeksha Sharma <
> dsha...@flexerasoftware.com> wrote:
>
>> By default its 10 rows on Admin UI and indeed its gray. But did you tried
>> writing a number into the text field for:
>> Start,rows. See the screen shots attached.
>>
>>
>>
>>
>> On 2/28/17, 10:08 AM, "OTH" <omer.t@gmail.com> wrote:
>>
>> Hello,
>>
>> In the browser-based Solr Admin, in the 'Query' page, the "start" and
>> "rows" input boxes have default values of 0 and 10 respectively, but
>> these
>> values are grayed out the input boxes are not allowing me to change
>> their
>> values.  Therefore, whenever I submit a query on this page, I am only
>> ever
>> able to see the first 10 rows.  How can I see more rows / results?
>>
>> Thanks
>>
>>
>>
>


Re: Viewing more than 10 results in Solr Admin

2017-02-28 Thread OTH
Hello, thanks for response - There are no screenshots attached to your
email though.

On Tue, Feb 28, 2017 at 11:12 PM, Deeksha Sharma <
dsha...@flexerasoftware.com> wrote:

> By default its 10 rows on Admin UI and indeed its gray. But did you tried
> writing a number into the text field for:
> Start,rows. See the screen shots attached.
>
>
>
>
> On 2/28/17, 10:08 AM, "OTH" <omer.t@gmail.com> wrote:
>
> Hello,
>
> In the browser-based Solr Admin, in the 'Query' page, the "start" and
> "rows" input boxes have default values of 0 and 10 respectively, but
> these
> values are grayed out the input boxes are not allowing me to change
> their
> values.  Therefore, whenever I submit a query on this page, I am only
> ever
> able to see the first 10 rows.  How can I see more rows / results?
>
> Thanks
>
>
>


Viewing more than 10 results in Solr Admin

2017-02-28 Thread OTH
Hello,

In the browser-based Solr Admin, in the 'Query' page, the "start" and
"rows" input boxes have default values of 0 and 10 respectively, but these
values are grayed out the input boxes are not allowing me to change their
values.  Therefore, whenever I submit a query on this page, I am only ever
able to see the first 10 rows.  How can I see more rows / results?

Thanks


Add fieldType from Solr API

2017-02-26 Thread OTH
Hello,

I am new to Solr, and am using Solr v. 6.4.1.

I need to add a new "fieldType" to my schema.  My version of Solr is using
the "managed-schema" XML file, which I gather one is not supposed to modify
directly.  Is it possible to add a new fieldType using the Solr Admin via
the browser?  The "schema" page doesn't seem to provide this option, at
least from what I can tell.

Thanks


Re: Auto-generate unique key when adding documents from SolrJ

2017-02-26 Thread OTH
Thanks, great, it's working now!
Omer

On Sun, Feb 26, 2017 at 8:24 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> It is not enough to declare URP chain, you have to invoke it.
>
> Either by marking it default or by adding the update.chain parameter
> to the request handler (or in initParams) you use to update the
> documents (usually /update). See, for example:
> https://github.com/apache/lucene-solr/blob/master/solr/
> server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml#L837
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 26 February 2017 at 10:11, OTH <omer.t@gmail.com> wrote:
> > Hello all,
> >
> > First of all, I am very new to Solr.
> >
> > I am using Solr version 6.4.1.  I have a Solr core (non-cloud), where
> there
> > is a mandatory unique key field called "id".
> >
> > I am trying to add documents to the core from Java, without having to
> > specify the "id" field explicitly; i.e. to have it auto-generated.
> >
> > I learned that this is possible by including the following information in
> > the conf/solrconfig.xml file:
> >
> >> 
> >> 
> >> 
> >> id
> >>   
> >> ...
> >> 
> >> 
> >> 
> >>   
> >
> >
> > (I did restart the server after adding the above text to the xml file.)
> >
> > However, when I try to add documents from Java using SolrJ (without
> > specifying the "id" field), I get the following exception:
> >
> >> Exception in thread "main"
> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> >> from server at http://localhost:8983/solr/sales_history: Document is
> >> missing mandatory uniqueKey field: id
> >
> >
> > My Java code is like this:
> >
> >> SolrClient solr = new HttpSolrClient.Builder(SOLR_URL).build();
> >> SolrInputDocument document = new SolrInputDocument();
> >> document.addField(..., ...);
> >> document.addField(..., ...);
> >> UpdateResponse updateResponse = solr.add(document);
> >
> >
> > The exception is thrown from the last line above.
> >
> > Is there any way to add documents from Java and have the uniqueKey field
> be
> > auto-generated?
> >
> >
> > Thank you
>


Auto-generate unique key when adding documents from SolrJ

2017-02-26 Thread OTH
Hello all,

First of all, I am very new to Solr.

I am using Solr version 6.4.1.  I have a Solr core (non-cloud), where there
is a mandatory unique key field called "id".

I am trying to add documents to the core from Java, without having to
specify the "id" field explicitly; i.e. to have it auto-generated.

I learned that this is possible by including the following information in
the conf/solrconfig.xml file:

> 
> 
> 
> id
>   
> ...
> 
> 
> 
>   


(I did restart the server after adding the above text to the xml file.)

However, when I try to add documents from Java using SolrJ (without
specifying the "id" field), I get the following exception:

> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:8983/solr/sales_history: Document is
> missing mandatory uniqueKey field: id


My Java code is like this:

> SolrClient solr = new HttpSolrClient.Builder(SOLR_URL).build();
> SolrInputDocument document = new SolrInputDocument();
> document.addField(..., ...);
> document.addField(..., ...);
> UpdateResponse updateResponse = solr.add(document);


The exception is thrown from the last line above.

Is there any way to add documents from Java and have the uniqueKey field be
auto-generated?


Thank you