change default id in results clustering

2016-02-18 Thread Dmitry Kan
Hi,

Is it possible to change the id field, that defaults to 'id' in carrot
based result clustering? I have another field, 'externalId', that is
stamped on each document and would like to return it in clusters instead.

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com


Re: Facet Filter

2016-02-18 Thread Anil
Thanks Shawn. This really helps. we are using 4.10.3 now.. will look into
5.4.1. Thanks.

Regards,
Anil

On 18 February 2016 at 20:04, Shawn Heisey  wrote:

> On 2/18/2016 7:12 AM, Anil wrote:
> > Thank you, i just checked in 5.1.
> >
> > as facet fields has to be Strings and cannot be tockenized. is there any
> > way to search on case insensitive search on this field (not in a facet
> > filter scenario).
>
> If you configure docValues on the field in schema.xml and reindex, then
> the returned facets will be the original input values even if the field
> is tokenized, just as if you had used a string type without docValues.
> This should allow you to use one field for queries *and* facets.
>
> The reindex *is* required after adding docValues, and the index will be
> larger.
>
> Note that using 5.1 isn't recommended at this point.  You should use the
> latest version available.  Currently that's 5.4.1, but soon it will be 5.5.
>
> Thanks,
> Shawn
>
>


Re: Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-18 Thread Anshum Gupta
I'd suggest using CloudSolrClient. It uses ConcurrentUpdateSolrClient under
the hood and is zk aware so it would route the documents from the Client to
your Solr nodes correctly, saving you an extra hop.
Another thing to remember here is to reuse the Solr client as it is
thread-safe.

Reading up about commits would also be useful and this blog by Erick
Erickson is a good place to learn about that:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

In terms of running SolrJ on each node, you could just run a single
multi-threaded indexer that gets data from your database and injects it
into Solr. This process would run outside of Solr and could potentially run
anywhere.

As far as routing goes, I suggest you just try the default composite id
router unless you hit issues there. If you do you could read up about how
routing in SolrCloud works here:
https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/

and also about advanced concepts here:
https://lucidworks.com/blog/2014/01/06/multi-level-composite-id-routing-solrcloud/



On Thu, Feb 18, 2016 at 2:08 PM, Colin Freas  wrote:

>
> Thanks for the info, Anshum.
>
> Writing up a SolrJ program to do this is entirely within my wheelhouse.
>
> Read through some of the SolrJ docs and found some examples to start.
>
> A handful of questions if anyone has some pointers.
>
> 1. From a performance perspective, is it worth it to use
> ConcurrentUpdateSolrServer? Also, documentation says best for updates;
> does that include adding documents?
>
> 2. When I run the importer via my SolrJ program to distribute the
> indexing, I¹ll create some kind of Solr client within SolrJ and point them
> at zookeeper.  But the communication with the SQL Server db is independent
> of the communication with zookeeper, right?  In that case, is it
> possible/does it make sense to run the SolrJ program on each node, so that
> each node communicates with the DB but they¹re both communicating with zk?
>
> One more question: for document routing to specific shards, the particular
> documents I have don¹t really have a natural way for routing.  Even if
> they did, my intuition is that I want the documents randomly and evenly
> distributed across all the machines in the cluster that will perform the
> querying.  Or is that intuition wrong, and it¹s better to have documents
> that fit a search criteria sorted in some way and placed near each other
> on a single or small number of machines?
>
> Any insights much appreciated!
>
> -Colin
>
>
>
> On 2/18/16, 2:01 AM, "Anshum Gupta"  wrote:
>
> >Hi Colin,
> >
> >As per when I last checked, DIH works with SolrCloud but has it's
> >limitations. It was designed for the non-cloud mode and is single
> >threaded.
> >It runs on whatever node you set it up on and that node might not host the
> >leader for the shard a document belongs to, adding an extra hop for those
> >documents.
> >
> >SolrCloud is designed for multi-threaded indexing and I'd highly recommend
> >you to use SolrJ to speed up your indexing. Yes, that would involve
> >writing
> >some code but it would speed things up considerably.
> >
> >
> >On Wed, Feb 17, 2016 at 10:51 PM, Colin Freas  wrote:
> >
> >>
> >> I just set up a SolrCloud instance with 2 Solr nodes & another machine
> >> running zookeeper.
> >>
> >> I¹ve imported 200M records from a SQL Server database, and those records
> >> are split nicely between the 2 nodes.  Everything seems ok.
> >>
> >> I did the data import via the admin ui.  It took not quite 8 hours,
> >>which
> >> I guess is fine.  So, in the middle of the import I checked to see what
> >>was
> >> connected to the SQL Server machine.  It turned out that only the node
> >>that
> >> I had started the import on was actually connected to my database
> >>server.
> >>
> >> Is that the expected behavior?  Is there any way to have all nodes of a
> >> SolrCloud index communicate with the database during the indexing?
> >>Would
> >> that speed up indexing?  Maybe this isn¹t a bottleneck I should be
> >>worried
> >> about.
> >>
> >> Thanks,
> >> -Colin
> >>
> >
> >
> >
> >--
> >Anshum Gupta
>
>


-- 
Anshum Gupta


Re: Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-18 Thread Shawn Heisey
On 2/18/2016 3:08 PM, Colin Freas wrote:
> Thanks for the info, Anshum.
>
> Writing up a SolrJ program to do this is entirely within my wheelhouse.
>
> Read through some of the SolrJ docs and found some examples to start.
>
> A handful of questions if anyone has some pointers.
>
> 1. From a performance perspective, is it worth it to use
> ConcurrentUpdateSolrServer? Also, documentation says best for updates;
> does that include adding documents?

An add is one kind of update.

ConcurrentUpdateSolrClient (Server in 4.x and earlier) automates the
process of using multiple threads to send your update requests in
parallel, but it comes with a downside -- all exceptions that Solr or
SolrJ would normally throw due to problems with update requests will
*never* make it to your application.  The Solr server could be
completely down when you start your program, and every update request
you send will appear to succeed.

If you care about your program knowing when updates fail, use
HttpSolrClient or CloudSolrClient, and set up multiple threads that can
send updates to Solr with that object.  The client objects are fully
thread safe, so you can use the same object in multiple threads with no
problem.  You will need to know how to properly write a multi-threaded
program.

Thanks,
Shawn



Re: SOLR ranking

2016-02-18 Thread Binoy Dalal
Hi Alessandro,
Don't get me wrong. Using mm, ps and pf can and absolutely will solve his
problem.

Like I said above, my solution is meant to be a quick and dirty fix. It's
really not that complex and shouldn't take more than an hour to setup at
the app level. Moreover I suggested it because he said it was urgent for
him and setting up a proper config with mm, pf and ps might take him much
longer.

Hope this clears things up :)

On Fri, 19 Feb 2016, 05:31 Alessandro Benedetti 
wrote:

> Hey Binoi ,
> can't understand why such complexity to be honest :/
> Can you explain me why playing with :
>
> edismax
> mm ( percentage of query terms you want to be in the results)
> pf ( the fields you want to be boosted if phrase matches )
> ps ( slop to allow)
>
> Should not solve the problem instead of the 2 phases query ?
>
> Cheers
>
> On 18 February 2016 at 18:09, Binoy Dalal  wrote:
>
> > Here's an alternative solution that may be of some help.
> > Here I'm assuming that you are not directly outputting the search results
> > to the user and have some sort of layer between the results from solr and
> > presentation to the user where some additional processing can be
> performed.
> >
> > 1) You already know that you want phrase matches to show up higher than
> > single matches. In this case, why not do an explicit phrase match first,
> > with some slop or as is based on how close you want the phrase terms be
> to
> > each other.
> > 2) Once you have the results from the first query, fire an OR query with
> > your terms and get those results.
> > 3) Put results from (2) after (1) and present to the user. This happens
> in
> > the app layer.
> >
> > This is essentially the same as running a query as such: "Rheumatoid
> > Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry
> > about the ordering because you're sorting your results.
> >
> > Now, this will obviously take more time since you're querying twice and
> > then doing the addtional processing in the app layer, but provided your
> > architecture is balanced enough and can cope with a little extra load, I
> do
> > not think that your performance will take that bad a hit. Moreover since
> > you're in a hurry, you could implement this as a quick and dirty solution
> > to meet the project goals, provided it fits the acceptance parameters and
> > then later play around with the scoring/sorting and figure out the best
> > possible setup to suit your needs.
> >
> > On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> > > Hi Nitin,
> > > Can you send us how your parsed query looks like (from debug output).
> > >
> > > Thanks,
> > > Emir
> > >
> > > On 17.02.2016 08:38, Nitin.K wrote:
> > > > Hi Binoy,
> > > >
> > > > We are searching for both phrases and individual words
> > > > but we want that only those documents which are having phrases will
> > come
> > > > first in the order and then the individual app.
> > > >
> > > > termPositions = true is also not working in my case.
> > > >
> > > > I have also removed the string type from copy fields. kindly look
> into
> > > the
> > > > changed configuration below:
> > > >
> > > > Hi Emir,
> > > >
> > > > I have changed the cofiguration as per your suggestion, added pf2 /
> > pf3.
> > > > Yes, i saw the difference but still the ranking is not getting
> followed
> > > > correctly in case of phrases.
> > > >
> > > > Changed configuration;
> > > >
> > > >  > > stored="true"
> > > > />
> > > >  stored="false"
> > > />
> > > >
> > > >  > > > stored="true"/>
> > > >  > > stored="false"/>
> > > >
> > > >  > > > multiValued="true"/>
> > > >  stored="false"
> > > > multiValued="true"/>
> > > >
> > > >  > > > multiValued="true"/>
> > > >  > stored="false"
> > > > multiValued="true"/>
> > > >
> > > >  > stored="false"/>
> > > >
> > > > Copy fields again for the reference :
> > > >
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >
> > > > Added following field type:
> > > >
> > > >  > > > positionIncrementGap="100" omitNorms="true">
> > > >   
> > > >   
> > > >ignoreCase="true"
> > > > words="stopwords.txt" />
> > > >   
> > > >   
> > > > 
> > > >
> > > > Removed the string type from the copy fields.
> > > >
> > > > Changed Query :
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true;
> > > > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > > > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > > > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > > > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3
> > > >
> > > > After making these changes, I am able to get my search results
> > correctly
> > > for
> > > > a single term but in case of phrase search, i am still not able to
> get
> > > the
> > > > results in 

RE: Hitting complex multilevel pivot queries in solr

2016-02-18 Thread Lewin Joy (TMS)
Hi,

The fields are single valued. But, the requirement will be at query time rather 
than index time. This is because, we will be having many such scenarios with 
different fields.
I hoped we could concatenate at query time. I just need top 100 counts from the 
leaf level of the pivot.
I'm also looking at facet.threads which could give responses to an extent. But 
It does not solve my issue.

Hovewer, the Endeca equivalent of this application seems to be working well. 
Example Endeca Query: 

RETURN Results as SELECT Count(1) as "Total" GROUP BY "Country", "State", 
"part_num", "part_code" ORDER BY "Total" desc PAGE(0,100)


-Lewin


-Original Message-
From: Alvaro Cabrerizo [mailto:topor...@gmail.com] 
Sent: Thursday, February 18, 2016 3:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Hitting complex multilevel pivot queries in solr

Hi,

The idea of copying fields into a new one (or various) during indexing and then 
facet the new field (or fields) looks promising. More information about data 
will be helpful (for example if the fields:country, state.. are single or 
multivalued). For example if all of the fields are single valued, then the 
combination of country,state,part_num,part_code looks like a file path 
country/state/part_num/part_code and maybe (don't know your business rules), 
the solr.PathHierarchyTokenizerFactory
 could be an 
option to research instead of facet pivoting. On the other hand, I don't think 
that the copy field 
 feature can 
help you to build that auxiliary field. I think that configuring an 
updateRequestProcessorChain 
and building your own 
UpdateRequestProcessorFactory to concat the country,state,part_num,part_code 
values can be better way.

Hope it helps.

On Thu, Feb 18, 2016 at 8:47 PM, Lewin Joy (TMS) 
wrote:

> Still splitting my head over this one.
> Let me know if anyone has any idea I could try.
>
> Or, is there a way to concatenate these 4 fields onto a dynamic field 
> and do a facet.field on top of this one?
>
> Thanks. Any idea is helpful to try.
>
> -Lewin
>
> -Original Message-
> From: Lewin Joy (TMS) [mailto:lewin@toyota.com]
> Sent: Wednesday, February 17, 2016 4:29 PM
> To: solr-user@lucene.apache.org
> Subject: Hitting complex multilevel pivot queries in solr
>
> Hi,
>
> Is there an efficient way to hit solr for complex time consuming queries?
> I have a requirement where I need to pivot on 4 fields. Two fields 
> contain facet values close to 50. And the other 2 fields have 5000 and 8000 
> values.
> Pivoting on the 4 fields would crash the server.
>
> Is there a better way to get the data?
>
> Example Query Params looks like this:
> =country,state,part_num,part_code
>
> Thanks,
> Lewin
>
>
>
>


Re: SOLR ranking

2016-02-18 Thread Alessandro Benedetti
Hey Binoi ,
can't understand why such complexity to be honest :/
Can you explain me why playing with :

edismax
mm ( percentage of query terms you want to be in the results)
pf ( the fields you want to be boosted if phrase matches )
ps ( slop to allow)

Should not solve the problem instead of the 2 phases query ?

Cheers

On 18 February 2016 at 18:09, Binoy Dalal  wrote:

> Here's an alternative solution that may be of some help.
> Here I'm assuming that you are not directly outputting the search results
> to the user and have some sort of layer between the results from solr and
> presentation to the user where some additional processing can be performed.
>
> 1) You already know that you want phrase matches to show up higher than
> single matches. In this case, why not do an explicit phrase match first,
> with some slop or as is based on how close you want the phrase terms be to
> each other.
> 2) Once you have the results from the first query, fire an OR query with
> your terms and get those results.
> 3) Put results from (2) after (1) and present to the user. This happens in
> the app layer.
>
> This is essentially the same as running a query as such: "Rheumatoid
> Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry
> about the ordering because you're sorting your results.
>
> Now, this will obviously take more time since you're querying twice and
> then doing the addtional processing in the app layer, but provided your
> architecture is balanced enough and can cope with a little extra load, I do
> not think that your performance will take that bad a hit. Moreover since
> you're in a hurry, you could implement this as a quick and dirty solution
> to meet the project goals, provided it fits the acceptance parameters and
> then later play around with the scoring/sorting and figure out the best
> possible setup to suit your needs.
>
> On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi Nitin,
> > Can you send us how your parsed query looks like (from debug output).
> >
> > Thanks,
> > Emir
> >
> > On 17.02.2016 08:38, Nitin.K wrote:
> > > Hi Binoy,
> > >
> > > We are searching for both phrases and individual words
> > > but we want that only those documents which are having phrases will
> come
> > > first in the order and then the individual app.
> > >
> > > termPositions = true is also not working in my case.
> > >
> > > I have also removed the string type from copy fields. kindly look into
> > the
> > > changed configuration below:
> > >
> > > Hi Emir,
> > >
> > > I have changed the cofiguration as per your suggestion, added pf2 /
> pf3.
> > > Yes, i saw the difference but still the ranking is not getting followed
> > > correctly in case of phrases.
> > >
> > > Changed configuration;
> > >
> > >  > stored="true"
> > > />
> > >  > />
> > >
> > >  > > stored="true"/>
> > >  > stored="false"/>
> > >
> > >  > > multiValued="true"/>
> > >  > > multiValued="true"/>
> > >
> > >  > > multiValued="true"/>
> > >  stored="false"
> > > multiValued="true"/>
> > >
> > >  stored="false"/>
> > >
> > > Copy fields again for the reference :
> > >
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > > Added following field type:
> > >
> > >  > > positionIncrementGap="100" omitNorms="true">
> > >   
> > >   
> > >> > words="stopwords.txt" />
> > >   
> > >   
> > > 
> > >
> > > Removed the string type from the copy fields.
> > >
> > > Changed Query :
> > >
> > >
> >
> http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true;
> > > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3
> > >
> > > After making these changes, I am able to get my search results
> correctly
> > for
> > > a single term but in case of phrase search, i am still not able to get
> > the
> > > results in the correct order.
> > >
> > > Hi Modassar,
> > >
> > > I tried using mm=100, but the order is still the same.
> > >
> > > Hi Alessandro,
> > >
> > > I have not yet tried the slope parameter. By default it is taking it as
> > 1.0
> > > when i looked it in debug mode. Will revert you definitely. So, let me
> > try
> > > this option too.
> > >
> > > All,
> > >
> > > Please suggest if anyone is having any other suggestion on this. I have
> > to
> > > implement it on urgent basis and i think i am very close to it. Thanks
> > all
> > > of you. I have reached to this level just because of you guys.
> > >
> > > Thanks and Regards,
> > > Nitin
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >

SolrCloud shards marked as down and Does not recovery connection to zk

2016-02-18 Thread KNitin
Hi,

I am  indexing about 5M docs in a 4 shard and 1 replica setup. During
indexing one of the shards is marked as down in zookeeper but when i tail
the logs all the updates are received in the shard and a hard commit at the
end of the job also succeeds.  (The auto commit is set to trigger every 10
mins or 150K documents).  The shard does not recover until i force restart
solr on that node.  The mem/cpu/load on solr is very less during this time.

How and when does solr try to reconnect to zk?

Thanks,
Nitin


Re: Hitting complex multilevel pivot queries in solr

2016-02-18 Thread Alvaro Cabrerizo
Hi,

The idea of copying fields into a new one (or various) during indexing and
then facet the new field (or fields) looks promising. More information
about data will be helpful (for example if the fields:country, state.. are
single or multivalued). For example if all of the fields are single valued,
then the combination of country,state,part_num,part_code looks like a file
path country/state/part_num/part_code and maybe (don't know your business
rules), the solr.PathHierarchyTokenizerFactory
 could be an
option to research instead of facet pivoting. On the other hand, I don't
think that the copy field
 feature
can help you to build that auxiliary field. I think that configuring an
updateRequestProcessorChain
and building your
own UpdateRequestProcessorFactory to concat the
country,state,part_num,part_code
values can be better way.

Hope it helps.

On Thu, Feb 18, 2016 at 8:47 PM, Lewin Joy (TMS) 
wrote:

> Still splitting my head over this one.
> Let me know if anyone has any idea I could try.
>
> Or, is there a way to concatenate these 4 fields onto a dynamic field and
> do a facet.field on top of this one?
>
> Thanks. Any idea is helpful to try.
>
> -Lewin
>
> -Original Message-
> From: Lewin Joy (TMS) [mailto:lewin@toyota.com]
> Sent: Wednesday, February 17, 2016 4:29 PM
> To: solr-user@lucene.apache.org
> Subject: Hitting complex multilevel pivot queries in solr
>
> Hi,
>
> Is there an efficient way to hit solr for complex time consuming queries?
> I have a requirement where I need to pivot on 4 fields. Two fields contain
> facet values close to 50. And the other 2 fields have 5000 and 8000 values.
> Pivoting on the 4 fields would crash the server.
>
> Is there a better way to get the data?
>
> Example Query Params looks like this:
> =country,state,part_num,part_code
>
> Thanks,
> Lewin
>
>
>
>


Re: How to use DocValues with TextField

2016-02-18 Thread Harry Yoo
Thanks for the pointer. 

Please advise me how I can contribute.

H

> On Jan 27, 2016, at 2:16 AM, Toke Eskildsen  wrote:
> 
> Erick Erickson  wrote:
>> DocValues was designed to support unanalyzed types
>> originally. I don't know that code, but given my respect
>> for the people who wrote I'd be amazed if there weren't
>> very good reasons this is true. I suspect your work-around
>> is going to be "surprising".
> 
> Hoss talked about this at the last Lucene/Solr Revolution and has opened
> https://issues.apache.org/jira/browse/SOLR-8362
> 
> Harry: Maybe you could follow up on that JIRA issue?
> 
> - Toke Eskildsen



Re: How to use DocValues with TextField

2016-02-18 Thread Harry Yoo
RE: separating a column into two for different behavior.

Yes, that is exactly I was advised multiple time. However, it will make a 
problem when I apply it to my application.
I have a one core that contains more than 50 columns (out of 100) want to be 
searched by case-insensitive and partial match as well as faceting.
Having too many columns is not a real problem. If I want to search, sort, 
faceting, I need to map column relationships, that became a headache. 

RE: scalability / performance

my current index size about 1.1TB, 25 cores, biggest core has 435M records. 
Don’t have or experienced any memory or performance issue. 


I am currently maintaining own repo just for this feature. It would be nice if 
this is supported out of box.

Harry

> On Jan 26, 2016, at 11:28 AM, Erick Erickson  wrote:
> 
> DocValues was designed to support unanalyzed types
> originally. I don't know that code, but given my respect
> for the people who wrote I'd be amazed if there weren't
> very good reasons this is true. I suspect your work-around
> is going to be "surprising".
> 
> And have you tested your change at scale? I suspect
> searching won't scale well.
> 
> bq:  I need a case-insensitive search for a relatively short string
> and at the same time, I need faceting on the original string
> 
> There's no reason at all to change code to do this. Just use a copyField.
> The field that's to be faceted on is a "string" type with docValues=true, and
> the searchable field is some text type with the appropriate analysis chain.
> 
> This doesn't really make much difference memory wise since the indexing
> and docValues are separate in the first place. I.e. if I specify
> indexed=true and docValues=true I get _two_ sets of date indexed.
> 
> Best,
> Erick
> 
> On Tue, Jan 26, 2016 at 8:50 AM, Harry Yoo  wrote:
>> Hi, I actually needed this functionality for a long time and I made up an 
>> extended data type to work around.
>> 
>> In my use case, I need a case-insensitive search for a relatively short 
>> string and at the same time, I need faceting on the original string. For 
>> example, “Human, Home sapiens’ is an original input, and I want it to be 
>> searched by human, Human, homo sapiens or Homo Sapiens.
>> 
>> Here is my workaround,
>> 
>> public class TextDocValueField extends TextField {
>> 
>>  @Override
>>  public List createFields(SchemaField field, Object value, 
>> float boost) {
>>if (field.hasDocValues()) {
>>  List fields = new ArrayList<>();
>>  fields.add(createField(field, value, boost));
>>  final BytesRef bytes = new BytesRef(value.toString());
>>  if (field.multiValued()) {
>>fields.add(new SortedSetDocValuesField(field.getName(), bytes));
>>  } else {
>>fields.add(new SortedDocValuesField(field.getName(), bytes));
>>  }
>>  return fields;
>>} else {
>> //  return Collections.singletonList(createField(field, value, boost));
>>  return super.createFields(field, value, boost);
>>}
>>  }
>> 
>>  @Override
>>  public void checkSchemaField(final SchemaField field) {
>>// do nothing
>>  }
>> 
>>  @Override
>>  public boolean multiValuedFieldCache() {
>>return false;
>>  }
>> }
>> 
>> I wish this can be supported by solr so that I don’t have to maintain my own 
>> repo.
>> 
>> 
>> 
>> What do you think?
>> 
>> Regards,
>> Harry
>> 
>> 
>>> On Jan 5, 2016, at 10:51 PM, Alok Bhandari 
>>>  wrote:
>>> 
>>> Thanks Markus.
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-18 Thread Colin Freas

Thanks for the info, Anshum.

Writing up a SolrJ program to do this is entirely within my wheelhouse.

Read through some of the SolrJ docs and found some examples to start.

A handful of questions if anyone has some pointers.

1. From a performance perspective, is it worth it to use
ConcurrentUpdateSolrServer? Also, documentation says best for updates;
does that include adding documents?

2. When I run the importer via my SolrJ program to distribute the
indexing, I¹ll create some kind of Solr client within SolrJ and point them
at zookeeper.  But the communication with the SQL Server db is independent
of the communication with zookeeper, right?  In that case, is it
possible/does it make sense to run the SolrJ program on each node, so that
each node communicates with the DB but they¹re both communicating with zk?

One more question: for document routing to specific shards, the particular
documents I have don¹t really have a natural way for routing.  Even if
they did, my intuition is that I want the documents randomly and evenly
distributed across all the machines in the cluster that will perform the
querying.  Or is that intuition wrong, and it¹s better to have documents
that fit a search criteria sorted in some way and placed near each other
on a single or small number of machines?

Any insights much appreciated!

-Colin



On 2/18/16, 2:01 AM, "Anshum Gupta"  wrote:

>Hi Colin,
>
>As per when I last checked, DIH works with SolrCloud but has it's
>limitations. It was designed for the non-cloud mode and is single
>threaded.
>It runs on whatever node you set it up on and that node might not host the
>leader for the shard a document belongs to, adding an extra hop for those
>documents.
>
>SolrCloud is designed for multi-threaded indexing and I'd highly recommend
>you to use SolrJ to speed up your indexing. Yes, that would involve
>writing
>some code but it would speed things up considerably.
>
>
>On Wed, Feb 17, 2016 at 10:51 PM, Colin Freas  wrote:
>
>>
>> I just set up a SolrCloud instance with 2 Solr nodes & another machine
>> running zookeeper.
>>
>> I¹ve imported 200M records from a SQL Server database, and those records
>> are split nicely between the 2 nodes.  Everything seems ok.
>>
>> I did the data import via the admin ui.  It took not quite 8 hours,
>>which
>> I guess is fine.  So, in the middle of the import I checked to see what
>>was
>> connected to the SQL Server machine.  It turned out that only the node
>>that
>> I had started the import on was actually connected to my database
>>server.
>>
>> Is that the expected behavior?  Is there any way to have all nodes of a
>> SolrCloud index communicate with the database during the indexing?
>>Would
>> that speed up indexing?  Maybe this isn¹t a bottleneck I should be
>>worried
>> about.
>>
>> Thanks,
>> -Colin
>>
>
>
>
>-- 
>Anshum Gupta



Re: Display entire string containing query string

2016-02-18 Thread Alvaro Cabrerizo
Hi,

To understand Binoy's answer, please check the: The fl (Field List)
Parameter
.
If you want to include: "*fragments of documents that match the user's
query to be included with the query response*", please check the Highlighting
feature .
Solr is so well documented!

Regards.

On Thu, Feb 18, 2016 at 7:19 PM, Tom Running  wrote:

> Hello
> Thank you for your reply.
> I am wondering if you can clarify a bit more for me. Is
> field_where_string_may_be_present something that I have to specify? I am
> searching HTML page.
> For example if I search for the word "name" I am trying to display the
> entire sentence containing  "name = T" or maybe "name: T". Ultimately by
> searching for the string "name" I am trying to find the value of name.
>
> Thanks for your time. I appreciate your help
> -T
> On Feb 18, 2016 1:18 AM, "Binoy Dalal"  wrote:
>
> > Append =
> >
> > On Thu, 18 Feb 2016, 11:35 Tom Running  wrote:
> >
> > > Hello,
> > >
> > > I am working on a project using Solr to search data from retrieved from
> > > Nutch.
> > >
> > > I have successfully integrated Nutch with Solr, and Solr is able to
> > search
> > > Nutch's data.
> > >
> > > However I am having a bit of a problem. If I query Solr, it will bring
> > back
> > > the numfound and which document the query string was found in, but it
> > will
> > > not display the string that contains the query string.
> > >
> > > Can anyone help on how to display the entire string that contains the
> > > query.
> > >
> > >
> > > I appreciate your time and guidance. Thank you so much!
> > >
> > > -T
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>


RE: Hitting complex multilevel pivot queries in solr

2016-02-18 Thread Lewin Joy (TMS)
Still splitting my head over this one. 
Let me know if anyone has any idea I could try.

Or, is there a way to concatenate these 4 fields onto a dynamic field and do a 
facet.field on top of this one?

Thanks. Any idea is helpful to try.

-Lewin

-Original Message-
From: Lewin Joy (TMS) [mailto:lewin@toyota.com] 
Sent: Wednesday, February 17, 2016 4:29 PM
To: solr-user@lucene.apache.org
Subject: Hitting complex multilevel pivot queries in solr

Hi,

Is there an efficient way to hit solr for complex time consuming queries?
I have a requirement where I need to pivot on 4 fields. Two fields contain 
facet values close to 50. And the other 2 fields have 5000 and 8000 values. 
Pivoting on the 4 fields would crash the server.

Is there a better way to get the data?

Example Query Params looks like this:
=country,state,part_num,part_code

Thanks,
Lewin





Re: Filter query and Faceting problem

2016-02-18 Thread Filippo La Torre
Thank you Alessandro for the suggestion (i have to investigate about solrJ), 
and thank you Mikhail for the explanation.
My problem isn’t related to that simple query and it don't depends on framework 
java.
Maybe i’m not smart enough, but if i have this query:

WHERE (macro_category = DRINKS AND micro_category = WATER) OR 
macro_catogory=FOODS

where i say:”i want drinks that are only water and also all the foods”, i don’t 
understand how to make this with separated fq for fields.Obviously this:

fq={!tag=MACRO}macrocategory:(DRINKS FOOD)={!tag=MICRO}microcategory:WATER

doesn’t work.

Thanks,
Filippo


> On 18 Feb 2016, at 15:47, Alessandro Benedetti  wrote:
> 
> As Mikhail suggests, are you sure spring data solr is the right tool for
> you ?
> Probably is a great tool for a newbie ( to be honest I just noticed it
> right now) but maybe you need a more customisable approach to build
> tag/exclusion filter queries on top of facets.
> You could use spring data but then using solrJ for customised approaches.
> 
> Cheers
> 
> On 18 February 2016 at 13:41, Mikhail Khludnev 
> wrote:
> 
>> Tagging works only in the way I describe. Otherwise you might need to need
>> issue separate queries.
>> However, are you sure that this case shouldn't be done in more regular way:
>> fq={!tag=foo}department:foods={!tag=foo}macro_category.key:(drinks food)
>> 
>> On Thu, Feb 18, 2016 at 1:48 PM, Filippo La Torre <
>> filippo.lato...@stentle.com> wrote:
>> 
>>> Hi,
>>> thank for your response.
>>> The problem is that i make this query by Spring Data Solr and i have to
>>> make complex  AND/OR.
>>> Example :
>>> 
>>> ( department:foods AND macro_category.key:drinks) OR ( department:foods
>>> AND macro_category.key:food)
>>> 
>>> What is the best practice the make complex AND/OR query and tagging?
>>> 
>>> Thanks,
>>> Filippo
 On 18 Feb 2016, at 11:19, Mikhail Khludnev >> 
>>> wrote:
 
 just do
 
>>> 
>> fq={!tag=DEPARTMENT}department:foods={!tag=MACROCATEGORY}macro_category.key:drinks
 tagging in the middle of the query may somehow work, but it's not a
 recommended way.
 
 On Thu, Feb 18, 2016 at 11:48 AM, Filippo La Torre <
 filippo.lato...@stentle.com> wrote:
 
> Hello everyone,
> 
> this is my first mail to solr user mailing list.
> I’m new to Solr too, my Solr version is 5.4.1.
> I have a problem with filter query and faceting, i have to make a
>> filter
> query with AND/OR using also faceting (i will make this query using
>>> Spring
> Data Solr).
> It seems that the same filter query with brackets and without brackets
> give different result. The version with brackets don’t see
>> tag/exclude.
>>> How
> it is possible?
> 
> Best regards.
> 
> With brackets:
> 
> {
> "responseHeader": {
>   "status": 0,
>   "QTime": 2,
>   "params": {
> "q": "*:*",
> "facet.field":
>> "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
> "indent": "true",
> "fq": "({!tag=DEPARTMENT}department:foods AND
> {!tag=MACROCATEGORY}macro_category.key:drinks)",
> "wt": "json",
> "facet": "true",
> "_": "1455700122431"
>   }
> },
> "response": {
>   "numFound": 1,
>   "start": 0,
>   "docs": [
> {
>   "id": "5672a222fa4d0e4c0d965cc5",
>   "published": true,
>   "micro_category.key": "drinks-beer",
>   "department": "foods",
>   "macro_category.key": "drinks",
>   "retail_price": 1,
>   "selling_price": 1
> }
>   ]
> },
> "facet_counts": {
>   "facet_queries": {},
>   "facet_fields": {
> "macro_category.key": [
>   "drinks",
>   1,
>   "box-collection",
>   0
> ]
>   },
>   "facet_dates": {},
>   "facet_ranges": {},
>   "facet_intervals": {},
>   "facet_heatmaps": {}
> }
> }
> 
> Without brackets:
> 
> {
> "responseHeader": {
>   "status": 0,
>   "QTime": 1,
>   "params": {
> "q": "*:*",
> "facet.field":
>> "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
> "indent": "true",
> "fq": "{!tag=DEPARTMENT}department:foods AND
> {!tag=MACROCATEGORY}macro_category.key:drinks",
> "wt": "json",
> "facet": "true",
> "_": "1455702347556"
>   }
> },
> "response": {
>   "numFound": 1,
>   "start": 0,
>   "docs": [
> {
>   "id": "5672a222fa4d0e4c0d965cc5",
>   "published": true,
>   "micro_category.key": "drinks-beer",
>   "department": "foods",
>   "macro_category.key": "drinks",
>   "retail_price": 1,
>   "selling_price": 1
> }
>   ]
> },
> "facet_counts": {
>   "facet_queries": {},
>   "facet_fields": {
>  

Re: Display entire string containing query string

2016-02-18 Thread Tom Running
Hello
Thank you for your reply.
I am wondering if you can clarify a bit more for me. Is
field_where_string_may_be_present something that I have to specify? I am
searching HTML page.
For example if I search for the word "name" I am trying to display the
entire sentence containing  "name = T" or maybe "name: T". Ultimately by
searching for the string "name" I am trying to find the value of name.

Thanks for your time. I appreciate your help
-T
On Feb 18, 2016 1:18 AM, "Binoy Dalal"  wrote:

> Append =
>
> On Thu, 18 Feb 2016, 11:35 Tom Running  wrote:
>
> > Hello,
> >
> > I am working on a project using Solr to search data from retrieved from
> > Nutch.
> >
> > I have successfully integrated Nutch with Solr, and Solr is able to
> search
> > Nutch's data.
> >
> > However I am having a bit of a problem. If I query Solr, it will bring
> back
> > the numfound and which document the query string was found in, but it
> will
> > not display the string that contains the query string.
> >
> > Can anyone help on how to display the entire string that contains the
> > query.
> >
> >
> > I appreciate your time and guidance. Thank you so much!
> >
> > -T
> >
> --
> Regards,
> Binoy Dalal
>


Re: SOLR ranking

2016-02-18 Thread Binoy Dalal
Here's an alternative solution that may be of some help.
Here I'm assuming that you are not directly outputting the search results
to the user and have some sort of layer between the results from solr and
presentation to the user where some additional processing can be performed.

1) You already know that you want phrase matches to show up higher than
single matches. In this case, why not do an explicit phrase match first,
with some slop or as is based on how close you want the phrase terms be to
each other.
2) Once you have the results from the first query, fire an OR query with
your terms and get those results.
3) Put results from (2) after (1) and present to the user. This happens in
the app layer.

This is essentially the same as running a query as such: "Rheumatoid
Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry
about the ordering because you're sorting your results.

Now, this will obviously take more time since you're querying twice and
then doing the addtional processing in the app layer, but provided your
architecture is balanced enough and can cope with a little extra load, I do
not think that your performance will take that bad a hit. Moreover since
you're in a hurry, you could implement this as a quick and dirty solution
to meet the project goals, provided it fits the acceptance parameters and
then later play around with the scoring/sorting and figure out the best
possible setup to suit your needs.

On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Nitin,
> Can you send us how your parsed query looks like (from debug output).
>
> Thanks,
> Emir
>
> On 17.02.2016 08:38, Nitin.K wrote:
> > Hi Binoy,
> >
> > We are searching for both phrases and individual words
> > but we want that only those documents which are having phrases will come
> > first in the order and then the individual app.
> >
> > termPositions = true is also not working in my case.
> >
> > I have also removed the string type from copy fields. kindly look into
> the
> > changed configuration below:
> >
> > Hi Emir,
> >
> > I have changed the cofiguration as per your suggestion, added pf2 / pf3.
> > Yes, i saw the difference but still the ranking is not getting followed
> > correctly in case of phrases.
> >
> > Changed configuration;
> >
> >  stored="true"
> > />
> >  />
> >
> >  > stored="true"/>
> >  stored="false"/>
> >
> >  > multiValued="true"/>
> >  > multiValued="true"/>
> >
> >  > multiValued="true"/>
> >  > multiValued="true"/>
> >
> > 
> >
> > Copy fields again for the reference :
> >
> > 
> > 
> > 
> > 
> > 
> >
> > Added following field type:
> >
> >  > positionIncrementGap="100" omitNorms="true">
> >   
> >   
> >> words="stopwords.txt" />
> >   
> >   
> > 
> >
> > Removed the string type from the copy fields.
> >
> > Changed Query :
> >
> >
> http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true;
> > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3
> >
> > After making these changes, I am able to get my search results correctly
> for
> > a single term but in case of phrase search, i am still not able to get
> the
> > results in the correct order.
> >
> > Hi Modassar,
> >
> > I tried using mm=100, but the order is still the same.
> >
> > Hi Alessandro,
> >
> > I have not yet tried the slope parameter. By default it is taking it as
> 1.0
> > when i looked it in debug mode. Will revert you definitely. So, let me
> try
> > this option too.
> >
> > All,
> >
> > Please suggest if anyone is having any other suggestion on this. I have
> to
> > implement it on urgent basis and i think i am very close to it. Thanks
> all
> > of you. I have reached to this level just because of you guys.
> >
> > Thanks and Regards,
> > Nitin
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
> --
Regards,
Binoy Dalal


Re: Reverse Eningeer Query For a Given Result Set?

2016-02-18 Thread Jack Krupansky
Out of the box? No. Could you develop one? Probably, or at least a rough
approximation, at least some of the time... but probably at a cost
significantly greater than converting queries by hand.

If it is taking you 2-4 hours per query then that suggests that the query
complexity is not amenable to any simple mechanical reverse engineering.

What aspects of the conversion is taking your so many hours? A few examples
would be helpful.

A mechanical reverse engineering from results would likely reduce the
semantic content of the original query, so that the query may then return a
false positive or false negative as new documents are added to the index
that are no longer in the same pattern as the old results by still within
the pattern of the original Oracle query. The trick may be whether the
delta is meaningful for the actual application use case.

-- Jack Krupansky

On Thu, Feb 18, 2016 at 4:07 AM, Christian Effertz 
wrote:

> Hi,
>
> Can I somehow feed Solr with a result set or a list of primary keys and get
> the shortest query that leads to this result? In other terms, can I reverse
> engineer a query for a given result set?
>
> Some background why I ask this question:
> We are currently migrating a search application from Oracle Text to Solr.
> Our users have several (>30) complex queries that we need to migrate to our
> new Solr index. This can be done by hand, but is rather time consuming. To
> get an idea of how long the whole task would need, we started with a hand
> full of them. We spent ~2-4h per query to get everything right.
>
> Thank you for your input
>


Re: Filter query and Faceting problem

2016-02-18 Thread Alessandro Benedetti
As Mikhail suggests, are you sure spring data solr is the right tool for
you ?
Probably is a great tool for a newbie ( to be honest I just noticed it
right now) but maybe you need a more customisable approach to build
tag/exclusion filter queries on top of facets.
You could use spring data but then using solrJ for customised approaches.

Cheers

On 18 February 2016 at 13:41, Mikhail Khludnev 
wrote:

> Tagging works only in the way I describe. Otherwise you might need to need
> issue separate queries.
> However, are you sure that this case shouldn't be done in more regular way:
> fq={!tag=foo}department:foods={!tag=foo}macro_category.key:(drinks food)
>
> On Thu, Feb 18, 2016 at 1:48 PM, Filippo La Torre <
> filippo.lato...@stentle.com> wrote:
>
> > Hi,
> > thank for your response.
> > The problem is that i make this query by Spring Data Solr and i have to
> > make complex  AND/OR.
> > Example :
> >
> > ( department:foods AND macro_category.key:drinks) OR ( department:foods
> > AND macro_category.key:food)
> >
> > What is the best practice the make complex AND/OR query and tagging?
> >
> > Thanks,
> > Filippo
> > > On 18 Feb 2016, at 11:19, Mikhail Khludnev  >
> > wrote:
> > >
> > > just do
> > >
> >
> fq={!tag=DEPARTMENT}department:foods={!tag=MACROCATEGORY}macro_category.key:drinks
> > > tagging in the middle of the query may somehow work, but it's not a
> > > recommended way.
> > >
> > > On Thu, Feb 18, 2016 at 11:48 AM, Filippo La Torre <
> > > filippo.lato...@stentle.com> wrote:
> > >
> > >> Hello everyone,
> > >>
> > >> this is my first mail to solr user mailing list.
> > >> I’m new to Solr too, my Solr version is 5.4.1.
> > >> I have a problem with filter query and faceting, i have to make a
> filter
> > >> query with AND/OR using also faceting (i will make this query using
> > Spring
> > >> Data Solr).
> > >> It seems that the same filter query with brackets and without brackets
> > >> give different result. The version with brackets don’t see
> tag/exclude.
> > How
> > >> it is possible?
> > >>
> > >> Best regards.
> > >>
> > >> With brackets:
> > >>
> > >> {
> > >>  "responseHeader": {
> > >>"status": 0,
> > >>"QTime": 2,
> > >>"params": {
> > >>  "q": "*:*",
> > >>  "facet.field":
> "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
> > >>  "indent": "true",
> > >>  "fq": "({!tag=DEPARTMENT}department:foods AND
> > >> {!tag=MACROCATEGORY}macro_category.key:drinks)",
> > >>  "wt": "json",
> > >>  "facet": "true",
> > >>  "_": "1455700122431"
> > >>}
> > >>  },
> > >>  "response": {
> > >>"numFound": 1,
> > >>"start": 0,
> > >>"docs": [
> > >>  {
> > >>"id": "5672a222fa4d0e4c0d965cc5",
> > >>"published": true,
> > >>"micro_category.key": "drinks-beer",
> > >>"department": "foods",
> > >>"macro_category.key": "drinks",
> > >>"retail_price": 1,
> > >>"selling_price": 1
> > >>  }
> > >>]
> > >>  },
> > >>  "facet_counts": {
> > >>"facet_queries": {},
> > >>"facet_fields": {
> > >>  "macro_category.key": [
> > >>"drinks",
> > >>1,
> > >>"box-collection",
> > >>0
> > >>  ]
> > >>},
> > >>"facet_dates": {},
> > >>"facet_ranges": {},
> > >>"facet_intervals": {},
> > >>"facet_heatmaps": {}
> > >>  }
> > >> }
> > >>
> > >> Without brackets:
> > >>
> > >> {
> > >>  "responseHeader": {
> > >>"status": 0,
> > >>"QTime": 1,
> > >>"params": {
> > >>  "q": "*:*",
> > >>  "facet.field":
> "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
> > >>  "indent": "true",
> > >>  "fq": "{!tag=DEPARTMENT}department:foods AND
> > >> {!tag=MACROCATEGORY}macro_category.key:drinks",
> > >>  "wt": "json",
> > >>  "facet": "true",
> > >>  "_": "1455702347556"
> > >>}
> > >>  },
> > >>  "response": {
> > >>"numFound": 1,
> > >>"start": 0,
> > >>"docs": [
> > >>  {
> > >>"id": "5672a222fa4d0e4c0d965cc5",
> > >>"published": true,
> > >>"micro_category.key": "drinks-beer",
> > >>"department": "foods",
> > >>"macro_category.key": "drinks",
> > >>"retail_price": 1,
> > >>"selling_price": 1
> > >>  }
> > >>]
> > >>  },
> > >>  "facet_counts": {
> > >>"facet_queries": {},
> > >>"facet_fields": {
> > >>  "macro_category.key": [
> > >>"box-collection",
> > >>2,
> > >>"drinks",
> > >>1
> > >>  ]
> > >>},
> > >>"facet_dates": {},
> > >>"facet_ranges": {},
> > >>"facet_intervals": {},
> > >>"facet_heatmaps": {}
> > >>  }
> > >> }
> > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > 
> >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> 

Re: Facet Filter

2016-02-18 Thread Shawn Heisey
On 2/18/2016 7:12 AM, Anil wrote:
> Thank you, i just checked in 5.1.
>
> as facet fields has to be Strings and cannot be tockenized. is there any
> way to search on case insensitive search on this field (not in a facet
> filter scenario).

If you configure docValues on the field in schema.xml and reindex, then
the returned facets will be the original input values even if the field
is tokenized, just as if you had used a string type without docValues. 
This should allow you to use one field for queries *and* facets.

The reindex *is* required after adding docValues, and the index will be
larger.

Note that using 5.1 isn't recommended at this point.  You should use the
latest version available.  Currently that's 5.4.1, but soon it will be 5.5.

Thanks,
Shawn



Re: Unable to query the spellchecker in a distributed way

2016-02-18 Thread Damien Picard
I got it ; in Solr 4.4, the component
org.apache.solr.handler.component.SpellCheckComponent
didn't implement the method
distributedProcess(ResponseBuilder rb) which is necessary to
org.apache.solr.handler.component.SearchHandler to handle distributed
searches the right way.

And it seems that with 4.10, the SpellCheckComponent did not too...

Do you have a workaround for these versions ?


2016-01-28 14:20 GMT+01:00 Damien Picard :

> (we use Solr 4.4)
>
> 2016-01-28 11:07 GMT+01:00 Damien Picard :
>
>> Hi,
>>
>> We are using SolrCloud (4 nodes) and we have defined a suggester using
>> the spellcheck component.
>>
>> The suggester is defined as :
>>
>> 
>>   
>> suggestOpeGes
>> > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> > name="classname">org.apache.solr.spelling.suggest.Suggester
>> ref_opegestion
>> 0
>> true
>> true
>>   
>>   
>> suggestRefCre
>> > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> > name="classname">org.apache.solr.spelling.suggest.Suggester
>> ref_cre
>> 0
>> true
>> true
>>   
>>   
>> suggestRefEcr
>> > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> > name="classname">org.apache.solr.spelling.suggest.Suggester
>> ref_ecriture
>> 0
>> true
>> true
>>   
>>   
>>   > startup="lazy">
>> 
>> true
>> suggestOpeGes
>> 20
>> true
>> false
>> 
>> 
>>   suggest
>> 
>>   
>>
>> When I query this collection suggest with the shards parameters :
>> GET
>> /solr/ppd_piste_audit_gsie_traite_001/suggest?q=GSIEBBA=json=true=true=suggestOpeGes=suggest/
>>
>> I get no results :
>>
>> {
>>   "responseHeader":{
>> "status":0,
>> "QTime":0}}
>>
>> But, when I disable the distributed search :
>> GET
>> /solr/ppd_piste_audit_gsie_traite_001/suggest?q=GSIEMMA=json=true=true=suggestOpeGes=false
>>
>> I get the results I expect :
>>
>> {
>>   "responseHeader":{
>> "status":0,
>> "QTime":28},
>>   "spellcheck":{
>> "suggestions":[
>>   "GSIEBBA",{
>> "numFound":20,
>> "startOffset":0,
>> "endOffset":7,
>> "suggestion":["GSIEMMA44257700010010401",
>>   "GSIEBBA64257700010013501",
>>   "GSIEBBA70723503779040201",
>>   "GSIEBBA71257700030012101",
>>   "GSIEBBA71723503830023601",
>>   "GSIEBBA74001300670011701",
>>   "GSIEBBA74001300670011801",
>>   "GSIEBBA74772000136021201",
>>   "GSIEBBA76257700040010501",
>>   "GSIEBBA76600101133030501",
>>   "GSIEBBA76680400195030601",
>>   "GSIEBBA77692100093024401",
>>   "GSIEBBA77692100093024501",
>>   "GSIEBBA78450700227020701",
>>   "GSIEBBA78450700227020801",
>>   "GSIEBBA78854102439020301",
>>   "GSIEBBA78854102439020401",
>>   "GSIEBBA79441700201040401",
>>   "GSIEBBA79723504720012701",
>>   "GSIEBBA79763600779010501"]},
>>   "collation","GSIEBBA44257700010010401"]}}
>>
>> I also try to send a "manually" distributed search without success :
>>
>> GET
>> /solr/ppd_piste_audit_gsie_traite_001-03_shard1_replica2/suggest?q=GSIEMMA=suggest=json=true=true=suggestOpeGes=suggest/=dn330003.xxx.priv:8983/solr/ppd_piste_audit_gsie_traite_001-03_shard2_replica1/|dn330004.xxx.priv:8983/solr/ppd_piste_audit_gsie_traite_001-03_shard1_replica1/
>>
>> What am I doing wrong ?
>>
>> Thank you.
>> --
>> Damien Picard
>> Expert GWT
>> 
>> Mob : 06 11 51 47 78
>>
>
>
>
> --
> Damien Picard
> Expert GWT
> 
> Mob : 06 11 51 47 78
>



-- 
Damien Picard
Expert GWT

Mob : 06 11 51 47 78


Re: Error creating document SolrInputDocument

2016-02-18 Thread Shawn Heisey
On 2/18/2016 7:07 AM, Bernd Fehling wrote:
> the DIH is doing the splitting:
>
> ...
> 
>  xpath="/documents/document/element[@name='dccreator']/value" />
> 
> ...

This DIH config says it's the "dccreator" field, but the schema.xml
excerpts you included earlier were the "creator" field.

Can you put your entire DIH config and your entire schema somewhere and
provide URLs to access them, and let me know what names I should look at?

If there's something sensitive in them (like usernames and passwords)
feel free to redact those.  I won't need them.

Thanks,
Shawn



Re: Facet Filter

2016-02-18 Thread Anil
Thank you, i just checked in 5.1.

as facet fields has to be Strings and cannot be tockenized. is there any
way to search on case insensitive search on this field (not in a facet
filter scenario).

Regards,
Anil

On 18 February 2016 at 17:34, Upayavira  wrote:

> facet.contains=
>
> Beware that it is relatively new, so will only be in the latest few Solr
> releases.
>
> I think this was it [1], which suggests it is in 5.1+
>
> Upayavira
> [1] https://issues.apache.org/jira/browse/SOLR-1387
>
> On Thu, Feb 18, 2016, at 10:38 AM, Anil wrote:
> > HI ,
> >
> > Following are the facets in my use case
> >
> > CLOSED
> > IN PROCESS
> > RE PROCESS
> > OPEN
> >
> > i know facet.prefix returns returns the facets starting with it.
> >
> > i just want to check if any facet parameter is exist in current solr to
> > return facets on matching any word in the facet text ?
> >
> > Ex : PROCESS  must return IN PROCESS and RE PROCESS.
> >
> > This is can achivied by including it as query ( : *PROCESS*,
> >  : PROCESS). but it is little expensive.
> >
> > Regards,
> > Anil
>


Re: Error creating document SolrInputDocument

2016-02-18 Thread Bernd Fehling
Hi Shawn,

the DIH is doing the splitting:

...



...


Bernd


Am 18.02.2016 um 14:42 schrieb Shawn Heisey:
> On 2/18/2016 3:45 AM, Bernd Fehling wrote:
>> Now this is strange with solr 4.10.4,
>> I have a multivalue string field for creator.
>> > multiValued="true" />
>>
>> And a multivalue string field for f_person, prepared for facetting with 
>> docValues.
>> > multiValued="true" docValues="true" />
>>
>> To fill f_person I use copyField.
>> 
>>
>> The input to creator is 43470 bytes long with names, split at ";" for each 
>> subfield.
>> Klionsky, Daniel J; JFA; CORA; Abdelmohsen, Kotb; Abe, Akihisa; ...
> 
> How are you handling splitting that information into multiple pieces? 
> If it's done with analysis configuration in schema.xml, then the data
> copied to f_person is *not* split into multiple values.  The copyField
> functionality always copies the the original input data -- *before*
> analysis.
> 
> If the information were split into multiple small values before it got
> to Solr, then this error would not be happening.
> 
> Thanks,
> Shawn
> 


Re: Error creating document SolrInputDocument

2016-02-18 Thread Shawn Heisey
On 2/18/2016 3:45 AM, Bernd Fehling wrote:
> Now this is strange with solr 4.10.4,
> I have a multivalue string field for creator.
>  multiValued="true" />
>
> And a multivalue string field for f_person, prepared for facetting with 
> docValues.
>  multiValued="true" docValues="true" />
>
> To fill f_person I use copyField.
> 
>
> The input to creator is 43470 bytes long with names, split at ";" for each 
> subfield.
> Klionsky, Daniel J; JFA; CORA; Abdelmohsen, Kotb; Abe, Akihisa; ...

How are you handling splitting that information into multiple pieces? 
If it's done with analysis configuration in schema.xml, then the data
copied to f_person is *not* split into multiple values.  The copyField
functionality always copies the the original input data -- *before*
analysis.

If the information were split into multiple small values before it got
to Solr, then this error would not be happening.

Thanks,
Shawn



Re: Filter query and Faceting problem

2016-02-18 Thread Mikhail Khludnev
Tagging works only in the way I describe. Otherwise you might need to need
issue separate queries.
However, are you sure that this case shouldn't be done in more regular way:
fq={!tag=foo}department:foods={!tag=foo}macro_category.key:(drinks food)

On Thu, Feb 18, 2016 at 1:48 PM, Filippo La Torre <
filippo.lato...@stentle.com> wrote:

> Hi,
> thank for your response.
> The problem is that i make this query by Spring Data Solr and i have to
> make complex  AND/OR.
> Example :
>
> ( department:foods AND macro_category.key:drinks) OR ( department:foods
> AND macro_category.key:food)
>
> What is the best practice the make complex AND/OR query and tagging?
>
> Thanks,
> Filippo
> > On 18 Feb 2016, at 11:19, Mikhail Khludnev 
> wrote:
> >
> > just do
> >
> fq={!tag=DEPARTMENT}department:foods={!tag=MACROCATEGORY}macro_category.key:drinks
> > tagging in the middle of the query may somehow work, but it's not a
> > recommended way.
> >
> > On Thu, Feb 18, 2016 at 11:48 AM, Filippo La Torre <
> > filippo.lato...@stentle.com> wrote:
> >
> >> Hello everyone,
> >>
> >> this is my first mail to solr user mailing list.
> >> I’m new to Solr too, my Solr version is 5.4.1.
> >> I have a problem with filter query and faceting, i have to make a filter
> >> query with AND/OR using also faceting (i will make this query using
> Spring
> >> Data Solr).
> >> It seems that the same filter query with brackets and without brackets
> >> give different result. The version with brackets don’t see tag/exclude.
> How
> >> it is possible?
> >>
> >> Best regards.
> >>
> >> With brackets:
> >>
> >> {
> >>  "responseHeader": {
> >>"status": 0,
> >>"QTime": 2,
> >>"params": {
> >>  "q": "*:*",
> >>  "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
> >>  "indent": "true",
> >>  "fq": "({!tag=DEPARTMENT}department:foods AND
> >> {!tag=MACROCATEGORY}macro_category.key:drinks)",
> >>  "wt": "json",
> >>  "facet": "true",
> >>  "_": "1455700122431"
> >>}
> >>  },
> >>  "response": {
> >>"numFound": 1,
> >>"start": 0,
> >>"docs": [
> >>  {
> >>"id": "5672a222fa4d0e4c0d965cc5",
> >>"published": true,
> >>"micro_category.key": "drinks-beer",
> >>"department": "foods",
> >>"macro_category.key": "drinks",
> >>"retail_price": 1,
> >>"selling_price": 1
> >>  }
> >>]
> >>  },
> >>  "facet_counts": {
> >>"facet_queries": {},
> >>"facet_fields": {
> >>  "macro_category.key": [
> >>"drinks",
> >>1,
> >>"box-collection",
> >>0
> >>  ]
> >>},
> >>"facet_dates": {},
> >>"facet_ranges": {},
> >>"facet_intervals": {},
> >>"facet_heatmaps": {}
> >>  }
> >> }
> >>
> >> Without brackets:
> >>
> >> {
> >>  "responseHeader": {
> >>"status": 0,
> >>"QTime": 1,
> >>"params": {
> >>  "q": "*:*",
> >>  "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
> >>  "indent": "true",
> >>  "fq": "{!tag=DEPARTMENT}department:foods AND
> >> {!tag=MACROCATEGORY}macro_category.key:drinks",
> >>  "wt": "json",
> >>  "facet": "true",
> >>  "_": "1455702347556"
> >>}
> >>  },
> >>  "response": {
> >>"numFound": 1,
> >>"start": 0,
> >>"docs": [
> >>  {
> >>"id": "5672a222fa4d0e4c0d965cc5",
> >>"published": true,
> >>"micro_category.key": "drinks-beer",
> >>"department": "foods",
> >>"macro_category.key": "drinks",
> >>"retail_price": 1,
> >>"selling_price": 1
> >>  }
> >>]
> >>  },
> >>  "facet_counts": {
> >>"facet_queries": {},
> >>"facet_fields": {
> >>  "macro_category.key": [
> >>"box-collection",
> >>2,
> >>"drinks",
> >>1
> >>  ]
> >>},
> >>"facet_dates": {},
> >>"facet_ranges": {},
> >>"facet_intervals": {},
> >>"facet_heatmaps": {}
> >>  }
> >> }
> >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Facet Filter

2016-02-18 Thread Upayavira
facet.contains=

Beware that it is relatively new, so will only be in the latest few Solr
releases.

I think this was it [1], which suggests it is in 5.1+

Upayavira
[1] https://issues.apache.org/jira/browse/SOLR-1387

On Thu, Feb 18, 2016, at 10:38 AM, Anil wrote:
> HI ,
> 
> Following are the facets in my use case
> 
> CLOSED
> IN PROCESS
> RE PROCESS
> OPEN
> 
> i know facet.prefix returns returns the facets starting with it.
> 
> i just want to check if any facet parameter is exist in current solr to
> return facets on matching any word in the facet text ?
> 
> Ex : PROCESS  must return IN PROCESS and RE PROCESS.
> 
> This is can achivied by including it as query ( : *PROCESS*,
>  : PROCESS). but it is little expensive.
> 
> Regards,
> Anil


Re: SOLR ranking

2016-02-18 Thread Emir Arnautovic

Hi Nitin,
Can you send us how your parsed query looks like (from debug output).

Thanks,
Emir

On 17.02.2016 08:38, Nitin.K wrote:

Hi Binoy,

We are searching for both phrases and individual words
but we want that only those documents which are having phrases will come
first in the order and then the individual app.

termPositions = true is also not working in my case.

I have also removed the string type from copy fields. kindly look into the
changed configuration below:

Hi Emir,

I have changed the cofiguration as per your suggestion, added pf2 / pf3.
Yes, i saw the difference but still the ranking is not getting followed
correctly in case of phrases.

Changed configuration;















Copy fields again for the reference :







Added following field type:









Removed the string type from the copy fields.

Changed Query :

http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true;
pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3

After making these changes, I am able to get my search results correctly for
a single term but in case of phrase search, i am still not able to get the
results in the correct order.

Hi Modassar,

I tried using mm=100, but the order is still the same.

Hi Alessandro,

I have not yet tried the slope parameter. By default it is taking it as 1.0
when i looked it in debug mode. Will revert you definitely. So, let me try
this option too.

All,

Please suggest if anyone is having any other suggestion on this. I have to
implement it on urgent basis and i think i am very close to it. Thanks all
of you. I have reached to this level just because of you guys.

Thanks and Regards,
Nitin



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Filter query and Faceting problem

2016-02-18 Thread Filippo La Torre
Hi,
thank for your response.
The problem is that i make this query by Spring Data Solr and i have to make 
complex  AND/OR.
Example :

( department:foods AND macro_category.key:drinks) OR ( department:foods AND 
macro_category.key:food)

What is the best practice the make complex AND/OR query and tagging?

Thanks,
Filippo
> On 18 Feb 2016, at 11:19, Mikhail Khludnev  wrote:
> 
> just do
> fq={!tag=DEPARTMENT}department:foods={!tag=MACROCATEGORY}macro_category.key:drinks
> tagging in the middle of the query may somehow work, but it's not a
> recommended way.
> 
> On Thu, Feb 18, 2016 at 11:48 AM, Filippo La Torre <
> filippo.lato...@stentle.com> wrote:
> 
>> Hello everyone,
>> 
>> this is my first mail to solr user mailing list.
>> I’m new to Solr too, my Solr version is 5.4.1.
>> I have a problem with filter query and faceting, i have to make a filter
>> query with AND/OR using also faceting (i will make this query using Spring
>> Data Solr).
>> It seems that the same filter query with brackets and without brackets
>> give different result. The version with brackets don’t see tag/exclude. How
>> it is possible?
>> 
>> Best regards.
>> 
>> With brackets:
>> 
>> {
>>  "responseHeader": {
>>"status": 0,
>>"QTime": 2,
>>"params": {
>>  "q": "*:*",
>>  "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
>>  "indent": "true",
>>  "fq": "({!tag=DEPARTMENT}department:foods AND
>> {!tag=MACROCATEGORY}macro_category.key:drinks)",
>>  "wt": "json",
>>  "facet": "true",
>>  "_": "1455700122431"
>>}
>>  },
>>  "response": {
>>"numFound": 1,
>>"start": 0,
>>"docs": [
>>  {
>>"id": "5672a222fa4d0e4c0d965cc5",
>>"published": true,
>>"micro_category.key": "drinks-beer",
>>"department": "foods",
>>"macro_category.key": "drinks",
>>"retail_price": 1,
>>"selling_price": 1
>>  }
>>]
>>  },
>>  "facet_counts": {
>>"facet_queries": {},
>>"facet_fields": {
>>  "macro_category.key": [
>>"drinks",
>>1,
>>"box-collection",
>>0
>>  ]
>>},
>>"facet_dates": {},
>>"facet_ranges": {},
>>"facet_intervals": {},
>>"facet_heatmaps": {}
>>  }
>> }
>> 
>> Without brackets:
>> 
>> {
>>  "responseHeader": {
>>"status": 0,
>>"QTime": 1,
>>"params": {
>>  "q": "*:*",
>>  "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
>>  "indent": "true",
>>  "fq": "{!tag=DEPARTMENT}department:foods AND
>> {!tag=MACROCATEGORY}macro_category.key:drinks",
>>  "wt": "json",
>>  "facet": "true",
>>  "_": "1455702347556"
>>}
>>  },
>>  "response": {
>>"numFound": 1,
>>"start": 0,
>>"docs": [
>>  {
>>"id": "5672a222fa4d0e4c0d965cc5",
>>"published": true,
>>"micro_category.key": "drinks-beer",
>>"department": "foods",
>>"macro_category.key": "drinks",
>>"retail_price": 1,
>>"selling_price": 1
>>  }
>>]
>>  },
>>  "facet_counts": {
>>"facet_queries": {},
>>"facet_fields": {
>>  "macro_category.key": [
>>"box-collection",
>>2,
>>"drinks",
>>1
>>  ]
>>},
>>"facet_dates": {},
>>"facet_ranges": {},
>>"facet_intervals": {},
>>"facet_heatmaps": {}
>>  }
>> }
> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 



Error creating document SolrInputDocument

2016-02-18 Thread Bernd Fehling
Now this is strange with solr 4.10.4,
I have a multivalue string field for creator.


And a multivalue string field for f_person, prepared for facetting with 
docValues.


To fill f_person I use copyField.


The input to creator is 43470 bytes long with names, split at ";" for each 
subfield.
Klionsky, Daniel J; JFA; CORA; Abdelmohsen, Kotb; Abe, Akihisa; ...

No errors for creator,
but for f_person I get:
java.lang.IllegalArgumentException: Document contains at least one immense term 
in field="f_person"
(whose UTF8 encoding is longer than the max length 32766), all of which were 
skipped.
Please correct the analyzer to not produce such terms.
The prefix of the first immense term is: '[75, 108, 105, 111, 110, 115, 107, 
121, 44, 32, 68, 97,
110, 105, 101, 108, 32, 74, 59, 32, 74, 70, 65, 59, 32, 67, 79, 82, 65, 59]...',
original message: bytes can be at most 32766 in length; got 43470

What is causing the problem?
1) the copyField is converting the multivalued creator to one long string for 
f_person?
2) a fieldType string with docValues can't be multivalued?
3) ...?

Any idea and how to solve it?

I guess it is the docValues but I don't get any errors about my schema.xml that
multivalue of fieldType string might be a problem with docValues.

Regards
Bernd


Facet Filter

2016-02-18 Thread Anil
HI ,

Following are the facets in my use case

CLOSED
IN PROCESS
RE PROCESS
OPEN

i know facet.prefix returns returns the facets starting with it.

i just want to check if any facet parameter is exist in current solr to
return facets on matching any word in the facet text ?

Ex : PROCESS  must return IN PROCESS and RE PROCESS.

This is can achivied by including it as query ( : *PROCESS*,
 : PROCESS). but it is little expensive.

Regards,
Anil


Re: Filter query and Faceting problem

2016-02-18 Thread Mikhail Khludnev
just do
fq={!tag=DEPARTMENT}department:foods={!tag=MACROCATEGORY}macro_category.key:drinks
tagging in the middle of the query may somehow work, but it's not a
recommended way.

On Thu, Feb 18, 2016 at 11:48 AM, Filippo La Torre <
filippo.lato...@stentle.com> wrote:

> Hello everyone,
>
> this is my first mail to solr user mailing list.
> I’m new to Solr too, my Solr version is 5.4.1.
> I have a problem with filter query and faceting, i have to make a filter
> query with AND/OR using also faceting (i will make this query using Spring
> Data Solr).
> It seems that the same filter query with brackets and without brackets
> give different result. The version with brackets don’t see tag/exclude. How
> it is possible?
>
> Best regards.
>
> With brackets:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "*:*",
>   "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
>   "indent": "true",
>   "fq": "({!tag=DEPARTMENT}department:foods AND
> {!tag=MACROCATEGORY}macro_category.key:drinks)",
>   "wt": "json",
>   "facet": "true",
>   "_": "1455700122431"
> }
>   },
>   "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "id": "5672a222fa4d0e4c0d965cc5",
> "published": true,
> "micro_category.key": "drinks-beer",
> "department": "foods",
> "macro_category.key": "drinks",
> "retail_price": 1,
> "selling_price": 1
>   }
> ]
>   },
>   "facet_counts": {
> "facet_queries": {},
> "facet_fields": {
>   "macro_category.key": [
> "drinks",
> 1,
> "box-collection",
> 0
>   ]
> },
> "facet_dates": {},
> "facet_ranges": {},
> "facet_intervals": {},
> "facet_heatmaps": {}
>   }
> }
>
> Without brackets:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 1,
> "params": {
>   "q": "*:*",
>   "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
>   "indent": "true",
>   "fq": "{!tag=DEPARTMENT}department:foods AND
> {!tag=MACROCATEGORY}macro_category.key:drinks",
>   "wt": "json",
>   "facet": "true",
>   "_": "1455702347556"
> }
>   },
>   "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "id": "5672a222fa4d0e4c0d965cc5",
> "published": true,
> "micro_category.key": "drinks-beer",
> "department": "foods",
> "macro_category.key": "drinks",
> "retail_price": 1,
> "selling_price": 1
>   }
> ]
>   },
>   "facet_counts": {
> "facet_queries": {},
> "facet_fields": {
>   "macro_category.key": [
> "box-collection",
> 2,
> "drinks",
> 1
>   ]
> },
> "facet_dates": {},
> "facet_ranges": {},
> "facet_intervals": {},
> "facet_heatmaps": {}
>   }
> }




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Reverse Eningeer Query For a Given Result Set?

2016-02-18 Thread Charlie Hull

On 18/02/2016 09:07, Christian Effertz wrote:

Hi,

Can I somehow feed Solr with a result set or a list of primary keys and get
the shortest query that leads to this result? In other terms, can I reverse
engineer a query for a given result set?

Some background why I ask this question:
We are currently migrating a search application from Oracle Text to Solr.
Our users have several (>30) complex queries that we need to migrate to our
new Solr index. This can be done by hand, but is rather time consuming. To
get an idea of how long the whole task would need, we started with a hand
full of them. We spent ~2-4h per query to get everything right.

Thank you for your input


Hi Christian,

This sounds very much like some of the work we've done migrating media 
monitoring applications to Solr, although in these cases we're dealing 
with 10k-1m stored queries. We have done Oracle Text but have dealt with 
dtLucene & Verity (VQL).


I don't think there's any way to reverse engineer your query in this way 
I'm afraid. Approaches we've taken include writing Lucene query parsers 
that can ingest the old query language (for translation on the fly) or 
parsers that turn the old language into either Lucene syntax or some 
intermediate engine-neutral language (which can then be simply parsed 
into Lucene syntax). For your small volume, manual translation may be best.


The key here is how will you know the new queries are returning the same 
results as the old queries: for this you'll need some kind of test setup 
with an archive of old data. It's important to remember (and often very 
hard to convince people!) that by changing underlying engines you *will* 
get different results, whatever you do, but you'll need to work out 
exactly what differences you can tolerate.


HTH,

Charlie
--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: join and NOT together

2016-02-18 Thread Sergio García Maroto
HI Mikhail. Sorry for all the confusion

This is the original query which doesn't work
q=PersonName:peter AND {!type=join from=DocPersonID to=PersonID
fromIndex=document v='(*:* -DocType:pdf)' }

I figure out  that negating outside the cross join query makes the trick
for me.
I take the negation out of the v='' and put in in the person collection
part of the query.
In that way I can exclude everyone.

q=PersonName:peter AND (*:* - {!type=join from=DocPersonID to=PersonID
fromIndex=document v='(DocType:pdf)' })


On 17 February 2016 at 12:13, Mikhail Khludnev 
wrote:

> Sergo,
>
> Please provide more debug output, I want to see how query was parsed.
>
> On Tue, Feb 16, 2016 at 1:20 PM, Sergio García Maroto 
> wrote:
>
> > My debugQuery=true returns related to the NOT:
> >
> > 0.06755901 = (MATCH) sum of: 0.06755901 = (MATCH) MatchAllDocsQuery,
> > product of: 0.06755901 = queryNorm
> >
> > I tried changing v='(*:* -DocType:pdf)'  to v='(-DocType:pdf)'
> > and it worked.
> >
> > Anyone could explain the difference?
> >
> > Thanks
> > Sergo
> >
> >
> > On 15 February 2016 at 21:12, Mikhail Khludnev <
> mkhlud...@griddynamics.com
> > >
> > wrote:
> >
> > > Hello Sergio,
> > >
> > > What debougQuery=true output does look like?
> > >
> > > On Mon, Feb 15, 2016 at 7:10 PM, marotosg  wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to solve an issue when doing a search joining two
> > collections
> > > > and negating the cross core query.
> > > >
> > > > Let's say I have one collection person and another collection
> documents
> > > and
> > > > I can join them using local param !join because I have PersonIDS in
> > > > document
> > > > collection.
> > > >
> > > > if my query is like below. Query executed against Person Core. I want
> > to
> > > > retrieve people with name Peter and not documents attached of type
> pdf.
> > > >
> > > > q=PersonName:peter AND {!type=join from=DocPersonID to=PersonID
> > > > fromIndex=document v='(*:* -DocType:pdf)' }
> > > >
> > > > If I have for person 1 called peter two documents one of type:pdf and
> > > other
> > > > one of type:word.
> > > > Then this person will come back.
> > > >
> > > > Is there any way of excluding that person if any of the docs fulfill
> > the
> > > > NOT.
> > > >
> > > > Thanks
> > > > Sergio
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > http://lucene.472066.n3.nabble.com/join-and-NOT-together-tp4257411.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > 
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Field exclusion from fl and hl.fl

2016-02-18 Thread Anil
I am looking for the same. please do let me know just in case you find
workaround.

On 18 February 2016 at 14:18, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Would like to find out, is there already a way to exclude field from the
> Solr response. I did came across SOLR-3191 which was created about 4 years
> ago, but could not find any workable solutions from there.
>
> As my collections can have more than 100 fields, and I would need to return
> the majority of then except for one or two, so if there is a way to exclude
> the fields would be good, if not I have to list all the remaining fields
> (which can be more than 100 for each collections).
>
> I am using Solr 5.4.0.
>
> Regards,
> Edwin
>


Reverse Eningeer Query For a Given Result Set?

2016-02-18 Thread Christian Effertz
Hi,

Can I somehow feed Solr with a result set or a list of primary keys and get
the shortest query that leads to this result? In other terms, can I reverse
engineer a query for a given result set?

Some background why I ask this question:
We are currently migrating a search application from Oracle Text to Solr.
Our users have several (>30) complex queries that we need to migrate to our
new Solr index. This can be done by hand, but is rather time consuming. To
get an idea of how long the whole task would need, we started with a hand
full of them. We spent ~2-4h per query to get everything right.

Thank you for your input


Filter query and Faceting problem

2016-02-18 Thread Filippo La Torre
Hello everyone,

this is my first mail to solr user mailing list.
I’m new to Solr too, my Solr version is 5.4.1. 
I have a problem with filter query and faceting, i have to make a filter query 
with AND/OR using also faceting (i will make this query using Spring Data Solr).
It seems that the same filter query with brackets and without brackets give 
different result. The version with brackets don’t see tag/exclude. How it is 
possible?

Best regards.

With brackets:

{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "*:*",
  "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
  "indent": "true",
  "fq": "({!tag=DEPARTMENT}department:foods AND 
{!tag=MACROCATEGORY}macro_category.key:drinks)",
  "wt": "json",
  "facet": "true",
  "_": "1455700122431"
}
  },
  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"id": "5672a222fa4d0e4c0d965cc5",
"published": true,
"micro_category.key": "drinks-beer",
"department": "foods",
"macro_category.key": "drinks",
"retail_price": 1,
"selling_price": 1
  }
]
  },
  "facet_counts": {
"facet_queries": {},
"facet_fields": {
  "macro_category.key": [
"drinks",
1,
"box-collection",
0
  ]
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
  }
}

Without brackets:

{
  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "q": "*:*",
  "facet.field": "{!ex=DEPARTMENT,MACROCATEGORY}macro_category.key",
  "indent": "true",
  "fq": "{!tag=DEPARTMENT}department:foods AND 
{!tag=MACROCATEGORY}macro_category.key:drinks",
  "wt": "json",
  "facet": "true",
  "_": "1455702347556"
}
  },
  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"id": "5672a222fa4d0e4c0d965cc5",
"published": true,
"micro_category.key": "drinks-beer",
"department": "foods",
"macro_category.key": "drinks",
"retail_price": 1,
"selling_price": 1
  }
]
  },
  "facet_counts": {
"facet_queries": {},
"facet_fields": {
  "macro_category.key": [
"box-collection",
2,
"drinks",
1
  ]
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
  }
}

Field exclusion from fl and hl.fl

2016-02-18 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, is there already a way to exclude field from the
Solr response. I did came across SOLR-3191 which was created about 4 years
ago, but could not find any workable solutions from there.

As my collections can have more than 100 fields, and I would need to return
the majority of then except for one or two, so if there is a way to exclude
the fields would be good, if not I have to list all the remaining fields
(which can be more than 100 for each collections).

I am using Solr 5.4.0.

Regards,
Edwin


Does hl.fl field shows fields with type int, tdate

2016-02-18 Thread Zheng Lin Edwin Yeo
Hi,

In my configuration, I have fields that are of type "string", "int" and
"tdate".

However, when I tried to do highlighting by setting fl=* and hl.fl=*, all
the fields are returned in the "fl" parameters, but only those of
type=string are returned in the "hl.fl" parameters.

Is this correct behaviour? And is there anyway that I can show all the
fields in the "hl.fl" parameters, including those of type=int and
type=tdate.

I am using Solr 5.4.0.

Regards,
Edwin


Re: Querying data based on field type

2016-02-18 Thread Binoy Dalal
My apologies. I thought you wanted to remove all the arr values.

On Thu, 18 Feb 2016, 13:55 Salman Ansari  wrote:

> Not sure if I am getting this but I am not interested in updating
> documents. I am interested in getting documents that has the field type of
> a specific field as array .
>
> Regards,
> Salman
>
> On Thu, Feb 18, 2016 at 11:13 AM, Binoy Dalal 
> wrote:
>
> > Take a look at atomic updates and remove regex.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> >
> > On Thu, 18 Feb 2016, 13:07 Salman Ansari 
> wrote:
> >
> > > Hi,
> > >
> > > Due to some mis-configuration issues, I have a field that has values as
> > > single string and an array of strings. Looks like there are some old
> > values
> > > that got indexed as an array of strings while anything new are single
> > > valued string. I have checked the configuration and multivalued for
> that
> > > field is set to false. What I want is to remove all the occurrences of
> > the
> > > field as an array (multi-valued) where it shows as  instead
> of
> > > . Is there a way to query the field so it returns only those
> > > documents that have field as an array and not as a single string?
> > >
> > > Appreciate your comments/feedback.
> > >
> > > Regards,
> > > Salman
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal


Re: Querying data based on field type

2016-02-18 Thread Salman Ansari
Not sure if I am getting this but I am not interested in updating
documents. I am interested in getting documents that has the field type of
a specific field as array .

Regards,
Salman

On Thu, Feb 18, 2016 at 11:13 AM, Binoy Dalal 
wrote:

> Take a look at atomic updates and remove regex.
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
> On Thu, 18 Feb 2016, 13:07 Salman Ansari  wrote:
>
> > Hi,
> >
> > Due to some mis-configuration issues, I have a field that has values as
> > single string and an array of strings. Looks like there are some old
> values
> > that got indexed as an array of strings while anything new are single
> > valued string. I have checked the configuration and multivalued for that
> > field is set to false. What I want is to remove all the occurrences of
> the
> > field as an array (multi-valued) where it shows as  instead of
> > . Is there a way to query the field so it returns only those
> > documents that have field as an array and not as a single string?
> >
> > Appreciate your comments/feedback.
> >
> > Regards,
> > Salman
> >
> --
> Regards,
> Binoy Dalal
>


Re: Querying data based on field type

2016-02-18 Thread Binoy Dalal
Take a look at atomic updates and remove regex.
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

On Thu, 18 Feb 2016, 13:07 Salman Ansari  wrote:

> Hi,
>
> Due to some mis-configuration issues, I have a field that has values as
> single string and an array of strings. Looks like there are some old values
> that got indexed as an array of strings while anything new are single
> valued string. I have checked the configuration and multivalued for that
> field is set to false. What I want is to remove all the occurrences of the
> field as an array (multi-valued) where it shows as  instead of
> . Is there a way to query the field so it returns only those
> documents that have field as an array and not as a single string?
>
> Appreciate your comments/feedback.
>
> Regards,
> Salman
>
-- 
Regards,
Binoy Dalal