Re: How to control the number of grouped results [DRUPAL]

2019-12-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
This parameter referers to the Solr request, for example: https://lucene.apache.org/solr/guide/7_0/result-grouping.html#grouping-by-query Drupal should expose it in the API, I guess? Cheers, diego From: solr-user@lucene.apache.org At: 12/02/19 14:47:06To: solr-user@lucene.apache.org

Re: Learning to Rank Feature creation in python

2019-04-24 Thread Diego Ceccarelli
Hi Ashis, Short answer: No, i don't think it's possible. I'm considering as well extending solr to allow plugging in features from outside, but it will require time because at the moment the features can see only the current document processed, while to do that ideally you want to process in one

Re:LTR: Normalize Feature Weights

2019-04-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Kamal, You can use a MinMaxNormalizer[1], and get min and max from historical data, for the original score won't guarantee that the value will be **always** between 0..1 but it should happen in the majority of the cases, if the 0..1 constraint is not super strong I would rather use a

Re: Performance problems with extremely common terms in collection (Solr 7.4)

2019-04-09 Thread Diego Ceccarelli
Another way to make queries faster is, if you can, identify a subset of documents that are in general relevant for the users (most recent ones, most browsed etc etc), index those documents into a separate collection and then query the small collection and back out to the full one if the small one

Re:BM25F in Solr

2019-03-20 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
If you want a 'global' IDF across different fields, maybe one solution is to use a copyfield to copy all the fields in a common field (e.g, title, authors, body, footer all copied into a copyfield call text), and then you should be able to use it with a function query or by implementing your

search devroom @ FOSDEM 2019

2018-12-03 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi all, I just noticed this and I just wanted to share with you: Full-text search is everywhere nowadays and FOSDEM 2019 will have a dedicated devroom for search on Sunday the 3rd of February. We would like to invite submissions of presentations from developers, researchers, and users of

Re: solr and diversification

2018-10-04 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
relevance. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) < > > dceccarel...@bloomberg.net> wrote: > > > > > Yeah, I think Kmeans might be a way to implement the &q

Re: solr and diversification

2018-09-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
gt; threshold. I would allow to define the strategy and select it from the request. From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To: Diego Ceccarelli (BLOOMBERG/ LONDON ) , solr-user@lucene.apache.org Subject: Re: solr and diversification I've thought about this problem a littl

solr and diversification

2018-09-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi, I'm considering to write a component for diversifying the results. I know that diversification can be achieved by using grouping but I'm thinking about something different and query biased. The idea is to have something that gets applied after the normal retrieval and selects the top k

Re: Learning to rank - Bad Request

2018-07-16 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Akshay, did you run solr enabling learning to rank? ./bin/solr -e techproducts -Dsolr.ltr.enabled=true if you don't pass -Dsolr.ltr.enabled=true ltr will not be available. Cheers, Diego From: solr-user@lucene.apache.org At: 07/16/18 09:00:39To: solr-user@lucene.apache.org Subject: Re:

Re:LTR performance issues

2018-05-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello ilayaraja, I think it would be good to move this discussion on the Jira item: https://issues.apache.org/jira/browse/SOLR-8776?attachmentOrder=asc You can add your comments there, and also in the page I explained how it works. On the performance you are right: at the moment it is slow.

Re:the number of docs in each group depends on rows

2018-05-04 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello, I'm not sure 100% but I think that if you have multiple shards the number of docs matched in each group is *not* guarantee to be exact. Increasing the rows will increase the amount of partial information that each shard sends to the federator and make the number more precise. For

Re: Learning to Rank (LTR) with grouping

2018-05-03 Thread Diego Ceccarelli
Thanks ilayaraja, I updated the PR today integrating your and Alan's comments. Now it works also in distributed mode. Please let me know what do you think :) Cheers Diego On Wed, May 2, 2018, 17:46 ilayaraja wrote: > Figured out that offset is used as part of the grouping

Re: Learning to Rank (LTR) with grouping

2018-04-18 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I just updated the PR to upstream - I still have to fix some things in distribute mode, but unit tests in non distribute mode works. Hope this helps, Diego From: solr-user@lucene.apache.org At: 04/15/18 03:37:54To: solr-user@lucene.apache.org Subject: Re: Learning to Rank (LTR) with

Re:Support LTR RankQuery with Grouping

2018-04-06 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Patch has not been merged yet, it is available here: https://github.com/apache/lucene-solr/pull/162 You can try to apply the patch on the current master and see if it fixes. Please let us know if you have any questions. Cheers, Diego From: solr-user@lucene.apache.org At: 04/05/18

Re:Defining Document Transformers in Solr Configuration

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I don't think you can define docTrasformer in the SolrConfig at the moment, I agree it would be a cool feature. Maybe one possibility could be to use the update request processors [1], and precompute the fields at index time, it would be more expensive in disk and index time, but then it

Re:SOLR Similarity Difference

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Rick, I don't think the issue is BM25 vs TFIDF (the old similarity), it seems more due to the "matching" logic. you are asking to match: "(Action AND Technical AND Temporaries AND t/a AND CTR AND Corporation)" This (in theory) means that you want to retrieve **only** the documents that

Re:FileDictionaryFactory:- pick source file from solr instead of zk config.

2018-02-26 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
A similar problem came out with learning to rank models, and was fixed by https://issues.apache.org/jira/browse/SOLR-11250 Maybe it can be useful.. From: solr-user@lucene.apache.org At: 02/26/18 13:13:28To: solr-user@lucene.apache.org Subject: FileDictionaryFactory:- pick source file from

Benchmarking Solr Query performance

2018-02-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi all, We would like to perform a benchmark of https://issues.apache.org/jira/browse/SOLR-11831 The patch improves the performance of grouped queries asking only for one result per group (aka. group.limit=1). I remember seeing a page showing a benchmark of the query performance on Wikipedia,

Re:skip slow tests?

2018-02-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
ant -Dtests.slow=false From: solr-user@lucene.apache.org At: 02/02/18 17:07:14To: solr-user@lucene.apache.org Subject: skip slow tests? Hi *, Some (slow) tests in Solr are annotated with @Slow. Is there a way to run ant test skipping them? thanks, Diego

skip slow tests?

2018-02-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi *, Some (slow) tests in Solr are annotated with @Slow. Is there a way to run ant test skipping them? thanks, Diego

Re: Searching for an efficient and scalable way to filter query results using non-indexed and dynamic range values

2018-02-01 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Luigi, I don't know much that part of Lucene, I would check blog posts and the code to understand if you can use NumericDocValues (my gut says yes). Also, I don't know if it is important, but please note that if you index all the documents at the beginning your scores will be different -

Re:Searching for an efficient and scalable way to filter query results using non-indexed and dynamic range values

2018-01-31 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Luigi, What about using an updatable DocValue [1] for the field x ? you could initially set it to -1, and then update it for the docs in the step j. Range queries should still work and the update should be fast. Cheers [1]

Re: LTR original score feature

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I think it really depends on the particular use case. Sometime the absolute score is a good feature, sometimes no. If you are using the default bm25, I think that increasing the number of terms in the query will increase the average doc. score in the results. So maybe I would normalize the

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
In theory it should be possible if you are indexing the positions of the tokens in your field, but I am not aware of any solr query that allows you to weight the matches based on the position, does anyone know if is possible? From: solr-user@lucene.apache.org At: 01/29/18 11:25:36To:

Re:***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-26 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Zahid, if you want to allow searching only if the query is shorter than a certain number of terms / characters, I would do it before calling solr probably, otherwise you could write a QueryParserPlugin (see [1]) and check that the query is sound before processing it. See also:

RE: Using lucene to post-process Solr query results

2018-01-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
And you want to show to the users only the Lucene documents that matched the original query sent to Solr? (what if a lucene document matches only part of the query?) From: solr-user@lucene.apache.org At: 01/23/18 13:55:46To: Diego Ceccarelli (BLOOMBERG/ LONDON ) , solr-user

Re: Using lucene to post-process Solr query results

2018-01-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Rahul, can you provide more details on how you decide that the smaller lucene objects are part of the same solr document? From: solr-user@lucene.apache.org At: 01/23/18 09:59:17To: solr-user@lucene.apache.org Subject: Re: Using lucene to post-process Solr query results Hi Rahul, Looks like

Re:Frequently Used Search Terms.

2018-01-18 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Fiz, It is not possible at the moment, you will have to log the queries (from solr, or before you sent them) and use external tools to do that. There is a jira item on that if you are interested: https://issues.apache.org/jira/browse/SOLR-10359 Diego From: solr-user@lucene.apache.org At:

Re: LTR and working with feature stores

2018-01-13 Thread Diego Ceccarelli
Hi Dariusz, On Jan 12, 2018 14:40, "Dariusz Wojtas" wrote: Hi, I am working with the LTR rescoring. Works beautifully, but I am curious about something. How do I specify the feature store in a way different than using the [features] syntax? [features

Re: Learning to Rank (LTR) with grouping

2018-01-11 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
natives would be helpful. Do I read your >> response as needing to go to 7.0 when you say upstream? >> >> Thank you, >> Roopa >> >> >> On Tue, Dec 19, 2017 at 1:37 PM, Diego Ceccarelli < >> diego.ceccare...@gmail.com> wrote: >> >>

Re: Personalized search parameters

2018-01-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I'm assuming that you are writing the cosine similarity and you have two vectors containing the pairs . The two vectors could have different sizes because they only contain the terms that have tfidf != 0. if you want to compute cosine similarity between the two lists you just have

Re: Personalized search parameters

2018-01-06 Thread Diego Ceccarelli
Maybe I misunderstood the question, but why you need to create the full size vectors? can't you just compute the cosine using the sparse vectors? On Fri, Jan 5, 2018 at 10:09 PM, marco wrote: > At the moment I have another problem: is there an efficient way to calculate

Re: Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
From: solr-user@lucene.apache.org At: 01/05/18 15:35:46To: solr-user@lucene.apache.org Subject: Re: Personalized search parameters In particular we have to retrieve the documents with a normal search followed by a result reranking phase where we calculate the cosine similarity between the

Re:Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Why you want the personalization to happen into Similarity? Similarity will score all the docs matching your query, so it has too be really fast. Unless your personalization is very easy (e.g., tf/idf computed in a different way based on the user) I would not put it there.. Did you consider

Re: SOLR 7.2 and LTR

2017-12-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
> at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run( > Exec

Re: SOLR 7.2 and LTR

2017-12-28 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Unknown Source) Best regards, Dariusz Wojtas On Thu, Dec 28, 2017 at 1:03 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net> wrote: > Hello Dariusz, > > Can you look into the solr l

Re:SOLR 7.2 and LTR

2017-12-28 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello Dariusz, Can you look into the solr logs for a stack trace or ERROR logs? From: solr-user@lucene.apache.org At: 12/27/17 19:01:29To: solr-user@lucene.apache.org Subject: SOLR 7.2 and LTR Hi, I am using SOLR 7.0 and use the ltr parser. The configuration I use works nicely under SOLR

Re: Learning to Rank (LTR) with grouping

2017-12-19 Thread Diego Ceccarelli
: >> >> Hi Diego, >> >> Thank you, >> >> I am interested in reranking the documents inside one of the groups. >> >> I will try the options you mentioned here. >> >> Thank you, >> Roopa >> >> On Mon, Dec 11, 2017 at 6:

Re: How to restrict the fields solr returns?

2017-12-19 Thread Diego Ceccarelli
Instead of putting this into Solr, did you consider adding this logic into the service that will call Solr? On Tue, Dec 19, 2017 at 4:41 PM, Solrmails wrote: > Thank you for your answer. I'd like to restrict the returned fields > dynamicaly based on a permission

Re: How to restrict the fields solr returns?

2017-12-19 Thread Diego Ceccarelli
If you need to return only a subset of the fields for each request you can set them as default in the solrconfig.xml. On Dec 19, 2017 13:45, "Solrmails" wrote: > I found a solution: I created a custom Search Handler and overridden > 'handleRequestBody'. Then I modify

Re: using rank queries(rq) with grouping in solr cloud

2017-12-18 Thread Diego Ceccarelli
Hi Tomerg, 1. Did you consider using the collapse component? https://lucene.apache.org/solr/guide/6_6/collapse-and-expand-results.html it is compatible with rq. 2. If you implement group reranking as a separate component you will end up with a lot of code duplicated from QueryComponent, you

Re: Learning to Rank (LTR) with grouping

2017-12-11 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
<roop...@gmail.com> wrote: > Hi Diego, > > Thank you, I will look into this and see how I could patch this. > > Thank you for your quick response, > Roopa > > > On Fri, Dec 8, 2017 at 5:44 PM, Diego Ceccarelli < > diego.ceccare...@gmail.com> wrote: >

Re: Learning to Rank (LTR) with grouping

2017-12-08 Thread Diego Ceccarelli
Hi Roopa, LTR is implemented using RankQuery, and at the moment grouping doens't support RankQuery. I opened a jira item time ago (https://issues.apache.org/jira/browse/SOLR-8776) and I would be happy to receive feedback on that. You can find the code here

Dedupe documents inside of each group

2017-11-30 Thread Diego Ceccarelli (BLOOMBERG/ QUEEN VIC)
Hello, I have a use case where I need to dedupe documents in each group based on a particular field: example: doc1 = { field_a=1 field_b=2 } doc2 = { field_a=1 field_b=2 } doc3 = { field_a=1 field_b=3 } doc4 = { field_a=2 field_b=3 } doc5 = { field_a=2 field_b=3 } and I want to run "Group

Re: LTR training

2017-11-19 Thread Diego Ceccarelli
Hello Ilay, Answers in line: On Sat, Nov 18, 2017 at 2:22 PM, ilay wrote: > > 1. Does LTR only support phrase matching (complete user query) from training > data for extracting feature score: > ex. > efi.user_query='tv+stand' matches the title feature only if title contains

Re:Given path of Ranklib model in Solr Model Json

2017-11-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello isspek, Unfortunately no, it would be nice to patch RankLib to output the model in json. Jfyi, I've a script to convert the xml into the json format https://github.com/bloomberg/lucene-solr/blob/ltr-demo-lucene-solr/py-solr-buzzwords/tree_model.py Cheers, Diego From:

vespa

2017-09-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi all, Yesterday Yahoo open sourced Vespa (i.e.: The open big data serving engine: Store, search, rank and organize big data at user serving time.), looking at the API they provide search. I did a quick search on the code for lucene, getting only 5 results. Does anyone know more about the

Re: Is there a way to delete multiple documents using wildcard?

2017-09-21 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
https://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F have a look also at the last post here: https://gist.github.com/nz/673027 I think there's a way to disallow delete by *:* in the solrconfig.xml but I can't find it (I would take a look in the solrconfig just in

Re: Rescoring from 0 - full

2017-09-21 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Dariusz, If you use *:* you'll rerank only the top N random documents, as Emir said, that will not produce interesting results probably. If you want to replace the original score, you can take a look at the learning to rank module [1], that would allow you to reassign a new score to the top

Re: Solr learning to rank features question

2017-09-13 Thread Diego Ceccarelli
gt; { > "name" : "FeatureA", > "store" : "commonFeatureStore", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "q" : "{!func}if(gt(ms(CutoffDate,NOW),0),exists

Re: Searching With UTF-8

2017-08-29 Thread Diego Ceccarelli
Hello Lawrence, Which type did you use in the solr schema for your fields? Cheers, Diego On Tue, Aug 29, 2017 at 5:34 PM, Elitzer, Lawrence < lelit...@lgsinnovations.com> wrote: > Hello! > > > > It seems I can correctly import (with DIH) UTF-8 characters such as J but > I am unable to search

Re: Solr learning to rank features question

2017-08-29 Thread Diego Ceccarelli
Hi Brian, The plugin doesn't allow you to express multiple function queries in the same feature. Maybe in this case you can express both the tw queries in one unique function query, using the if function. Something like: "fq":"if(gt(ms(NOW,mydatefield),0,query(PreCutOffZones:${zone}), query(

Re: Learn To Rank Questions

2017-06-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi, Sorry for the delay, here are my replies: 1. I'm not yet a spark user (but I'm working on that :)) 2. I'm not sure I understand how you would use a feature that is not a float into a model, in my experience all the learning to rank methods always train and predict from a list of floats.

Support RankQuery in grouping

2017-05-11 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi All, At the moment RankQueries [1] are not supported when you perform grouping: if you perform a ReRankQuery and ask for the groups, reranking will be ignored in the scoring. In SOLR-8776, I added support for ReRankQueries in grouping and I opened a PR on github [2]. ReRankQueries are

Re: How to train the model using user clicks when use ltr(learning to rank) module?

2017-01-06 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Jeffery, I submitted a patch to the README of the learning to rank example folder, trying to explain better how to produce a training set given a log with interaction data. Patch is available here: https://issues.apache.org/jira/browse/SOLR-9929 And you can see the new version of the

Re: Solr Support for BM25F

2016-04-14 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi David, I implemented bm25f for Europeana on Solr 4.x a couple of years ago, you can find it here: https://github.com/europeana/contrib/tree/master/bm25f-ranking maybe I should contribute it back.. Please do not hesitate to contact me if you need help :) Cheers, Diego From:

Re: Extracting article keywords using tf-idf algorithm

2015-07-18 Thread Diego Ceccarelli
Dear Ali, I'm not sure I understand what you are trying to do, please correct me if I misunderstood: given a document indexed into lucene you want to retrieve the top-k terms with highest tf-idf right? Could you please post your code somewhere? I don't understand what is mlt :) Cheers, Diego

Re: Rerank queries and grouping

2015-07-16 Thread Diego Ceccarelli
PM, Diego Ceccarelli diego.ceccare...@gmail.com wrote: Hi Everyone, I need to use a RankQuery within a grouping [1]. I did some experiments with RerankQuery [2] and solr 4.10.2 and it seems that if you group on a field, the reranking query is completely ignored (on the cloud

Rerank queries and grouping

2015-07-15 Thread Diego Ceccarelli
Hi Everyone, I need to use a RankQuery within a grouping [1]. I did some experiments with RerankQuery [2] and solr 4.10.2 and it seems that if you group on a field, the reranking query is completely ignored (on the cloud, and on a single instance). I would expect to see the results in each group