Re: Query Keyword Storage

2015-10-11 Thread Imtiaz Shakil Siddique
Hi Erik,

Thank you for the solution. I'll surely give it a try.
But I was trying to collect the logs directly from Solr source base (maybe
by extending the edismax query parser) because that way I don't have to
write query keywords into log files. After that I want to feed that data
into Banana .
Is that possible?

Regards,
Imtiaz Shakil Siddique
Senior Software Engineer
Chorki Limited
www.chorki.com


On 10 October 2015 at 05:43, Erik Hatcher  wrote:

> There’s no built-in query log handling, other than the (jetty) request
> logs.
>
> More and more these days, folks are logging directly or processing log
> files back into Solr, in a separate collection, and driving analytics from
> that.   You can do a lot with logstash + banana (
> https://github.com/LucidWorks/banana ).
> We, at Lucidworks, wrap all this up into our [excuse the commercial
> interruption] platform Fusion.  Fusion logs (optionally) all requests to
> the query pipeline to a logs collection and drive the Silk (banana)
> dashboard from that.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
>
> > On Oct 9, 2015, at 6:29 PM, Imtiaz Shakil Siddique <
> shakilsust...@gmail.com> wrote:
> >
> > Hi,
> >
> > I'd like to know is there any built-in feature/plugin in solr that can
> > store user query .
> >
> > I know that I can always check the jetty server's log files which ships
> > with solr for collecting user query. But is there any other better way?
> And
> > If I needed to write a plugin for this case, what plugin should I extend?
> >
> > Thank you.
> > Imtiaz Shakil Siddique
> > Senior Software Engineer
> > Chorki Limited
> > www.chorki.com
>
>


Grouping facets: Possible to get facet results for each Group?

2015-10-11 Thread Peter Sturge
Hello Solr Forum,

Been trying to coerce Group faceting to give some faceting back for each
group, but maybe this use case isn't catered for in Grouping? :

So the Use Case is this:
Let's say I do a grouped search that returns say, 9 distinct groups, and in
these groups are various numbers of unique field values that need faceting
- but the faceting needs to be within each group:


user:*=true=user=true=host=true

This query gives back grouped facets for each 'host' value (i.e. the facet
counts are 'collapsed') - but the facet counts (unique values of 'user'
field) are aggregated for all the groups, not on a 'per-group' basis (i.e.
returned as 'global facets' - outside of the grouped results).
The results from the query above doesn't say which unique values for
'users' are in which group. If the number of doc hits is very large (can
easily be in the 100's of thousands) it's not practical to iterate through
the docs looking for unique values.
This Use Case necessitates the unique values within each group, rather than
the total doc hits.

Is this possible with grouping, or inconjunction with another module?

Many thanks,
+Peter


Re: Solr cross core join special condition

2015-10-11 Thread Shawn Heisey
On 10/11/2015 8:01 AM, Ali Nazemian wrote:
> I did check the jira issue that you mentioned but it seems its target is
> Solr 6! Am I correct? The patch failed for Solr 5.3 due to class not found.
> For Solr 5.x should I try to implement something similar myself?

Virtually all changes that are made to Solr are done first in trunk
(currently trunk is slated to be the 6.0 release) and then are
backported to the stable branch (currently branch_5x).

If a change is particularly controversial, difficult to directly and
reliably test, very large, or breaks backward compatibility in a major
way, then it may not get backported to the stable branch.  The feature
will remain unreleased until a large amount of work is completed to make
a new major version ready for release.

Sometimes a new feature will be committed to trunk, worked on for
several weeks or months to make sure it's ready, and *then* get moved to
the stable branch for release.

I admit to not really understanding what SOLR-7584 is about, but if the
code (when it's fully done) is solid and includes some tests to make
sure it works, it doesn't look like something that needs to wait for
6.0.  The patch is somewhat large, at 50K, but it looks like the bulk of
it is new code, not large-scale changes to existing code.

Thanks,
Shawn



Re: How to show some (paid) documents ahead of others (non-paid) - fantasy scenario

2015-10-11 Thread liviuchristian
Hi, 
What if we write all paid results in a new, dedicated, core... let's call it: 
"PaidResultsCore" and lets call the non-paid results core: "NonPaidResultsCore"
When a user asks for "red pepper" we first perform the query upon 
"PaidResultsCore" and get the first ranking 3 results and then we perform the 
query upon "NonPaidResultsCore" and get the first ranking 9 results. Then we 
mix them all together and deliver a 12 results page to the user. 

Could that be achieved and how???
Thank you,Christian
 Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570
  From: Upayavira 
 To: solr-user@lucene.apache.org 
 Sent: Saturday, October 10, 2015 6:13 PM
 Subject: Re: How to show some documents ahead of others - requirements
   
I've seen a similar requirement to this recently.

Basically, a sorting requirement that is close to impossible to
implement as a scoring/boosting formula, because the *position* of the
result features in the score, and that's not something I believe can be
done right now.

The way we solved the issue in the similar case I referred to above was
by using a RerankQuery. That query class has a getTopDocsCollector()
function, which you can override, providing your own Collector.

If you then refer to your query(actually your query parser) with the
rerank query param in Solr: rq={!myRerankQuery} then it will trigger
your new collector, which will be given its topDocs() method is called,
will call topDocs on its parent query, get a list of documents, then
order them in some way such as you require, and return them in a
non-score order.

Not sure I've made that very clear, but hope it helps a little.

Upayavira



On Sat, Oct 10, 2015, at 03:13 PM, liviuchrist...@yahoo.com.INVALID
wrote:
> Hi Upayavira & Walter & everyone else
> 
> About the requirements:1. I need to return no more than 3 paid results on
> a page of 12 results2. Paid results should be sorted like this: let's say
> a user is searching for: "chocolate almonds cake"Now, lets say that 2000
> results match the query and there are about 10 of these that are "paid
> results".I need to list the first 3 (1-2-3) of the paid results (in their
> ranking decreasing order) on the first page (maybe by improving the
> ranking of the 20 paid results over the non-paid ones and listing the
> first 3 of them.) and then listing 9 non-paid results on the page in
> their ranking decreasing order.
> Then, on the second page, I want to list first the next 3 paid results
> (4-5-6) and so on.
> 
> Kind regards,Christian
>  Christian Fotache Tel: 0728.297.207 
> 
>      From: Upayavira 
>  To: solr-user@lucene.apache.org 
>  Sent: Thursday, October 8, 2015 7:03 PM
>  Subject: Re: How to show some documents ahead of others
>    
> Hence the suggestion to group by the paid field - would give you two
> lists of the number you ask for.
> 
> What I'm trying to say is that the QueryElevationComponent might do it,
> but it is also relatively clunky, so a pure search solution might do it.
> 
> However, the thing we lack right now is a full take on the requirements,
> e.g. how should paid results be sorted, how many paid results do you
> show, etc, etc. Without these details we're all guessing.
> 
> Upayavira
> 
> 
> On Thu, Oct 8, 2015, at 04:45 PM, Walter Underwood wrote:
> > Sorting all paid above all unpaid will give bad results when there are
> > many matches. It will show 1000 paid items, include all the barely
> > relevant ones, before it shows the first highly relevant unpaid recipe.
> > What if that was the only correct result?
> > 
> > Two approaches that work:
> > 
> > 1. Boost paid items using the “boost” parameter in edismax. Adjust it to
> > be a tiebreaker between documents with similar score.
> > 
> > 2. Show two lists, one with the five most relevant paid, the next with
> > the five most relevant unpaid.
> > 
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> > 
> > 
> > > On Oct 8, 2015, at 7:39 AM, Alessandro Benedetti 
> > >  wrote:
> > > 
> > > Is it possible to understand better this : "as it doesn't
> > > allow any meaningful customization " ?
> > > 
> > > Cheers
> > > 
> > > On 8 October 2015 at 15:27, Andrea Roggerone 
> > >  > >> wrote:
> > > 
> > >> Hi guys,
> > >> I don't think that sorting is a good solution in this case as it doesn't
> > >> allow any meaningful customization.I believe that the advised
> > >> QueryElevationComponent is one of the viable alternative. Another one 
> > >> would
> > >> be to boost at query time a particular field, like for instance paid. 
> > >> That
> > >> would allow you to assign different boosts to different values using a
> > >> function.
> > >> 
> > >> On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:
> > >> 
> > >>> Or just have a field in your index -
> > >>> 
> > >>> paid: true/false
> > >>> 
> > >>> Then sort=paid desc, 

Re: Solr cross core join special condition

2015-10-11 Thread Ali Nazemian
Dear Susheel,
Hi,

I did check the jira issue that you mentioned but it seems its target is
Solr 6! Am I correct? The patch failed for Solr 5.3 due to class not found.
For Solr 5.x should I try to implement something similar myself?

Sincerely yours.


On Wed, Oct 7, 2015 at 7:15 PM, Susheel Kumar  wrote:

> You may want to take a look at new Solr feature of Streaming API &
> Expressions
> https://issues.apache.org/jira/browse/SOLR-7584?filter=12333278
> for making joins between collections.
>
> On Wed, Oct 7, 2015 at 9:42 AM, Ryan Josal  wrote:
>
> > I developed a join transformer plugin that did that (although it didn't
> > flatten the results like that).  The one thing that was painful about it
> is
> > that the TextResponseWriter has references to both the IndexSchema and
> > SolrReturnFields objects for the primary core.  So when you add a
> > SolrDocument from another core it returned the wrong fields.  I worked
> > around that by transforming the SolrDocument to a NamedList.  Then when
> it
> > gets to processing the IndexableFields it uses the wrong IndexSchema, I
> > worked around that by transforming each field to a hard Java object
> > (through the IndexSchema and FieldType of the correct core).  I think it
> > would be great to patch TextResponseWriter with multi core writing
> > abilities, but there is one question, how can it tell which core a
> > SolrDocument or IndexableField is from?  Seems we'd have to add an
> > attribute for that.
> >
> > The other possibly simpler thing to do is execute the join at index time
> > with an update processor.
> >
> > Ryan
> >
> > On Tuesday, October 6, 2015, Mikhail Khludnev <
> mkhlud...@griddynamics.com>
> > wrote:
> >
> > > On Wed, Oct 7, 2015 at 7:05 AM, Ali Nazemian  > > > wrote:
> > >
> > > > it
> > > > seems there is not any way to do that right now and it should be
> > > developed
> > > > somehow. Am I right?
> > > >
> > >
> > > yep
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > >
> > >
> >
>



-- 
A.Nazemian


Re: How to show some (paid) documents ahead of others (non-paid) - fantasy scenario

2015-10-11 Thread Upayavira
I think Walter suggested the simplest: make two requests. When you've
got both results back, you can stick them together to make results.

At present, there is no method to do multiple actions within a single
request.

Upayavira

On Sun, Oct 11, 2015, at 01:38 PM, liviuchrist...@yahoo.com.INVALID
wrote:
> Hi, 
> What if we write all paid results in a new, dedicated, core... let's call
> it: "PaidResultsCore" and lets call the non-paid results core:
> "NonPaidResultsCore"
> When a user asks for "red pepper" we first perform the query upon
> "PaidResultsCore" and get the first ranking 3 results and then we perform
> the query upon "NonPaidResultsCore" and get the first ranking 9 results.
> Then we mix them all together and deliver a 12 results page to the user. 
> 
> Could that be achieved and how???
> Thank you,Christian
>  Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570
>   From: Upayavira 
>  To: solr-user@lucene.apache.org 
>  Sent: Saturday, October 10, 2015 6:13 PM
>  Subject: Re: How to show some documents ahead of others - requirements
>
> I've seen a similar requirement to this recently.
> 
> Basically, a sorting requirement that is close to impossible to
> implement as a scoring/boosting formula, because the *position* of the
> result features in the score, and that's not something I believe can be
> done right now.
> 
> The way we solved the issue in the similar case I referred to above was
> by using a RerankQuery. That query class has a getTopDocsCollector()
> function, which you can override, providing your own Collector.
> 
> If you then refer to your query(actually your query parser) with the
> rerank query param in Solr: rq={!myRerankQuery} then it will trigger
> your new collector, which will be given its topDocs() method is called,
> will call topDocs on its parent query, get a list of documents, then
> order them in some way such as you require, and return them in a
> non-score order.
> 
> Not sure I've made that very clear, but hope it helps a little.
> 
> Upayavira
> 
> 
> 
> On Sat, Oct 10, 2015, at 03:13 PM, liviuchrist...@yahoo.com.INVALID
> wrote:
> > Hi Upayavira & Walter & everyone else
> > 
> > About the requirements:1. I need to return no more than 3 paid results on
> > a page of 12 results2. Paid results should be sorted like this: let's say
> > a user is searching for: "chocolate almonds cake"Now, lets say that 2000
> > results match the query and there are about 10 of these that are "paid
> > results".I need to list the first 3 (1-2-3) of the paid results (in their
> > ranking decreasing order) on the first page (maybe by improving the
> > ranking of the 20 paid results over the non-paid ones and listing the
> > first 3 of them.) and then listing 9 non-paid results on the page in
> > their ranking decreasing order.
> > Then, on the second page, I want to list first the next 3 paid results
> > (4-5-6) and so on.
> > 
> > Kind regards,Christian
> >  Christian Fotache Tel: 0728.297.207 
> > 
> >      From: Upayavira 
> >  To: solr-user@lucene.apache.org 
> >  Sent: Thursday, October 8, 2015 7:03 PM
> >  Subject: Re: How to show some documents ahead of others
> >    
> > Hence the suggestion to group by the paid field - would give you two
> > lists of the number you ask for.
> > 
> > What I'm trying to say is that the QueryElevationComponent might do it,
> > but it is also relatively clunky, so a pure search solution might do it.
> > 
> > However, the thing we lack right now is a full take on the requirements,
> > e.g. how should paid results be sorted, how many paid results do you
> > show, etc, etc. Without these details we're all guessing.
> > 
> > Upayavira
> > 
> > 
> > On Thu, Oct 8, 2015, at 04:45 PM, Walter Underwood wrote:
> > > Sorting all paid above all unpaid will give bad results when there are
> > > many matches. It will show 1000 paid items, include all the barely
> > > relevant ones, before it shows the first highly relevant unpaid recipe.
> > > What if that was the only correct result?
> > > 
> > > Two approaches that work:
> > > 
> > > 1. Boost paid items using the “boost” parameter in edismax. Adjust it to
> > > be a tiebreaker between documents with similar score.
> > > 
> > > 2. Show two lists, one with the five most relevant paid, the next with
> > > the five most relevant unpaid.
> > > 
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > > 
> > > 
> > > > On Oct 8, 2015, at 7:39 AM, Alessandro Benedetti 
> > > >  wrote:
> > > > 
> > > > Is it possible to understand better this : "as it doesn't
> > > > allow any meaningful customization " ?
> > > > 
> > > > Cheers
> > > > 
> > > > On 8 October 2015 at 15:27, Andrea Roggerone 
> > > >  > > >> wrote:
> > > > 
> > > >> Hi guys,
> > > >> I don't think that sorting is a good solution in this case as it 
> > > >> doesn't
> > > >> allow any 

Re: How to show some (paid) documents ahead of others (non-paid) - fantasy scenario

2015-10-11 Thread Mikhail Khludnev
On Sun, Oct 11, 2015 at 3:38 PM,  wrote:

> Hi,
> What if we write all paid results in a new, dedicated, core... let's call
> it: "PaidResultsCore" and lets call the non-paid results core:
> "NonPaidResultsCore"
> When a user asks for "red pepper" we first perform the query upon
> "PaidResultsCore" and get the first ranking 3 results and then we perform
> the query upon "NonPaidResultsCore" and get the first ranking 9 results.
> Then we mix them all together and deliver a 12 results page to the user.
>

you can experiment with sending =, or
similarly =.. see
https://cwiki.apache.org/confluence/display/solr/Advanced+Distributed+Request+Options
Also, .however, there are no precise control over relevance and merging,
fwiw it might be a handy extension for SolrCloud.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: How to show some (paid) documents ahead of others (non-paid) - fantasy scenario

2015-10-11 Thread Alexandre Rafalovitch
What about Streaming Expressions? Could they be used here? Disclaimer:
I have not used them myself yet.

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 11 October 2015 at 13:56, Upayavira  wrote:
> I think Walter suggested the simplest: make two requests. When you've
> got both results back, you can stick them together to make results.
>
> At present, there is no method to do multiple actions within a single
> request.
>
> Upayavira
>
> On Sun, Oct 11, 2015, at 01:38 PM, liviuchrist...@yahoo.com.INVALID
> wrote:
>> Hi,
>> What if we write all paid results in a new, dedicated, core... let's call
>> it: "PaidResultsCore" and lets call the non-paid results core:
>> "NonPaidResultsCore"
>> When a user asks for "red pepper" we first perform the query upon
>> "PaidResultsCore" and get the first ranking 3 results and then we perform
>> the query upon "NonPaidResultsCore" and get the first ranking 9 results.
>> Then we mix them all together and deliver a 12 results page to the user.
>>
>> Could that be achieved and how???
>> Thank you,Christian
>>  Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570
>>   From: Upayavira 
>>  To: solr-user@lucene.apache.org
>>  Sent: Saturday, October 10, 2015 6:13 PM
>>  Subject: Re: How to show some documents ahead of others - requirements
>>
>> I've seen a similar requirement to this recently.
>>
>> Basically, a sorting requirement that is close to impossible to
>> implement as a scoring/boosting formula, because the *position* of the
>> result features in the score, and that's not something I believe can be
>> done right now.
>>
>> The way we solved the issue in the similar case I referred to above was
>> by using a RerankQuery. That query class has a getTopDocsCollector()
>> function, which you can override, providing your own Collector.
>>
>> If you then refer to your query(actually your query parser) with the
>> rerank query param in Solr: rq={!myRerankQuery} then it will trigger
>> your new collector, which will be given its topDocs() method is called,
>> will call topDocs on its parent query, get a list of documents, then
>> order them in some way such as you require, and return them in a
>> non-score order.
>>
>> Not sure I've made that very clear, but hope it helps a little.
>>
>> Upayavira
>>
>>
>>
>> On Sat, Oct 10, 2015, at 03:13 PM, liviuchrist...@yahoo.com.INVALID
>> wrote:
>> > Hi Upayavira & Walter & everyone else
>> >
>> > About the requirements:1. I need to return no more than 3 paid results on
>> > a page of 12 results2. Paid results should be sorted like this: let's say
>> > a user is searching for: "chocolate almonds cake"Now, lets say that 2000
>> > results match the query and there are about 10 of these that are "paid
>> > results".I need to list the first 3 (1-2-3) of the paid results (in their
>> > ranking decreasing order) on the first page (maybe by improving the
>> > ranking of the 20 paid results over the non-paid ones and listing the
>> > first 3 of them.) and then listing 9 non-paid results on the page in
>> > their ranking decreasing order.
>> > Then, on the second page, I want to list first the next 3 paid results
>> > (4-5-6) and so on.
>> >
>> > Kind regards,Christian
>> >  Christian Fotache Tel: 0728.297.207
>> >
>> >  From: Upayavira 
>> >  To: solr-user@lucene.apache.org
>> >  Sent: Thursday, October 8, 2015 7:03 PM
>> >  Subject: Re: How to show some documents ahead of others
>> >
>> > Hence the suggestion to group by the paid field - would give you two
>> > lists of the number you ask for.
>> >
>> > What I'm trying to say is that the QueryElevationComponent might do it,
>> > but it is also relatively clunky, so a pure search solution might do it.
>> >
>> > However, the thing we lack right now is a full take on the requirements,
>> > e.g. how should paid results be sorted, how many paid results do you
>> > show, etc, etc. Without these details we're all guessing.
>> >
>> > Upayavira
>> >
>> >
>> > On Thu, Oct 8, 2015, at 04:45 PM, Walter Underwood wrote:
>> > > Sorting all paid above all unpaid will give bad results when there are
>> > > many matches. It will show 1000 paid items, include all the barely
>> > > relevant ones, before it shows the first highly relevant unpaid recipe.
>> > > What if that was the only correct result?
>> > >
>> > > Two approaches that work:
>> > >
>> > > 1. Boost paid items using the “boost” parameter in edismax. Adjust it to
>> > > be a tiebreaker between documents with similar score.
>> > >
>> > > 2. Show two lists, one with the five most relevant paid, the next with
>> > > the five most relevant unpaid.
>> > >
>> > > wunder
>> > > Walter Underwood
>> > > wun...@wunderwood.org
>> > > http://observer.wunderwood.org/  (my blog)
>> > >
>> > >
>> > > > On Oct 8, 2015, at 7:39 AM, Alessandro Benedetti 
>> > > > 

Re: admin-extra

2015-10-11 Thread Bill Au
admin-extra allows one to include additional links and/or information in
the Solr admin main page:

https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools

Bill

On Wed, Oct 7, 2015 at 5:40 PM, Upayavira  wrote:

> Do you use admin-extra within the admin UI?
>
> If so, please go to [1] and document your use case. The feature
> currently isn't implemented in the new admin UI, and without use-cases,
> it likely won't be - so if you want it in there, please help us
> understand how you use it!
>
> Thanks!
>
> Upayavira
>
> [1] https://issues.apache.org/jira/browse/SOLR-8140
>


Re: Solr cross core join special condition

2015-10-11 Thread Susheel Kumar
Yes, Ali.  These are targeted for Solr 6 but you have the option download
source from trunk, build it and try out these features if that helps in the
meantime.

Thanks
Susheel

On Sun, Oct 11, 2015 at 10:01 AM, Ali Nazemian 
wrote:

> Dear Susheel,
> Hi,
>
> I did check the jira issue that you mentioned but it seems its target is
> Solr 6! Am I correct? The patch failed for Solr 5.3 due to class not found.
> For Solr 5.x should I try to implement something similar myself?
>
> Sincerely yours.
>
>
> On Wed, Oct 7, 2015 at 7:15 PM, Susheel Kumar 
> wrote:
>
> > You may want to take a look at new Solr feature of Streaming API &
> > Expressions
> > https://issues.apache.org/jira/browse/SOLR-7584?filter=12333278
> > for making joins between collections.
> >
> > On Wed, Oct 7, 2015 at 9:42 AM, Ryan Josal  wrote:
> >
> > > I developed a join transformer plugin that did that (although it didn't
> > > flatten the results like that).  The one thing that was painful about
> it
> > is
> > > that the TextResponseWriter has references to both the IndexSchema and
> > > SolrReturnFields objects for the primary core.  So when you add a
> > > SolrDocument from another core it returned the wrong fields.  I worked
> > > around that by transforming the SolrDocument to a NamedList.  Then when
> > it
> > > gets to processing the IndexableFields it uses the wrong IndexSchema, I
> > > worked around that by transforming each field to a hard Java object
> > > (through the IndexSchema and FieldType of the correct core).  I think
> it
> > > would be great to patch TextResponseWriter with multi core writing
> > > abilities, but there is one question, how can it tell which core a
> > > SolrDocument or IndexableField is from?  Seems we'd have to add an
> > > attribute for that.
> > >
> > > The other possibly simpler thing to do is execute the join at index
> time
> > > with an update processor.
> > >
> > > Ryan
> > >
> > > On Tuesday, October 6, 2015, Mikhail Khludnev <
> > mkhlud...@griddynamics.com>
> > > wrote:
> > >
> > > > On Wed, Oct 7, 2015 at 7:05 AM, Ali Nazemian  > > > > wrote:
> > > >
> > > > > it
> > > > > seems there is not any way to do that right now and it should be
> > > > developed
> > > > > somehow. Am I right?
> > > > >
> > > >
> > > > yep
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > Principal Engineer,
> > > > Grid Dynamics
> > > >
> > > > 
> > > > >
> > > >
> > >
> >
>
>
>
> --
> A.Nazemian
>


Re: Unexpected delayed document deletion with atomic updates

2015-10-11 Thread John Smith
Hi Allessandro,

In the example I set the value to 1, but it's actually incremented in
the code, so with time it should go up. You're right though, I could use
an inc update instead.

John


On 08/10/15 16:45, Alessandro Benedetti wrote:
> Not related to the deletion problem, only as a curiosity for your use case :
>
> 1
>
> Have i misunderstood your use case, or you should use :
>
> inc
>
> Increments a numeric value by a specific amount.
>
> Must be specified as a single numeric value.
>
> Basically overtime you click, you always set the value for that field to
> "1" .
> So a document with 1 click will be considered equal to one with 1000 clicks.
> My 2 cents
>
> Cheers
>
> On 8 October 2015 at 14:10, John Smith  wrote:
>
>> Well, every day we update a lot of documents (usually several millions)
>> so the DIH is a good fit.
>>
>> Calling the update chain would make sense there: after all a data import
>> is just a batch update. Otherwise, the same operations would have to be
>> made upfront, possibly in another environment and/or language. That's
>> probably what I'm gonna do anyway.
>>
>> Thanks for your help!
>> John
>>
>>
>> On 08/10/15 13:39, Upayavira wrote:
>>> You can either specify the update chain via an update.chain request
>>> parameter, or you can configure a new request parameter with its own URL
>>> and separate update.chain value.
>>>
>>> I have no idea how you would then reference that in the DIH - I've never
>>> really used it.
>>>
>>> Upayavira
>>>
>>> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
 After some further investigation, for those interested: the
 SignatureUpdateProcessorFactory fields were somehow mis-configured (I
 guess copied over from another collection). The initial import had been
 made using a data import handler: I suppose the update chain isn't
 called in this process and no signature field is created - am I right?.

 The first time a document was updated, a signature field with value
 "" was added. The next time, the same signature was
 generated for the new udpate, which triggered the deletion of all
 documents with the same signature (i.e. the first one) as overwriteDupes
 was set to true. Correct behavior but quite tricky...

 So my conclusion here (please correct me if I'm wrong) is of course to
 fix the signature configuration problem, but also to manage calling the
 update chain (or maybe a simplified one, e.g. by skipping logging) in
 the data import handler. Is there an easy way to do this? Conceptually,
 shouldn't the update chain be callable from the data import process -
 maybe it is?

 John


 On 08/10/15 09:43, Upayavira wrote:
> Yay!
>
> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>> Yes indeed, the update chain had been activated... I commented it out
>> again and the problem vanished.
>>
>> Good job, thanks Erick and Upayavira!
>> John
>>
>>
>> On 08/10/15 08:58, Upayavira wrote:
>>> Look for the DedupUpdateProcessor in an update chain.
>>>
>>> that is there, but commented out IIRC in the techproducts sample
>>> configs.
>>>
>>> Perhaps you uncommented it to use your own update processors, but
>> didn't
>>> remove that component?
>>>
>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
 Oh, I forgot Erick's mention of the logs: there's nothing unusual in
 INFO level, the update request just gets mentioned. No exception. I
 reran it with the DEBUG level, but most of the log was related to
>> jetty.
 Here's a line I noticed though:

 org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
 {wt=json=true=dedupe}

 The update.chain parameter wasn't part of the original request, and
 "dedupe" looks suspicious to me. Perhaps should I investigate
>> further
 there?

 Thanks,
 John.


 On 08/10/15 08:25, John Smith wrote:
> The ids are all different: they're unique numbers followed by a
>> couple
> of keywords. I've made a test with a small collection of 10
>> documents to
> make sure I can manage them manually: all ids are confirmed as
>> different.
> I also dumped the exact command, here's one example:
>
> 101084385_Sebago_ sebago
>> shoes name="Clicks" update="set">1 update="set">1.8701925463775
>
> It's sent as the body of a POST request to
> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true,
>> with a
> Content-Type: text/xml header. I still noted the consistent loss of
> another document with the update above.
>
> John
>
>
> On 08/10/15 00:38, Upayavira wrote:
>> What ID are you using? Are you possibly using the same ID field
>> 

Indexing logs when using post,jar

2015-10-11 Thread Zheng Lin Edwin Yeo
Hi,

I am using Solr 5.3.0, and I would like to find out, is the logs for the
indexing using post.jar stored anywhere in Solr?

I would need to know which files has been successfully indexed and which
has not, so that I can re-run the indexing for those files which has not
been indexed successfully due to various reasons.

Thank you.

Regards,
Edwin