Re: Why do I get different results for the same query with two Solr versions?

2021-01-04 Thread nettadalet
Tulsi wrote
> Can you post the managed schema and solrconfig content here ?

Schema for the 4.6 index (I omitted all non-relevant data):






























Schema for the 7.5 index (I omitted all non-relevant data):






























About the solrconfig.xml file - I don't think I can share it because it may
contain sensitive information. Is there something specific from this file
that may be relevant for our discussion?


Tulsi wrote
> Do try the solr admin analysis screen
> once as well to see the behaviour for this field.
> https://lucene.apache.org/solr/guide/7_6/index.html

I looked at the analysis screen, but it wasn't helpful. That's why I started
using the "debug=query" parameter and the content of parsedquery.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why do I get different results for the same query with two Solr versions?

2020-12-29 Thread Tulsi Das
Can you post the managed schema and solrconfig content here ?

Do try the solr admin analysis screen
once as well to see the behaviour for this field.

https://lucene.apache.org/solr/guide/7_6/index.html

On Sun, 27 Dec, 2020, 6:54 pm nettadalet,  wrote:

> Thank you, that was helpful!
>
> For Solr 4.6 I get
> "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
> For Solr 7.5 I get
> "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
> +TITLE_ItemCode_t:7)))"
>
> So this is the cause of the difference in the search result, but I still
> don't know why the parsedquery is different between the two versions.
> Any idea/guess?
> Is it some internal implementation that changed sometime between 4.6 and
> 7.5?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-29 Thread nettadalet
Hi,
thank for the comment, but I tried to use both "sow=false" and "saw=true"
and I still get the same result. For query (TITLE_ItemCode_t:KI_7) I still
see:
Solr 4.6: "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
Solr 7.5: "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"



Tulsi wrote
> Hi ,
> Yes this look like related to sow (split on whitespace) param default
> behaviour change in solr 7.
> 
> The sow parameter (short for "Split on Whitespace") now defaults to
> false, which allows support for multi-word synonyms out of the box.
> This parameter is used with the eDismax and standard/"lucene" query
> parsers. If this parameter is not explicitly specified as true, query
> text will not be split on whitespace before analysis.
> 
> https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
> 
> 
> On Sun, 27 Dec, 2020, 8:25 pm nettadalet, <

> nsteinberg@

> > wrote:
> 
>> I added "defType=lucene" to both searches to make sure I use the same
>> query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re:Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-28 Thread xiefengchang



SOW default to false?
but this seems to be true right??
For Solr 7.5 I get
"parsedquery":"+(+(text1:ki7 (+text1:ki
+text1:7)))"














At 2020-12-28 01:13:29, "Tulsi Das"  wrote:
>Hi ,
>Yes this look like related to sow (split on whitespace) param default
>behaviour change in solr 7.
>
>The sow parameter (short for "Split on Whitespace") now defaults to
>false, which allows support for multi-word synonyms out of the box.
>This parameter is used with the eDismax and standard/"lucene" query
>parsers. If this parameter is not explicitly specified as true, query
>text will not be split on whitespace before analysis.
>
>https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
>
>
>On Sun, 27 Dec, 2020, 8:25 pm nettadalet,  wrote:
>
>> I added "defType=lucene" to both searches to make sure I use the same query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread Tulsi Das
Hi ,
Yes this look like related to sow (split on whitespace) param default
behaviour change in solr 7.

The sow parameter (short for "Split on Whitespace") now defaults to
false, which allows support for multi-word synonyms out of the box.
This parameter is used with the eDismax and standard/"lucene" query
parsers. If this parameter is not explicitly specified as true, query
text will not be split on whitespace before analysis.

https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html


On Sun, 27 Dec, 2020, 8:25 pm nettadalet,  wrote:

> I added "defType=lucene" to both searches to make sure I use the same query
> parser, but it didn't change the results.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet
I added "defType=lucene" to both searches to make sure I use the same query
parser, but it didn't change the results.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet
I'm not sure how to check the implementation of the query parser, or how to
change the query parser that I use. I think I'm using the standard query
parser.

I use Solr Admin to run the queries. If I look at the URL, I see
Solr 4.6:
select?q=TITLE_ItemCode_t:KI_7&fl=TITLE_ItemCode_t
Solr 7.5:
select?q=TITLE_ItemCode_t:KI_7&fl=TITLE_ItemCode_t

Should I change something?
Where should I look?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread xiefengchang
which query parser are you using? I think to answer your question, you need to 
check the implementation of the query parser

















At 2020-12-27 21:23:59, "nettadalet"  wrote:
>Thank you, that was helpful!
>
>For Solr 4.6 I get 
>"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
>For Solr 7.5 I get
>"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
>+TITLE_ItemCode_t:7)))"
>
>So this is the cause of the difference in the search result, but I still
>don't know why the parsedquery is different between the two versions.
>Any idea/guess?
>Is it some internal implementation that changed sometime between 4.6 and
>7.5?
>
>
>
>--
>Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet
Thank you, that was helpful!

For Solr 4.6 I get 
"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"

For Solr 7.5 I get
"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"

So this is the cause of the difference in the search result, but I still
don't know why the parsedquery is different between the two versions.
Any idea/guess?
Is it some internal implementation that changed sometime between 4.6 and
7.5?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread Tulsi Das
Hi,
Try adding debug=true or debug=query in the url and see the formed query at
the end .
You will get to know why the results are different.


On Thu, 24 Dec, 2020, 8:05 pm nettadalet,  wrote:

> Hello,
>
> I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
> search with both versions, I get different results, and I don't know why
>
> I have the following *field type definition in Solr 4.6*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
>
> I have the following *field type definition in Solr 7.5*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> ignoreCase="true"
>words="stopwords.txt"
>/>
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
> * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
> solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
> but the result was the same.
>
> I have the following *6 values set for field text1 of type text_type1 for 6
> different documents* (the type(s) from above):
> KI_d5e7b43a
> KI_b7c490bd
> KI_7df2f026
> KI_fa7d129d
> KI_5867aec7
> KI_7c3c0b93
>
>
> My query is *text1=KI_7*.
> Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
> Using Solr 7.5, I get all 6 results.
>
> Questions:
> 1. How come I get different results with the same data, when my fields
> definitions are the same (as far as I can tell)?
>
> 2. What are the expected results?
> I think that the results Solr 7.5 returns are the correct ones, since at
> the
> end of the of the analysis I get *KA* as a term and *7* as a term, both
> during the indexing analysis and the query analysis, so, to my
> understanding, all 6 results should be found.
> Is this correct? if not, what am I missing? what don't I understand
> correctly?
>
> I would very much appreciate a full/partial answer, but even a link that
> could explain at least the expected results part would be great.
>
> Thanks in advance, I know this might be a tough one to answer [Hope not
> :)]
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread nettadalet
Hello,

I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why

I have the following *field type definition in Solr 4.6*:



















I have the following *field type definition in Solr 7.5*:



















* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.

I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93


My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.

Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?

2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?

I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great. 

Thanks in advance, I know this might be a tough one to answer [Hope not  :)]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
> : I am going to adjust my schema, re-index, and try again. See if that
> : doesn't fix this problem. I didn't know that having the uniqueKey be a
> : textField was a bad idea.
>
>
> https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey
>
> "The fieldType of uniqueKey must not be analyzed"
>
> (hence my comment baout "possible, but hard to get right ... you can use
> something like the KeywordTokenizer, but at that point you might as well
> use StrField except in some really esoteric special situations)
>
>
Good news. I added a field called ID, and made it string. Then I deleted
documents, re-indexed my data, and tried the search again.

Now solrResults size and numFound size are exactly the same.

Thanks for your help.

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread Chris Hostetter


: > whoa... that's not normal .. what *exactly* does the fieldType declaration
: > (with all analyzers) look like, and what does the  declaration
: > look like?
: >
: >
: 
: 
: 

NOTE: "text_general" != "text_gen_sort"

Assuming your "text_general" declaration looks like it does in the 
_default config set, then using that for uniqueKey or sorting is definitly 
not a good idea.

If you were *actually* using SortableTextField for your uniqueKeyField ... 
well, that should be ok to *sort* on, but i still wouldn't suggest using 
it as a uniqueKey field ... honestly not sure what behavior that might 
have with things like deleteById, etc...


: I am going to adjust my schema, re-index, and try again. See if that
: doesn't fix this problem. I didn't know that having the uniqueKey be a
: textField was a bad idea.

https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey

"The fieldType of uniqueKey must not be analyzed"

(hence my comment baout "possible, but hard to get right ... you can use 
something like the KeywordTokenizer, but at that point you might as well 
use StrField except in some really esoteric special situations)



-Hoss
http://www.lucidworks.com/


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 12:18 PM Chris Hostetter 
wrote:

>
> : > a) What is the fieldType of the uniqueKey field in use?
> : >
> :
> : It is a textField
>
> whoa... that's not normal .. what *exactly* does the fieldType declaration
> (with all analyzers) look like, and what does the  declaration
> look like?
>
>




  
  
  


  
  
  
  

  



> you should really never use TextField for a uniqueKey ... it's possible,
> but incredibly tricky to get "right".
>
>
I am going to adjust my schema, re-index, and try again. See if that
doesn't fix this problem. I didn't know that having the uniqueKey be a
textField was a bad idea.


> Independent from that, "sorting" on a TextField doesn't always do what you
> might think (again: depending on the analysis in use)
>
> With a cursorMark you have other factors to consider: i bet what's
> happening is that the post-analysis terms for your docs result it
> duplicate values, so the cursorMark is skipping all docs that have hte
> same (post analysis) sort value ... this could also manifest itself in
> other weird ways, like trying to deleteById.
>
> Step #1: switch to using a simple StrField for your uniqueKey field and
> see if htat solves all your problems.
>
>
Thanks, doing this now.

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread Chris Hostetter


: > a) What is the fieldType of the uniqueKey field in use?
: >
: 
: It is a textField

whoa... that's not normal .. what *exactly* does the fieldType declaration 
(with all analyzers) look like, and what does the  declaration 
look like?

you should really never use TextField for a uniqueKey ... it's possible, 
but incredibly tricky to get "right".

Independent from that, "sorting" on a TextField doesn't always do what you 
might think (again: depending on the analysis in use)

With a cursorMark you have other factors to consider: i bet what's 
happening is that the post-analysis terms for your docs result it 
duplicate values, so the cursorMark is skipping all docs that have hte 
same (post analysis) sort value ... this could also manifest itself in 
other weird ways, like trying to deleteById.

Step #1: switch to using a simple StrField for your uniqueKey field and 
see if htat solves all your problems.


-Hoss
http://www.lucidworks.com/


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Mon, Nov 11, 2019 at 8:32 PM Chris Hostetter 
wrote:

>
> Based on the info provided, it's hard to be certain, but reading between
> the lines here are hte assumptions i'm making...
>
> 1) your core name is "dbtr"
> 2) the uniqueId field for the "dbtr" core is "debtor_id"
>
> ..are those assumptions correct?
>

Yes they are. Sorry I didn't provide that from the beginning.


> Two key pieces of information that doesn't seem to be assumable from the
> imfo you've provided:
>
> a) What is the fieldType of the uniqueKey field in use?
>

It is a textField


> b) how are you determining that "The numFound: 35008"
>
>
I do a preliminary query to the solr core and print out the numFound from
this:

 my $solrResponse = $ua->post( $solrURI );

 my $decoded = decode_json( $solrResponse->{_content} );
 my $numFound = $decoded->{response}{numFound};


> ...
>
> You show the code that prints out "size of solrResults: 22006" but nothing
> in your code ever prints $numFound.  there is a snippet of code at the top
>

I am printing numFound every time it loops. This should remain constant,
because it is the total of all documents found. It's not really necessary
that I am printing it.

The number of docs is the size that I also print, and that is 1000 every
time, until the last little bit, and then it is 6 docs found.


> of your perl logic that seems disconnected from the rest of the code which
> makes me think that before you do anything with a cursor you are already
> parsing some *other* query response to get $numFound that way...
>
>
I am running this query first, to get the cursor set:

"http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id
asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*"

This sets the cursor, and then returns a cursorMark that I start using in
order to grab 1000 documents at a time.



> ...what exactly does all the code *before* this look like? what is the
> request that you are using to get that initial '$solrResponse' that you
> are parsing to extract '$numFound'  are you sure it's exactly the same as
> the query whose cursor you are iterating over?
>
>
query from before the loop:

"http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id
asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*"

query in the loop:

http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id:
608384 OR debt_id: 393291&cursorMark=AoElMTg1MzE=

I do have some logic to make sure i grab the first 1000 from the first
query, but other than that, it's a simple loop.


> It looks like you are (also) extracting 'my $numFound =
> $decoded->{response}{numFound};' on every (cusor) request ... what do you
> get if add this to your cursor loop...
>
>print STDERR "numFound = $numFound at '$cursor'";
>
> numFound is always 35008 because that is how many total documents are
found. The number of docs in the response is the number that I care about,
because that shows me how many came back for this slice.


> ...because unless documents are being added/deleted as you iterate over
> hte cursor, the numFound value should be consistent on each request.
>
>
numFound is consistently 35008.

Thanks

Rhys


Re: different results in numFound vs using the cursor

2019-11-11 Thread Chris Hostetter


Based on the info provided, it's hard to be certain, but reading between 
the lines here are hte assumptions i'm making...

1) your core name is "dbtr"
2) the uniqueId field for the "dbtr" core is "debtor_id"

..are those assumptions correct?

Two key pieces of information that doesn't seem to be assumable from the 
imfo you've provided:

a) What is the fieldType of the uniqueKey field in use?
b) how are you determining that "The numFound: 35008"

...

You show the code that prints out "size of solrResults: 22006" but nothing 
in your code ever prints $numFound.  there is a snippet of code at the top 
of your perl logic that seems disconnected from the rest of the code which 
makes me think that before you do anything with a cursor you are already 
parsing some *other* query response to get $numFound that way...

: i am using this logic in perl:
: 
: my $decoded = decode_json( $solrResponse->{_content} );
: my $numFound = $decoded->{response}{numFound};
: 
: $cursor = "*";
: $prevCursor = '';
: 
: while ( $prevCursor ne $cursor )
: {
:   my $solrURI = "\"http://[SOLR URL]:8983/solr/";
:   $solrURI .= $fdat{core};
...

...what exactly does all the code *before* this look like? what is the 
request that you are using to get that initial '$solrResponse' that you 
are parsing to extract '$numFound'  are you sure it's exactly the same as 
the query whose cursor you are iterating over?

It looks like you are (also) extracting 'my $numFound = 
$decoded->{response}{numFound};' on every (cusor) request ... what do you 
get if add this to your cursor loop...

   print STDERR "numFound = $numFound at '$cursor'";


...because unless documents are being added/deleted as you iterate over 
hte cursor, the numFound value should be consistent on each request.


-Hoss
http://www.lucidworks.com/


different results in numFound vs using the cursor

2019-11-11 Thread rhys J
i am using this logic in perl:

my $decoded = decode_json( $solrResponse->{_content} );
my $numFound = $decoded->{response}{numFound};

$cursor = "*";
$prevCursor = '';

while ( $prevCursor ne $cursor )
{
  my $solrURI = "\"http://[SOLR URL]:8983/solr/";
  $solrURI .= $fdat{core};

  $solrSort = ( $fdat{core} eq 'dbtr' ) ? "debtor_id+asc" : "id+asc";
  $solrOptions = "/select?indent=on&rows=$getrows&sort=$solrSort&q=";
  $solrURI .= $solrOptions;
  $solrURI .= $query;

 $solrURI .= ( $prevCursor eq '' ) ? "&cursorMark=*\"":
 "&cursorMark=$cursor\"";

 print STDERR "solrURI '$solrURI'\n";
 my $solrResponse = $ua->post( $solrURI );
   my $decoded = decode_json( $solrResponse->{_content} );
  my $numFound = $decoded->{response}{numFound};

 foreach my $d ( $decoded->{response}{docs} )
  {
  my @docs = @$d;
  print STDERR "size of docs '" . scalar( @docs ) . "'\n";
   foreach my $r ( @docs )
   {
   if ( $fdat{cust_num} and $fdat{core} eq 'dbtr' )
   {
   push ( @solrResults, $r->{debtor_id} );
   }
   elsif ( $fdat{cust_num} and $fdat{core} eq 'debt' )
   {
   push ( @solrResults, $r->{debt_id} );
   }
   }

}
   $prevCursor = ( $prevCursor eq '' ) ? "*" : $cursor;
 $cursor = $decoded->{nextCursorMark};
  print STDERR "cursor '$cursor'\n";
  print STDERR "prevCursor '$prevCursor'\n";
  print STDERR "size of solrResults '" . scalar( @solrResults ) . "'\n";
}

print out:

http://[SOLR
URL]:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id:
608384 OR debt_id: 393291&cursorMark=AoEmMzkzMjkx

The numFound: 35008
final size of solrResults: 22006

Am I missing something I should be using with cursorMark? Or is this
expected?

I've checked my logic, and I'm using the cursors the way this page is using
them in examples:

https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

Thanks

Rhys


Re: Different results due to sharding and problems with interesting terms in MLT

2019-09-27 Thread Lucky Sharma
Hi Salman,

1. For 1st one:
 One suggestion could be, don't create  [@, ., -, _, +, #, *] as
individual tokens. I guess you need to update your tokenizer in that case.

2. For the second issue, is the score of both the results same? If the
score is same and the queries are same then the reason would be  Lucene doc
ID. I have also observed the same thing in Solr 7.6.0, and my reason for
that was, docID for the same doc could be different in both the nodes. so
for making the same record order what you can do is, add  "id desc" as very
last stage of sorting

Regards,
Lucky Sharma

On Sat, 28 Sep, 2019, 8:22 am Salmaan Rashid Syed, <
salmaan.ras...@mroads.com> wrote:

> Hi Solr Users,
>
> I have two questions,
>
> 1) I am working on Solr 7.6 and I have incorporated MLT feature into it. I
> need to allow users to search on emails and skills, so I have allowed few
> of the special characters such as [@, ., -, _, +, #, *]. I am not using
> stemmer as it is removing letter "s" from many of the useful words like
> "AngularJS" to "AngularJ".
>
> Now when I enter a processed text as query into the search bar, I get "."
> as the "*most interesting term*" boosted by the highest order usually. I
> can't figure out how to remove this from interesting terms without removing
> it from the field I am searching in.
>
> 2) I have 2 shards per collections on two nodes 8983 and 7574 in cloud
> mode. I am getting different results for same query.
>
> I have come to know through reading forums and documentation that this is
> happening due to sharding and due to calculation of stats on individual
> sharding rather than on entire collection. So I implemented one of the
> solutions mentioned in forum/documentations in solrconfig.xml as follows,
>
> 
>
> It still doesn't works and gives different results for same query. Please
> let me know what can be done to avoid these issues.
>
> Regards,
> Salmaan
>


Different results due to sharding and problems with interesting terms in MLT

2019-09-27 Thread Salmaan Rashid Syed
Hi Solr Users,

I have two questions,

1) I am working on Solr 7.6 and I have incorporated MLT feature into it. I
need to allow users to search on emails and skills, so I have allowed few
of the special characters such as [@, ., -, _, +, #, *]. I am not using
stemmer as it is removing letter "s" from many of the useful words like
"AngularJS" to "AngularJ".

Now when I enter a processed text as query into the search bar, I get "."
as the "*most interesting term*" boosted by the highest order usually. I
can't figure out how to remove this from interesting terms without removing
it from the field I am searching in.

2) I have 2 shards per collections on two nodes 8983 and 7574 in cloud
mode. I am getting different results for same query.

I have come to know through reading forums and documentation that this is
happening due to sharding and due to calculation of stats on individual
sharding rather than on entire collection. So I implemented one of the
solutions mentioned in forum/documentations in solrconfig.xml as follows,



It still doesn't works and gives different results for same query. Please
let me know what can be done to avoid these issues.

Regards,
Salmaan


Re:Solr query fetching different results

2019-09-19 Thread Ramsey Haddad (BLOOMBERG/ LONDON)
Your query seems simple enough that this may not be your issue, but just 
mentioning it:

Your collection has 1 shard. Depending on how the query is sent, queries to 1 
shard collections can sometimes get interpreted as a "distributed query" and 
sometimes as a "non-distributed query". These have different code paths that 
should *in theory* give identical results. When we made some code extensions to 
Solr in our private plugins, we decided not to support both code paths and so 
instead we use shortCircuit=false (we sent this in the config of our 
) to force use of the distributed query code path. (We want our 
change to work for both our 60 shard collection and our 1 shard collection.) 
This gives us more consistent results from different ways of invoking the 
search.

But, again, your query seems too simple for this to be the cause -- why would 
the distributed vs non-distributed return different results for this??

From: solr-user@lucene.apache.org At: 09/19/19 06:20:30To:  
solr-user@lucene.apache.org
Subject: Solr query fetching different results

Hi all,

There is something "strange' happening in our Solr cluster. If I execute a
query from the server, via solarium client, I get one result. If I execute
the same or similar query from admin Panel, I get another result. If I go
to Admin Panel  - Collections - Select Collection and click "Reload", and
then repeat the query, the result I get is consistent with  the one I get
from the server via solarium client. So I picked the query that is getting
executed, from Solr logs. Evidently, the query was going to different nodes.

Query that went from Admin Panel, went to node 4 and fetched 0 documents
2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
[c:paymetryproducts s:shard1 r:*core_node4*
x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
status=0 QTime=0


Query that went from solarium client running on a server, went to node 3
and fetched 4 documents

2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
[c:paymetryproducts s:shard1 r:*core_node3*
x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&json.nl=flat&omitHeader=true&fl=I
D&start=0&rows=90&wt=json}
*hits=4* status=0 QTime=104

What could be causing this strange behaviour? How can I fix this?
SOlr Version - 7.3
Shard count: 1
replicationFactor: 2
maxShardsPerNode: 1

Regards,
Jayadevan




Re: Solr query fetching different results

2019-09-19 Thread Erick Erickson
Multiple replicas of the same shard will execute their autocommits at
different wall clock times.
Thus there may be a _temporary_ time when newly-indexed document is
found by a query that
happens to get served by replica1 but not by replica2. If you have a
timestamp in the doc, and
a soft commit interval of, say, 1 minute, you can test whether this is
the case by adding
&fq=timestamp:[* TO NOW-2MINUTE]. In that case you should see identical returns.

Best,
Erick

On Thu, Sep 19, 2019 at 1:20 AM Jayadevan Maymala
 wrote:
>
> Hi all,
>
> There is something "strange' happening in our Solr cluster. If I execute a
> query from the server, via solarium client, I get one result. If I execute
> the same or similar query from admin Panel, I get another result. If I go
> to Admin Panel  - Collections - Select Collection and click "Reload", and
> then repeat the query, the result I get is consistent with  the one I get
> from the server via solarium client. So I picked the query that is getting
> executed, from Solr logs. Evidently, the query was going to different nodes.
>
> Query that went from Admin Panel, went to node 4 and fetched 0 documents
> 2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
> [c:paymetryproducts s:shard1 r:*core_node4*
> x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
> [paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
> params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
> status=0 QTime=0
>
>
> Query that went from solarium client running on a server, went to node 3
> and fetched 4 documents
>
> 2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
> [c:paymetryproducts s:shard1 r:*core_node3*
> x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
> [paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
> params={q=category_id:5a0aeaeea6bc7239cc21ee39&json.nl=flat&omitHeader=true&fl=ID&start=0&rows=90&wt=json}
> *hits=4* status=0 QTime=104
>
> What could be causing this strange behaviour? How can I fix this?
> SOlr Version - 7.3
> Shard count: 1
> replicationFactor: 2
> maxShardsPerNode: 1
>
> Regards,
> Jayadevan


Solr query fetching different results

2019-09-18 Thread Jayadevan Maymala
Hi all,

There is something "strange' happening in our Solr cluster. If I execute a
query from the server, via solarium client, I get one result. If I execute
the same or similar query from admin Panel, I get another result. If I go
to Admin Panel  - Collections - Select Collection and click "Reload", and
then repeat the query, the result I get is consistent with  the one I get
from the server via solarium client. So I picked the query that is getting
executed, from Solr logs. Evidently, the query was going to different nodes.

Query that went from Admin Panel, went to node 4 and fetched 0 documents
2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
[c:paymetryproducts s:shard1 r:*core_node4*
x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
status=0 QTime=0


Query that went from solarium client running on a server, went to node 3
and fetched 4 documents

2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
[c:paymetryproducts s:shard1 r:*core_node3*
x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&json.nl=flat&omitHeader=true&fl=ID&start=0&rows=90&wt=json}
*hits=4* status=0 QTime=104

What could be causing this strange behaviour? How can I fix this?
SOlr Version - 7.3
Shard count: 1
replicationFactor: 2
maxShardsPerNode: 1

Regards,
Jayadevan


Re: Consecutive calls to a query give different results

2017-09-08 Thread Erick Erickson
Here's Mike McCandless' blog on the topic:

https://www.elastic.co/blog/lucenes-handling-of-deleted-documents

The same options he mentions are available in Solr as both use Lucene
under the covers.

The long and short of it is that you can have a significant amount of
deleted documents in your index, depending on the update pattern.

One thing Mike doesn't mention is at the root of why I'm so negative
about optimize (and forceMerge is just an optimize that only mashes
segments together if they have > X% deleted docs). Let's say your max
segment size is 5G. And you optimize an index down to a single 100G
segment. That segment will _not_ be merged until it has < 2.5G live
docs. That's not a typo. 97.5% deleted docs..

You could ameliorate this somewhat by specifying the number of
segments after optimizing (default is 1). Say you determine that you
have 100G of live data, specify 20 segments for optimize. This would
be better I'd guess, but haven't tested personally.

Best,
Erick

On Fri, Sep 8, 2017 at 10:36 AM, Webster Homer  wrote:
> Thank you, Erick Erickson and Shawn Heisey for your excellent answers.
> For some of our collections, it would seem that an occasional optimize
> would be a good thing. However we have some collections that are updated
> constantly
>
> Would using the commit expungeDeletes help mitigate the issue?
>
> I also came across a discussion of Lucene merge policies. and the
> TieredMergePolicy.
> Is there documentation about this? I notice that a couple of our replicas
> in some of our collections have ~30% deleted documents which I would think
> would contribute to the problem.
> I have at least 3 collections that are updated constantly, and would not
> lend themselves to being optimized what is the best approach for these?
>
> Thanks
>
> On Fri, Sep 8, 2017 at 9:47 AM, Shawn Heisey  wrote:
>
>> On 9/7/2017 8:54 AM, Webster Homer wrote:
>> > I am not concerned about deleted documents. I am concerned that the same
>> > search gives different results after each search. The top document seems
>> to
>> > cycle between 3 different documents
>> >
>> > I have an enhanced collections info api call that calls the core admin
>> api
>> > to get the index information for the replica.
>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>> > deleted documents are not the same for the replicas, but the number of
>> > numdocs is.
>> >
>> > Or are you saying that the search is looking at deleted documents
>> wouldn't
>> > that be a very significant bug?
>>
>> Lucene score calculations take a lot of information in the index into
>> account when calculating the score.  That includes deleted documents,
>> because they are part of the index.  When you delete a document, Lucene
>> just makes a note saying "internal document ID number  is deleted."
>> The actual information for that document is not removed from the index,
>> because doing so could take a very long time.
>>
>> When you make queries against a replicated SolrCloud, the queries are
>> load balanced across the entire cloud, so different queries will hit
>> different replicas.  With different numbers of deleted documents in
>> different replicas (which is not unusual), the scores are going to come
>> out a little bit different on each query.  If you're sorting by score
>> (which is the default sort), that *can* affect the order.  Your replicas
>> have a fairly high percentage of deleted documents, so there is a lot of
>> extra information affecting the scores.  The relative difference in the
>> deleted document count between the replicas is high as well, so multiple
>> queries could be substantially different.
>>
>> It is not a bug that Lucene and Solr look at deleted documents.
>> Removing deleted document information from things like the score
>> calculation would be VERY computationally intense, bordering on the
>> impossible.  To assure good performance, Lucene doesn't even try.
>> Because the way Lucene tracks deleted documents is with a list of
>> internal Lucene document IDs, those documents are easily removed from
>> *results*, but their contents are an integral part of the index and that
>> information can only be truly removed by completely rewriting (merging)
>> the index segments.
>>
>> You can get rid of all deleted documents with an optimize operation,
>> which is a forced merge of the entire index down to one segment -- but
>> just like it sounds, that is a complete rewrite of the index.  It
>> involves a huge amount of CPU resources and disk I

Re: Consecutive calls to a query give different results

2017-09-08 Thread Webster Homer
Thank you, Erick Erickson and Shawn Heisey for your excellent answers.
For some of our collections, it would seem that an occasional optimize
would be a good thing. However we have some collections that are updated
constantly

Would using the commit expungeDeletes help mitigate the issue?

I also came across a discussion of Lucene merge policies. and the
TieredMergePolicy.
Is there documentation about this? I notice that a couple of our replicas
in some of our collections have ~30% deleted documents which I would think
would contribute to the problem.
I have at least 3 collections that are updated constantly, and would not
lend themselves to being optimized what is the best approach for these?

Thanks

On Fri, Sep 8, 2017 at 9:47 AM, Shawn Heisey  wrote:

> On 9/7/2017 8:54 AM, Webster Homer wrote:
> > I am not concerned about deleted documents. I am concerned that the same
> > search gives different results after each search. The top document seems
> to
> > cycle between 3 different documents
> >
> > I have an enhanced collections info api call that calls the core admin
> api
> > to get the index information for the replica.
> > When I said the numdocs were the same I meant exactly that. maxdocs and
> > deleted documents are not the same for the replicas, but the number of
> > numdocs is.
> >
> > Or are you saying that the search is looking at deleted documents
> wouldn't
> > that be a very significant bug?
>
> Lucene score calculations take a lot of information in the index into
> account when calculating the score.  That includes deleted documents,
> because they are part of the index.  When you delete a document, Lucene
> just makes a note saying "internal document ID number  is deleted."
> The actual information for that document is not removed from the index,
> because doing so could take a very long time.
>
> When you make queries against a replicated SolrCloud, the queries are
> load balanced across the entire cloud, so different queries will hit
> different replicas.  With different numbers of deleted documents in
> different replicas (which is not unusual), the scores are going to come
> out a little bit different on each query.  If you're sorting by score
> (which is the default sort), that *can* affect the order.  Your replicas
> have a fairly high percentage of deleted documents, so there is a lot of
> extra information affecting the scores.  The relative difference in the
> deleted document count between the replicas is high as well, so multiple
> queries could be substantially different.
>
> It is not a bug that Lucene and Solr look at deleted documents.
> Removing deleted document information from things like the score
> calculation would be VERY computationally intense, bordering on the
> impossible.  To assure good performance, Lucene doesn't even try.
> Because the way Lucene tracks deleted documents is with a list of
> internal Lucene document IDs, those documents are easily removed from
> *results*, but their contents are an integral part of the index and that
> information can only be truly removed by completely rewriting (merging)
> the index segments.
>
> You can get rid of all deleted documents with an optimize operation,
> which is a forced merge of the entire index down to one segment -- but
> just like it sounds, that is a complete rewrite of the index.  It
> involves a huge amount of CPU resources and disk I/O, and can severely
> impact normal indexing and query operations while it's happening.  If
> the collection is extremely large, an optimize could take hours.  For
> indexes that change rapidly, optimize is strongly discouraged, except as
> an occasional "clean things up" operation, run during non-peak times.
>
> Thanks,
> Shawn
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Consecutive calls to a query give different results

2017-09-08 Thread Shawn Heisey
On 9/7/2017 8:54 AM, Webster Homer wrote:
> I am not concerned about deleted documents. I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents
>
> I have an enhanced collections info api call that calls the core admin api
> to get the index information for the replica.
> When I said the numdocs were the same I meant exactly that. maxdocs and
> deleted documents are not the same for the replicas, but the number of
> numdocs is.
>
> Or are you saying that the search is looking at deleted documents wouldn't
> that be a very significant bug?

Lucene score calculations take a lot of information in the index into
account when calculating the score.  That includes deleted documents,
because they are part of the index.  When you delete a document, Lucene
just makes a note saying "internal document ID number  is deleted." 
The actual information for that document is not removed from the index,
because doing so could take a very long time.

When you make queries against a replicated SolrCloud, the queries are
load balanced across the entire cloud, so different queries will hit
different replicas.  With different numbers of deleted documents in
different replicas (which is not unusual), the scores are going to come
out a little bit different on each query.  If you're sorting by score
(which is the default sort), that *can* affect the order.  Your replicas
have a fairly high percentage of deleted documents, so there is a lot of
extra information affecting the scores.  The relative difference in the
deleted document count between the replicas is high as well, so multiple
queries could be substantially different.

It is not a bug that Lucene and Solr look at deleted documents. 
Removing deleted document information from things like the score
calculation would be VERY computationally intense, bordering on the
impossible.  To assure good performance, Lucene doesn't even try. 
Because the way Lucene tracks deleted documents is with a list of
internal Lucene document IDs, those documents are easily removed from
*results*, but their contents are an integral part of the index and that
information can only be truly removed by completely rewriting (merging)
the index segments.

You can get rid of all deleted documents with an optimize operation,
which is a forced merge of the entire index down to one segment -- but
just like it sounds, that is a complete rewrite of the index.  It
involves a huge amount of CPU resources and disk I/O, and can severely
impact normal indexing and query operations while it's happening.  If
the collection is extremely large, an optimize could take hours.  For
indexes that change rapidly, optimize is strongly discouraged, except as
an occasional "clean things up" operation, run during non-peak times.

Thanks,
Shawn



Re: Consecutive calls to a query give different results

2017-09-08 Thread Webster Homer
We have several cloud collections, but this one is updated once a day with
a partial load, and once a week with a full load, followed by a delete
which is based upon an index_date field (timestamp of the solr record).

For this and related collections optimizing once per day is probably
acceptable.

We do have other collections that are updated every 15 minutes, I don't
think those would be able to be optimized from what you write.



On Thu, Sep 7, 2017 at 5:10 PM, Erick Erickson 
wrote:

> bq: So apparently it IS essential to run optimize after a data load
>
> Don't do this if you can avoid it, you run the risk of excessive
> amounts of your index consisting of deleted documents unless you are
> following a process whereby you periodically (and I'm talking at least
> hours, if not once per day) index data then don't change the index for
> a bunch more hours.
>
> You're missing the point when it comes to deleted docs. Different
> replicas of the _same_ shard commit at different wall clock times due
> to network delays. Therefore, which segments are merged will not be
> identical between replicas when a commit happens, since commits are
> local.
>
> So replica1 may merge segments 1, 3, 6 in to segment 7
> replica2 may merge segments 1, 2, 4 into segment 7
>
> Here's the key: Now replica1 may have 100 deleted documents (ones
> marked as deleted but still in segments 2, 4 and 5
>  replica2 may have 90 deleted
> documents (the ones still in segments 3, 5 and 6)
>
> The statistics in the term frequency and document frequency for some
> terms are _not_ the same. Therefore the scoring will be slightly
> different. Therefore, depending on which replica serves the query, the
> order of docs may be somewhat different if the scores are close.
>
> optimizing squeezes all the deleted documents out of all the replicas
> so the scores become identical.
>
> This doesn't happen, of course, if you have only one replica.
>
> Best,
> Erick
>
> On Thu, Sep 7, 2017 at 8:13 AM, Webster Homer 
> wrote:
> > We have several solr clouds, a couple of them have only 1 replica per
> > shard. We have never observed the problem when we have a single replica
> > only when there are multiple replicas per shard.
> >
> > On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer 
> > wrote:
> >
> >> the scores are not the same
> >> Doc
> >> 305340 432.44238
> >> C2646 428.24185
> >> 12837 430.61722
> >>
> >> One other thing. I just ran optimize and now document 305340 is
> >> consistently the top score.
> >> So apparently it IS essential to run optimize after a data load
> >>
> >> Note we see this behavior fairly commonly on our solr cloud instances.
> >> This was not the first time. This particular situation was on a
> development
> >> system
> >>
> >> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer 
> >> wrote:
> >>
> >>> the scores are not the same
> >>> Doc
> >>> 305340 432.44238
> >>>
> >>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
> >>> hastings.recurs...@gmail.com> wrote:
> >>>
> >>>> "I am concerned that the same
> >>>> search gives different results after each search. The top document
> seems
> >>>> to
> >>>> cycle between 3 different documents"
> >>>>
> >>>>
> >>>> if you do debug query on the search, are the scores for the top 3
> >>>> documents
> >>>> the same or not?  you can easily have three documents with the same
> >>>> score,
> >>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
> >>>> expect
> >>>> 1-1-1 to rotate based on whatever.  use a second element like id to
> your
> >>>> ranking perhaps.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <
> webster.ho...@sial.com>
> >>>> wrote:
> >>>>
> >>>> > I am not concerned about deleted documents. I am concerned that the
> >>>> same
> >>>> > search gives different results after each search. The top document
> >>>> seems to
> >>>> > cycle between 3 different documents
> >>>> >
> >>>> > I have an enhanced collections info api call that calls the core
> admin
> >>>> api
> >>>

Re: Consecutive calls to a query give different results

2017-09-07 Thread Erick Erickson
bq: So apparently it IS essential to run optimize after a data load

Don't do this if you can avoid it, you run the risk of excessive
amounts of your index consisting of deleted documents unless you are
following a process whereby you periodically (and I'm talking at least
hours, if not once per day) index data then don't change the index for
a bunch more hours.

You're missing the point when it comes to deleted docs. Different
replicas of the _same_ shard commit at different wall clock times due
to network delays. Therefore, which segments are merged will not be
identical between replicas when a commit happens, since commits are
local.

So replica1 may merge segments 1, 3, 6 in to segment 7
replica2 may merge segments 1, 2, 4 into segment 7

Here's the key: Now replica1 may have 100 deleted documents (ones
marked as deleted but still in segments 2, 4 and 5
 replica2 may have 90 deleted
documents (the ones still in segments 3, 5 and 6)

The statistics in the term frequency and document frequency for some
terms are _not_ the same. Therefore the scoring will be slightly
different. Therefore, depending on which replica serves the query, the
order of docs may be somewhat different if the scores are close.

optimizing squeezes all the deleted documents out of all the replicas
so the scores become identical.

This doesn't happen, of course, if you have only one replica.

Best,
Erick

On Thu, Sep 7, 2017 at 8:13 AM, Webster Homer  wrote:
> We have several solr clouds, a couple of them have only 1 replica per
> shard. We have never observed the problem when we have a single replica
> only when there are multiple replicas per shard.
>
> On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer 
> wrote:
>
>> the scores are not the same
>> Doc
>> 305340 432.44238
>> C2646 428.24185
>> 12837 430.61722
>>
>> One other thing. I just ran optimize and now document 305340 is
>> consistently the top score.
>> So apparently it IS essential to run optimize after a data load
>>
>> Note we see this behavior fairly commonly on our solr cloud instances.
>> This was not the first time. This particular situation was on a development
>> system
>>
>> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer 
>> wrote:
>>
>>> the scores are not the same
>>> Doc
>>> 305340 432.44238
>>>
>>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
>>> hastings.recurs...@gmail.com> wrote:
>>>
>>>> "I am concerned that the same
>>>> search gives different results after each search. The top document seems
>>>> to
>>>> cycle between 3 different documents"
>>>>
>>>>
>>>> if you do debug query on the search, are the scores for the top 3
>>>> documents
>>>> the same or not?  you can easily have three documents with the same
>>>> score,
>>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
>>>> expect
>>>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>>>> ranking perhaps.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer 
>>>> wrote:
>>>>
>>>> > I am not concerned about deleted documents. I am concerned that the
>>>> same
>>>> > search gives different results after each search. The top document
>>>> seems to
>>>> > cycle between 3 different documents
>>>> >
>>>> > I have an enhanced collections info api call that calls the core admin
>>>> api
>>>> > to get the index information for the replica.
>>>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>>>> > deleted documents are not the same for the replicas, but the number of
>>>> > numdocs is.
>>>> >
>>>> > Or are you saying that the search is looking at deleted documents
>>>> wouldn't
>>>> > that be a very significant bug?
>>>> >
>>>> > The four replicas:
>>>> > shard1
>>>> > core_node1
>>>> > "numDocs": 383817,
>>>> > "maxDocs": 611592,
>>>> > "deletedDocs": 227775,
>>>> > "size": "2.49 GB",
>>>> > "lastModified": "2017-09-07T08:18:03.639Z",
>>>> > "current": true,
>>>> > "version": 35644,
>>>> > "segmentCount&quo

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
We have several solr clouds, a couple of them have only 1 replica per
shard. We have never observed the problem when we have a single replica
only when there are multiple replicas per shard.

On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer 
wrote:

> the scores are not the same
> Doc
> 305340 432.44238
> C2646 428.24185
> 12837 430.61722
>
> One other thing. I just ran optimize and now document 305340 is
> consistently the top score.
> So apparently it IS essential to run optimize after a data load
>
> Note we see this behavior fairly commonly on our solr cloud instances.
> This was not the first time. This particular situation was on a development
> system
>
> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer 
> wrote:
>
>> the scores are not the same
>> Doc
>> 305340 432.44238
>>
>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>
>>> "I am concerned that the same
>>> search gives different results after each search. The top document seems
>>> to
>>> cycle between 3 different documents"
>>>
>>>
>>> if you do debug query on the search, are the scores for the top 3
>>> documents
>>> the same or not?  you can easily have three documents with the same
>>> score,
>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
>>> expect
>>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>>> ranking perhaps.
>>>
>>>
>>>
>>>
>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer 
>>> wrote:
>>>
>>> > I am not concerned about deleted documents. I am concerned that the
>>> same
>>> > search gives different results after each search. The top document
>>> seems to
>>> > cycle between 3 different documents
>>> >
>>> > I have an enhanced collections info api call that calls the core admin
>>> api
>>> > to get the index information for the replica.
>>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>>> > deleted documents are not the same for the replicas, but the number of
>>> > numdocs is.
>>> >
>>> > Or are you saying that the search is looking at deleted documents
>>> wouldn't
>>> > that be a very significant bug?
>>> >
>>> > The four replicas:
>>> > shard1
>>> > core_node1
>>> > "numDocs": 383817,
>>> > "maxDocs": 611592,
>>> > "deletedDocs": 227775,
>>> > "size": "2.49 GB",
>>> > "lastModified": "2017-09-07T08:18:03.639Z",
>>> > "current": true,
>>> > "version": 35644,
>>> > "segmentCount": 28
>>> >
>>> > core_node3
>>> > "numDocs": 383817,
>>> > "maxDocs": 571737,
>>> > "deletedDocs": 187920,
>>> > "size": "2.85 GB",
>>> > "lastModified": "2017-09-07T08:18:03.634Z",
>>> > "current": false,
>>> > "version": 35562,
>>> > "segmentCount": 36
>>> > shard2
>>> > core_node2
>>> > "numDocs": 385326,
>>> > "maxDocs": 529214,
>>> > "deletedDocs": 143888,
>>> > "size": "2.13 GB",
>>> > "lastModified": "2017-09-07T08:18:03.632Z",
>>> > "current": true,
>>> > "version": 34783,
>>> > "segmentCount": 24
>>> > core_node4
>>> > "numDocs": 385326,
>>> > "maxDocs": 488201,
>>> > "deletedDocs": 102875,
>>> > "size": "1.96 GB",
>>> > "lastModified": "2017-09-07T08:18:03.633Z",
>>> > "current": true,
>>> > "version": 34932,
>>> > "segmentCount": 21
>>> >
>>> >
>>> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley 
>>> wrote:
>>> >
>>> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
>>> erickerick...@gmail.com
>>> > >
>>> > > wrote:
>>> > > > bq: and deleted documents are irrelevant to term statistics...
>>> > > >
>>> &

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
the scores are not the same
Doc
305340 432.44238
C2646 428.24185
12837 430.61722

One other thing. I just ran optimize and now document 305340 is
consistently the top score.
So apparently it IS essential to run optimize after a data load

Note we see this behavior fairly commonly on our solr cloud instances. This
was not the first time. This particular situation was on a development
system

On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer 
wrote:

> the scores are not the same
> Doc
> 305340 432.44238
>
> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
>> "I am concerned that the same
>> search gives different results after each search. The top document seems
>> to
>> cycle between 3 different documents"
>>
>>
>> if you do debug query on the search, are the scores for the top 3
>> documents
>> the same or not?  you can easily have three documents with the same score,
>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
>> expect
>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>> ranking perhaps.
>>
>>
>>
>>
>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer 
>> wrote:
>>
>> > I am not concerned about deleted documents. I am concerned that the same
>> > search gives different results after each search. The top document
>> seems to
>> > cycle between 3 different documents
>> >
>> > I have an enhanced collections info api call that calls the core admin
>> api
>> > to get the index information for the replica.
>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>> > deleted documents are not the same for the replicas, but the number of
>> > numdocs is.
>> >
>> > Or are you saying that the search is looking at deleted documents
>> wouldn't
>> > that be a very significant bug?
>> >
>> > The four replicas:
>> > shard1
>> > core_node1
>> > "numDocs": 383817,
>> > "maxDocs": 611592,
>> > "deletedDocs": 227775,
>> > "size": "2.49 GB",
>> > "lastModified": "2017-09-07T08:18:03.639Z",
>> > "current": true,
>> > "version": 35644,
>> > "segmentCount": 28
>> >
>> > core_node3
>> > "numDocs": 383817,
>> > "maxDocs": 571737,
>> > "deletedDocs": 187920,
>> > "size": "2.85 GB",
>> > "lastModified": "2017-09-07T08:18:03.634Z",
>> > "current": false,
>> > "version": 35562,
>> > "segmentCount": 36
>> > shard2
>> > core_node2
>> > "numDocs": 385326,
>> > "maxDocs": 529214,
>> > "deletedDocs": 143888,
>> > "size": "2.13 GB",
>> > "lastModified": "2017-09-07T08:18:03.632Z",
>> > "current": true,
>> > "version": 34783,
>> > "segmentCount": 24
>> > core_node4
>> > "numDocs": 385326,
>> > "maxDocs": 488201,
>> > "deletedDocs": 102875,
>> > "size": "1.96 GB",
>> > "lastModified": "2017-09-07T08:18:03.633Z",
>> > "current": true,
>> > "version": 34932,
>> > "segmentCount": 21
>> >
>> >
>> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley  wrote:
>> >
>> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
>> erickerick...@gmail.com
>> > >
>> > > wrote:
>> > > > bq: and deleted documents are irrelevant to term statistics...
>> > > >
>> > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>> > >
>> > > One can make it work either way ;-)
>> > > Whether a document is marked as deleted or not has no effect on term
>> > > statistics (i.e. irrelevant)
>> > > OR documents marked for deletion still count in term statistics (i.e.
>> > > relevant)
>> > >
>> > > I guess I used the former because we don't go out of our way to still
>> > > include deleted documents... it's just a side effect of the index
>> > > structure that we don't (and can't easily) update statistics when a
>> > > docume

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
the scores are not the same
Doc
305340 432.44238

On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> "I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents"
>
>
> if you do debug query on the search, are the scores for the top 3 documents
> the same or not?  you can easily have three documents with the same score,
> so when you have a result set that is ranked 1-1-1-2-3-4 you can expect
> 1-1-1 to rotate based on whatever.  use a second element like id to your
> ranking perhaps.
>
>
>
>
> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer 
> wrote:
>
> > I am not concerned about deleted documents. I am concerned that the same
> > search gives different results after each search. The top document seems
> to
> > cycle between 3 different documents
> >
> > I have an enhanced collections info api call that calls the core admin
> api
> > to get the index information for the replica.
> > When I said the numdocs were the same I meant exactly that. maxdocs and
> > deleted documents are not the same for the replicas, but the number of
> > numdocs is.
> >
> > Or are you saying that the search is looking at deleted documents
> wouldn't
> > that be a very significant bug?
> >
> > The four replicas:
> > shard1
> > core_node1
> > "numDocs": 383817,
> > "maxDocs": 611592,
> > "deletedDocs": 227775,
> > "size": "2.49 GB",
> > "lastModified": "2017-09-07T08:18:03.639Z",
> > "current": true,
> > "version": 35644,
> > "segmentCount": 28
> >
> > core_node3
> > "numDocs": 383817,
> > "maxDocs": 571737,
> > "deletedDocs": 187920,
> > "size": "2.85 GB",
> > "lastModified": "2017-09-07T08:18:03.634Z",
> > "current": false,
> > "version": 35562,
> > "segmentCount": 36
> > shard2
> > core_node2
> > "numDocs": 385326,
> > "maxDocs": 529214,
> > "deletedDocs": 143888,
> > "size": "2.13 GB",
> > "lastModified": "2017-09-07T08:18:03.632Z",
> > "current": true,
> > "version": 34783,
> > "segmentCount": 24
> > core_node4
> > "numDocs": 385326,
> > "maxDocs": 488201,
> > "deletedDocs": 102875,
> > "size": "1.96 GB",
> > "lastModified": "2017-09-07T08:18:03.633Z",
> > "current": true,
> > "version": 34932,
> > "segmentCount": 21
> >
> >
> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley  wrote:
> >
> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > > > bq: and deleted documents are irrelevant to term statistics...
> > > >
> > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
> > >
> > > One can make it work either way ;-)
> > > Whether a document is marked as deleted or not has no effect on term
> > > statistics (i.e. irrelevant)
> > > OR documents marked for deletion still count in term statistics (i.e.
> > > relevant)
> > >
> > > I guess I used the former because we don't go out of our way to still
> > > include deleted documents... it's just a side effect of the index
> > > structure that we don't (and can't easily) update statistics when a
> > > document is marked as deleted.
> > >
> > > -Yonik
> > >
> > >
> > > > Erick
> > > >
> > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley 
> > wrote:
> > > >> Different replicas of the same shard can have different numbers of
> > > >> deleted documents (really just marked as deleted), and deleted
> > > >> documents are irrelevant to term statistics (like the number of
> > > >> documents a term appears in).  Documents marked for deletion stop
> > > >> contributing to corpus statistics when they are actually removed
> (via
> > > >> expunge deletes, merges, optimizes).
> > > >> -Yonik
> > > >>
> > > >>
> > > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer <
> 

Re: Consecutive calls to a query give different results

2017-09-07 Thread David Hastings
"I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents"


if you do debug query on the search, are the scores for the top 3 documents
the same or not?  you can easily have three documents with the same score,
so when you have a result set that is ranked 1-1-1-2-3-4 you can expect
1-1-1 to rotate based on whatever.  use a second element like id to your
ranking perhaps.




On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer 
wrote:

> I am not concerned about deleted documents. I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents
>
> I have an enhanced collections info api call that calls the core admin api
> to get the index information for the replica.
> When I said the numdocs were the same I meant exactly that. maxdocs and
> deleted documents are not the same for the replicas, but the number of
> numdocs is.
>
> Or are you saying that the search is looking at deleted documents wouldn't
> that be a very significant bug?
>
> The four replicas:
> shard1
> core_node1
> "numDocs": 383817,
> "maxDocs": 611592,
> "deletedDocs": 227775,
> "size": "2.49 GB",
> "lastModified": "2017-09-07T08:18:03.639Z",
> "current": true,
> "version": 35644,
> "segmentCount": 28
>
> core_node3
> "numDocs": 383817,
> "maxDocs": 571737,
> "deletedDocs": 187920,
> "size": "2.85 GB",
> "lastModified": "2017-09-07T08:18:03.634Z",
> "current": false,
> "version": 35562,
> "segmentCount": 36
> shard2
> core_node2
> "numDocs": 385326,
> "maxDocs": 529214,
> "deletedDocs": 143888,
> "size": "2.13 GB",
> "lastModified": "2017-09-07T08:18:03.632Z",
> "current": true,
> "version": 34783,
> "segmentCount": 24
> core_node4
> "numDocs": 385326,
> "maxDocs": 488201,
> "deletedDocs": 102875,
> "size": "1.96 GB",
> "lastModified": "2017-09-07T08:18:03.633Z",
> "current": true,
> "version": 34932,
> "segmentCount": 21
>
>
> On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley  wrote:
>
> > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson  >
> > wrote:
> > > bq: and deleted documents are irrelevant to term statistics...
> > >
> > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
> >
> > One can make it work either way ;-)
> > Whether a document is marked as deleted or not has no effect on term
> > statistics (i.e. irrelevant)
> > OR documents marked for deletion still count in term statistics (i.e.
> > relevant)
> >
> > I guess I used the former because we don't go out of our way to still
> > include deleted documents... it's just a side effect of the index
> > structure that we don't (and can't easily) update statistics when a
> > document is marked as deleted.
> >
> > -Yonik
> >
> >
> > > Erick
> > >
> > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley 
> wrote:
> > >> Different replicas of the same shard can have different numbers of
> > >> deleted documents (really just marked as deleted), and deleted
> > >> documents are irrelevant to term statistics (like the number of
> > >> documents a term appears in).  Documents marked for deletion stop
> > >> contributing to corpus statistics when they are actually removed (via
> > >> expunge deletes, merges, optimizes).
> > >> -Yonik
> > >>
> > >>
> > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  >
> > wrote:
> > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
> > >>> replicas (total of 4 nodes).
> > >>>
> > >>> If I run the query multiple times I see the three different top
> scoring
> > >>> results.
> > >>> No data load is running, all data has been commited
> > >>>
> > >>> I get these three different hits with their scores:
> > >>> copperiinitratehemipentahydrate2325919004194430.61722
> > >>> copperiinitrateoncelite1234598765
> >  432.44238
> > >>> copperiinitratehydrate18756anhydrousbasis13778319 428.

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
I am not concerned about deleted documents. I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents

I have an enhanced collections info api call that calls the core admin api
to get the index information for the replica.
When I said the numdocs were the same I meant exactly that. maxdocs and
deleted documents are not the same for the replicas, but the number of
numdocs is.

Or are you saying that the search is looking at deleted documents wouldn't
that be a very significant bug?

The four replicas:
shard1
core_node1
"numDocs": 383817,
"maxDocs": 611592,
"deletedDocs": 227775,
"size": "2.49 GB",
"lastModified": "2017-09-07T08:18:03.639Z",
"current": true,
"version": 35644,
"segmentCount": 28

core_node3
"numDocs": 383817,
"maxDocs": 571737,
"deletedDocs": 187920,
"size": "2.85 GB",
"lastModified": "2017-09-07T08:18:03.634Z",
"current": false,
"version": 35562,
"segmentCount": 36
shard2
core_node2
"numDocs": 385326,
"maxDocs": 529214,
"deletedDocs": 143888,
"size": "2.13 GB",
"lastModified": "2017-09-07T08:18:03.632Z",
"current": true,
"version": 34783,
"segmentCount": 24
core_node4
"numDocs": 385326,
"maxDocs": 488201,
"deletedDocs": 102875,
"size": "1.96 GB",
"lastModified": "2017-09-07T08:18:03.633Z",
"current": true,
"version": 34932,
"segmentCount": 21


On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley  wrote:

> On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson 
> wrote:
> > bq: and deleted documents are irrelevant to term statistics...
> >
> > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>
> One can make it work either way ;-)
> Whether a document is marked as deleted or not has no effect on term
> statistics (i.e. irrelevant)
> OR documents marked for deletion still count in term statistics (i.e.
> relevant)
>
> I guess I used the former because we don't go out of our way to still
> include deleted documents... it's just a side effect of the index
> structure that we don't (and can't easily) update statistics when a
> document is marked as deleted.
>
> -Yonik
>
>
> > Erick
> >
> > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
> >> Different replicas of the same shard can have different numbers of
> >> deleted documents (really just marked as deleted), and deleted
> >> documents are irrelevant to term statistics (like the number of
> >> documents a term appears in).  Documents marked for deletion stop
> >> contributing to corpus statistics when they are actually removed (via
> >> expunge deletes, merges, optimizes).
> >> -Yonik
> >>
> >>
> >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer 
> wrote:
> >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
> >>> replicas (total of 4 nodes).
> >>>
> >>> If I run the query multiple times I see the three different top scoring
> >>> results.
> >>> No data load is running, all data has been commited
> >>>
> >>> I get these three different hits with their scores:
> >>> copperiinitratehemipentahydrate2325919004194430.61722
> >>> copperiinitrateoncelite1234598765
>  432.44238
> >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
> >>>
> >>> How is it that the same search against the same data can give different
> >>> responses?
> >>> I looked at the specific cores they look OK the numdocs for the
> replicas in
> >>> a shard match
> >>>
> >>> This is the query:
> >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-
> catalog-product/select?defType=edismax&fl=searchmv_
> en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%
> 20search_en_p_pri_name,%20search_pno%20[explain%
> 20style=nl]&group.field=id_s&group.limit=30&group=true&
> group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=
> OR&q=copper%20nitrate&qf=search_pid
> >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%
> 20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%
> 20searchmv_p_skus_genr%20searchmv_user_term^200%
> 20search_lform^190%20searchmv_en_acronym^180%20search_en_
> root_name^170

Re: Consecutive calls to a query give different results

2017-09-07 Thread Erick Erickson
Whew! I haven't been lying to people for _years_..

On Thu, Sep 7, 2017 at 5:58 AM, Yonik Seeley  wrote:
> On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson  
> wrote:
>> bq: and deleted documents are irrelevant to term statistics...
>>
>> Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>
> One can make it work either way ;-)
> Whether a document is marked as deleted or not has no effect on term
> statistics (i.e. irrelevant)
> OR documents marked for deletion still count in term statistics (i.e. 
> relevant)
>
> I guess I used the former because we don't go out of our way to still
> include deleted documents... it's just a side effect of the index
> structure that we don't (and can't easily) update statistics when a
> document is marked as deleted.
>
> -Yonik
>
>
>> Erick
>>
>> On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
>>> Different replicas of the same shard can have different numbers of
>>> deleted documents (really just marked as deleted), and deleted
>>> documents are irrelevant to term statistics (like the number of
>>> documents a term appears in).  Documents marked for deletion stop
>>> contributing to corpus statistics when they are actually removed (via
>>> expunge deletes, merges, optimizes).
>>> -Yonik
>>>
>>>
>>> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  
>>> wrote:
 I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
 replicas (total of 4 nodes).

 If I run the query multiple times I see the three different top scoring
 results.
 No data load is running, all data has been commited

 I get these three different hits with their scores:
 copperiinitratehemipentahydrate2325919004194430.61722
 copperiinitrateoncelite1234598765   432.44238
 copperiinitratehydrate18756anhydrousbasis13778319 428.24185

 How is it that the same search against the same data can give different
 responses?
 I looked at the specific cores they look OK the numdocs for the replicas in
 a shard match

 This is the query:
 http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid
 ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json

 --


 This message and any attachment are confidential and may be privileged or
 otherwise protected from disclosure. If you are not the intended recipient,
 you must not copy this message or attachment or disclose the contents to
 any other person. If you have received this transmission in error, please
 notify the sender immediately and delete the message and any attachment
 from your system. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not accept liability for any omissions or errors in this
 message which may arise as a result of E-Mail-transmission or for damages
 resulting from any unauthorized changes of the content of this message and
 any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not guarantee that this message is free of viruses and does
 not accept liability for any damages caused by any virus transmitted
 therewith.

 Click http://www.emdgroup.com/disclaimer to access the German, French,
 Spanish and Portuguese versions of this disclaimer.


Re: Consecutive calls to a query give different results

2017-09-07 Thread Yonik Seeley
On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson  wrote:
> bq: and deleted documents are irrelevant to term statistics...
>
> Did you mean "relevant"? Or do I have to adjust my thinking _again_?

One can make it work either way ;-)
Whether a document is marked as deleted or not has no effect on term
statistics (i.e. irrelevant)
OR documents marked for deletion still count in term statistics (i.e. relevant)

I guess I used the former because we don't go out of our way to still
include deleted documents... it's just a side effect of the index
structure that we don't (and can't easily) update statistics when a
document is marked as deleted.

-Yonik


> Erick
>
> On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
>> Different replicas of the same shard can have different numbers of
>> deleted documents (really just marked as deleted), and deleted
>> documents are irrelevant to term statistics (like the number of
>> documents a term appears in).  Documents marked for deletion stop
>> contributing to corpus statistics when they are actually removed (via
>> expunge deletes, merges, optimizes).
>> -Yonik
>>
>>
>> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  wrote:
>>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
>>> replicas (total of 4 nodes).
>>>
>>> If I run the query multiple times I see the three different top scoring
>>> results.
>>> No data load is running, all data has been commited
>>>
>>> I get these three different hits with their scores:
>>> copperiinitratehemipentahydrate2325919004194430.61722
>>> copperiinitrateoncelite1234598765   432.44238
>>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>>>
>>> How is it that the same search against the same data can give different
>>> responses?
>>> I looked at the specific cores they look OK the numdocs for the replicas in
>>> a shard match
>>>
>>> This is the query:
>>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid
>>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json
>>>
>>> --
>>>
>>>
>>> This message and any attachment are confidential and may be privileged or
>>> otherwise protected from disclosure. If you are not the intended recipient,
>>> you must not copy this message or attachment or disclose the contents to
>>> any other person. If you have received this transmission in error, please
>>> notify the sender immediately and delete the message and any attachment
>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not accept liability for any omissions or errors in this
>>> message which may arise as a result of E-Mail-transmission or for damages
>>> resulting from any unauthorized changes of the content of this message and
>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not guarantee that this message is free of viruses and does
>>> not accept liability for any damages caused by any virus transmitted
>>> therewith.
>>>
>>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>>> Spanish and Portuguese versions of this disclaimer.


Re: Consecutive calls to a query give different results

2017-09-06 Thread Erick Erickson
bq: and deleted documents are irrelevant to term statistics...

Did you mean "relevant"? Or do I have to adjust my thinking _again_?

Erick

On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
> Different replicas of the same shard can have different numbers of
> deleted documents (really just marked as deleted), and deleted
> documents are irrelevant to term statistics (like the number of
> documents a term appears in).  Documents marked for deletion stop
> contributing to corpus statistics when they are actually removed (via
> expunge deletes, merges, optimizes).
> -Yonik
>
>
> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  wrote:
>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
>> replicas (total of 4 nodes).
>>
>> If I run the query multiple times I see the three different top scoring
>> results.
>> No data load is running, all data has been commited
>>
>> I get these three different hits with their scores:
>> copperiinitratehemipentahydrate2325919004194430.61722
>> copperiinitrateoncelite1234598765   432.44238
>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>>
>> How is it that the same search against the same data can give different
>> responses?
>> I looked at the specific cores they look OK the numdocs for the replicas in
>> a shard match
>>
>> This is the query:
>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid
>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json
>>
>> --
>>
>>
>> This message and any attachment are confidential and may be privileged or
>> otherwise protected from disclosure. If you are not the intended recipient,
>> you must not copy this message or attachment or disclose the contents to
>> any other person. If you have received this transmission in error, please
>> notify the sender immediately and delete the message and any attachment
>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not accept liability for any omissions or errors in this
>> message which may arise as a result of E-Mail-transmission or for damages
>> resulting from any unauthorized changes of the content of this message and
>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not guarantee that this message is free of viruses and does
>> not accept liability for any damages caused by any virus transmitted
>> therewith.
>>
>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>> Spanish and Portuguese versions of this disclaimer.


Re: Consecutive calls to a query give different results

2017-09-06 Thread Yonik Seeley
Different replicas of the same shard can have different numbers of
deleted documents (really just marked as deleted), and deleted
documents are irrelevant to term statistics (like the number of
documents a term appears in).  Documents marked for deletion stop
contributing to corpus statistics when they are actually removed (via
expunge deletes, merges, optimizes).
-Yonik


On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  wrote:
> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
> replicas (total of 4 nodes).
>
> If I run the query multiple times I see the three different top scoring
> results.
> No data load is running, all data has been commited
>
> I get these three different hits with their scores:
> copperiinitratehemipentahydrate2325919004194430.61722
> copperiinitrateoncelite1234598765   432.44238
> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>
> How is it that the same search against the same data can give different
> responses?
> I looked at the specific cores they look OK the numdocs for the replicas in
> a shard match
>
> This is the query:
> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid
> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Consecutive calls to a query give different results

2017-09-06 Thread Webster Homer
I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
replicas (total of 4 nodes).

If I run the query multiple times I see the three different top scoring
results.
No data load is running, all data has been commited

I get these three different hits with their scores:
copperiinitratehemipentahydrate2325919004194430.61722
copperiinitrateoncelite1234598765   432.44238
copperiinitratehydrate18756anhydrousbasis13778319 428.24185

How is it that the same search against the same data can give different
responses?
I looked at the specific cores they look OK the numdocs for the replicas in
a shard match

This is the query:
http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid
^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Solr suggester query with quotes produces different results

2017-07-01 Thread Angel Todorov
Hi guys,

I have the Suggester configured using the FreeTextFactory. Noticed that if
I dont use quotation marks, I only get single term results. If i use
quotation marks around my query, then I only get results that are comprised
of multiple terms. There is no configuration that would return both types
of results with a single query.

Thanks
Angel


Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Erick Erickson
ler.component.ShardFieldSortedHitQueue$S
>>> hardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>>> 1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>>> 1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>>> essThan(ShardFieldSortedHitQueue.java:91)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>>> essThan(ShardFieldSortedHitQueue.java:33)\n\tat
>>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
>>> org.apache.solr.handler.component.QueryComponent.mergeIds(
>>> QueryComponent.java:1098)\n\tat org.apache.solr.handler.compon
>>> ent.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
>>> org.apache.solr.handler.component.QueryComponent.handleRespo
>>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>>> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo
>>> lrCall.execute(HttpSolrCall.java:652)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>>> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv
>>> letHandler.doHandle(ServletHandler.java:581)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>>> ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.Serv
>>> letHandler.doScope(ServletHandler.java:511)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>>> ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handl
>>> er.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>>> ndle(ContextHandlerCollection.java:213)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>>> HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handl
>>> er.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>>> succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io
>>> .FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io
>>> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>>> .produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>>> .run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.
>>> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>>> java.lang.Thread.run(Thread.java:745)\n", "code":500}}
>>>
>>> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson 
>>> wrote:
>>>
>>>> Let's back up a bit. You say "This seems to cause two replicas to
>>>> return different hits depending upon which one is queried."
>>>>
>>>> OK, _how_ are they different? I've been assuming different numbers of
>>>> hits. If you're getting the same number of hits but different document
>>>> ordering, that's a completely different issue and may be easily
>>>> explainable. If this is true, skip the rest of this message. I only
>>>> realized we may be using 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
ponent.java:758)\n\tat
>> org.apache.solr.handler.component.QueryComponent.handleRespo
>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo
>> lrCall.execute(HttpSolrCall.java:652)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv
>> letHandler.doHandle(ServletHandler.java:581)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.Serv
>> letHandler.doScope(ServletHandler.java:511)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handl
>> er.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> ndle(ContextHandlerCollection.java:213)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handl
>> er.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io
>> .FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io
>> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.
>> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>> java.lang.Thread.run(Thread.java:745)\n", "code":500}}
>>
>> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson 
>> wrote:
>>
>>> Let's back up a bit. You say "This seems to cause two replicas to
>>> return different hits depending upon which one is queried."
>>>
>>> OK, _how_ are they different? I've been assuming different numbers of
>>> hits. If you're getting the same number of hits but different document
>>> ordering, that's a completely different issue and may be easily
>>> explainable. If this is true, skip the rest of this message. I only
>>> realized we may be using a different definition of "different hits"
>>> part way through writing this reply.
>>>
>>> 
>>>
>>> Having the timestamp as a string isn't a problem, you can do something
>>> very similar with wildcards and the like if it's a string that sorts
>>> the same way the timestamp would. And it's best if it's created
>>> upstream anyway that way it's guaranteed to be the same for the doc on
>>> all replicas.
>>>
>>> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
>>> copyfield to a date field would do the trick.
>>>
>>> But there's no real reason to do any of that. Given that you see this
>>> when there's no indexing going on then there's no point to those
>>> tests, those were just for a way to examine your nodes while there was
>>> active indexing.
>>>
>>> How do you fix this problem when you see it? If it goes away by itself
>>> that would gives at least a start on where to look. If you have to
>>> manually intervene it would be good to

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
rvletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.
> session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.
> ServletHandler.doScope(ServletHandler.java:511)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.
> handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.
> handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)\n\tat org.eclipse.jetty.io.
> FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io.
> SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.
> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:572)\n\tat java.lang.Thread.run(Thread.java:745)\n",
> "code":500}}
>
> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson 
> wrote:
>
>> Let's back up a bit. You say "This seems to cause two replicas to
>> return different hits depending upon which one is queried."
>>
>> OK, _how_ are they different? I've been assuming different numbers of
>> hits. If you're getting the same number of hits but different document
>> ordering, that's a completely different issue and may be easily
>> explainable. If this is true, skip the rest of this message. I only
>> realized we may be using a different definition of "different hits"
>> part way through writing this reply.
>>
>> 
>>
>> Having the timestamp as a string isn't a problem, you can do something
>> very similar with wildcards and the like if it's a string that sorts
>> the same way the timestamp would. And it's best if it's created
>> upstream anyway that way it's guaranteed to be the same for the doc on
>> all replicas.
>>
>> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
>> copyfield to a date field would do the trick.
>>
>> But there's no real reason to do any of that. Given that you see this
>> when there's no indexing going on then there's no point to those
>> tests, those were just for a way to examine your nodes while there was
>> active indexing.
>>
>> How do you fix this problem when you see it? If it goes away by itself
>> that would gives at least a start on where to look. If you have to
>> manually intervene it would be good to know what you do.
>>
>> The CDCR pattern is docs to from the leader on the source cluster to
>> the leader on the target cluster. Once the target leader gets the
>> docs, it's supposed to send the doc to all the replicas.
>>
>> To try to narrow down the issue, next time it occurs can you look at
>> _both_ the source and target clusters and see if they _both_ show the
>> same discrepancy? What I'm looking for is whether both are
>> self-consistent. That is, all the replicas for shardN on the source
>> cluster show the same documents (M). All the replicas for shardN on
>> the target cluster show the same number of docs (N). I'm not as
>> concerned if M != N at this point. Note I'm looking at the number of
>> hits here, not say the document ordering.
>>
>> To do this you'll have to do the trick I mentioned where you query
>> each replica separately.
>>
>> And are you absolutely sure that your differe

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
adPool$3.run(QueuedThreadPool.java:572)\n\tat
java.lang.Thread.run(Thread.java:745)\n", "code":500}}

On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson 
wrote:

> Let's back up a bit. You say "This seems to cause two replicas to
> return different hits depending upon which one is queried."
>
> OK, _how_ are they different? I've been assuming different numbers of
> hits. If you're getting the same number of hits but different document
> ordering, that's a completely different issue and may be easily
> explainable. If this is true, skip the rest of this message. I only
> realized we may be using a different definition of "different hits"
> part way through writing this reply.
>
> 
>
> Having the timestamp as a string isn't a problem, you can do something
> very similar with wildcards and the like if it's a string that sorts
> the same way the timestamp would. And it's best if it's created
> upstream anyway that way it's guaranteed to be the same for the doc on
> all replicas.
>
> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
> copyfield to a date field would do the trick.
>
> But there's no real reason to do any of that. Given that you see this
> when there's no indexing going on then there's no point to those
> tests, those were just for a way to examine your nodes while there was
> active indexing.
>
> How do you fix this problem when you see it? If it goes away by itself
> that would gives at least a start on where to look. If you have to
> manually intervene it would be good to know what you do.
>
> The CDCR pattern is docs to from the leader on the source cluster to
> the leader on the target cluster. Once the target leader gets the
> docs, it's supposed to send the doc to all the replicas.
>
> To try to narrow down the issue, next time it occurs can you look at
> _both_ the source and target clusters and see if they _both_ show the
> same discrepancy? What I'm looking for is whether both are
> self-consistent. That is, all the replicas for shardN on the source
> cluster show the same documents (M). All the replicas for shardN on
> the target cluster show the same number of docs (N). I'm not as
> concerned if M != N at this point. Note I'm looking at the number of
> hits here, not say the document ordering.
>
> To do this you'll have to do the trick I mentioned where you query
> each replica separately.
>
> And are you absolutely sure that your different results are coming
> from the _same_ cluster? If you're comparing a query from the source
> cluster with a query from the target cluster, that's different than if
> the queries come from the same cluster.
>
> Best,
> Erick
>
> On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer 
> wrote:
> > Thanks for the quick feedback.
> >
> > We are not doing continuous indexing, we do a complete load once a week
> and
> > then have a daily partial load for any documents that have changed since
> > the load. These partial loads take only a few minutes every morning.
> >
> > The problem is we see this discrepancy long after the data load
> completes.
> >
> > We have a source collection that uses cdcr to replicate to the target. I
> > see the current=false setting in both the source and target collections.
> > Only the target collection is being heavily searched so that is where my
> > concern is. So what could cause this kind of issue?
> > Do we have a configuration problem?
> >
> > It doesn't happen all the time, so I don't currently have a reproducible
> > test case, yet.
> >
> > I will see about adding the timestamp, we have one, but it was created
> as a
> > string, and was generated by our ETL job
> >
> > On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson  >
> > wrote:
> >
> >> The commit points on different replicas will trip at different wall
> >> clock times so the leader and replica may return slightly different
> >> results depending on whether doc X was included in the commit on one
> >> replica but not on the second. After the _next_ commit interval (2
> >> seconds in your case), doc X will be committed on the second replica:
> >> that is it's not lost.
> >>
> >> Here's a couple of ways to verify:
> >>
> >> 1> turn off indexing and wait a few seconds. The replicas should have
> >> the exact same documents. "A few seconds" is your autocommit (soft in
> >> your case) interval + autowarm time. This last is unknown, but you can
> >&g

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson
Let's back up a bit. You say "This seems to cause two replicas to
return different hits depending upon which one is queried."

OK, _how_ are they different? I've been assuming different numbers of
hits. If you're getting the same number of hits but different document
ordering, that's a completely different issue and may be easily
explainable. If this is true, skip the rest of this message. I only
realized we may be using a different definition of "different hits"
part way through writing this reply.



Having the timestamp as a string isn't a problem, you can do something
very similar with wildcards and the like if it's a string that sorts
the same way the timestamp would. And it's best if it's created
upstream anyway that way it's guaranteed to be the same for the doc on
all replicas.

If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
copyfield to a date field would do the trick.

But there's no real reason to do any of that. Given that you see this
when there's no indexing going on then there's no point to those
tests, those were just for a way to examine your nodes while there was
active indexing.

How do you fix this problem when you see it? If it goes away by itself
that would gives at least a start on where to look. If you have to
manually intervene it would be good to know what you do.

The CDCR pattern is docs to from the leader on the source cluster to
the leader on the target cluster. Once the target leader gets the
docs, it's supposed to send the doc to all the replicas.

To try to narrow down the issue, next time it occurs can you look at
_both_ the source and target clusters and see if they _both_ show the
same discrepancy? What I'm looking for is whether both are
self-consistent. That is, all the replicas for shardN on the source
cluster show the same documents (M). All the replicas for shardN on
the target cluster show the same number of docs (N). I'm not as
concerned if M != N at this point. Note I'm looking at the number of
hits here, not say the document ordering.

To do this you'll have to do the trick I mentioned where you query
each replica separately.

And are you absolutely sure that your different results are coming
from the _same_ cluster? If you're comparing a query from the source
cluster with a query from the target cluster, that's different than if
the queries come from the same cluster.

Best,
Erick

On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer  wrote:
> Thanks for the quick feedback.
>
> We are not doing continuous indexing, we do a complete load once a week and
> then have a daily partial load for any documents that have changed since
> the load. These partial loads take only a few minutes every morning.
>
> The problem is we see this discrepancy long after the data load completes.
>
> We have a source collection that uses cdcr to replicate to the target. I
> see the current=false setting in both the source and target collections.
> Only the target collection is being heavily searched so that is where my
> concern is. So what could cause this kind of issue?
> Do we have a configuration problem?
>
> It doesn't happen all the time, so I don't currently have a reproducible
> test case, yet.
>
> I will see about adding the timestamp, we have one, but it was created as a
> string, and was generated by our ETL job
>
> On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson 
> wrote:
>
>> The commit points on different replicas will trip at different wall
>> clock times so the leader and replica may return slightly different
>> results depending on whether doc X was included in the commit on one
>> replica but not on the second. After the _next_ commit interval (2
>> seconds in your case), doc X will be committed on the second replica:
>> that is it's not lost.
>>
>> Here's a couple of ways to verify:
>>
>> 1> turn off indexing and wait a few seconds. The replicas should have
>> the exact same documents. "A few seconds" is your autocommit (soft in
>> your case) interval + autowarm time. This last is unknown, but you can
>> check your admin/plugins-stats search handler times, it's reported
>> there. Now issue your queries. If the replicas don't report the same
>> docs A Bad Thing that should be worrying. BTW, with a 2 second soft
>> commit interval, which is really aggressive, you _better not_ have
>> very large autowarm intervals!
>>
>> 2> Include a timestamp in your docs when they are indexed. There's an
>> automatic way to do that BTW now do your queries and append an FQ
>> clause like &fq=timestamp:[* TO some_point_in_the_past]. The replicas
>> should have the same counts 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer
Thanks for the quick feedback.

We are not doing continuous indexing, we do a complete load once a week and
then have a daily partial load for any documents that have changed since
the load. These partial loads take only a few minutes every morning.

The problem is we see this discrepancy long after the data load completes.

We have a source collection that uses cdcr to replicate to the target. I
see the current=false setting in both the source and target collections.
Only the target collection is being heavily searched so that is where my
concern is. So what could cause this kind of issue?
Do we have a configuration problem?

It doesn't happen all the time, so I don't currently have a reproducible
test case, yet.

I will see about adding the timestamp, we have one, but it was created as a
string, and was generated by our ETL job

On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson 
wrote:

> The commit points on different replicas will trip at different wall
> clock times so the leader and replica may return slightly different
> results depending on whether doc X was included in the commit on one
> replica but not on the second. After the _next_ commit interval (2
> seconds in your case), doc X will be committed on the second replica:
> that is it's not lost.
>
> Here's a couple of ways to verify:
>
> 1> turn off indexing and wait a few seconds. The replicas should have
> the exact same documents. "A few seconds" is your autocommit (soft in
> your case) interval + autowarm time. This last is unknown, but you can
> check your admin/plugins-stats search handler times, it's reported
> there. Now issue your queries. If the replicas don't report the same
> docs A Bad Thing that should be worrying. BTW, with a 2 second soft
> commit interval, which is really aggressive, you _better not_ have
> very large autowarm intervals!
>
> 2> Include a timestamp in your docs when they are indexed. There's an
> automatic way to do that BTW now do your queries and append an FQ
> clause like &fq=timestamp:[* TO some_point_in_the_past]. The replicas
> should have the same counts unless you are deleting documents. I
> mention deletes on the off chance that you're deleting documents that
> fall in the interval and then the same as above could theoretically
> occur. Updates should be fine.
>
> BTW, I've seen continuous monitoring of this done by automated
> scripts. The key is to get the shard URL and ping that with
> &distrib=false. It'll look something like
> http://host:port/solr/collection_shard1_replica1 People usually
> just use *:* and compare numFound.
>
> Best,
> Erick
>
>
>
> On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer 
> wrote:
> > We are using Solr Cloud 6.2
> >
> > We have been noticing an issue where the index in a core shows as
> current =
> > false
> >
> > We have autocommit set for 15 seconds, and soft commit at 2 seconds
> >
> > This seems to cause two replicas to return different hits depending upon
> > which one is queried.
> >
> > What would lead to the indexes not being "current"? The documentation on
> > the meaning of current is vague.
> >
> > The collections in our cloud have two shards each with two replicas. I
> see
> > this with several of the collections.
> >
> > We don't know how they get like this but it's troubling
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.merckgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson
The commit points on different replicas will trip at different wall
clock times so the leader and replica may return slightly different
results depending on whether doc X was included in the commit on one
replica but not on the second. After the _next_ commit interval (2
seconds in your case), doc X will be committed on the second replica:
that is it's not lost.

Here's a couple of ways to verify:

1> turn off indexing and wait a few seconds. The replicas should have
the exact same documents. "A few seconds" is your autocommit (soft in
your case) interval + autowarm time. This last is unknown, but you can
check your admin/plugins-stats search handler times, it's reported
there. Now issue your queries. If the replicas don't report the same
docs A Bad Thing that should be worrying. BTW, with a 2 second soft
commit interval, which is really aggressive, you _better not_ have
very large autowarm intervals!

2> Include a timestamp in your docs when they are indexed. There's an
automatic way to do that BTW now do your queries and append an FQ
clause like &fq=timestamp:[* TO some_point_in_the_past]. The replicas
should have the same counts unless you are deleting documents. I
mention deletes on the off chance that you're deleting documents that
fall in the interval and then the same as above could theoretically
occur. Updates should be fine.

BTW, I've seen continuous monitoring of this done by automated
scripts. The key is to get the shard URL and ping that with
&distrib=false. It'll look something like
http://host:port/solr/collection_shard1_replica1 People usually
just use *:* and compare numFound.

Best,
Erick



On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer  wrote:
> We are using Solr Cloud 6.2
>
> We have been noticing an issue where the index in a core shows as current =
> false
>
> We have autocommit set for 15 seconds, and soft commit at 2 seconds
>
> This seems to cause two replicas to return different hits depending upon
> which one is queried.
>
> What would lead to the indexes not being "current"? The documentation on
> the meaning of current is vague.
>
> The collections in our cloud have two shards each with two replicas. I see
> this with several of the collections.
>
> We don't know how they get like this but it's troubling
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer
We are using Solr Cloud 6.2

We have been noticing an issue where the index in a core shows as current =
false

We have autocommit set for 15 seconds, and soft commit at 2 seconds

This seems to cause two replicas to return different hits depending upon
which one is queried.

What would lead to the indexes not being "current"? The documentation on
the meaning of current is vague.

The collections in our cloud have two shards each with two replicas. I see
this with several of the collections.

We don't know how they get like this but it's troubling

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Different results for comma and whitespace separated query string using eDisMax Query Parser

2016-10-31 Thread Frank.Zirkelbach
Hi,

different results are obtained for a query separated by comma and one separated 
by whitespace,

   "q":"foo,bar",
   "q":"foo bar",

although solr.StandardTokenizerFactory is utilized. The eDisMax Query Parser is 
used.
Fields of interest are determined by the 'qf' parameter.
   
   "defType":"edismax",
   "qf":"STREET_NAME COMMPART_NAME",

The different results are also reflected within the parsedquery debug output:

Whitespace:
"rawquerystring":"foo bar",
"querystring":"foo bar",
"parsedquery":"(+(DisjunctionMaxQuery((STREET_NAME:foo | 
COMMPART_NAME:foo)) DisjunctionMaxQuery((STREET_NAME:bar | 
COMMPART_NAME:bar/no_coord",
"parsedquery_toString":"+((STREET_NAME:foo | COMMPART_NAME:foo) 
(STREET_NAME:bar | COMMPART_NAME:bar))",
"explain":{},
"QParser":"ExtendedDismaxQParser",

Comma:
"rawquerystring":"foo,bar",
"querystring":"foo,bar",
"parsedquery":"(+DisjunctionMaxQuery(((STREET_NAME:foo STREET_NAME:bar) | 
(COMMPART_NAME:foo COMMPART_NAME:bar/no_coord",
"parsedquery_toString":"+((STREET_NAME:foo STREET_NAME:bar) | 
(COMMPART_NAME:foo COMMPART_NAME:bar))",
"explain":{},
"QParser":"ExtendedDismaxQParser",

The way I understand the standard tokenizer, both query strings should be split 
in the same way,
treating whitespace and punctuation as delimiters.

However, obviously, different separators result in different evaluations.
In the first case, the score values of both DisjunctionMaxQuery evaluations are 
added together.
In the second case, only one (the maximum) of these score values is returned.

Any ideas what I am missing here?

I am using Solr 6.2.0.
Configuration details:
   
   
and
   
 
   
   
   
   
   
 
   


Thanks and all the best,

Frank

-- 

Frank Zirkelbach
LEW Verteilnetz GmbH (LVN), GIS/NIS
Schaezlerstraße 3, 86150 Augsburg

Tel. intern: 71-1379
Tel. extern: +49-821-328-1379
Fax extern: +49-821-328-1360
mailto:frank.zirkelb...@lew-verteilnetz.de
www.lew-verteilnetz.de

Vorsitzender des Aufsichtsrats: Dr. Markus Litpher;
Geschäftsführer: Manfred Lux, Theo Schmidtner, Eugen Wiedemann
Sitz der Gesellschaft: Augsburg; USt-IdNr. DE240432124
Handelsregister HRB 20929, Registergericht: Amtsgericht Augsburg



Re: Solr MLT with stream.body returns different results on each shard

2015-08-11 Thread Chris Hostetter

: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly
: indexed (I can also reproduce this issue on 4.10.0). When I use the Solr
: MorelikeThisHandler with content stream I'm getting different results per
: shard.

I haven't looked at the code recently but i'm 99% certain that the MLT 
handler in general doesn't work with distributed (ie: sharded) queries.  
(unlike the MLT component and the recently added MLT qparser)

I suspect that in the specific case of stream.body, what you are seeing is 
that the interesting terms are being computed relative the local tf/idf 
stats for that shard, and then only local results from that shard are 
being returned.

: I also looked at using a standard MLT query, but I need to be able to
: stream in a fairly large block of text for comparison that is not in the
: index (different type of document). A standard MLT  query

Until/unless the MLT parser supports arbitrary text (there's some mention 
of this in SOLR-7639 but i'm not sure what the status of that is) you 
might find that just POSTing all of your text as a regular query (q) using 
dismax or edismax is suitable for your needs -- that's essentially the 
equivilent of what MLTHandler does with a stream.body, except it tries to 
only focus on "interesting terms" based on tf/idf, but if your fields 
are all configured with stopword files anyway, then the results and 
performance may be similar.


-Hoss
http://www.lucidworks.com/


Solr MLT with stream.body returns different results on each shard

2015-08-11 Thread Aaron Gibbons
I have a fresh install of Solr 5.2.1 with about 3 million docs freshly
indexed (I can also reproduce this issue on 4.10.0). When I use the Solr
MorelikeThisHandler with content stream I'm getting different results per
shard.

I also looked at using a standard MLT query, but I need to be able to
stream in a fairly large block of text for comparison that is not in the
index (different type of document). A standard MLT  query
http://testsolr2:8983/solr/mega/select?q=electronics&mlt.flt=text&mlt.mintf=0&fl=id,score
appears to return consistent results between shards.

Any reason why the content stream query would be different between shards?
Thank you for your help!
Aaron


*Content Stream Example:*
http://testsolr1:8983/solr/mega/mlt?stream.body=electronics&mlt.flt=text&mlt.mintf=0&fl=id,score
*Returns: *


0
3



http://testsolr2:8983/solr/mega/mlt?stream.body=electronics&mlt.flt=text&mlt.mintf=0&fl=id,score

*Returns: *


0
1




Solr Clustering component different results than Carrot workbench

2014-08-18 Thread Yavar Husain
Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing
list however just wanted to post my problem to a wider audience.

I am using Solr 4.7 (on both windows and linux) and saved my
lingo-attributes.xml file from the workbench which I am using in Solr. Note
that for testing I am just having one solr Index and all the queries are
getting fired on that.

Now the clusters that I am getting are good in the workbench (carrot) but
pathetic in Solr. In the logs (jetty) I can see:

Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that
indicates that my attribute file is being loaded.

I am really confused what is accounting for the difference in the two
outputs (workbench vs Solr). Again to reiterate the data sources are same
(just one solr index and same queries with 100 results). This is happening
on both Linux and Windows.

Given below is my search component and request handler configuration:



  lingo

  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  30


  
  clustering/carrot2



  

  
  

  true
  true
  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  clustering/carrot2
  film_id
  
  description
  
  true
  
  
  
  false
  100


  clustering

  


Re: Join and non-Join query give different results

2014-07-19 Thread atawfik
I have figured it out. 

The reason is simply the type of join in Solr. It is an outer join. Since
both filter queries are executed separately, a house that has available
documents with discount > 1 or (sd_year:2014 AND sd_month:11) will be
returned even though my intention was applying bother conditions at the same
time. 

However, in the second case, both conditions are applied at same time to
find available documents, then houses based on the matching available
documents are returned. Since there is no any available document that
satisfies both conditions, then there is no any matching house which gives
zero results.

It really took sometime to figure this out, I hope this will help someone
else.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922p4148131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Join and non-Join query give different results

2014-07-13 Thread atawfik
Hi everyone,

I am trying to link two types of documents in my Solr index. The parent is
named "house" and the child is named "available". So, I want to return a
list of houses that have available documents with some filtering. However,
the following query gives me around 18 documents, which is wrong. It should
return 0 documents.

q=*:*
&fq={!join from=house_id_fk to=house_id}doctype:available AND discount:[1 TO
*] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
&fq={!join from=house_id_fk to=house_id}doctype:available AND sd_year:2014
AND sd_month:11

To debug it, I tried first to check whether there is any available documents
with the given filter queries. So, I tried the following query:
q=*:*
&fq=doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO
NOW/DAY%2B21DAYS]
&fq=doctype:available AND sd_year:2014 AND sd_month:11

The query gives 0 results, which is correct. So as you can see both queries
are the same, the different is using the join query parser. I am a bit
confused, why the first query gives results. My understanding is that this
should not happen because the second query shows that there is no any
available documents that satisfy the given filter queries.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Debug different Results from different Request Handlers

2014-06-18 Thread O. Olson
Thank you Erik (and to steffkes who helped me on the IRC #Solr Chat). Sorry
for the delay in responding, but I got this to work. 

Your suggestion about adding debug=true to the query helped me. Since I 
was
adding this to the Velocity request handler, I could not see the debug
results, but when I added wt=xml i.e. /products?q=hp|lync&
debug=true&wt=xml, I could see the Parsed Query as well as the Parser used
for each handler. 

Thanks also to steffkes who answered my question in the original post 
(on
IRC) i.e. both of my handlers go through
org.apache.solr.servlet.SolrDispatchFilter, particularly it’s the doFilter()
method that I was looking for.

Also as steffkes pointed out, (from my original post), the /products
request handler uses the ExtendedDismaxQParser whereas the second /search or
/select request handler uses the LuceneQParser. It seems that these two
parsers handle the | sign very differently.  For my limited private
installation, I decided to get to the base class of ExtendedDismaxQParser &
LuceneQParser i.e. QParser. There in the constructor, I strip out the | sign
from the qstr parameter. This is probably the dirtiest way to get this to
work, but it works for now. 

Thanks again to you all.
O. O. 

 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4142716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Debug different Results from different Request Handlers

2014-06-16 Thread Erik Hatcher
If you want the two request handlers to have the same behavior, but just the 
velocity stuff be different, than remove everything except echoParams, wt, 
v.template, v.base_dir, v.layout, (and title if your templates are using it, 
the default does).

You can see which query parser is being used by adding debug=true to the 
request (or debugQuery=true, legacy param).

Erik

On Jun 14, 2014, at 1:47 PM, O. Olson  wrote:

> Thank you Erik. I tried /products?q=hp|lync&wt=xml and I show no results i.e.
> numFound="0", so I think there is something wrong. You are correct, that the
> VRW is not the problem but the Query Parser. Could you please let me know
> how to determine the query parser?
> 
> For most part I have not changed these request handlers from the Solr
> examples. The Request Handler that uses Apache Velocity looks like: 
> 
> 
> 
>   explicit
>   velocity
>   browse
>  true
>  VMTemplates
>   layout
>   Solritas
>   edismax
>   
>  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  title^10.0 description^5.0 keywords^5.0 author^2.0
> resourcename^1.0
>   
>   text
>   100%
>   *:*
>   10
>   *,score
>   
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
> title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
>
>name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename
>   3
>   on
>  CategoryID
>   on
>   false   
>   5
>   2
>   5   
>   true
>   true  
>   5
>   3  
> 
> 
>   spellcheck
> 
>  
> 
> And the regular XML handler looks like: 
> 
>  class="org.apache.solr.handler.component.SearchHandler">
>
>  explicit
>
>  
> 
> Does this show which is the Query Parser? I can post more of my
> solrconfig.xml if necessary. 
> 
> I am curious where the Query Parser hands over the parameters to the Solr
> engine that would be common irrespective of Request Handler i.e. I am trying
> to put debugging statements into the common code so that these can dump out
> intermediate results to the log. 
> 
> Thanks again Erik.
> O. O.
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4141859.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Debug different Results from different Request Handlers

2014-06-14 Thread O. Olson
Thank you Erik. I tried /products?q=hp|lync&wt=xml and I show no results i.e.
numFound="0", so I think there is something wrong. You are correct, that the
VRW is not the problem but the Query Parser. Could you please let me know
how to determine the query parser?

For most part I have not changed these request handlers from the Solr
examples. The Request Handler that uses Apache Velocity looks like: 


 
   explicit
   velocity
   browse
   true
   VMTemplates
   layout
   Solritas
   edismax
   
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  title^10.0 description^5.0 keywords^5.0 author^2.0
resourcename^1.0
   
   text
   100%
   *:*
   10
   *,score
   
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0

   text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename
   3
   on
   CategoryID
   on
   false   
   5
   2
   5   
   true
   true  
   5
   3  
 
 
   spellcheck
 
  

And the regular XML handler looks like: 



  explicit

  

Does this show which is the Query Parser? I can post more of my
solrconfig.xml if necessary. 

I am curious where the Query Parser hands over the parameters to the Solr
engine that would be common irrespective of Request Handler i.e. I am trying
to put debugging statements into the common code so that these can dump out
intermediate results to the log. 

Thanks again Erik.
O. O.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4141859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Debug different Results from different Request Handlers

2014-06-14 Thread Erik Hatcher
Try /products?wt=xml and compare.  VRW is just a writer; it doesn't affect the 
results in any way.   Let's see the rest of those handler definitions - 
different query parser is my hunch. Or maybe your velocity template is not 
showing the actual results?

  Erik

> On Jun 13, 2014, at 22:44, "O. Olson"  wrote:
> 
> Hi,
> 
> In my solrcofig.xml I have one Request Handler displaying the results using 
> Apache Velocity: 
> 
>   
> 
> And another with regular XML: 
>  class="org.apache.solr.handler.component.SearchHandler">
> 
> I am seeing different results when I use these two handlers. 
> 
> Search Query: hp|lync  (Or on the URL  q=hp%7Elync)
> 
> I see 0 results when I use the first handler (Velocity), but I see many 
> results (10’s) with the second handler. I am trying to debug why this problem 
> occurs.  I am certain the problem is with the first handler, and I would be 
> grateful if anyone can help me debug this. I do not know Solr well enough, so 
> a few pointers could help. 
> 
> 1. First, I would like to know if class="solr.SearchHandler" and 
> class="org.apache.solr.handler.component.SearchHandler" are the same? If no, 
> what does "solr.SearchHandler" refer to?
> 
> 2. Second, I am working with the source of Solr 4.7 (yes, it is  a bit old, 
> but I don’t think it is fundamentally changed). I have put log.debug() 
> statements in the org.apache.solr.response.VelocityResponseWriter.write() 
> method to verify that my query is not getting mangled with the URL encoding, 
> and it is not. So, since I am getting different results for the same queries, 
> I am curious to see what the core Solr engine is receiving when I run the 
> same query from different handlers. Could someone tell me the class which has 
> the core Solr engine that is used irrespective of which Request Handler makes 
> the request? I am trying to put debug statements into this class to log the 
> value of the query parameter that it receives. The results are different, so 
> I think one or more parameters are different.
> 
> Thank you in advance,
> O. O.
> 


Debug different Results from different Request Handlers

2014-06-13 Thread O. Olson
Hi,

In my solrcofig.xml I have one Request Handler displaying the results using 
Apache Velocity: 

  

And another with regular XML: 


I am seeing different results when I use these two handlers. 

Search Query: hp|lync  (Or on the URL  q=hp%7Elync)

I see 0 results when I use the first handler (Velocity), but I see many results 
(10’s) with the second handler. I am trying to debug why this problem occurs.  
I am certain the problem is with the first handler, and I would be grateful if 
anyone can help me debug this. I do not know Solr well enough, so a few 
pointers could help. 

1. First, I would like to know if class="solr.SearchHandler" and 
class="org.apache.solr.handler.component.SearchHandler" are the same? If no, 
what does "solr.SearchHandler" refer to?

2. Second, I am working with the source of Solr 4.7 (yes, it is  a bit old, but 
I don’t think it is fundamentally changed). I have put log.debug() statements 
in the org.apache.solr.response.VelocityResponseWriter.write() method to verify 
that my query is not getting mangled with the URL encoding, and it is not. So, 
since I am getting different results for the same queries, I am curious to see 
what the core Solr engine is receiving when I run the same query from different 
handlers. Could someone tell me the class which has the core Solr engine that 
is used irrespective of which Request Handler makes the request? I am trying to 
put debug statements into this class to log the value of the query parameter 
that it receives. The results are different, so I think one or more parameters 
are different.

Thank you in advance,
O. O.



Re: Luke and SOLR search giving different results

2012-12-04 Thread Erol Akarsu
Thanks Shawn and Jack,

I changed solrconfig to set defaul query field (qf) to field content. It
works fine now.

Erol Akarsu

On Mon, Dec 3, 2012 at 5:03 PM, Shawn Heisey  wrote:

> On 12/3/2012 1:44 PM, Erol Akarsu wrote:
>
>> I tried  as search query  not "baş" but "features:baş" in field "q" in
>> SOLR
>> GUI. And, I got result!
>>
>> In the one document, I had some fields type of text_eng, text_general and
>> one field features type of text_tr. If I don't specify field name, SOLR
>> use
>> EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
>> in search query string.
>>
>
> Your config is set up to search against a field named "text" by default -
> either by a setting in schema.xml or a "df" parameter in your search
> handler definition in solrconfig.xml.  If you are using (e)dismax, it might
> be qf/pf parameters instead of df.
>
> The field named text is not properly set up for this search.  Your
> attachment at the beginning of this thread indicates that either you do not
> have a text field for this document at all, or that field is not stored.
>  If the text field is a copyField as Jack has mentioned, note that it
> doesn't matter what analysis you are doing on features -- the copy is done
> before analysis, so it is completely separate.
>
> Thanks,
> Shawn
>
>


Re: Luke and SOLR search giving different results

2012-12-03 Thread Shawn Heisey

On 12/3/2012 1:44 PM, Erol Akarsu wrote:

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.


Your config is set up to search against a field named "text" by default 
- either by a setting in schema.xml or a "df" parameter in your search 
handler definition in solrconfig.xml.  If you are using (e)dismax, it 
might be qf/pf parameters instead of df.


The field named text is not properly set up for this search.  Your 
attachment at the beginning of this thread indicates that either you do 
not have a text field for this document at all, or that field is not 
stored.  If the text field is a copyField as Jack has mentioned, note 
that it doesn't matter what analysis you are doing on features -- the 
copy is done before analysis, so it is completely separate.


Thanks,
Shawn



Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky
As I pointed out in my message, your query is indicating that "text" is your 
default search field. So, either choose a different default search field, or 
assure that the "text" field has the desired field type.


If you want to change the default search field, eEither use a "df" request 
parameter or change the "df" default value for the request handler in the 
solrconfig.xml.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I see interesting stuff here now.

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu  wrote:


Jack,

I have these in schema.xml that defines "features" as type of text_tr

But unfortunately, this fails.


 




  
 


 
  
  




 
  





On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky 
wrote:


Ah! See where it says "name="parsedquery_toString">**text:baş"?

Your query is against the "text" field, which probably doesn't have the
Turkish analysis.

There is probably a copyField from "features" to "text". You use the
"text_tr" field type for "features", but probably not for the "text" 
field.



-- Jack Krupansky

-----Original Message- From: Erol Akarsu
Sent: Monday, December 03, 2012 1:06 PM

To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding="UTF-8" to all  elements in server.xml in 
Tomcat

7.

As you see below, when I search  word "baş"  with debug mode I can see
empty response. But  when I search word "baştan", I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search "baştan" but not the root word
"baş". Probably, English Analyzer is being used and could not find the
root
word. For example, in Luke, if I change "Analyser to use for query
parsing"
to EnglishAnalyser, then it can not find word "baş" but it can with
TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason




   
   0
   58
   
   true
   baş
   xml
   
   
   
   
   baş
   baş
   text:baş
   **text:baş
   
   LuceneQParser
   
   38.0
   
   16.0
   
   3.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   
   10.0
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   10.0
   
   
   
   



   
   0
   2
   
   true
   baştan
   xml
   
   
   
   
   htt://111.a.b1
   6H500F0
   tr
   Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
   
   Maxtor Corp.
   maxtor
   
   electronics
   hard drive
   
   
   SATA 3.0Gb/s, NCQ
   8.5ms seek
   16MB cache
   
   Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu!" diyerek
   baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
   ve büyük umutlarla Türkiye'ye getirilen Paris 
Hilton'un

oynatıldığı
   giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki
   ismin oynadığı reklam Arda'nın kabinde papağan gibi
tekrarladığı
   "My darling!" repliği, sonunda Paris'i görünce anlam
veremediğimiz
   uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
izle

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

I see interesting stuff here now.

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu  wrote:

> Jack,
>
> I have these in schema.xml that defines "features" as type of text_tr
>
> But unfortunately, this fails.
>
>
>   multiValued="true"/>
> 
>
>
>  positionIncrementGap="100">
>   
>  
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>   language="Turkish"/>
>   
>   
>
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>   language="Turkish"/>
>   
> 
>
>
>
>
> On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote:
>
>> Ah! See where it says "**text:baş"?
>> Your query is against the "text" field, which probably doesn't have the
>> Turkish analysis.
>>
>> There is probably a copyField from "features" to "text". You use the
>> "text_tr" field type for "features", but probably not for the "text" field.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Erol Akarsu
>> Sent: Monday, December 03, 2012 1:06 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Luke and SOLR search giving different results
>>
>> Jack,
>>
>> I have already set tomcat server fro UTF-Encoding before. I have added
>> URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
>> 7.
>>
>> As you see below, when I search  word "baş"  with debug mode I can see
>> empty response. But  when I search word "baştan", I can get correct
>> response.
>>
>> It seems to me that TurkishAnalyser is not being used in SOLR search
>> because we can make only full word search "baştan" but not the root word
>> "baş". Probably, English Analyzer is being used and could not find the
>> root
>> word. For example, in Luke, if I change "Analyser to use for query
>> parsing"
>> to EnglishAnalyser, then it can not find word "baş" but it can with
>> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.
>>
>> Is this assumption true? I could not find any other reason
>>
>>
>> 
>> 
>>
>>0
>>58
>>
>>true
>>baş
>>xml
>>
>>
>>
>>
>>baş
>>baş
>>text:baş
>>**text:baş
>>
>>LuceneQParser
>>
>>38.0
>>
>>16.0
>>> name="org.apache.solr.handler.**component.QueryComponent">
>>3.0
>>
>>> name="org.apache.solr.handler.**component.FacetComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.HighlightComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.StatsComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.DebugComponent">
>>0.0
>>
>>
>>
>>10.0
>>> name="org.apache.solr.handler.**component.QueryComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.FacetComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>>0.0
>>
>>>

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

I have these in schema.xml that defines "features" as type of text_tr

But unfortunately, this fails.

 



  




  
  




  




On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote:

> Ah! See where it says "**text:baş"?
> Your query is against the "text" field, which probably doesn't have the
> Turkish analysis.
>
> There is probably a copyField from "features" to "text". You use the
> "text_tr" field type for "features", but probably not for the "text" field.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Erol Akarsu
> Sent: Monday, December 03, 2012 1:06 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> I have already set tomcat server fro UTF-Encoding before. I have added
> URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
> 7.
>
> As you see below, when I search  word "baş"  with debug mode I can see
> empty response. But  when I search word "baştan", I can get correct
> response.
>
> It seems to me that TurkishAnalyser is not being used in SOLR search
> because we can make only full word search "baştan" but not the root word
> "baş". Probably, English Analyzer is being used and could not find the root
> word. For example, in Luke, if I change "Analyser to use for query parsing"
> to EnglishAnalyser, then it can not find word "baş" but it can with
> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.
>
> Is this assumption true? I could not find any other reason
>
>
> 
> 
>
>0
>58
>
>true
>baş
>xml
>
>
>
>
>baş
>baş
>text:baş
>**text:baş
>
>LuceneQParser
>
>38.0
>
>16.0
> name="org.apache.solr.handler.**component.QueryComponent">
>3.0
>
> name="org.apache.solr.handler.**component.FacetComponent">
>0.0
>
> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>0.0
>
> name="org.apache.solr.handler.**component.HighlightComponent">
>0.0
>
> name="org.apache.solr.handler.**component.StatsComponent">
>0.0
>
> name="org.apache.solr.handler.**component.DebugComponent">
>0.0
>
>
>
>10.0
> name="org.apache.solr.handler.**component.QueryComponent">
>0.0
>
> name="org.apache.solr.handler.**component.FacetComponent">
>0.0
>
> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>0.0
>
> name="org.apache.solr.handler.**component.HighlightComponent">
>0.0
>
> name="org.apache.solr.handler.**component.StatsComponent">
>0.0
>
> name="org.apache.solr.handler.**component.DebugComponent">
>10.0
>
>
>
>
> 
>
> 
>
>0
>2
>
>true
>baştan
>xml
>
>
>
>
>htt://111.a.b1
>6H500F0
>tr
>Maxtor DiamondMax 11 - hard drive - 500 GB -
> SATA-300
>
>Maxtor Corp.
>maxtor
>
>electronics
>hard drive
>
>
>SATA 3.0Gb/s, NCQ
>8.5ms seek
>16MB cache
>
>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
> senaryoyu!" diyerek
>baştan savma reklamlarla kotarmaya bakıyor işi.
> Futbolcu Arda Turan
>ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
> oynatıldığı
&

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky
Ah! See where it says "text:baş"? 
Your query is against the "text" field, which probably doesn't have the 
Turkish analysis.


There is probably a copyField from "features" to "text". You use the 
"text_tr" field type for "features", but probably not for the "text" field.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
7.

As you see below, when I search  word "baş"  with debug mode I can see
empty response. But  when I search word "baştan", I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search "baştan" but not the root word
"baş". Probably, English Analyzer is being used and could not find the root
word. For example, in Luke, if I change "Analyser to use for query parsing"
to EnglishAnalyser, then it can not find word "baş" but it can with
TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason




   
   0
   58
   
   true
   baş
   xml
   
   
   
   
   baş
   baş
   text:baş
   text:baş
   
   LuceneQParser
   
   38.0
   
   16.0
   
   3.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   
   10.0
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   10.0
   
   
   
   



   
   0
   2
   
   true
   baştan
   xml
   
   
   
   
   htt://111.a.b1
   6H500F0
   tr
   Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
   
   Maxtor Corp.
   maxtor
   
   electronics
   hard drive
   
   
   SATA 3.0Gb/s, NCQ
   8.5ms seek
   16MB cache
   
   Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu!" diyerek
   baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
   ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
oynatıldığı
   giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki
   ismin oynadığı reklam Arda'nın kabinde papağan gibi
tekrarladığı
   "My darling!" repliği, sonunda Paris'i görünce anlam
veremediğimiz
   uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
izledikten
   sonra anlaşılan "Paris seçti, firma yaptı, Arda
bayıldı."
   sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.
   
   
   350.0
   350,USD
   6
   true
   2006-02-13T15:26:37Z
   1420300467908378624
   
   
   
   baştan
   baştan
   text:baştan
   text:baştan
   
   
   0.028767452 = (MATCH) weight(text:baştan in 0)
[DefaultSimilarity], result of:
   0.028767452 = fieldWeight in 0, product of:
   1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
   0.30685282 = idf(docFreq=1, maxDocs=1)
   0.09375 = fieldNorm(doc=0)
   
   
   LuceneQParser
   
   2.0
   
   1.0
   
   1.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   
   1.0
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
  

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
expected term that matches what Luke reports for the
> index and what Solr Admin Analysis also reports for index analysis.
>
> -- Jack Krupansky
>
> -Original Message- From: Erol Akarsu
> Sent: Monday, December 03, 2012 11:35 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Yes.
>
> I expect SOLR should give same search results as Luked does.
>
> Term analyzer gives correct answer in SOLR as expected. But SOLR does not
> return correct search results.
>
> I don't know why.
>
> Erol Akarsu
>
> On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky *
> *wrote:
>
>  So, does that highlight the problem for you or not? Is the term analyzed
>> as you expected?
>>
>> -- Jack Krupansky
>>
>> From: Erol Akarsu
>> Sent: Monday, December 03, 2012 8:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Luke and SOLR search giving different results
>>
>> Jack,
>>
>> Thanks for help.
>>
>> I removed data folder  of SOLR and indexed this sample doc from scratch,
>> there was no document in SOLR but only one.
>>
>> When I analysed , I can see stemming is correct and I can see these for
>> words "bul", "baş" ,"gör" and "umut" in SF row
>> I attached analyse screens
>>
>> Erol Akarsu
>>
>>
>> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
>> wrote:
>>
>>   Have you tried using the Solr Admin Analysis page, using the word and a
>> few words of context for index analysis and the word alone for query
>> analysis?
>>
>>   And be sure to fully reindex if you change ANYTHING in the schema fields
>> or field types.
>>
>>   -- Jack Krupansky
>>
>>   From: Erol Akarsu
>>   Sent: Sunday, December 02, 2012 10:38 PM
>>   To: solr-user@lucene.apache.org
>>   Subject: Luke and SOLR search giving different results
>>
>>
>>   Hi,
>>
>>   I am trying to apply SOLR for Turkish Language for my research.
>>
>>   Instead of using language identification, I manually assigned Turkish
>> language for a sample test document. I have configured SOLR schema.xml,
>> activated the part below. I have added the attached document
>> testTurkishDoc.xml that is inserted to SOLR database.
>>
>>   But searching for raw Lucene index through Luke and SOLR 4.0 search
>> though GUI is giving different results. In picture Selection_006.png, the
>> word "baş" is listed as top term. I search the word "baş" in Luke and I
>> got
>> the result result that is only document, shown in Selection_004.png.
>>
>>   But in SOLR GUI, I am getting empty result for word "baş" in picture
>> Selection_002.png.
>>
>>   In the text we have  features field, that has word "baştan" that is
>> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI
>> is
>> doing search different than Luke. I could not figure it out why I could
>> not
>> find it while getting in Luke. The same thing happens for words "umut",
>> "bul" and "gör".
>>
>>   I will appreciate if you can help me to get same results from SOLR UI.
>>
>>
>>   
>>  Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
>> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda
>> Turan
>> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
>> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
>> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
>> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir
>> de
>> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
>> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
>> dedirterek.
>> 
>>
>>
>>
>>   Added to schema.xml for SOLR:
>>
>>   > multiValued="true"/>
>>   > positionIncrementGap="100">
>> 
>>   
>>   
>>   > words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/>
>>   > language="Turkish"/>
>> 
>> 
>>   
>>   
>>   > words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/>
>>   > language="Turkish"/>
>> 
>>   
>>
>>
>>
>>
>>
>


Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

Two points:

1. Possibly an encoding problem with your container? Is UTF-8 encoding 
enabled?
2. Add &debugQuery=true to your query (from the browser) and see if the 
parser_query has the expected term that matches what Luke reports for the 
index and what Solr Admin Analysis also reports for index analysis.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky 
wrote:



So, does that highlight the problem for you or not? Is the term analyzed
as you expected?

-- Jack Krupansky

From: Erol Akarsu
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words "bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu


On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a
few words of context for index analysis and the word alone for query
analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields
or field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results


  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish
language for a sample test document. I have configured SOLR schema.xml,
activated the part below. I have added the attached document
testTurkishDoc.xml that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search
though GUI is giving different results. In picture Selection_006.png, the
word "baş" is listed as top term. I search the word "baş" in Luke and I 
got

the result result that is only document, shown in Selection_004.png.

  But in SOLR GUI, I am getting empty result for word "baş" in picture
Selection_002.png.

  In the text we have  features field, that has word "baştan" that is
being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI 
is
doing search different than Luke. I could not figure it out why I could 
not

find it while getting in Luke. The same thing happens for words "umut",
"bul" and "gör".

  I will appreciate if you can help me to get same results from SOLR UI.


  
 Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda 
Turan

ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir 
de

Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.




  Added to schema.xml for SOLR:

  
  

  
  
  
  


  
  
  
  

  








Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky wrote:

> So, does that highlight the problem for you or not? Is the term analyzed
> as you expected?
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Monday, December 03, 2012 8:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Thanks for help.
>
> I removed data folder  of SOLR and indexed this sample doc from scratch,
> there was no document in SOLR but only one.
>
> When I analysed , I can see stemming is correct and I can see these for
> words "bul", "baş" ,"gör" and "umut" in SF row
> I attached analyse screens
>
> Erol Akarsu
>
>
> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
> wrote:
>
>   Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
>   And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
>   -- Jack Krupansky
>
>   From: Erol Akarsu
>   Sent: Sunday, December 02, 2012 10:38 PM
>   To: solr-user@lucene.apache.org
>   Subject: Luke and SOLR search giving different results
>
>
>   Hi,
>
>   I am trying to apply SOLR for Turkish Language for my research.
>
>   Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
>   But searching for raw Lucene index through Luke and SOLR 4.0 search
> though GUI is giving different results. In picture Selection_006.png, the
> word "baş" is listed as top term. I search the word "baş" in Luke and I got
> the result result that is only document, shown in Selection_004.png.
>
>   But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
>   In the text we have  features field, that has word "baştan" that is
> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is
> doing search different than Luke. I could not figure it out why I could not
> find it while getting in Luke. The same thing happens for words "umut",
> "bul" and "gör".
>
>   I will appreciate if you can help me to get same results from SOLR UI.
>
>
>   
>  Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
> 
>
>
>
>   Added to schema.xml for SOLR:
>
>multiValued="true"/>
>positionIncrementGap="100">
> 
>   
>   
>words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>language="Turkish"/>
> 
> 
>   
>   
>words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>language="Turkish"/>
> 
>   
>
>
>
>


Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky
So, does that highlight the problem for you or not? Is the term analyzed as you 
expected?

-- Jack Krupansky

From: Erol Akarsu 
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch, there 
was no document in SOLR but only one. 

When I analysed , I can see stemming is correct and I can see these for words 
"bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu


On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky  wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a few 
words of context for index analysis and the word alone for query analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields or 
field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results


  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish 
language for a sample test document. I have configured SOLR schema.xml, 
activated the part below. I have added the attached document testTurkishDoc.xml 
that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search though 
GUI is giving different results. In picture Selection_006.png, the word "baş" 
is listed as top term. I search the word "baş" in Luke and I got the result 
result that is only document, shown in Selection_004.png.

  But in SOLR GUI, I am getting empty result for word "baş" in picture 
Selection_002.png.

  In the text we have  features field, that has word "baştan" that is being 
derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing 
search different than Luke. I could not figure it out why I could not find it 
while getting in Luke. The same thing happens for words "umut", "bul" and "gör".

  I will appreciate if you can help me to get same results from SOLR UI.


  
 Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” 
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve 
büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması 
reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam 
Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda 
Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in 
ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda 
bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.




  Added to schema.xml for SOLR:

  
  

  
  
  
  


  
  
  
  

  





Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words "bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky wrote:

> Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
> And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Sunday, December 02, 2012 10:38 PM
> To: solr-user@lucene.apache.org
> Subject: Luke and SOLR search giving different results
>
> Hi,
>
> I am trying to apply SOLR for Turkish Language for my research.
>
> Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
> But searching for raw Lucene index through Luke and SOLR 4.0 search though
> GUI is giving different results. In picture Selection_006.png, the word
> "baş" is listed as top term. I search the word "baş" in Luke and I got the
> result result that is only document, shown in Selection_004.png.
>
> But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
> In the text we have  features field, that has word "baştan" that is being
> derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing
> search different than Luke. I could not figure it out why I could not find
> it while getting in Luke. The same thing happens for words "umut", "bul"
> and "gör".
>
> I will appreciate if you can help me to get same results from SOLR UI.
>
>
> 
>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
>   
>
>
>
> Added to schema.xml for SOLR:
>
>  multiValued="true"/>
>  positionIncrementGap="100">
>   
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>  language="Turkish"/>
>   
>   
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>  language="Turkish"/>
>   
> 
>
>
>


Re: Luke and SOLR search giving different results

2012-12-02 Thread Jack Krupansky
Have you tried using the Solr Admin Analysis page, using the word and a few 
words of context for index analysis and the word alone for query analysis?

And be sure to fully reindex if you change ANYTHING in the schema fields or 
field types.

-- Jack Krupansky

From: Erol Akarsu 
Sent: Sunday, December 02, 2012 10:38 PM
To: solr-user@lucene.apache.org 
Subject: Luke and SOLR search giving different results

Hi,

I am trying to apply SOLR for Turkish Language for my research.

Instead of using language identification, I manually assigned Turkish language 
for a sample test document. I have configured SOLR schema.xml, activated the 
part below. I have added the attached document testTurkishDoc.xml that is 
inserted to SOLR database.

But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI 
is giving different results. In picture Selection_006.png, the word "baş" is 
listed as top term. I search the word "baş" in Luke and I got the result result 
that is only document, shown in Selection_004.png.

But in SOLR GUI, I am getting empty result for word "baş" in picture 
Selection_002.png.

In the text we have  features field, that has word "baştan" that is being 
derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing 
search different than Luke. I could not figure it out why I could not find it 
while getting in Luke. The same thing happens for words "umut", "bul" and 
"gör". 

I will appreciate if you can help me to get same results from SOLR UI.



   Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” 
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve 
büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması 
reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam 
Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda 
Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in 
ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda 
bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
  



Added to schema.xml for SOLR:



  




  
  




  





Luke and SOLR search giving different results

2012-12-02 Thread Erol Akarsu
Hi,

I am trying to apply SOLR for Turkish Language for my research.

Instead of using language identification, I manually assigned Turkish
language for a sample test document. I have configured SOLR schema.xml,
activated the part below. I have added the attached document
testTurkishDoc.xml that is inserted to SOLR database.

But searching for raw Lucene index through Luke and SOLR 4.0 search though
GUI is giving different results. In picture Selection_006.png, the word "baş"
is listed as top term. I search the word "baş" in Luke and I got the result
result that is only document, shown in Selection_004.png.

But in SOLR GUI, I am getting empty result for word "baş" in picture
Selection_002.png.

In the text we have  features field, that has word "baştan" that is being
derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing
search different than Luke. I could not figure it out why I could not find
it while getting in Luke. The same thing happens for words "umut", "bul"
and "gör".

I will appreciate if you can help me to get same results from SOLR UI.



   Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.
  



Added to schema.xml for SOLR:



  




  
  




  



  htt://111.a.b1
  6H500F0
  tr
  Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300
  Maxtor Corp.
  
  maxtor
  electronics
  hard drive
  SATA 3.0Gb/s, NCQ
  8.5ms seek
  16MB cache
  350
  6
  true
  
   Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
   
  
  2006-02-13T15:26:37Z





Re: synonyms.txt: different results on admin and on site..

2011-09-08 Thread deniz
you are right about wildcards and analysis stuff... 

so any way of putting wildcards in for analysis? 

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3322026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: synonyms.txt: different results on admin and on site..

2011-09-08 Thread François Schiettecatte
Wildcard terms are not analyzed, so your synonyms.txt may come into play here, 
have you check the analysis for deniz* ?

François

On Sep 7, 2011, at 10:08 PM, deniz wrote:

> well yea you are right... i realised that lack of detail issue here... so
> here it comes... 
> 
> 
> This is from my schema.xml and basically i have a synonyms.txt file which
> contains
> 
> deniz,denis,denise
> 
> 
> After posting here, I have checked some stuff that I have faced before,
> while trying to add accented letters to the system... so it seems like same
> or similar stuff... so...
> 
> As i want to support partial matches, the search string is modified on php
> side. if user enters deniz, it is sent to solr as deniz*
> 
> when i check on solr admin, i was able to make searches with 
> deniz,denise,denis and they all return correct results, but when i put the
> wildcard, i get nothing...
> 
> so with the above settings;
> 
> deniz
> denise
> denis
> works smoothly
> 
> deniz*
> denise*
> denis*
> returns nothing...
> 
> 
> should i implement some kinda analyzer or tokenizer or any kinda component
> to overtime this thing? 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Rob Casson wrote:
>> 
>> you should probably post your schema.xml and some parts of your
>> synonyms.txt.  it could be differences between your index and query
>> analysis chains, synonym expansion errors, etc, but folks will likely
>> need more details to help you out.
>> 
>> cheers,
>> rob
>> 
>> On Wed, Sep 7, 2011 at 9:46 PM, deniz <denizdurmu...@gmail.com>
>> wrote:
>>> could it be related with analysis issue about synonyms once again?
>>> 
>>> 
>>> 
>>> -
>>> Zeki ama calismiyor... Calissa yapar...
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms.txt: different results on admin and on site..

2011-09-07 Thread deniz
well yea you are right... i realised that lack of detail issue here... so
here it comes... 








 

 
 
 
 
 
 
 


This is from my schema.xml and basically i have a synonyms.txt file which
contains

deniz,denis,denise


After posting here, I have checked some stuff that I have faced before,
while trying to add accented letters to the system... so it seems like same
or similar stuff... so...

As i want to support partial matches, the search string is modified on php
side. if user enters deniz, it is sent to solr as deniz*

when i check on solr admin, i was able to make searches with 
deniz,denise,denis and they all return correct results, but when i put the
wildcard, i get nothing...

so with the above settings;

deniz
denise
denis
works smoothly

deniz*
denise*
denis*
returns nothing...


should i implement some kinda analyzer or tokenizer or any kinda component
to overtime this thing? 










Rob Casson wrote:
> 
> you should probably post your schema.xml and some parts of your
> synonyms.txt.  it could be differences between your index and query
> analysis chains, synonym expansion errors, etc, but folks will likely
> need more details to help you out.
> 
> cheers,
> rob
> 
> On Wed, Sep 7, 2011 at 9:46 PM, deniz <denizdurmu...@gmail.com>
> wrote:
>> could it be related with analysis issue about synonyms once again?
>>
>>
>>
>> -
>> Zeki ama calismiyor... Calissa yapar...
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 


-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms.txt: different results on admin and on site..

2011-09-07 Thread Rob Casson
you should probably post your schema.xml and some parts of your
synonyms.txt.  it could be differences between your index and query
analysis chains, synonym expansion errors, etc, but folks will likely
need more details to help you out.

cheers,
rob

On Wed, Sep 7, 2011 at 9:46 PM, deniz  wrote:
> could it be related with analysis issue about synonyms once again?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: synonyms.txt: different results on admin and on site..

2011-09-07 Thread deniz
could it be related with analysis issue about synonyms once again? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
Sent from the Solr - User mailing list archive at Nabble.com.


synonyms.txt: different results on admin and on site..

2011-09-07 Thread deniz
hi all...

i have checked the list about the issue in the title, but couldnt find any
related info... so my problem is:

i change sysnonyms.txt and then reload the core without restarting the
server. new synonyms works smoothly if i use admin interface of solr,
however when i use the site which is written in php, i got nothing when i
use one of the synonyms that i have added.

any ideas why this is happening?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318338.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching similar values for same field results in different results

2011-01-06 Thread PeterKerk

That was it! thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2206087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching similar values for same field results in different results

2011-01-06 Thread Juan Grande
You have a problem with the analysis chain. When you do a query, the
EnglishPorterFilter is cutting off the last part of your word, but you're
not doing the same when indexing. I think that removing that filter from the
chain will solve your problem.

Remember that there are two different analysis chains, one for indexing time
and one for querying time. I think that you didn't see the shortened word in
analysis.jsp because you entered the text in the "Field Value (Index)" text
box, so it was using the indexing time analysis chain. If you want to see
the results of applying the querying time analysis chain, you should enter
the text in the "Field Value (Query)" text box.

Good luck,

Juan Grande

On Thu, Jan 6, 2011 at 10:58 AM, PeterKerk  wrote:

>
> @iorixxx:
> I ran: http://localhost:8983/solr/db/update/?optimize=true
> This is the response:
> 
>
>0
>58
>
> 
>
> Then I ran:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=on&q=*:*&facet.field=themes_raw
>
> This is response:
> 
>
>366
>153
> 16
> 
> 
>
> So, it seems that nothing has changed there, and it looks like also before
> the optimize operation the results were shown correct?
>
> when you say http caching, you mean the caching by the browser? Or does
> Solr
> have some caching by default? If the latter, how can I clear that cache?
>
>
> @Erick: I added debugquery
>
> For "Strand en Zee" I see this:
> 
> PhraseQuery(themes:"strand en zee")
> 
>
> Looks correct.
>
>
> For "Kasteel en Landgoed" I see this:
> 
> PhraseQuery(themes:"kasteel en landgo")
> 
>
> Which isnt correct! So it seems herein lies the problem.
>
> Now Im wondering why the value is cut off...this is my schema.xml:
>
> 
>  
>
> words="stopwords_dutch.txt"/>
> generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
> words="stopwords_dutch.txt"/>
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
> 
>
>  multiValued="true"  />
>  multiValued="true"/>
>
>
> I checked analysis.jsp:
> filled in Field: "themes"
> and Field value: "Kasteel en Landgoed"
>
> and schema.jsp, but I didnt see any weird results
>
> Now, Im wondering what else it could be..
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching similar values for same field results in different results

2011-01-06 Thread PeterKerk

@iorixxx:
I ran: http://localhost:8983/solr/db/update/?optimize=true
This is the response:


0
58



Then I ran:
http://localhost:8983/solr/db/select/?indent=on&facet=on&q=*:*&facet.field=themes_raw

This is response:


366
153
16



So, it seems that nothing has changed there, and it looks like also before
the optimize operation the results were shown correct?

when you say http caching, you mean the caching by the browser? Or does Solr
have some caching by default? If the latter, how can I clear that cache?


@Erick: I added debugquery

For "Strand en Zee" I see this:

PhraseQuery(themes:"strand en zee")


Looks correct.


For "Kasteel en Landgoed" I see this:

PhraseQuery(themes:"kasteel en landgo")


Which isnt correct! So it seems herein lies the problem.

Now Im wondering why the value is cut off...this is my schema.xml:


  





  
  







  






I checked analysis.jsp:
filled in Field: "themes"
and Field value: "Kasteel en Landgoed"

and schema.jsp, but I didnt see any weird results

Now, Im wondering what else it could be..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching similar values for same field results in different results

2011-01-05 Thread Erick Erickson
Often adding &debugQuery=on to the URL can show you very useful information
that helps pinpoint the problem. I confess I don't see anything amiss in
what
you've shown though.

Also, look at the "schema browser" page off the admin page, and look
at your "themes" field to see what is actually in your index, it may
surprise you..

Finally, the admin/analysis page (turn debug on) may also help you to see
exactly what tokenization is happening when indexing and querying. I'd guess
that the behavior isn't exactly what you expect.

Best
Erick


On Wed, Jan 5, 2011 at 10:47 AM, PeterKerk  wrote:

>
> Something weird is happening.
>
> I have locations that can have 1 or more themes.
> A theme can be: "Kasteel en Landgoed", or a theme can be "Strand en Zee"
>
> I checked in the database, there are many locations that have 1 or more of
> these themes assigned to it.
>
> Also in the response xml when I do a general search I get:
> 
> 
> 
> 
>366
>153<- 153 found
>16  <- 16 found
> 
>
>
> When I request this:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title
> I get 16 results. Which is expected.
>
> When I request this:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title
> I get 0 results!!!
>
> why?!?
>
>
> definition in schema.xml:
>
>
>  multiValued="true"  />
>  multiValued="true"/>
>
> 
>
> Why are these results differing?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching similar values for same field results in different results

2011-01-05 Thread Ahmet Arslan

> 
> uhm...how do I perform an optimize operation? :)


http://localhost:8983/solr/db/update/?optimize=true


  


Re: Searching similar values for same field results in different results

2011-01-05 Thread PeterKerk

uhm...how do I perform an optimize operation? :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199795.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching similar values for same field results in different results

2011-01-05 Thread Ahmet Arslan
> Something weird is happening.
> 
> I have locations that can have 1 or more themes.
> A theme can be: "Kasteel en Landgoed", or a theme can be
> "Strand en Zee"
> 
> I checked in the database, there are many locations that
> have 1 or more of
> these themes assigned to it.
> 
> Also in the response xml when I do a general search I get:
> 
> 
> 
> 
>     366
>     153    <- 153
> found
>     16    <- 16 found
> 
> 
> 
> When I request this:
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title
> I get 16 results. Which is expected.
> 
> When I request this:
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title
> I get 0 results!!!
> 
> why?!?

May be you deleted those documents? Deleted terms can appear in facet section 
until you optimize. Can you run these queries after an optimize operation?
What is the output of this after an optimize :
facet=on&q=*:*&facet.field=themes_raw

Also using browser to query/test solr sometimes gives old results due to http 
caching.





Searching similar values for same field results in different results

2011-01-05 Thread PeterKerk

Something weird is happening.

I have locations that can have 1 or more themes.
A theme can be: "Kasteel en Landgoed", or a theme can be "Strand en Zee"

I checked in the database, there are many locations that have 1 or more of
these themes assigned to it.

Also in the response xml when I do a general search I get:




366
153<- 153 found
16  <- 16 found



When I request this:
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title
I get 16 results. Which is expected.

When I request this:
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title
I get 0 results!!!

why?!?


definition in schema.xml:







Why are these results differing?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different Results..

2010-12-22 Thread Ahmet Arslan

--- On Wed, 12/22/10, satya swaroop  wrote:

> From: satya swaroop 
> Subject: Different Results..
> To: solr-user@lucene.apache.org
> Date: Wednesday, December 22, 2010, 10:44 AM
> Hi All,
>          i am getting
> different results when i used with some escape keys..
> for example:::
> 1) when i use this request
>             http://localhost:8080/solr/select?q=erlang!ericson
>            
>    the result obtained is
>            
>     start="0">
> 
> 2) when the request is
>              http://localhost:8080/solr/select?q=erlang/ericson
>                
>     the result is
>                
>            name="response" numFound="1" start="0">
> 
> 
> My query here is, do solr consider both the queries
> differently and what do
> it consider for !,/ and all other escape characters.
> 

First of all ! has a special meaning. it means NOT. It is part of the query 
syntax. It is equivalent to minus - operator. 

q=erlang!ericson is parsed into : 
defaultSearchField:erlang -defaultSearchField:ericson

You can see this by appending &debugQuery=on to your search URL.

So you need to escape ! in your case. 
q=erlang\!ericson will return same result set as q=erlang/ericson

You can see the complete list of special charter list.
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping Special 
Characters








Re: Different Results..

2010-12-22 Thread Marco Martinez
We need more information about the the analyzers and tokenizers of the
default field of your search

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/12/22 satya swaroop 

> Hi All,
> i am getting different results when i used with some escape keys..
> for example:::
> 1) when i use this request
>http://localhost:8080/solr/select?q=erlang!ericson
>   the result obtained is
>   
>
> 2) when the request is
> http://localhost:8080/solr/select?q=erlang/ericson
>the result is
>  
>
>
> My query here is, do solr consider both the queries differently and what do
> it consider for !,/ and all other escape characters.
>
>
> Regards,
> satya
>


Different Results..

2010-12-22 Thread satya swaroop
Hi All,
 i am getting different results when i used with some escape keys..
for example:::
1) when i use this request
http://localhost:8080/solr/select?q=erlang!ericson
   the result obtained is
   

2) when the request is
 http://localhost:8080/solr/select?q=erlang/ericson
the result is
  


My query here is, do solr consider both the queries differently and what do
it consider for !,/ and all other escape characters.


Regards,
satya


Re: different results depending on result format

2010-10-22 Thread Mike Sokolov
OK I solved the problem.  It turns out that I was connecting to the 
server using its FQDN (rosen.ifactory.com).  When, instead, I connect to 
it using the name "rosen" (which maps to the same IP using the default 
domain name configured in my resolver, ifactory.com), I get results back.


I am looking into the virtual hosts config in tomcat; it seems as if 
there must indeed be another solr instance running; in fact I'm now 
concerned there might be two solr instances running against the same 
data folder. yargh.


-Mike


On 10/22/2010 09:05 AM, Mike Sokolov wrote:
Yes - I really only have the one solr instance.  And I have plenty of 
other cases where I am getting good results back via solrj.  It's 
really a mystery.  Unfortunately I have to catch up on other stuff I 
have been neglecting, but I'll follow up when I'm able to get a 
solution...


-Mike


On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote:
strange..are you absolutely sure the two queries are directed to the 
same

Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolov  wrote:

quick follow-up: I also notice that the query from solrj gets 
version=1,
whereas the admin webapp puts version=2.2 on the query string, 
although this
param doesn't seem to change the xml results at all.  Does this 
indicate an

older version of solrj perhaps?

-Mike


On 10/21/2010 04:47 PM, Mike Sokolov wrote:

I'm experiencing something really weird: I get different results 
depending
on whether I specify wt=javabin, and retrieve using SolrJ, or 
wt=xml.  I
spent quite a while staring at query params to make sure everything 
else is
the same, and they do seem to be.  At first I thought the problem 
related to
the javabin format change that has been talked about recently, but 
I am

using solr 1.4.0 and solrj 1.4.0.

Notice in the two entries that the wt param is different and the hits
result count is different.

Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/
params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} 


hits=261 status=0 QTime=1
Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select
params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} 


hits=57 status=0 QTime=0


The xml format results seem to be the correct ones. So one thought 
I had
is that I could somehow fall back to using xml format in solrj, but 
I tried
SolrQuery.set('wt','xml') and that didn't have the desired effect 
(I get
'&wt=javabin&wt=javabin' in the log - ie the param is repeated, but 
still

javabin).


Am I crazy? Is this a known issue?

Thanks for any suggestions




Re: different results depending on result format

2010-10-22 Thread Mike Sokolov
Yes - I really only have the one solr instance.  And I have plenty of 
other cases where I am getting good results back via solrj.  It's really 
a mystery.  Unfortunately I have to catch up on other stuff I have been 
neglecting, but I'll follow up when I'm able to get a solution...


-Mike


On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote:

strange..are you absolutely sure the two queries are directed to the same
Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolov  wrote:

   

quick follow-up: I also notice that the query from solrj gets version=1,
whereas the admin webapp puts version=2.2 on the query string, although this
param doesn't seem to change the xml results at all.  Does this indicate an
older version of solrj perhaps?

-Mike


On 10/21/2010 04:47 PM, Mike Sokolov wrote:

 

I'm experiencing something really weird: I get different results depending
on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml.  I
spent quite a while staring at query params to make sure everything else is
the same, and they do seem to be.  At first I thought the problem related to
the javabin format change that has been talked about recently, but I am
using solr 1.4.0 and solrj 1.4.0.

Notice in the two entries that the wt param is different and the hits
result count is different.

Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/
params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1}
hits=261 status=0 QTime=1
Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select
params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1}
hits=57 status=0 QTime=0


The xml format results seem to be the correct ones. So one thought I had
is that I could somehow fall back to using xml format in solrj, but I tried
SolrQuery.set('wt','xml') and that didn't have the desired effect (I get
'&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still
javabin).


Am I crazy? Is this a known issue?

Thanks for any suggestions


   
   


Re: different results depending on result format

2010-10-22 Thread Savvas-Andreas Moysidis
strange..are you absolutely sure the two queries are directed to the same
Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolov  wrote:

> quick follow-up: I also notice that the query from solrj gets version=1,
> whereas the admin webapp puts version=2.2 on the query string, although this
> param doesn't seem to change the xml results at all.  Does this indicate an
> older version of solrj perhaps?
>
> -Mike
>
>
> On 10/21/2010 04:47 PM, Mike Sokolov wrote:
>
>> I'm experiencing something really weird: I get different results depending
>> on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml.  I
>> spent quite a while staring at query params to make sure everything else is
>> the same, and they do seem to be.  At first I thought the problem related to
>> the javabin format change that has been talked about recently, but I am
>> using solr 1.4.0 and solrj 1.4.0.
>>
>> Notice in the two entries that the wt param is different and the hits
>> result count is different.
>>
>> Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
>> INFO: [bopp.ba] webapp=/solr path=/select/
>> params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1}
>> hits=261 status=0 QTime=1
>> Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
>> INFO: [bopp.ba] webapp=/solr path=/select
>> params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1}
>> hits=57 status=0 QTime=0
>>
>>
>> The xml format results seem to be the correct ones. So one thought I had
>> is that I could somehow fall back to using xml format in solrj, but I tried
>> SolrQuery.set('wt','xml') and that didn't have the desired effect (I get
>> '&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still
>> javabin).
>>
>>
>> Am I crazy? Is this a known issue?
>>
>> Thanks for any suggestions
>>
>>


Re: different results depending on result format

2010-10-21 Thread Mike Sokolov
quick follow-up: I also notice that the query from solrj gets version=1, 
whereas the admin webapp puts version=2.2 on the query string, although 
this param doesn't seem to change the xml results at all.  Does this 
indicate an older version of solrj perhaps?


-Mike

On 10/21/2010 04:47 PM, Mike Sokolov wrote:
I'm experiencing something really weird: I get different results 
depending on whether I specify wt=javabin, and retrieve using SolrJ, 
or wt=xml.  I spent quite a while staring at query params to make sure 
everything else is the same, and they do seem to be.  At first I 
thought the problem related to the javabin format change that has been 
talked about recently, but I am using solr 1.4.0 and solrj 1.4.0.


Notice in the two entries that the wt param is different and the hits 
result count is different.


Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/ 
params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} 
hits=261 status=0 QTime=1

Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select 
params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} 
hits=57 status=0 QTime=0



The xml format results seem to be the correct ones. So one thought I 
had is that I could somehow fall back to using xml format in solrj, 
but I tried SolrQuery.set('wt','xml') and that didn't have the desired 
effect (I get '&wt=javabin&wt=javabin' in the log - ie the param is 
repeated, but still javabin).



Am I crazy? Is this a known issue?

Thanks for any suggestions



different results depending on result format

2010-10-21 Thread Mike Sokolov
I'm experiencing something really weird: I get different results 
depending on whether I specify wt=javabin, and retrieve using SolrJ, or 
wt=xml.  I spent quite a while staring at query params to make sure 
everything else is the same, and they do seem to be.  At first I thought 
the problem related to the javabin format change that has been talked 
about recently, but I am using solr 1.4.0 and solrj 1.4.0.


Notice in the two entries that the wt param is different and the hits 
result count is different.


Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/ 
params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} 
hits=261 status=0 QTime=1

Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select 
params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} 
hits=57 status=0 QTime=0



The xml format results seem to be the correct ones. So one thought I had 
is that I could somehow fall back to using xml format in solrj, but I 
tried SolrQuery.set('wt','xml') and that didn't have the desired effect 
(I get '&wt=javabin&wt=javabin' in the log - ie the param is repeated, 
but still javabin).



Am I crazy? Is this a known issue?

Thanks for any suggestions

--
Michael Sokolov
Engineering Director
www.ifactory.com
@iFactoryBoston

PubFactory: the revolutionary e-publishing platform from iFactory



Re: SolrJ - how separte different results from the same facet query?

2010-03-15 Thread Jon Baer
I am interested in this as well ... Im also having the issue of understanding 
if a result has been elevated by the QueryElevation component.  It should like 
SolrJ would need to know about some type of metadata contained within the docs 
but I haven't seen SolrJ dealing w/ payloads specifically yet.  

I also can't tell if these would require some feature request on those 
components or if it's something that is too custom that it would require 
writing new components.  

It sounds like retrieving a document should answer questions like ...

"did this document come from a facet query?"
"was this document elevated?"

Etc.  Maybe something the Debug component can handle if it can write payloads 
back to the results, etc.

- Jon

On Mar 15, 2010, at 7:56 AM, Saïd Radhouani wrote:

> I'm faceting with a two different query ranges while using addFacetQuery. I
> wonder wether it's possible using SolrJ to extract the result of each query
> range separately. Here's is an example:
> 
> addFacetQuery("price:[* TO 150]"); addFacetQuery("price:[151 TO 300]"); etc.
> addFacetQuery("length:[* TO 5]");addFacetQuery("length:[5 TO 10]"); etc.
> 
> When I use getFacetQuery, SolrJ gives me the responses of both query ranges
> (prices and lengths) mixed in the same list. I wonder wether it's possible
> to tell SolrJ to extract the response of a specific query range, i.e., tell
> it to extract the price-based response in a list and the length-based
> response in another list. It would be helpful to have something like
> getFacetQuery(field=price), getFacetQuery(field=length), etc.
> 
> Any ideas?
> 
> Thanks.



SolrJ - how separte different results from the same facet query?

2010-03-15 Thread Saïd Radhouani
I'm faceting with a two different query ranges while using addFacetQuery. I
wonder wether it's possible using SolrJ to extract the result of each query
range separately. Here's is an example:

addFacetQuery("price:[* TO 150]"); addFacetQuery("price:[151 TO 300]"); etc.
addFacetQuery("length:[* TO 5]");addFacetQuery("length:[5 TO 10]"); etc.

When I use getFacetQuery, SolrJ gives me the responses of both query ranges
(prices and lengths) mixed in the same list. I wonder wether it's possible
to tell SolrJ to extract the response of a specific query range, i.e., tell
it to extract the price-based response in a list and the length-based
response in another list. It would be helpful to have something like
getFacetQuery(field=price), getFacetQuery(field=length), etc.

Any ideas?

Thanks.


SolrJ - separte different results from the same facet query?

2010-03-11 Thread Steve Radhouani
I'm faceting with a two different query ranges while using addFacetQuery. I
wonder wether it's possible using SolrJ to extract the result of each query
range separately. Here's is my example:

addFacetQuery("price:[* TO 150]"); addFacetQuery("price:[151 TO 300]"); etc.
addFacetQuery("date:[* TO NOW]");

When I use getFacetQuery, SolrJ gives me the responses of both query ranges
(prices and dates) mixed in the same list. I wonder wether it's possible to
tell SolrJ to extract the response of a specific query range, i.e., tell it
to extract the price-based response in a list and the date-based response in
another list. It would be helpful to have something like
getFacetQuery(field=price).

Any ideas?

Thanks.


Re: Different results return for capital and small letters.

2009-01-02 Thread Otis Gospodnetic
Tushar,

Could you ask on solr-user in the future, please?
Your last sentence got cut off.  Do you have LowerCaseFilter in both the index 
and query-time analyzer sections?  Perhaps you should just paste that section 
of the config.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Tushar_Gandhi 
> To: solr-...@lucene.apache.org
> Sent: Wednesday, December 31, 2008 3:26:32 AM
> Subject: Different results return for capital and small letters.
> 
> 
> Hi,
>I am using solr 1.3.
> I am facing a problem with the ordering of the results returned by the
> solr.
> Whenever I search for "cats", it is giving me the result. Nextly whenever I
> am searching "CATS", I am getting same result but ordering is different. Is
> this the behavior of the Solr ? Is there is any priority for searching
> depending on the cases?
> I want same result for both. What should I do if this is default behavior of
> solr?
> Is there is any problem with my indexing?
> Also, I already have LowerCaseFilter configuration for the
> Thanks,
> Tushar
> -- 
> View this message in context: 
> http://www.nabble.com/Different-results-return-for-capital-and-small-letters.-tp21228594p21228594.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.