Re: strange behavior of solr query parser

2020-03-02 Thread Hongtai Xue
Hi Phil.Staley

Thanks for your reply.
but I'm afraid that's a different problem.

Our problem can be confirmed since at least SOLR 7.3.0. (the oldest version we 
have)
And we guess it might already exists since SOLR-9786.
https://github.com/apache/lucene-solr/commit/bf9db95f218f49bac8e7971eb953a9fd9d13a2f0#diff-269ae02e56283ced3ce781cce21b3147R563

sincerely 
hongtai

送信元: "Staley, Phil R - DCF" 
Reply-To: "d...@lucene.apache.org" 
日付: 2020年3月2日 月曜日 22:38
宛先: solr_user lucene_apache , 
"d...@lucene.apache.org" 
件名: Re: strange behavior of solr query parser

I believe we are experiencing the same thing.

We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
reports of certain patterns of search terms resulting in an error that reads, 
“The website encountered an unexpected error. Please try again later.”
 
Below is a list of example terms that always result in this error and a similar 
list that works fine.  The problem pattern seems to be a search term that 
contains 2 or 3 characters followed by a space, followed by additional text.
 
To confirm that the problem is version 8 of SOLR, I have updated our local and 
UAT sites with the latest Drupal updates that did include an update to the 
Search API Solr module and tested the terms below under SOLR 7.7.2, 8.3.1, and 
8.4.1.  Under version 7.7.2  everything works fine. Under either of the version 
8, the problem returns.
 
Thoughts?
 
Search terms that result in error
• w-2 agency directory
• agency w-2 directory
• w-2 agency
• w-2 directory
• w2 agency directory
• w2 agency
• w2 directory
 
Search terms that do not result in error
• w-22 agency directory
• agency directory w-2
• agency w-2directory
• agencyw-2 directory
• w-2
• w2
• agency directory
• agency
• directory
• -2 agency directory
• 2 agency directory
• w-2agency directory
• w2agency directory
 



From: Hongtai Xue 
Sent: Monday, March 2, 2020 3:45 AM
To: solr_user lucene_apache 
Cc: d...@lucene.apache.org 
Subject: strange behavior of solr query parser 
 
Hi,
 
Our team found a strange behavior of solr query parser.
In some specific cases, some conditional clauses on unindexed field will be 
ignored.
 
for query like, q=A:1 OR B:1 OR A:2 OR B:2
if field B is not indexed(but docValues="true"), "B:1" will be lost.
 
but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, 
it will work perfect.
 
the only difference of two queries is that they are wrote in different orders.
one is ABAB, another is AABB,
 
■reproduce steps and example explanation
you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.
 
1. create a _default collection
bin/solr create -c books -s 2 -rf 2
 
2. post books.csv.
bin/post -c books example/exampledocs/books.csv
 
3. run following query.
http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query
 
 
I printed query parsing debug information. 
you can tell "name_str:Foundation" is lost.
 
query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
64 61 74 69 6f 6e")

  "debug":{
    "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
    "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
    "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
    "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
    "QParser":"LuceneQParser"}}

 
but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",
everything is OK. "name_str:Foundation" is not lost.

  "debug":{
    "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
    "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
    "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
    "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
    "QParser":"LuceneQParser"}}

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query
 
we did a little bit research, and we wander if it is a bug of SolrQueryParser.
more specifically, we think if statement here might be wrong.
https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711
 
Could you please tell us if it is a bug, or it's just a wrong query statement.
 
Thanks,
Hongtai Xue



Re: strange behavior of solr query parser

2020-03-02 Thread Staley, Phil R - DCF
I believe we are experiencing the same thing.


We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
reports of certain patterns of search terms resulting in an error that reads, 
“The website encountered an unexpected error. Please try again later.”



Below is a list of example terms that always result in this error and a similar 
list that works fine.  The problem pattern seems to be a search term that 
contains 2 or 3 characters followed by a space, followed by additional text.



To confirm that the problem is version 8 of SOLR, I have updated our local and 
UAT sites with the latest Drupal updates that did include an update to the 
Search API Solr module and tested the terms below under SOLR 7.7.2, 8.3.1, and 
8.4.1.  Under version 7.7.2  everything works fine. Under either of the version 
8, the problem returns.



Thoughts?



Search terms that result in error

  *   w-2 agency directory
  *   agency w-2 directory
  *   w-2 agency
  *   w-2 directory
  *   w2 agency directory
  *   w2 agency
  *   w2 directory



Search terms that do not result in error

  *   w-22 agency directory
  *   agency directory w-2
  *   agency w-2directory
  *   agencyw-2 directory
  *   w-2
  *   w2
  *   agency directory
  *   agency
  *   directory
  *   -2 agency directory
  *   2 agency directory
  *   w-2agency directory
  *   w2agency directory





From: Hongtai Xue 
Sent: Monday, March 2, 2020 3:45 AM
To: solr_user lucene_apache 
Cc: d...@lucene.apache.org 
Subject: strange behavior of solr query parser


Hi,



Our team found a strange behavior of solr query parser.

In some specific cases, some conditional clauses on unindexed field will be 
ignored.



for query like, q=A:1 OR B:1 OR A:2 OR B:2

if field B is not indexed(but docValues="true"), "B:1" will be lost.



but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,

it will work perfect.



the only difference of two queries is that they are wrote in different orders.

one is ABAB, another is AABB,



■reproduce steps and example explanation

you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.



1. create a _default collection

bin/solr create -c books -s 2 -rf 2



2. post books.csv.

bin/post -c books example/exampledocs/books.csv



3. run following query.

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query





I printed query parsing debug information.

you can tell "name_str:Foundation" is lost.



query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"

(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
64 61 74 69 6f 6e")



  "debug":{

"rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",

"querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",

"parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",

"parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",

"QParser":"LuceneQParser"}}





but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",

everything is OK. "name_str:Foundation" is not lost.



  "debug":{

"rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",

"querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",

"parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",

"parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",

"QParser":"LuceneQParser"}}



http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query



we did a little bit research, and we wander if it is a bug of SolrQueryParser.

more specifically, we think if statement here might be wrong.

https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711



Could you please tell us if it is a bug, or it's just a wrong query statement.



Thanks,

Hongtai Xue


strange behavior of solr query parser

2020-03-02 Thread Hongtai Xue
Hi,

Our team found a strange behavior of solr query parser.
In some specific cases, some conditional clauses on unindexed field will be 
ignored.

for query like, q=A:1 OR B:1 OR A:2 OR B:2
if field B is not indexed(but docValues="true"), "B:1" will be lost.

but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
it will work perfect.

the only difference of two queries is that they are wrote in different orders.
one is ABAB, another is AABB,

■reproduce steps and example explanation
you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.

1. create a _default collection
bin/solr create -c books -s 2 -rf 2

2. post books.csv.
bin/post -c books example/exampledocs/books.csv

3. run following query.
http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query


I printed query parsing debug information.
you can tell "name_str:Foundation" is lost.

query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
64 61 74 69 6f 6e")

  "debug":{
"rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
"querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
"parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
"parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
"QParser":"LuceneQParser"}}


but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",
everything is OK. "name_str:Foundation" is not lost.

  "debug":{
"rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
"querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
"parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
"parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
"QParser":"LuceneQParser"}}

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query

we did a little bit research, and we wander if it is a bug of SolrQueryParser.
more specifically, we think if statement here might be wrong.
https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711

Could you please tell us if it is a bug, or it's just a wrong query statement.

Thanks,
Hongtai Xue


Re: strange behavior

2019-06-06 Thread Wendy2
Hi David,

I see. It fixed now by adding the ().  Thank you so much!
q=audit_author.name:(Burley,%20S.K.)%20AND%20entity.type:polymer



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: strange behavior

2019-06-06 Thread Wendy2
Hi Shawn,

I see. 

I added () and it works now. Thank you very much for your help!

q=audit_author.name:(Burley,%20S.K.)%20AND%20entity.type:polymer&rows=1





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: strange behavior

2019-06-06 Thread Shawn Heisey

On 6/6/2019 12:46 PM, Wendy2 wrote:

Why "AND" didn't work anymore?

I use Solr 7.3.1 and edismax parser.
Could someone explain to me why the following query doesn't work any more?
What could be the cause? Thanks!

q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer

It worked previously but now returned very lower number of documents.
I had to use "fq" to make it work correctly:

q=audit_author.name:Burley,%20S.K.&fq=entity.type:polymer&rows=1


That should work no problem with edismax.  It would not however work 
properly with dismax, and it would be easy to mix up the two query parsers.


The way you have written your query is somewhat ambiguous, because of 
the space after the comma.  That ambiguity exists in both of the queries 
mentioned, even the one with the fq.


Thanks,
Shawn


Re: strange behavior

2019-06-06 Thread David Hastings
audit_author.name:Burley,%20S.K.

translates to
audit_author.name:Burley, DEFAULT_OPERATOR DEFAULT_FIELD:S.K.




On Thu, Jun 6, 2019 at 2:46 PM Wendy2  wrote:

>
> Hi,
>
> Why "AND" didn't work anymore?
>
> I use Solr 7.3.1 and edismax parser.
> Could someone explain to me why the following query doesn't work any
> more?
> What could be the cause? Thanks!
>
> q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer
>
> It worked previously but now returned very lower number of documents.
> I had to use "fq" to make it work correctly:
>
> q=audit_author.name:Burley,%20S.K.&fq=entity.type:polymer&rows=1
>
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


strange behavior

2019-06-06 Thread Wendy2


Hi,

Why "AND" didn't work anymore?  

I use Solr 7.3.1 and edismax parser.
Could someone explain to me why the following query doesn't work any more?  
What could be the cause? Thanks! 

q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer

It worked previously but now returned very lower number of documents. 
I had to use "fq" to make it work correctly:

q=audit_author.name:Burley,%20S.K.&fq=entity.type:polymer&rows=1







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Strange Behavior When Extracting Features

2017-10-16 Thread Michael Alcorn
If anyone else is following this thread, I replied on the Jira.

On Mon, Oct 16, 2017 at 4:07 AM, alessandro.benedetti 
wrote:

> This is interesting, the EFI parameter resolution should work using the
> quotes independently of the query parser.
> At that point, the query parsers (both) receive a multi term text.
> Both of them should work the same.
> At the time I saw the mail I tried to reproduce it through the LTR module
> tests and I didn't succeed .
> It would be quite useful if you can contribute a test that is failing with
> the field query parser.
> Have you tried just with the same query, but in a request handler ?
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Strange Behavior When Extracting Features

2017-10-16 Thread alessandro.benedetti
This is interesting, the EFI parameter resolution should work using the
quotes independently of the query parser.
At that point, the query parsers (both) receive a multi term text.
Both of them should work the same.
At the time I saw the mail I tried to reproduce it through the LTR module
tests and I didn't succeed .
It would be quite useful if you can contribute a test that is failing with
the field query parser.
Have you tried just with the same query, but in a request handler ?



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Strange Behavior When Extracting Features

2017-10-13 Thread Michael Alcorn
I believe I've discovered a workaround. If you use:

{
"store": "redhat_efi_feature_store",
"name": "case_description_issue_tfidf",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q":"{!dismax qf=text_tfidf}${text}"
}
}

instead of:

{
"store": "redhat_efi_feature_store",
"name": "case_description_issue_tfidf",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!field f=issue_tfidf}${case_description}"
}
}

you can then use single quotes to incorporate multi-term arguments as
Alessandro suggested. I've added this information to the Jira.

On Fri, Sep 22, 2017 at 8:30 AM, alessandro.benedetti 
wrote:

> I think this has nothing to do with the LTR plugin.
> The problem here should be just the way you use the local params,
> to properly pass multi term local params in Solr you need to use *'* :
>
> efi.case_description='added couple of fiber channel'
>
> This should work.
> If not only the first term will be passed as a local param and then passed
> in the efi map to LTR.
>
> I will update the Jira issue as well.
>
> Cheers
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Strange Behavior When Extracting Features

2017-09-22 Thread alessandro.benedetti
I think this has nothing to do with the LTR plugin.
The problem here should be just the way you use the local params,
to properly pass multi term local params in Solr you need to use *'* :

efi.case_description='added couple of fiber channel'

This should work.
If not only the first term will be passed as a local param and then passed
in the efi map to LTR.

I will update the Jira issue as well.

Cheers





-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Strange Behavior When Extracting Features

2017-09-20 Thread Michael Alcorn
Hi all,

I'm getting some extremely strange behavior when trying to extract features
for a learning to rank model. The following query incorrectly says all
features have zero values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=added
couple of fiber channel&rq={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=added couple of fiber channel
efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10

But this query, which simply moves the word "added" from the front of the
provided text to the back, properly fills in the feature values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple
of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=couple of fiber channel added
efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10

The explain output for the failing query can be found here:

https://gist.github.com/manisnesan/18a8f1804f29b1b62ebfae1211f38cc4

and the explain output for the properly functioning query can be found here:

https://gist.github.com/manisnesan/47685a561605e2229434b38aed11cc65

Have any of you run into this issue? Seems like it could be a bug.

Thanks,
Michael A. Alcorn


Re: Strange behavior of solr

2015-09-02 Thread Erik Hatcher
See example/films/README.txt

The “name” field is guessed incorrectly (because the first film has name=“.45”, 
so indexing errors once it hits a name value that is no longer numeric.  The 
README provides a command to define the name field *before* indexing.  If 
you’ve indexed and had the name field guessed incorrectly and created, you’ll 
need to delete and recreate the collection, then define the name field, then 
reindex.

We used to have a fake film at the top to allow field guessing to “work”, but I 
felt that was too fake and that the example should be true to what happens with 
real world data and the pitfalls of allowing field type guessing to guess 
incorrectly.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com




> On Sep 2, 2015, at 5:17 AM, Long Yan  wrote:
> 
> Hey,
> I have created a core with
> bin\solr create -c mycore
> 
> I want to index the csv sample files from solr-5.2.1
> 
> If I index film.csv under solr-5.2.1\example\films\, solr can only index this 
> file until the line
> "2046,Wong Kar-wai,Romance Film|Fantasy|Science 
> Fiction|Drama,,/en/2046_2004,2004-05-20"
> 
> But if I at first index books.csv under solr-5.2.1\example\exampledocs and 
> then index film.csv, solr can index all lines in film.csv
> 
> Why?
> 
> Regards
> Long Yan
> 
> 



Re: Strange behavior of solr

2015-09-02 Thread Zheng Lin Edwin Yeo
Is there any error message in the log when Solr stops indexing the file at
line 2046?

Regards,
Edwin

On 2 September 2015 at 17:17, Long Yan  wrote:

> Hey,
> I have created a core with
> bin\solr create -c mycore
>
> I want to index the csv sample files from solr-5.2.1
>
> If I index film.csv under solr-5.2.1\example\films\, solr can only index
> this file until the line
> "2046,Wong Kar-wai,Romance Film|Fantasy|Science
> Fiction|Drama,,/en/2046_2004,2004-05-20"
>
> But if I at first index books.csv under solr-5.2.1\example\exampledocs and
> then index film.csv, solr can index all lines in film.csv
>
> Why?
>
> Regards
> Long Yan
>
>
>


Strange behavior of solr

2015-09-02 Thread Long Yan
Hey,
I have created a core with
bin\solr create -c mycore

I want to index the csv sample files from solr-5.2.1

If I index film.csv under solr-5.2.1\example\films\, solr can only index this 
file until the line
"2046,Wong Kar-wai,Romance Film|Fantasy|Science 
Fiction|Drama,,/en/2046_2004,2004-05-20"

But if I at first index books.csv under solr-5.2.1\example\exampledocs and then 
index film.csv, solr can index all lines in film.csv

Why?

Regards
Long Yan




Re: Strange Behavior

2014-08-23 Thread Shawn Heisey
On 8/23/2014 9:01 AM, Jack Krupansky wrote:
> It sounds as if you are trying to treat hyphen as a digit so that
> negative numbers are discrete terms. But... that conflicts with the use
> of hyphen as a word separator. Sorry, but WDF does not support both.
> Pick one or the other, you can't have both.
> 
> But first, please explain your intended use case clearly - there may be
> some better way to try to achieve it.
> 
> Use the analysis page of the Solr Admin UI to see the detailed query and
> index analysis of your terms. You'll be surprised.

You can force WDF to treat hyphen as a digit if you want to, but you are
right that you cannot have both.  To change WDF, create a text file, put
the following in it, and reference it with the types parameter on
WordDelimiterFilterFactory:

- => DIGIT

I use this functionality to build a special analysis chain for
mimetypes.  FOR that fieldType, I treat hyphen and underscore as ALPHANUM.

Search for "wdfftypes" on this page for more info:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Naturally you have to reindex after making this change.  For anyone who
doesn't know what that entails:

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: Strange Behavior

2014-08-23 Thread Jack Krupansky
It sounds as if you are trying to treat hyphen as a digit so that negative 
numbers are discrete terms. But... that conflicts with the use of hyphen as 
a word separator. Sorry, but WDF does not support both. Pick one or the 
other, you can't have both.


But first, please explain your intended use case clearly - there may be some 
better way to try to achieve it.


Use the analysis page of the Solr Admin UI to see the detailed query and 
index analysis of your terms. You'll be surprised.


-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Thursday, August 21, 2014 2:31 PM
To: solr-user@lucene.apache.org
Subject: Strange Behavior

Hi , I have a field type text_general where query type for worddelimiter I 
am using the below type: where wddftype.txt contains "- DIGIT"



When I do a query I am not getting the right results. E.g. Name:"Wi-Fi" 
Gets results but Name:"Wi-Fi Devices Make" not getting any results

but if I change it to Name:"Wi-Fi Devices Make"~3 it works.

If someone can explain what is happening with the current situation..? FYI I 
have the types="wdfftypes.txt" in Query Analyzer.



My Fieldtype

positionIncrementGap="100">

 

   
   

   words="stopwords.txt" />


   
   

   generateWordParts="1" generateNumberParts="0" splitOnCaseChange="0"
splitOnNumerics="0" stemEnglishPossessive="0" 
catenateWords="1" catenateNumbers="1"

catenateAll="1" preserveOriginal="1" />

   synonyms="synonyms.txt" ignoreCase="true" expand="true"/>



 

   
   

   words="stopwords.txt" />


   
   

generateWordParts="1" generateNumberParts="0" splitOnCaseChange="0"
splitOnNumerics="0" stemEnglishPossessive="0" 
catenateWords="1" catenateNumbers="1"
catenateAll="1" preserveOriginal="1" 
types="wdfftypes.txt" />
   synonyms="synonyms.txt" ignoreCase="true" expand="true"/>



   





Strange Behavior

2014-08-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi , I have a field type text_general where query type for worddelimiter I am 
using the below type: where wddftype.txt contains "- DIGIT"


When I do a query I am not getting the right results. E.g. Name:"Wi-Fi"  Gets 
results but Name:"Wi-Fi Devices Make" not getting any results
but if I change it to Name:"Wi-Fi Devices Make"~3 it works.

If someone can explain what is happening with the current situation..? FYI I 
have the types="wdfftypes.txt" in Query Analyzer.


My Fieldtype


  













 
  









 


 






Re: Strange Behavior with Solr in Tomcat.

2014-06-07 Thread Shalin Shekhar Mangar
Interesting, thanks for reporting back. I've re-opened SOLR-4408.


On Sat, Jun 7, 2014 at 10:50 PM, S.L  wrote:

> Thanks, Meraj, that was exactly the issue , setting
> true worked like a charm and the server
> starts up as usual.
>
> Thanks again!
>
>
> On Fri, Jun 6, 2014 at 2:42 PM, Meraj A. Khan  wrote:
>
> > This looks distinctly related to
> > https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher =
> true
> > as being suggested in JIRA and let us know .
> >
> >
> > On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon <
> > jean-sebastien.vac...@wantedanalytics.com> wrote:
> >
> > > I would try a thread dump and check the output to see what`s going on.
> > > You could also strace the process if you`re running on Unix or changed
> > the
> > > log level in Solr to get more information logged
> > >
> > > > -Original Message-
> > > > From: S.L [mailto:simpleliving...@gmail.com]
> > > > Sent: June-06-14 2:33 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Strange Behavior with Solr in Tomcat.
> > > >
> > > > Anyone folks?
> > > >
> > > >
> > > > On Wed, Jun 4, 2014 at 10:25 AM, S.L 
> > wrote:
> > > >
> > > > >  Hi Folks,
> > > > >
> > > > > I recently started using the spellchecker in my solrconfig.xml. I
> am
> > > > > able to build up an index in Solr.
> > > > >
> > > > > But,if I ever shutdown tomcat I am not able to restart it.The
> server
> > > > > never spits out the server startup time in seconds in the logs,nor
> > > > > does it print any error messages in the catalina.out file.
> > > > >
> > > > > The only way for me to get around this is by delete the data
> > directory
> > > > > of the index and then start the server,obviously this makes me
> loose
> > my
> > > > index.
> > > > >
> > > > > Just wondering if anyone faced a similar issue and if they were
> able
> > > > > to solve this.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > >
> > > > -
> > > > Aucun virus trouvé dans ce message.
> > > > Analyse effectuée par AVG - www.avg.fr
> > > > Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> > > > 27/05/2014 La Base de données des virus a expiré.
> > >
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Strange Behavior with Solr in Tomcat.

2014-06-07 Thread S.L
Thanks, Meraj, that was exactly the issue , setting
true worked like a charm and the server
starts up as usual.

Thanks again!


On Fri, Jun 6, 2014 at 2:42 PM, Meraj A. Khan  wrote:

> This looks distinctly related to
> https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher = true
> as being suggested in JIRA and let us know .
>
>
> On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon <
> jean-sebastien.vac...@wantedanalytics.com> wrote:
>
> > I would try a thread dump and check the output to see what`s going on.
> > You could also strace the process if you`re running on Unix or changed
> the
> > log level in Solr to get more information logged
> >
> > > -Original Message-
> > > From: S.L [mailto:simpleliving...@gmail.com]
> > > Sent: June-06-14 2:33 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Strange Behavior with Solr in Tomcat.
> > >
> > > Anyone folks?
> > >
> > >
> > > On Wed, Jun 4, 2014 at 10:25 AM, S.L 
> wrote:
> > >
> > > >  Hi Folks,
> > > >
> > > > I recently started using the spellchecker in my solrconfig.xml. I am
> > > > able to build up an index in Solr.
> > > >
> > > > But,if I ever shutdown tomcat I am not able to restart it.The server
> > > > never spits out the server startup time in seconds in the logs,nor
> > > > does it print any error messages in the catalina.out file.
> > > >
> > > > The only way for me to get around this is by delete the data
> directory
> > > > of the index and then start the server,obviously this makes me loose
> my
> > > index.
> > > >
> > > > Just wondering if anyone faced a similar issue and if they were able
> > > > to solve this.
> > > >
> > > > Thanks.
> > > >
> > > >
> > >
> > > -
> > > Aucun virus trouvé dans ce message.
> > > Analyse effectuée par AVG - www.avg.fr
> > > Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> > > 27/05/2014 La Base de données des virus a expiré.
> >
>


Re: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread Meraj A. Khan
This looks distinctly related to
https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher = true
as being suggested in JIRA and let us know .


On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon <
jean-sebastien.vac...@wantedanalytics.com> wrote:

> I would try a thread dump and check the output to see what`s going on.
> You could also strace the process if you`re running on Unix or changed the
> log level in Solr to get more information logged
>
> > -Original Message-
> > From: S.L [mailto:simpleliving...@gmail.com]
> > Sent: June-06-14 2:33 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Strange Behavior with Solr in Tomcat.
> >
> > Anyone folks?
> >
> >
> > On Wed, Jun 4, 2014 at 10:25 AM, S.L  wrote:
> >
> > >  Hi Folks,
> > >
> > > I recently started using the spellchecker in my solrconfig.xml. I am
> > > able to build up an index in Solr.
> > >
> > > But,if I ever shutdown tomcat I am not able to restart it.The server
> > > never spits out the server startup time in seconds in the logs,nor
> > > does it print any error messages in the catalina.out file.
> > >
> > > The only way for me to get around this is by delete the data directory
> > > of the index and then start the server,obviously this makes me loose my
> > index.
> > >
> > > Just wondering if anyone faced a similar issue and if they were able
> > > to solve this.
> > >
> > > Thanks.
> > >
> > >
> >
> > -
> > Aucun virus trouvé dans ce message.
> > Analyse effectuée par AVG - www.avg.fr
> > Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> > 27/05/2014 La Base de données des virus a expiré.
>


RE: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread Jean-Sebastien Vachon
I would try a thread dump and check the output to see what`s going on. 
You could also strace the process if you`re running on Unix or changed the log 
level in Solr to get more information logged

> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: June-06-14 2:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Strange Behavior with Solr in Tomcat.
> 
> Anyone folks?
> 
> 
> On Wed, Jun 4, 2014 at 10:25 AM, S.L  wrote:
> 
> >  Hi Folks,
> >
> > I recently started using the spellchecker in my solrconfig.xml. I am
> > able to build up an index in Solr.
> >
> > But,if I ever shutdown tomcat I am not able to restart it.The server
> > never spits out the server startup time in seconds in the logs,nor
> > does it print any error messages in the catalina.out file.
> >
> > The only way for me to get around this is by delete the data directory
> > of the index and then start the server,obviously this makes me loose my
> index.
> >
> > Just wondering if anyone faced a similar issue and if they were able
> > to solve this.
> >
> > Thanks.
> >
> >
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> 27/05/2014 La Base de données des virus a expiré.


Re: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread S.L
Anyone folks?


On Wed, Jun 4, 2014 at 10:25 AM, S.L  wrote:

>  Hi Folks,
>
> I recently started using the spellchecker in my solrconfig.xml. I am able
> to build up an index in Solr.
>
> But,if I ever shutdown tomcat I am not able to restart it.The server never
> spits out the server startup time in seconds in the logs,nor does it print
> any error messages in the catalina.out file.
>
> The only way for me to get around this is by delete the data directory of
> the index and then start the server,obviously this makes me loose my index.
>
> Just wondering if anyone faced a similar issue and if they were able to
> solve this.
>
> Thanks.
>
>


Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi,

This is not a case of accidental deletion , the only way I can restart the
tomcat is by deleting the data directory for the index that was created
earlier, this started happening after I started using spellcheckers in my
solrconfig.xml. As long as the Tomcat is running its fine.

Any help from anyone who faced a similar issues would be appreciated.

Thanks.



On Wed, Jun 4, 2014 at 11:08 AM, Aman Tandon  wrote:

> I guess if you try to copy the index and then kill the process of tomcat
> then it might help. If still the index need to be delete you would have the
> back up. Next time always make back up.
> On Jun 4, 2014 7:55 PM, "S.L"  wrote:
>
> > Hi Folks,
> >
> > I recently started using the spellchecker in my solrconfig.xml. I am able
> > to build up an index in Solr.
> >
> > But,if I ever shutdown tomcat I am not able to restart it.The server
> never
> > spits out the server startup time in seconds in the logs,nor does it
> print
> > any error messages in the catalina.out file.
> >
> > The only way for me to get around this is by delete the data directory of
> > the index and then start the server,obviously this makes me loose my
> index.
> >
> > Just wondering if anyone faced a similar issue and if they were able to
> > solve this.
> >
> > Thanks.
> >
> >
>


Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread Aman Tandon
I guess if you try to copy the index and then kill the process of tomcat
then it might help. If still the index need to be delete you would have the
back up. Next time always make back up.
On Jun 4, 2014 7:55 PM, "S.L"  wrote:

> Hi Folks,
>
> I recently started using the spellchecker in my solrconfig.xml. I am able
> to build up an index in Solr.
>
> But,if I ever shutdown tomcat I am not able to restart it.The server never
> spits out the server startup time in seconds in the logs,nor does it print
> any error messages in the catalina.out file.
>
> The only way for me to get around this is by delete the data directory of
> the index and then start the server,obviously this makes me loose my index.
>
> Just wondering if anyone faced a similar issue and if they were able to
> solve this.
>
> Thanks.
>
>


Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi Folks,

I recently started using the spellchecker in my solrconfig.xml. I am able to 
build up an index in Solr.

But,if I ever shutdown tomcat I am not able to restart it.The server never 
spits out the server startup time in seconds in the logs,nor does it print any 
error messages in the catalina.out file.

The only way for me to get around this is by delete the data directory of the 
index and then start the server,obviously this makes me loose my index.

Just wondering if anyone faced a similar issue and if they were able to solve 
this.

Thanks.



Re: Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-06 Thread Nils Kaiser
Actually I found why... I had and as lowercase word in my queries at the
checkbox does not seem to work in the admin UI.
adding lowercaseOperators=false made the queries work.


2014-04-04 18:10 GMT+02:00 Nils Kaiser :

> Hey,
>
> I am currently using solr to recognize songs and people from a list of
> user comments. My index stores the titles of the songs. At the moment my
> application builds word ngrams and fires a search with that query, which
> works well but is quite inefficient.
>
> So my thought was to simply use the collated comments as query. So it is a
> case where the query is much longer. I need to use mm=0 or mm=1.
>
> My plan was to use edismax as the pf2 and pf3 parameters should work well
> for my usecase.
>
> However when using longer queries, I get a strange behavior which can be
> seen in debugQuery.
>
> Here is an example:
>
> Collated Comments (used as query)
>
> "I love Henry so much. It is hard to tear your eyes away from Maria, but
> watch just his feet. You'll be amazed.
> sometimes pure skill can will a comp, sometimes pure joy can win... put
> them both together and there is no competition
> This video clip makes me smile.
> Pure joy!
> so good!
> Who's the person that gave this a thumbs down?!? This is one of the best
> routines I've ever seen. Period. And it's a competitionl! How is that
> possible? They're so good it boggles my mind.
> It's gorgeous. Flawless victory.
> Great number! Does anybody know the name of the piece?
> I believe it's called Sunny side of the street
> Maria is like, the best 'follow' I've ever seen. She's so amazing.
> Thanks so much Johnathan!"
>
> Song name in Index
> Louis Armstrong - Sunny Side of The Street
>
> parsedquery_toString:
> +(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It)
> (text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes)
> (text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just)
> (text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.)
> (text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a)
> (text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can)
> (text:win...) (text:put) (text:them) (text:both) +(text:together)
> +(text:there) (text:is) (text:no) (text:competition) (text:This)
> (text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure)
> (text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person)
> (text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?)
> (text:This) (text:is) (text:one) (text:of) (text:the) (text:best)
> (text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.)
> +(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that)
> (text:possible?) (text:They're) (text:so) (text:good) (text:it)
> (text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.)
> (text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does)
> (text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the)
> (text:piece?) (text:I) (text:believe) (text:it's) (text:called)
> (text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria)
> (text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've)
> (text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.)
> (text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)
>
> This query generates 0 results. The reason is it expects terms together,
> there, Period., it's to be part of the document (see parsedquery above, all
> other terms are optional, those terms are must).
>
> Is there any reason for this behavior? If I use shorter queries it works
> flawlessly and returns the document.
>
> I've appended the whole query.
>
> Best,
>
> Nils
>


Re: Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-05 Thread Jack Krupansky
Set the q.op parameter to OR and set mm=10% or something like that. The idea is 
to not excessively restrict the documents that will match, but weight the 
matched results based on how many word pairs and triples do match.

In addition, use the pf parameter to provide extra weight when the full query 
term phrase matches exactly.

-- Jack Krupansky

From: Nils Kaiser 
Sent: Friday, April 4, 2014 10:10 AM
To: solr-user@lucene.apache.org 
Subject: Strange behavior of edismax and mm=0 with long queries (bug?)

Hey, 

I am currently using solr to recognize songs and people from a list of user 
comments. My index stores the titles of the songs. At the moment my application 
builds word ngrams and fires a search with that query, which works well but is 
quite inefficient.

So my thought was to simply use the collated comments as query. So it is a case 
where the query is much longer. I need to use mm=0 or mm=1.

My plan was to use edismax as the pf2 and pf3 parameters should work well for 
my usecase.

However when using longer queries, I get a strange behavior which can be seen 
in debugQuery.

Here is an example:

Collated Comments (used as query)

"I love Henry so much. It is hard to tear your eyes away from Maria, but watch 
just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them 
both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best 
routines I've ever seen. Period. And it's a competitionl! How is that possible? 
They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!"

Song name in Index
Louis Armstrong - Sunny Side of The Street

parsedquery_toString:
+(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It) (text:is) 
(text:hard) (text:to) (text:tear) (text:your) (text:eyes) (text:away) 
(text:from) (text:Maria,) (text:but) (text:watch) (text:just) (text:his) 
(text:feet.) (text:You'll) (text:be) (text:amazed.) (text:sometimes) 
(text:pure) (text:skill) (text:can) (text:will) (text:a) (text:comp,) 
(text:sometimes) (text:pure) (text:joy) (text:can) (text:win...) (text:put) 
(text:them) (text:both) +(text:together) +(text:there) (text:is) (text:no) 
(text:competition) (text:This) (text:video) (text:clip) (text:makes) (text:me) 
(text:smile.) (text:Pure) (text:joy!) (text:so) (text:good!) (text:Who's) 
(text:the) (text:person) (text:that) (text:gave) (text:this) (text:a) 
(text:thumbs) (text:down?!?) (text:This) (text:is) (text:one) (text:of) 
(text:the) (text:best) (text:routines) (text:I've) (text:ever) (text:seen.) 
+(text:Period.) +(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) 
(text:that) (text:possible?) (text:They're) (text:so) (text:good) (text:it) 
(text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.) 
(text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does) 
(text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the) 
(text:piece?) (text:I) (text:believe) (text:it's) (text:called) (text:Sunny) 
(text:side) (text:of) (text:the) (text:street) (text:Maria) (text:is) 
(text:like,) (text:the) (text:best) (text:'follow') (text:I've) (text:ever) 
(text:seen.) (text:She's) (text:so) (text:amazing.) (text:Thanks) (text:so) 
(text:much) (text:Johnathan!))~1)
 
This query generates 0 results. The reason is it expects terms together, there, 
Period., it's to be part of the document (see parsedquery above, all other 
terms are optional, those terms are must).

Is there any reason for this behavior? If I use shorter queries it works 
flawlessly and returns the document.

I've appended the whole query.

Best,

Nils

Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-04 Thread Nils Kaiser
Hey,

I am currently using solr to recognize songs and people from a list of user
comments. My index stores the titles of the songs. At the moment my
application builds word ngrams and fires a search with that query, which
works well but is quite inefficient.

So my thought was to simply use the collated comments as query. So it is a
case where the query is much longer. I need to use mm=0 or mm=1.

My plan was to use edismax as the pf2 and pf3 parameters should work well
for my usecase.

However when using longer queries, I get a strange behavior which can be
seen in debugQuery.

Here is an example:

Collated Comments (used as query)

"I love Henry so much. It is hard to tear your eyes away from Maria, but
watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put
them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best
routines I've ever seen. Period. And it's a competitionl! How is that
possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!"

Song name in Index
Louis Armstrong - Sunny Side of The Street

parsedquery_toString:
+(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It)
(text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes)
(text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just)
(text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.)
(text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a)
(text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can)
(text:win...) (text:put) (text:them) (text:both) +(text:together)
+(text:there) (text:is) (text:no) (text:competition) (text:This)
(text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure)
(text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person)
(text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?)
(text:This) (text:is) (text:one) (text:of) (text:the) (text:best)
(text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.)
+(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that)
(text:possible?) (text:They're) (text:so) (text:good) (text:it)
(text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.)
(text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does)
(text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the)
(text:piece?) (text:I) (text:believe) (text:it's) (text:called)
(text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria)
(text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've)
(text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.)
(text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)

This query generates 0 results. The reason is it expects terms together,
there, Period., it's to be part of the document (see parsedquery above, all
other terms are optional, those terms are must).

Is there any reason for this behavior? If I use shorter queries it works
flawlessly and returns the document.

I've appended the whole query.

Best,

Nils




  0
  11




  I love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

  I love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She&#x

Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky

So, how big is the discrepancy?

If you do a *:* query for rows=100, is the 100th result the same for both?

Do a bunch of random queries and see if you can find a document key that is 
missing from one core, but present in the other, and check if it should have 
been deleted.


Are you deleting by "id" or by "query"?

Do you do an explicit commit on your update request? If not, it could just 
take a few minutes before the commit actually occurs.


Are the two Solr servers on the same machine or different machines? If the 
latter, is one of the machines significantly faster than the other.


-- Jack Krupansky

-Original Message- 
From: abhishek.netj...@gmail.com

Sent: Monday, March 31, 2014 5:48 AM
To: solr-user@lucene.apache.org ; solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Hi,
These settings are commented in schema. These are two different solr severs 
and almost identical schema ‎with the exception of one stemmed field.


Same solr versions are running.
Please help.

Thanks
Abhishek

 Original Message
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain

Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

--
Thanks and kind Regards,
Abhishek 



Re: Strange behavior while deleting

2014-03-31 Thread abhishek . netjain
Hi,
These settings are commented in schema. These are two different solr severs and 
almost identical schema ‎with the exception of one stemmed field.

Same solr versions are running.
Please help.

Thanks 
Abhishek

  Original Message  
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain
Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek 



Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky
Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?


Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?


How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain

Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

--
Thanks and kind Regards,
Abhishek 



Strange behavior while deleting

2014-03-31 Thread abhishek jain
hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek


Strange behavior of gap fragmenter on highlighting

2013-11-13 Thread Ing. Jorge Luis Betancourt Gonzalez
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is 
my configuration for the gap fragmenter:

  

  150

  

This is the basic configuration, just tweaked the fragsize parameter to get 
shorter fragments. The thing is that for 1 particular PDF document in my 
results I get a really long snippet, way over 150 characters. This get a little 
more odd, if I change the 150 value for 100 the snippet for the same document 
it's normal ~ 100 characters. The type of the field being highlighted is this:















Any ideas about what's happening?? Or how could I debug what is really going 
on??

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Strange behavior on text field with number-text content

2013-05-29 Thread Erick Erickson
Hmmm, there are two things you _must_ get familiar with when diagnosing
these ..

1> admin/analysis. That'll show you exactly what the analysis chain does,
and it's
 not always obvious.
2> add &debug=query to your input and look at the parsed query results. For
instance,
 this "name:4nSolution Inc." parses as name:4nSolution defaultfield:inc.

That doesn't explain why name=4nSolutions, except..

your index chain has splitOnCaseChange=1 and your query bit has
splitOnCaseChange=0
which doesn't seem right

Best
Erick


On Tue, May 28, 2013 at 10:31 AM, Алексей Цой  wrote:

> solr-user-unsubscribe 
>
>
> 2013/5/28 Michał Matulka 
>
>>  Thanks for your responses, I must admit that after hours of trying I
>> made some mistakes.
>> So the most problematic phrase will now be:
>> "4nSolution Inc." which cannot be found using query:
>>
>> name:4nSolution
>>
>> or even
>>
>> name:4nSolution Inc.
>>
>> but can be using following queries:
>>
>> name:nSolution
>> name:4
>> name:inc
>>
>> Sorry for the mess, it turned out I didn't reindex fields after modyfying
>> schema so I thought that the problem also applies to 300letters .
>>
>> The cause of all of this is the WordDelimiter filter defined as following:
>>
>> 
>>   
>> 
>> 
>> 
>> > ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="true"
>> />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>> 
>> > language="English" protected="protwords.txt"/>
>>   
>>   
>> 
>> > ignoreCase="true" expand="true"/>
>> > ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="true"
>> />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
>> preserveOriginal="1" />
>> 
>> > language="English" protected="protwords.txt"/>
>>   
>> 
>>
>> and I still don't know why it behaves like that - after all there is
>> "preserveOriginal" attribute set to 1...
>>
>> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>>
>> Hmmm, with 4.x I get much different behavior than you're
>> describing, what version of Solr are you using?
>>
>> Besides Alex's comments, try adding &debug=query to the url and see what 
>> comes
>> out from the query parser.
>>
>> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
>> do
>> any analysis, here's the javadoc...
>>  /**
>>* Default analyzer for types that only produces 1 verbatim token...
>>* A maximum size of chars to be read must be specified
>>*/
>>
>> so it's much like the "string" type. Which means I'm totally perplexed by 
>> your
>> statement that 300 and letters return a hit. Have you perhaps changed the
>> field definition and not re-indexed?
>>
>> The behavior you're seeing really looks like somehow 
>> WordDelimiterFilterFactory
>> is getting into your analysis chain with settings that don't mash the parts 
>> back
>> together, i.e. you can set up WDDF to split on letter/number transitions, 
>> index
>> each and NOT index the original, but I have no explanation for how that
>> could happen with the field definition you indicated
>>
>> FWIW,
>> Erick
>>
>> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch 
>>  wrote:
>>
>>   What does analyzer screen say in the Web AdminUI when you try to do that?
>> Also, what are the tokens stored in the field (also in Web AdminUI).
>>
>> I think it is very strange to have TextField without a tokenizer chain.
>> Maybe you get a standard one assigned by default, but I don't know what the
>> standard chain would be.
>>
>> Regards,
>>
>>   Alex.
>> On 28 May 2013 04:44, "Michał Matulka"  
>>  wrote:
>>
>>
>>  Hello,
>>
>> I've got following problem. I have a text type in my schema and a field
>> "name" of that type.
>> That field contains a data, there is, for example, record that has
>> "300letters" as name.
>>
>> Now field type definition:
>> 
>>
>> And, of course, field definition:
>> 
>>
>> yes, that's all - there are no tokenizers.
>>
>> And now time for my question:
>>
>> Why following queries:
>>
>> name:300
>>
>> and
>>
>> name:letters
>>
>> are returning that result, but:
>>
>> name:300letters
>>
>> is not (0 results)?
>>
>> Best regards,
>> Michał Matulka
>>
>>
>>
>>
>> --
>>  Pozdrawiam,
>> Michał Matulka
>>  Programista
>>  michal.matu...@gowork.pl
>>
>>
>>  *[image: GoWork.pl]*
>>  ul. Zielna 39
>>  00-108 Warszawa
>>  www.GoWork.pl
>>
>
>


Re: Distributed query: strange behavior.

2013-05-28 Thread Valery Giner

Eric,

Thank you for the explanation.

My problem was that allowing the docs with the same unique ids  to be 
present in the multiple shards in a "normal" situation,
makes it impossible to estimate the number of shards needed for an index 
with a "really large" number of docs.


Thanks,
Val

On 05/26/2013 11:16 AM, Erick Erickson wrote:

Valery:

I share your puzzlement. _If_ you are letting Solr do the document
routing, and not doing any of the custom routing, then the same unique
key should be going to the same shard and replacing the previous doc
with that key.

But, if you're using custom routing, if you've been experimenting with
different configurations and didn't start over, in general if you're
configuration is in an "interesting" state this could happen.

So in the normal case if you have a document with the same key indexed
in multiple shards, that would indicate a bug. But there are many
ways, especially when experimenting, that you could have this happen
which are _not_ a bug. I'm guessing that Luis may be trying the custom
routing option maybe?

Best
Erick

On Fri, May 24, 2013 at 9:09 AM, Valery Giner  wrote:

Shawn,

How is it possible for more than one document with the same unique key to
appear in the index, even in different shards?
Isn't it a bug by definition?
What am I missing here?

Thanks,
Val


On 05/23/2013 09:55 AM, Shawn Heisey wrote:

On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:

I've query each Solr shard server one by one and the total number of
documents is correct. However, when I change rows parameter from 10 to
100
the total numFound of documents change:

I've seen this problem on the list before and the cause has been
determined each time to be caused by documents with the same uniqueKey
value appearing in more than one shard.

What I think happens here:

With rows=10, you get the top ten docs from each of the three shards,
and each shard sends its numFound for that query to the core that's
coordinating the search.  The coordinator adds up numFound, looks
through those thirty docs, and arranges them according to the requested
sort order, returning only the top 10.  In this case, there happen to be
no duplicates.

With rows=100, you get a total of 300 docs.  This time, duplicates are
found and removed by the coordinator.  I think that the coordinator
adjusts the total numFound by the number of duplicate documents it
removed, in an attempt to be more accurate.

I don't know if adjusting numFound when duplicates are found in a
sharded query is the right thing to do, I'll leave that for smarter
people.  Perhaps Solr should return a message with the results saying
that duplicates were found, and if a config option is not enabled, the
server should throw an exception and return a 4xx HTTP error code.  One
idea for a config parameter name would be allowShardDuplicates, but
something better can probably be found.

Thanks,
Shawn





Re: Strange behavior on text field with number-text content

2013-05-28 Thread Алексей Цой
solr-user-unsubscribe 


2013/5/28 Michał Matulka 

>  Thanks for your responses, I must admit that after hours of trying I
> made some mistakes.
> So the most problematic phrase will now be:
> "4nSolution Inc." which cannot be found using query:
>
> name:4nSolution
>
> or even
>
> name:4nSolution Inc.
>
> but can be using following queries:
>
> name:nSolution
> name:4
> name:inc
>
> Sorry for the mess, it turned out I didn't reindex fields after modyfying
> schema so I thought that the problem also applies to 300letters .
>
> The cause of all of this is the WordDelimiter filter defined as following:
>
> 
>   
> 
> 
> 
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
> 
>  language="English" protected="protwords.txt"/>
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1" />
> 
>  language="English" protected="protwords.txt"/>
>   
> 
>
> and I still don't know why it behaves like that - after all there is
> "preserveOriginal" attribute set to 1...
>
> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>
> Hmmm, with 4.x I get much different behavior than you're
> describing, what version of Solr are you using?
>
> Besides Alex's comments, try adding &debug=query to the url and see what comes
> out from the query parser.
>
> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
> do
> any analysis, here's the javadoc...
>  /**
>* Default analyzer for types that only produces 1 verbatim token...
>* A maximum size of chars to be read must be specified
>*/
>
> so it's much like the "string" type. Which means I'm totally perplexed by your
> statement that 300 and letters return a hit. Have you perhaps changed the
> field definition and not re-indexed?
>
> The behavior you're seeing really looks like somehow 
> WordDelimiterFilterFactory
> is getting into your analysis chain with settings that don't mash the parts 
> back
> together, i.e. you can set up WDDF to split on letter/number transitions, 
> index
> each and NOT index the original, but I have no explanation for how that
> could happen with the field definition you indicated
>
> FWIW,
> Erick
>
> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch 
>  wrote:
>
>   What does analyzer screen say in the Web AdminUI when you try to do that?
> Also, what are the tokens stored in the field (also in Web AdminUI).
>
> I think it is very strange to have TextField without a tokenizer chain.
> Maybe you get a standard one assigned by default, but I don't know what the
> standard chain would be.
>
> Regards,
>
>   Alex.
> On 28 May 2013 04:44, "Michał Matulka"  
>  wrote:
>
>
>  Hello,
>
> I've got following problem. I have a text type in my schema and a field
> "name" of that type.
> That field contains a data, there is, for example, record that has
> "300letters" as name.
>
> Now field type definition:
> 
>
> And, of course, field definition:
> 
>
> yes, that's all - there are no tokenizers.
>
> And now time for my question:
>
> Why following queries:
>
> name:300
>
> and
>
> name:letters
>
> are returning that result, but:
>
> name:300letters
>
> is not (0 results)?
>
> Best regards,
> Michał Matulka
>
>
>
>
> --
>  Pozdrawiam,
> Michał Matulka
>  Programista
>  michal.matu...@gowork.pl
>
>
>  *[image: GoWork.pl]*
>  ul. Zielna 39
>  00-108 Warszawa
>  www.GoWork.pl
>


Re: Strange behavior on text field with number-text content

2013-05-28 Thread Michał Matulka

  
  
Thanks for your responses, I must admit
  that after hours of trying I made some mistakes.
  So the most problematic phrase will now be:
  "4nSolution Inc." which cannot be found using query:
  
  name:4nSolution
  
  or even
  
  name:4nSolution Inc.
  
  but can be using following queries:
  
  name:nSolution
  name:4
  name:inc
  
  Sorry for the mess, it turned out I didn't reindex fields after
  modyfying schema so I thought that the problem also applies to
  300letters .
  
  The cause of all of this is the WordDelimiter filter defined as
  following:
  
  
    
      
      
      
      
      ignoreCase="true"
      words="stopwords.txt"
      enablePositionIncrements="true"
      />
      
      
      
    
    
      
      
      
      ignoreCase="true"
      words="stopwords.txt"
      enablePositionIncrements="true"
      />
      
      
      
    
      
  
  and I still don't know why it behaves like that - after all there
  is "preserveOriginal" attribute set to 1...
  
  W dniu 28.05.2013 14:21, Erick Erickson pisze:


  Hmmm, with 4.x I get much different behavior than you're
describing, what version of Solr are you using?

Besides Alex's comments, try adding &debug=query to the url and see what comes
out from the query parser.

A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
any analysis, here's the javadoc...
 /**
   * Default analyzer for types that only produces 1 verbatim token...
   * A maximum size of chars to be read must be specified
   */

so it's much like the "string" type. Which means I'm totally perplexed by your
statement that 300 and letters return a hit. Have you perhaps changed the
field definition and not re-indexed?

The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
is getting into your analysis chain with settings that don't mash the parts back
together, i.e. you can set up WDDF to split on letter/number transitions, index
each and NOT index the original, but I have no explanation for how that
could happen with the field definition you indicated

FWIW,
Erick

On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch
 wrote:

  
 What does analyzer screen say in the Web AdminUI when you try to do that?
Also, what are the tokens stored in the field (also in Web AdminUI).

I think it is very strange to have TextField without a tokenizer chain.
Maybe you get a standard one assigned by default, but I don't know what the
standard chain would be.

Regards,

  Alex.
On 28 May 2013 04:44, "Michał Matulka"  wrote:



  Hello,

I've got following problem. I have a text type in my schema and a field
"name" of that type.
That field contains a data, there is, for example, record that has
"300letters" as name.

Now field type definition:


And, of course, field definition:


yes, that's all - there are no tokenizers.

And now time for my question:

Why following queries:

name:300

and

name:letters

are returning that result, but:

name:300letters

is not (0 results)?

Best regards,
Michał Matulka



  
  




-- 
  
 Pozdrawiam,
  Michał Matulka    
 Programista
 michal.matu...@gowork.pl
  

  
 
 ul. Zielna 39
 00-108 Warszawa
 www.GoWork.pl
  

  



Re: Strange behavior on text field with number-text content

2013-05-28 Thread Erick Erickson
Hmmm, with 4.x I get much different behavior than you're
describing, what version of Solr are you using?

Besides Alex's comments, try adding &debug=query to the url and see what comes
out from the query parser.

A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
any analysis, here's the javadoc...
 /**
   * Default analyzer for types that only produces 1 verbatim token...
   * A maximum size of chars to be read must be specified
   */

so it's much like the "string" type. Which means I'm totally perplexed by your
statement that 300 and letters return a hit. Have you perhaps changed the
field definition and not re-indexed?

The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
is getting into your analysis chain with settings that don't mash the parts back
together, i.e. you can set up WDDF to split on letter/number transitions, index
each and NOT index the original, but I have no explanation for how that
could happen with the field definition you indicated

FWIW,
Erick

On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch
 wrote:
>  What does analyzer screen say in the Web AdminUI when you try to do that?
> Also, what are the tokens stored in the field (also in Web AdminUI).
>
> I think it is very strange to have TextField without a tokenizer chain.
> Maybe you get a standard one assigned by default, but I don't know what the
> standard chain would be.
>
> Regards,
>
>   Alex.
> On 28 May 2013 04:44, "Michał Matulka"  wrote:
>
>> Hello,
>>
>> I've got following problem. I have a text type in my schema and a field
>> "name" of that type.
>> That field contains a data, there is, for example, record that has
>> "300letters" as name.
>>
>> Now field type definition:
>> 
>>
>> And, of course, field definition:
>> 
>>
>> yes, that's all - there are no tokenizers.
>>
>> And now time for my question:
>>
>> Why following queries:
>>
>> name:300
>>
>> and
>>
>> name:letters
>>
>> are returning that result, but:
>>
>> name:300letters
>>
>> is not (0 results)?
>>
>> Best regards,
>> Michał Matulka
>>


Re: Strange behavior on text field with number-text content

2013-05-28 Thread Alexandre Rafalovitch
 What does analyzer screen say in the Web AdminUI when you try to do that?
Also, what are the tokens stored in the field (also in Web AdminUI).

I think it is very strange to have TextField without a tokenizer chain.
Maybe you get a standard one assigned by default, but I don't know what the
standard chain would be.

Regards,

  Alex.
On 28 May 2013 04:44, "Michał Matulka"  wrote:

> Hello,
>
> I've got following problem. I have a text type in my schema and a field
> "name" of that type.
> That field contains a data, there is, for example, record that has
> "300letters" as name.
>
> Now field type definition:
> 
>
> And, of course, field definition:
> 
>
> yes, that's all - there are no tokenizers.
>
> And now time for my question:
>
> Why following queries:
>
> name:300
>
> and
>
> name:letters
>
> are returning that result, but:
>
> name:300letters
>
> is not (0 results)?
>
> Best regards,
> Michał Matulka
>


Strange behavior on text field with number-text content

2013-05-28 Thread Michał Matulka

Hello,

I've got following problem. I have a text type in my schema and a field 
"name" of that type.
That field contains a data, there is, for example, record that has 
"300letters" as name.


Now field type definition:


And, of course, field definition:


yes, that's all - there are no tokenizers.

And now time for my question:

Why following queries:

name:300

and

name:letters

are returning that result, but:

name:300letters

is not (0 results)?

Best regards,
Michał Matulka


Re: Distributed query: strange behavior.

2013-05-27 Thread Luis Cappa Banda
Hello, guys!

Well, I've done some tests and I think that there exists some kind of bug
related with distributed search. Currently I'm setting a key field that
it's impossible to be duplicated, and I have experienced the same wrong
behavior with numFound field while changing rows parameter. Has anyone
experienced the same?

Best regards,

- Luis Cappa


2013/5/27 Luis Cappa Banda 

> Hi, Erick!
>
> That's it! I'm using a custom implementation of a SolrServer with
> distributed behavior that routes queries and updates using an in-house
> Round Robin method. But the thing is that I'm doing this myself because
> I've noticed that duplicated documents appears using LBHttpSolrServer
> implementation. Last week I modified my implementation to avoid that with
> this changes:
>
>
>- I have normalized the key field to all documents. Now every document
>indexed must include *_id_* field that stores the selected key value.
>The value is setted with a *copyField*.
>- When I index a new document a *HttpSolrServer* from the shard list
>is selected using a Round Robin strategy. Then, a field called *_shard_
>* is setted to *SolrInputDocument*. That field value includes a
>relationship with the main shard selected.
>- If a document wants to be indexed/updated and it includes *_shard_*field 
> to update it automatically the belonged shard (
>*HttpSolrServer*) is selected.
>- If a document wants to be indexed/updated and *_shard_* field is not
>included then the key value from *_id_* is getted from *
>SolrInputDocument*. With that key a distributed search query is
>executed by it's key to retrieve *_shard_* field. With *_shard_* field
>we can now choose the correct shard (*HttpSolrServer*). It's not a
>good practice and performance isn't the best, but it's secure.
>
> Best Regards,
>
> - Luis Cappa
>
>
> 2013/5/26 Erick Erickson 
>
>> Valery:
>>
>> I share your puzzlement. _If_ you are letting Solr do the document
>> routing, and not doing any of the custom routing, then the same unique
>> key should be going to the same shard and replacing the previous doc
>> with that key.
>>
>> But, if you're using custom routing, if you've been experimenting with
>> different configurations and didn't start over, in general if you're
>> configuration is in an "interesting" state this could happen.
>>
>> So in the normal case if you have a document with the same key indexed
>> in multiple shards, that would indicate a bug. But there are many
>> ways, especially when experimenting, that you could have this happen
>> which are _not_ a bug. I'm guessing that Luis may be trying the custom
>> routing option maybe?
>>
>> Best
>> Erick
>>
>> On Fri, May 24, 2013 at 9:09 AM, Valery Giner 
>> wrote:
>> > Shawn,
>> >
>> > How is it possible for more than one document with the same unique key
>> to
>> > appear in the index, even in different shards?
>> > Isn't it a bug by definition?
>> > What am I missing here?
>> >
>> > Thanks,
>> > Val
>> >
>> >
>> > On 05/23/2013 09:55 AM, Shawn Heisey wrote:
>> >>
>> >> On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
>> >>>
>> >>> I've query each Solr shard server one by one and the total number of
>> >>> documents is correct. However, when I change rows parameter from 10 to
>> >>> 100
>> >>> the total numFound of documents change:
>> >>
>> >> I've seen this problem on the list before and the cause has been
>> >> determined each time to be caused by documents with the same uniqueKey
>> >> value appearing in more than one shard.
>> >>
>> >> What I think happens here:
>> >>
>> >> With rows=10, you get the top ten docs from each of the three shards,
>> >> and each shard sends its numFound for that query to the core that's
>> >> coordinating the search.  The coordinator adds up numFound, looks
>> >> through those thirty docs, and arranges them according to the requested
>> >> sort order, returning only the top 10.  In this case, there happen to
>> be
>> >> no duplicates.
>> >>
>> >> With rows=100, you get a total of 300 docs.  This time, duplicates are
>> >> found and removed by the coordinator.  I think that the coordinator
>> >> adjusts the total numFound by the number of duplicate documents it
>> >> removed, in an attempt to be more accurate.
>> >>
>> >> I don't know if adjusting numFound when duplicates are found in a
>> >> sharded query is the right thing to do, I'll leave that for smarter
>> >> people.  Perhaps Solr should return a message with the results saying
>> >> that duplicates were found, and if a config option is not enabled, the
>> >> server should throw an exception and return a 4xx HTTP error code.  One
>> >> idea for a config parameter name would be allowShardDuplicates, but
>> >> something better can probably be found.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >
>>
>
>
>
> --
> - Luis Cappa
>



-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-26 Thread Luis Cappa Banda
Hi, Erick!

That's it! I'm using a custom implementation of a SolrServer with
distributed behavior that routes queries and updates using an in-house
Round Robin method. But the thing is that I'm doing this myself because
I've noticed that duplicated documents appears using LBHttpSolrServer
implementation. Last week I modified my implementation to avoid that with
this changes:


   - I have normalized the key field to all documents. Now every document
   indexed must include *_id_* field that stores the selected key value.
   The value is setted with a *copyField*.
   - When I index a new document a *HttpSolrServer* from the shard list is
   selected using a Round Robin strategy. Then, a field called *_shard_* is
   setted to *SolrInputDocument*. That field value includes a relationship
   with the main shard selected.
   - If a document wants to be indexed/updated and it includes
*_shard_*field to update it automatically the belonged shard (
   *HttpSolrServer*) is selected.
   - If a document wants to be indexed/updated and *_shard_* field is not
   included then the key value from *_id_* is getted from *SolrInputDocument
   *. With that key a distributed search query is executed by it's key to
   retrieve *_shard_* field. With *_shard_* field we can now choose the
   correct shard (*HttpSolrServer*). It's not a good practice and
   performance isn't the best, but it's secure.

Best Regards,

- Luis Cappa


2013/5/26 Erick Erickson 

> Valery:
>
> I share your puzzlement. _If_ you are letting Solr do the document
> routing, and not doing any of the custom routing, then the same unique
> key should be going to the same shard and replacing the previous doc
> with that key.
>
> But, if you're using custom routing, if you've been experimenting with
> different configurations and didn't start over, in general if you're
> configuration is in an "interesting" state this could happen.
>
> So in the normal case if you have a document with the same key indexed
> in multiple shards, that would indicate a bug. But there are many
> ways, especially when experimenting, that you could have this happen
> which are _not_ a bug. I'm guessing that Luis may be trying the custom
> routing option maybe?
>
> Best
> Erick
>
> On Fri, May 24, 2013 at 9:09 AM, Valery Giner 
> wrote:
> > Shawn,
> >
> > How is it possible for more than one document with the same unique key to
> > appear in the index, even in different shards?
> > Isn't it a bug by definition?
> > What am I missing here?
> >
> > Thanks,
> > Val
> >
> >
> > On 05/23/2013 09:55 AM, Shawn Heisey wrote:
> >>
> >> On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
> >>>
> >>> I've query each Solr shard server one by one and the total number of
> >>> documents is correct. However, when I change rows parameter from 10 to
> >>> 100
> >>> the total numFound of documents change:
> >>
> >> I've seen this problem on the list before and the cause has been
> >> determined each time to be caused by documents with the same uniqueKey
> >> value appearing in more than one shard.
> >>
> >> What I think happens here:
> >>
> >> With rows=10, you get the top ten docs from each of the three shards,
> >> and each shard sends its numFound for that query to the core that's
> >> coordinating the search.  The coordinator adds up numFound, looks
> >> through those thirty docs, and arranges them according to the requested
> >> sort order, returning only the top 10.  In this case, there happen to be
> >> no duplicates.
> >>
> >> With rows=100, you get a total of 300 docs.  This time, duplicates are
> >> found and removed by the coordinator.  I think that the coordinator
> >> adjusts the total numFound by the number of duplicate documents it
> >> removed, in an attempt to be more accurate.
> >>
> >> I don't know if adjusting numFound when duplicates are found in a
> >> sharded query is the right thing to do, I'll leave that for smarter
> >> people.  Perhaps Solr should return a message with the results saying
> >> that duplicates were found, and if a config option is not enabled, the
> >> server should throw an exception and return a 4xx HTTP error code.  One
> >> idea for a config parameter name would be allowShardDuplicates, but
> >> something better can probably be found.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>



-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-26 Thread Erick Erickson
Valery:

I share your puzzlement. _If_ you are letting Solr do the document
routing, and not doing any of the custom routing, then the same unique
key should be going to the same shard and replacing the previous doc
with that key.

But, if you're using custom routing, if you've been experimenting with
different configurations and didn't start over, in general if you're
configuration is in an "interesting" state this could happen.

So in the normal case if you have a document with the same key indexed
in multiple shards, that would indicate a bug. But there are many
ways, especially when experimenting, that you could have this happen
which are _not_ a bug. I'm guessing that Luis may be trying the custom
routing option maybe?

Best
Erick

On Fri, May 24, 2013 at 9:09 AM, Valery Giner  wrote:
> Shawn,
>
> How is it possible for more than one document with the same unique key to
> appear in the index, even in different shards?
> Isn't it a bug by definition?
> What am I missing here?
>
> Thanks,
> Val
>
>
> On 05/23/2013 09:55 AM, Shawn Heisey wrote:
>>
>> On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
>>>
>>> I've query each Solr shard server one by one and the total number of
>>> documents is correct. However, when I change rows parameter from 10 to
>>> 100
>>> the total numFound of documents change:
>>
>> I've seen this problem on the list before and the cause has been
>> determined each time to be caused by documents with the same uniqueKey
>> value appearing in more than one shard.
>>
>> What I think happens here:
>>
>> With rows=10, you get the top ten docs from each of the three shards,
>> and each shard sends its numFound for that query to the core that's
>> coordinating the search.  The coordinator adds up numFound, looks
>> through those thirty docs, and arranges them according to the requested
>> sort order, returning only the top 10.  In this case, there happen to be
>> no duplicates.
>>
>> With rows=100, you get a total of 300 docs.  This time, duplicates are
>> found and removed by the coordinator.  I think that the coordinator
>> adjusts the total numFound by the number of duplicate documents it
>> removed, in an attempt to be more accurate.
>>
>> I don't know if adjusting numFound when duplicates are found in a
>> sharded query is the right thing to do, I'll leave that for smarter
>> people.  Perhaps Solr should return a message with the results saying
>> that duplicates were found, and if a config option is not enabled, the
>> server should throw an exception and return a 4xx HTTP error code.  One
>> idea for a config parameter name would be allowShardDuplicates, but
>> something better can probably be found.
>>
>> Thanks,
>> Shawn
>>
>


Re: Distributed query: strange behavior.

2013-05-24 Thread Shalin Shekhar Mangar
The uniqueKey is enforced within the same shard/index only.


On Fri, May 24, 2013 at 6:39 PM, Valery Giner wrote:

> Shawn,
>
> How is it possible for more than one document with the same unique key to
> appear in the index, even in different shards?
> Isn't it a bug by definition?
> What am I missing here?
>
> Thanks,
> Val
>
>
> On 05/23/2013 09:55 AM, Shawn Heisey wrote:
>
>> On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
>>
>>> I've query each Solr shard server one by one and the total number of
>>> documents is correct. However, when I change rows parameter from 10 to
>>> 100
>>> the total numFound of documents change:
>>>
>> I've seen this problem on the list before and the cause has been
>> determined each time to be caused by documents with the same uniqueKey
>> value appearing in more than one shard.
>>
>> What I think happens here:
>>
>> With rows=10, you get the top ten docs from each of the three shards,
>> and each shard sends its numFound for that query to the core that's
>> coordinating the search.  The coordinator adds up numFound, looks
>> through those thirty docs, and arranges them according to the requested
>> sort order, returning only the top 10.  In this case, there happen to be
>> no duplicates.
>>
>> With rows=100, you get a total of 300 docs.  This time, duplicates are
>> found and removed by the coordinator.  I think that the coordinator
>> adjusts the total numFound by the number of duplicate documents it
>> removed, in an attempt to be more accurate.
>>
>> I don't know if adjusting numFound when duplicates are found in a
>> sharded query is the right thing to do, I'll leave that for smarter
>> people.  Perhaps Solr should return a message with the results saying
>> that duplicates were found, and if a config option is not enabled, the
>> server should throw an exception and return a 4xx HTTP error code.  One
>> idea for a config parameter name would be allowShardDuplicates, but
>> something better can probably be found.
>>
>> Thanks,
>> Shawn
>>
>>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Distributed query: strange behavior.

2013-05-24 Thread Valery Giner

Shawn,

How is it possible for more than one document with the same unique key 
to appear in the index, even in different shards?

Isn't it a bug by definition?
What am I missing here?

Thanks,
Val

On 05/23/2013 09:55 AM, Shawn Heisey wrote:

On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:

I've query each Solr shard server one by one and the total number of
documents is correct. However, when I change rows parameter from 10 to 100
the total numFound of documents change:

I've seen this problem on the list before and the cause has been
determined each time to be caused by documents with the same uniqueKey
value appearing in more than one shard.

What I think happens here:

With rows=10, you get the top ten docs from each of the three shards,
and each shard sends its numFound for that query to the core that's
coordinating the search.  The coordinator adds up numFound, looks
through those thirty docs, and arranges them according to the requested
sort order, returning only the top 10.  In this case, there happen to be
no duplicates.

With rows=100, you get a total of 300 docs.  This time, duplicates are
found and removed by the coordinator.  I think that the coordinator
adjusts the total numFound by the number of duplicate documents it
removed, in an attempt to be more accurate.

I don't know if adjusting numFound when duplicates are found in a
sharded query is the right thing to do, I'll leave that for smarter
people.  Perhaps Solr should return a message with the results saying
that duplicates were found, and if a config option is not enabled, the
server should throw an exception and return a 4xx HTTP error code.  One
idea for a config parameter name would be allowShardDuplicates, but
something better can probably be found.

Thanks,
Shawn





Re: Distributed query: strange behavior.

2013-05-24 Thread Luis Cappa Banda
Uhm... that sounds reasonable. My data model may allow duplicate keys, but
it's quite difficult. My key is a hash formed by an URL during a crawling
process, and it's posible to re-crawl an existing URL. I think that I need
to find a new way to compose an unique key to avoid this kind of bad
behavior. However, that would be very useful if can Solr alert about
duplicate keys or something. Maybe an extra parameter included as a field
in the response plus numFound, docs, facets, etc. would be nice. Thank you
very much!

Best regards,

- Luis Cappa


2013/5/23 Shawn Heisey 

> On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
> > I've query each Solr shard server one by one and the total number of
> > documents is correct. However, when I change rows parameter from 10 to
> 100
> > the total numFound of documents change:
>
> I've seen this problem on the list before and the cause has been
> determined each time to be caused by documents with the same uniqueKey
> value appearing in more than one shard.
>
> What I think happens here:
>
> With rows=10, you get the top ten docs from each of the three shards,
> and each shard sends its numFound for that query to the core that's
> coordinating the search.  The coordinator adds up numFound, looks
> through those thirty docs, and arranges them according to the requested
> sort order, returning only the top 10.  In this case, there happen to be
> no duplicates.
>
> With rows=100, you get a total of 300 docs.  This time, duplicates are
> found and removed by the coordinator.  I think that the coordinator
> adjusts the total numFound by the number of duplicate documents it
> removed, in an attempt to be more accurate.
>
> I don't know if adjusting numFound when duplicates are found in a
> sharded query is the right thing to do, I'll leave that for smarter
> people.  Perhaps Solr should return a message with the results saying
> that duplicates were found, and if a config option is not enabled, the
> server should throw an exception and return a 4xx HTTP error code.  One
> idea for a config parameter name would be allowShardDuplicates, but
> something better can probably be found.
>
> Thanks,
> Shawn
>
>


-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-23 Thread Shawn Heisey
On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
> I've query each Solr shard server one by one and the total number of
> documents is correct. However, when I change rows parameter from 10 to 100
> the total numFound of documents change:

I've seen this problem on the list before and the cause has been
determined each time to be caused by documents with the same uniqueKey
value appearing in more than one shard.

What I think happens here:

With rows=10, you get the top ten docs from each of the three shards,
and each shard sends its numFound for that query to the core that's
coordinating the search.  The coordinator adds up numFound, looks
through those thirty docs, and arranges them according to the requested
sort order, returning only the top 10.  In this case, there happen to be
no duplicates.

With rows=100, you get a total of 300 docs.  This time, duplicates are
found and removed by the coordinator.  I think that the coordinator
adjusts the total numFound by the number of duplicate documents it
removed, in an attempt to be more accurate.

I don't know if adjusting numFound when duplicates are found in a
sharded query is the right thing to do, I'll leave that for smarter
people.  Perhaps Solr should return a message with the results saying
that duplicates were found, and if a config option is not enabled, the
server should throw an exception and return a 4xx HTTP error code.  One
idea for a config parameter name would be allowShardDuplicates, but
something better can probably be found.

Thanks,
Shawn



Distributed query: strange behavior.

2013-05-23 Thread Luis Cappa Banda
Hello, guys!

I'm running Solr 4.3.0 and I've notice an strange behavior during
distributed queries execution. Currently I have three Solr servers as
shards and I when I do the following query...


http://localhost:11080/twitter/data/select?&q=*:*&*rows=10*
&&shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/data&wt=json<http://localhost:11080/twitter/data/select?&q=*:*&rows=10&sort=docIndexDate%20desc&shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/data&wt=json>

*Numfound* = 47131


I've query each Solr shard server one by one and the total number of
documents is correct. However, when I change rows parameter from 10 to 100
the total numFound of documents change:

http://localhost:11080/twitter/data/select?&q=*:*&*rows=100*
&&shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/data&wt=json<http://localhost:11080/twitter/data/select?&q=*:*&rows=10&sort=docIndexDate%20desc&shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/data&wt=json>

*Numfound* = 47124

And if i set rows=50 again the numFound count changes:

http://localhost:11080/twitter/data/select?&q=*:*&rows=50&shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/data&wt=json

*Numfound* = 47129


What's happening here? Anybody knows? It's a distributed search bug or
something?

Thank you very much in advance!


Best regards,

-- 
- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-25 Thread Luis Cappa Banda
Yes! I opened that issue, :-P Next week I'll test with the latest trunk
artifacts and check if the problem still happens.

Regards,

- Luis Cappa.
El 25/11/2012 13:35, "joe.cohe...@gmail.com" 
escribió:

>
> I'm having a smiliar problem.
>
> Did you by any chance try the suggestion here:
>
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
>
> ?
>
>
>
> Rakudten wrote
> > More info:
> >
> > -  I´m trying to update the document re-indexing the whole document
> again.
> > I first retrieve the document querying by it´s id, then delete it by it´s
> > id, and re-index including the new changes.
> > - At the same time there are other index writing operations.
> >
> > *RESULT*: in most cases the document wasn´t updated. Bad news... it
> smells
> > like a critical bug.
> >
> > Regards,
> >
> >
> > - Luis Cappa.
> >
> > 2012/11/22 Luis Cappa Banda <
>
> > luiscappa@
>
> > >
> >
> >> For more details, my indexation App is:
> >>
> >> 1. Multithreaded.
> >> 2. NRT indexation.
> >> 3. It´s a Web App with a REST API. It receives asynchronous requests
> that
> >> produces those atomic updates / document reindexations I told before.
> >>
> >> I´m pretty sure that the wrong behavior is related with CloudSolrServer
> >> and with the fact that maybe you are trying to modify the index while an
> >> index update is in course.
> >>
> >> Regards,
> >>
> >>
> >> - Luis Cappa.
> >>
> >>
> >> 2012/11/22 Luis Cappa Banda <
>
> > luiscappa@
>
> > >
> >>
> >>> Hello!
> >>>
> >>> I´m using a simple test configuration with nShards=1 without any
> >>> replica.
> >>> SolrCloudServer is suposed to forward properly those index/update
> >>> operations, isn´t it? I test with a complete document reindexation, not
> >>> atomic updates, using the official LBHttpSolrServer, not my custom
> >>> BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
> >>> related with atomic updates via CloudSolrServer but a general bug when
> >>> an
> >>> index changes with reindexations/updates frequently.
> >>>
> >>> Regards,
> >>>
> >>> - Luis Cappa.
> >>>
> >>>
> >>> 2012/11/22 Sami Siren <
>
> > ssiren@
>
> > >
> >>>
> >>>> It might even depend on the cluster layout! Let's say you have 2
> shards
> >>>> (no
> >>>> replicas) if the doc belongs to the node you send it to so that it
> does
> >>>> not
> >>>> get forwarded to another node then the update should work and in case
> >>>> where
> >>>> the doc gets forwarded to another node the problem occurs. With
> >>>> replicas
> >>>> it
> >>>> could appear even more strange: the leader might have the doc right
> and
> >>>> the
> >>>> replica not.
> >>>>
> >>>> I only briefly looked at the bits that deal with this so perhaps
> >>>> there's
> >>>> something more involved.
> >>>>
> >>>>
> >>>> On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda <
>
> > luiscappa@
>
> > >>> >wrote:
> >>>>
> >>>> > Hi, Sami!
> >>>> >
> >>>> > But isn´t strange that some documents were updated (atomic updates)
> >>>> > correctly and other ones not? Can´t it be a more serious problem
> like
> >>>> some
> >>>> > kind of index writer lock, or whatever?
> >>>> >
> >>>> > Regards,
> >>>> >
> >>>> > - Luis Cappa.
> >>>> >
> >>>> > 2012/11/22 Sami Siren <
>
> > ssiren@
>
> > >
> >>>> >
> >>>> > > I think the problem is that even though you were able to work
> >>>> around
> >>>> the
> >>>> > > bug in the client solr still uses the xml format internally so the
> >>>> atomic
> >>>> > > update (with multivalued field) fails later down the stack. The
> bug
> >>

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-25 Thread joe.cohe...@gmail.com

I'm having a smiliar problem.

Did you by any chance try the suggestion here:
https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

?



Rakudten wrote
> More info:
> 
> -  I´m trying to update the document re-indexing the whole document again.
> I first retrieve the document querying by it´s id, then delete it by it´s
> id, and re-index including the new changes.
> - At the same time there are other index writing operations.
> 
> *RESULT*: in most cases the document wasn´t updated. Bad news... it smells
> like a critical bug.
> 
> Regards,
> 
> 
> - Luis Cappa.
> 
> 2012/11/22 Luis Cappa Banda <

> luiscappa@

> >
> 
>> For more details, my indexation App is:
>>
>> 1. Multithreaded.
>> 2. NRT indexation.
>> 3. It´s a Web App with a REST API. It receives asynchronous requests that
>> produces those atomic updates / document reindexations I told before.
>>
>> I´m pretty sure that the wrong behavior is related with CloudSolrServer
>> and with the fact that maybe you are trying to modify the index while an
>> index update is in course.
>>
>> Regards,
>>
>>
>> - Luis Cappa.
>>
>>
>> 2012/11/22 Luis Cappa Banda <

> luiscappa@

> >
>>
>>> Hello!
>>>
>>> I´m using a simple test configuration with nShards=1 without any
>>> replica.
>>> SolrCloudServer is suposed to forward properly those index/update
>>> operations, isn´t it? I test with a complete document reindexation, not
>>> atomic updates, using the official LBHttpSolrServer, not my custom
>>> BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
>>> related with atomic updates via CloudSolrServer but a general bug when
>>> an
>>> index changes with reindexations/updates frequently.
>>>
>>> Regards,
>>>
>>> - Luis Cappa.
>>>
>>>
>>> 2012/11/22 Sami Siren <

> ssiren@

> >
>>>
>>>> It might even depend on the cluster layout! Let's say you have 2 shards
>>>> (no
>>>> replicas) if the doc belongs to the node you send it to so that it does
>>>> not
>>>> get forwarded to another node then the update should work and in case
>>>> where
>>>> the doc gets forwarded to another node the problem occurs. With
>>>> replicas
>>>> it
>>>> could appear even more strange: the leader might have the doc right and
>>>> the
>>>> replica not.
>>>>
>>>> I only briefly looked at the bits that deal with this so perhaps
>>>> there's
>>>> something more involved.
>>>>
>>>>
>>>> On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda <

> luiscappa@

> >>> >wrote:
>>>>
>>>> > Hi, Sami!
>>>> >
>>>> > But isn´t strange that some documents were updated (atomic updates)
>>>> > correctly and other ones not? Can´t it be a more serious problem like
>>>> some
>>>> > kind of index writer lock, or whatever?
>>>> >
>>>> > Regards,
>>>> >
>>>> > - Luis Cappa.
>>>> >
>>>> > 2012/11/22 Sami Siren <

> ssiren@

> >
>>>> >
>>>> > > I think the problem is that even though you were able to work
>>>> around
>>>> the
>>>> > > bug in the client solr still uses the xml format internally so the
>>>> atomic
>>>> > > update (with multivalued field) fails later down the stack. The bug
>>>> you
>>>> > > filed needs to be fixed to get the problem solved.
>>>> > >
>>>> > >
>>>> > > On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda <
>>>> 

> luiscappa@

>>>> > > >wrote:
>>>> > >
>>>> > > > Hello everyone.
>>>> > > >
>>>> > > > I´ve starting to seriously worry about with SolrCloud due an
>>>> strange
>>>> > > > behavior that I have detected. The situation is this the
>>>> following:
>>>> > > >
>>>> > > > *1.* SolrCloud with one shard and two Solr instances.
>>>> > > > *2.* Indexation via SolrJ with CloudServer and a custom
>

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
More info:

-  I´m trying to update the document re-indexing the whole document again.
I first retrieve the document querying by it´s id, then delete it by it´s
id, and re-index including the new changes.
- At the same time there are other index writing operations.

*RESULT*: in most cases the document wasn´t updated. Bad news... it smells
like a critical bug.

Regards,


- Luis Cappa.

2012/11/22 Luis Cappa Banda 

> For more details, my indexation App is:
>
> 1. Multithreaded.
> 2. NRT indexation.
> 3. It´s a Web App with a REST API. It receives asynchronous requests that
> produces those atomic updates / document reindexations I told before.
>
> I´m pretty sure that the wrong behavior is related with CloudSolrServer
> and with the fact that maybe you are trying to modify the index while an
> index update is in course.
>
> Regards,
>
>
> - Luis Cappa.
>
>
> 2012/11/22 Luis Cappa Banda 
>
>> Hello!
>>
>> I´m using a simple test configuration with nShards=1 without any replica.
>> SolrCloudServer is suposed to forward properly those index/update
>> operations, isn´t it? I test with a complete document reindexation, not
>> atomic updates, using the official LBHttpSolrServer, not my custom
>> BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
>> related with atomic updates via CloudSolrServer but a general bug when an
>> index changes with reindexations/updates frequently.
>>
>> Regards,
>>
>> - Luis Cappa.
>>
>>
>> 2012/11/22 Sami Siren 
>>
>>> It might even depend on the cluster layout! Let's say you have 2 shards
>>> (no
>>> replicas) if the doc belongs to the node you send it to so that it does
>>> not
>>> get forwarded to another node then the update should work and in case
>>> where
>>> the doc gets forwarded to another node the problem occurs. With replicas
>>> it
>>> could appear even more strange: the leader might have the doc right and
>>> the
>>> replica not.
>>>
>>> I only briefly looked at the bits that deal with this so perhaps there's
>>> something more involved.
>>>
>>>
>>> On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda >> >wrote:
>>>
>>> > Hi, Sami!
>>> >
>>> > But isn´t strange that some documents were updated (atomic updates)
>>> > correctly and other ones not? Can´t it be a more serious problem like
>>> some
>>> > kind of index writer lock, or whatever?
>>> >
>>> > Regards,
>>> >
>>> > - Luis Cappa.
>>> >
>>> > 2012/11/22 Sami Siren 
>>> >
>>> > > I think the problem is that even though you were able to work around
>>> the
>>> > > bug in the client solr still uses the xml format internally so the
>>> atomic
>>> > > update (with multivalued field) fails later down the stack. The bug
>>> you
>>> > > filed needs to be fixed to get the problem solved.
>>> > >
>>> > >
>>> > > On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda <
>>> luisca...@gmail.com
>>> > > >wrote:
>>> > >
>>> > > > Hello everyone.
>>> > > >
>>> > > > I´ve starting to seriously worry about with SolrCloud due an
>>> strange
>>> > > > behavior that I have detected. The situation is this the following:
>>> > > >
>>> > > > *1.* SolrCloud with one shard and two Solr instances.
>>> > > > *2.* Indexation via SolrJ with CloudServer and a custom
>>> > > > BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
>>> > correctly
>>> > > > atomic updates. Check
>>> > > > JIRA-4080<
>>> > > >
>>> > >
>>> >
>>> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
>>> > > > >
>>> > > > *3.* An asynchronous proccess updates partially some document
>>> fields.
>>> > > After
>>> > > > that operation I automatically execute a commit, so the index must
>>> be
>>> > > > reloaded.
>>> > > >
>>> > > > What I have checked is that both using atomic updates or complete
>>> > > document
>>> > > > reindexations* aleatory documents are not updated* *even if I saw
>>> > > debugging
>>> > > > how the add() and commit() operations were executed correctly* *and
>>> > > without
>>> > > > errors*. Has anyone experienced a similar behavior? Is it posible
>>> that
>>> > if
>>> > > > an index update operation didn´t finish and CloudSolrServer
>>> receives a
>>> > > new
>>> > > > one this second update operation doesn´t complete?
>>> > > >
>>> > > > Thank you in advance.
>>> > > >
>>> > > > Regards,
>>> > > >
>>> > > > --
>>> > > >
>>> > > > - Luis Cappa
>>> > > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > - Luis Cappa
>>> >
>>>
>>
>>
>>
>> --
>>
>> - Luis Cappa
>>
>>
>
>
> --
>
> - Luis Cappa
>
>


-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
For more details, my indexation App is:

1. Multithreaded.
2. NRT indexation.
3. It´s a Web App with a REST API. It receives asynchronous requests that
produces those atomic updates / document reindexations I told before.

I´m pretty sure that the wrong behavior is related with CloudSolrServer and
with the fact that maybe you are trying to modify the index while an index
update is in course.

Regards,


- Luis Cappa.


2012/11/22 Luis Cappa Banda 

> Hello!
>
> I´m using a simple test configuration with nShards=1 without any replica.
> SolrCloudServer is suposed to forward properly those index/update
> operations, isn´t it? I test with a complete document reindexation, not
> atomic updates, using the official LBHttpSolrServer, not my custom
> BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
> related with atomic updates via CloudSolrServer but a general bug when an
> index changes with reindexations/updates frequently.
>
> Regards,
>
> - Luis Cappa.
>
>
> 2012/11/22 Sami Siren 
>
>> It might even depend on the cluster layout! Let's say you have 2 shards
>> (no
>> replicas) if the doc belongs to the node you send it to so that it does
>> not
>> get forwarded to another node then the update should work and in case
>> where
>> the doc gets forwarded to another node the problem occurs. With replicas
>> it
>> could appear even more strange: the leader might have the doc right and
>> the
>> replica not.
>>
>> I only briefly looked at the bits that deal with this so perhaps there's
>> something more involved.
>>
>>
>> On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda > >wrote:
>>
>> > Hi, Sami!
>> >
>> > But isn´t strange that some documents were updated (atomic updates)
>> > correctly and other ones not? Can´t it be a more serious problem like
>> some
>> > kind of index writer lock, or whatever?
>> >
>> > Regards,
>> >
>> > - Luis Cappa.
>> >
>> > 2012/11/22 Sami Siren 
>> >
>> > > I think the problem is that even though you were able to work around
>> the
>> > > bug in the client solr still uses the xml format internally so the
>> atomic
>> > > update (with multivalued field) fails later down the stack. The bug
>> you
>> > > filed needs to be fixed to get the problem solved.
>> > >
>> > >
>> > > On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda <
>> luisca...@gmail.com
>> > > >wrote:
>> > >
>> > > > Hello everyone.
>> > > >
>> > > > I´ve starting to seriously worry about with SolrCloud due an strange
>> > > > behavior that I have detected. The situation is this the following:
>> > > >
>> > > > *1.* SolrCloud with one shard and two Solr instances.
>> > > > *2.* Indexation via SolrJ with CloudServer and a custom
>> > > > BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
>> > correctly
>> > > > atomic updates. Check
>> > > > JIRA-4080<
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
>> > > > >
>> > > > *3.* An asynchronous proccess updates partially some document
>> fields.
>> > > After
>> > > > that operation I automatically execute a commit, so the index must
>> be
>> > > > reloaded.
>> > > >
>> > > > What I have checked is that both using atomic updates or complete
>> > > document
>> > > > reindexations* aleatory documents are not updated* *even if I saw
>> > > debugging
>> > > > how the add() and commit() operations were executed correctly* *and
>> > > without
>> > > > errors*. Has anyone experienced a similar behavior? Is it posible
>> that
>> > if
>> > > > an index update operation didn´t finish and CloudSolrServer
>> receives a
>> > > new
>> > > > one this second update operation doesn´t complete?
>> > > >
>> > > > Thank you in advance.
>> > > >
>> > > > Regards,
>> > > >
>> > > > --
>> > > >
>> > > > - Luis Cappa
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > - Luis Cappa
>> >
>>
>
>
>
> --
>
> - Luis Cappa
>
>


-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hello!

I´m using a simple test configuration with nShards=1 without any replica.
SolrCloudServer is suposed to forward properly those index/update
operations, isn´t it? I test with a complete document reindexation, not
atomic updates, using the official LBHttpSolrServer, not my custom
BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
related with atomic updates via CloudSolrServer but a general bug when an
index changes with reindexations/updates frequently.

Regards,

- Luis Cappa.


2012/11/22 Sami Siren 

> It might even depend on the cluster layout! Let's say you have 2 shards (no
> replicas) if the doc belongs to the node you send it to so that it does not
> get forwarded to another node then the update should work and in case where
> the doc gets forwarded to another node the problem occurs. With replicas it
> could appear even more strange: the leader might have the doc right and the
> replica not.
>
> I only briefly looked at the bits that deal with this so perhaps there's
> something more involved.
>
>
> On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda  >wrote:
>
> > Hi, Sami!
> >
> > But isn´t strange that some documents were updated (atomic updates)
> > correctly and other ones not? Can´t it be a more serious problem like
> some
> > kind of index writer lock, or whatever?
> >
> > Regards,
> >
> > - Luis Cappa.
> >
> > 2012/11/22 Sami Siren 
> >
> > > I think the problem is that even though you were able to work around
> the
> > > bug in the client solr still uses the xml format internally so the
> atomic
> > > update (with multivalued field) fails later down the stack. The bug you
> > > filed needs to be fixed to get the problem solved.
> > >
> > >
> > > On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda  > > >wrote:
> > >
> > > > Hello everyone.
> > > >
> > > > I´ve starting to seriously worry about with SolrCloud due an strange
> > > > behavior that I have detected. The situation is this the following:
> > > >
> > > > *1.* SolrCloud with one shard and two Solr instances.
> > > > *2.* Indexation via SolrJ with CloudServer and a custom
> > > > BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
> > correctly
> > > > atomic updates. Check
> > > > JIRA-4080<
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
> > > > >
> > > > *3.* An asynchronous proccess updates partially some document fields.
> > > After
> > > > that operation I automatically execute a commit, so the index must be
> > > > reloaded.
> > > >
> > > > What I have checked is that both using atomic updates or complete
> > > document
> > > > reindexations* aleatory documents are not updated* *even if I saw
> > > debugging
> > > > how the add() and commit() operations were executed correctly* *and
> > > without
> > > > errors*. Has anyone experienced a similar behavior? Is it posible
> that
> > if
> > > > an index update operation didn´t finish and CloudSolrServer receives
> a
> > > new
> > > > one this second update operation doesn´t complete?
> > > >
> > > > Thank you in advance.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > >
> > > > - Luis Cappa
> > > >
> > >
> >
> >
> >
> > --
> >
> > - Luis Cappa
> >
>



-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Sami Siren
It might even depend on the cluster layout! Let's say you have 2 shards (no
replicas) if the doc belongs to the node you send it to so that it does not
get forwarded to another node then the update should work and in case where
the doc gets forwarded to another node the problem occurs. With replicas it
could appear even more strange: the leader might have the doc right and the
replica not.

I only briefly looked at the bits that deal with this so perhaps there's
something more involved.


On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda wrote:

> Hi, Sami!
>
> But isn´t strange that some documents were updated (atomic updates)
> correctly and other ones not? Can´t it be a more serious problem like some
> kind of index writer lock, or whatever?
>
> Regards,
>
> - Luis Cappa.
>
> 2012/11/22 Sami Siren 
>
> > I think the problem is that even though you were able to work around the
> > bug in the client solr still uses the xml format internally so the atomic
> > update (with multivalued field) fails later down the stack. The bug you
> > filed needs to be fixed to get the problem solved.
> >
> >
> > On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda  > >wrote:
> >
> > > Hello everyone.
> > >
> > > I´ve starting to seriously worry about with SolrCloud due an strange
> > > behavior that I have detected. The situation is this the following:
> > >
> > > *1.* SolrCloud with one shard and two Solr instances.
> > > *2.* Indexation via SolrJ with CloudServer and a custom
> > > BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
> correctly
> > > atomic updates. Check
> > > JIRA-4080<
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
> > > >
> > > *3.* An asynchronous proccess updates partially some document fields.
> > After
> > > that operation I automatically execute a commit, so the index must be
> > > reloaded.
> > >
> > > What I have checked is that both using atomic updates or complete
> > document
> > > reindexations* aleatory documents are not updated* *even if I saw
> > debugging
> > > how the add() and commit() operations were executed correctly* *and
> > without
> > > errors*. Has anyone experienced a similar behavior? Is it posible that
> if
> > > an index update operation didn´t finish and CloudSolrServer receives a
> > new
> > > one this second update operation doesn´t complete?
> > >
> > > Thank you in advance.
> > >
> > > Regards,
> > >
> > > --
> > >
> > > - Luis Cappa
> > >
> >
>
>
>
> --
>
> - Luis Cappa
>


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hi, Sami!

But isn´t strange that some documents were updated (atomic updates)
correctly and other ones not? Can´t it be a more serious problem like some
kind of index writer lock, or whatever?

Regards,

- Luis Cappa.

2012/11/22 Sami Siren 

> I think the problem is that even though you were able to work around the
> bug in the client solr still uses the xml format internally so the atomic
> update (with multivalued field) fails later down the stack. The bug you
> filed needs to be fixed to get the problem solved.
>
>
> On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda  >wrote:
>
> > Hello everyone.
> >
> > I´ve starting to seriously worry about with SolrCloud due an strange
> > behavior that I have detected. The situation is this the following:
> >
> > *1.* SolrCloud with one shard and two Solr instances.
> > *2.* Indexation via SolrJ with CloudServer and a custom
> > BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute correctly
> > atomic updates. Check
> > JIRA-4080<
> >
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
> > >
> > *3.* An asynchronous proccess updates partially some document fields.
> After
> > that operation I automatically execute a commit, so the index must be
> > reloaded.
> >
> > What I have checked is that both using atomic updates or complete
> document
> > reindexations* aleatory documents are not updated* *even if I saw
> debugging
> > how the add() and commit() operations were executed correctly* *and
> without
> > errors*. Has anyone experienced a similar behavior? Is it posible that if
> > an index update operation didn´t finish and CloudSolrServer receives a
> new
> > one this second update operation doesn´t complete?
> >
> > Thank you in advance.
> >
> > Regards,
> >
> > --
> >
> > - Luis Cappa
> >
>



-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Sami Siren
I think the problem is that even though you were able to work around the
bug in the client solr still uses the xml format internally so the atomic
update (with multivalued field) fails later down the stack. The bug you
filed needs to be fixed to get the problem solved.


On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda wrote:

> Hello everyone.
>
> I´ve starting to seriously worry about with SolrCloud due an strange
> behavior that I have detected. The situation is this the following:
>
> *1.* SolrCloud with one shard and two Solr instances.
> *2.* Indexation via SolrJ with CloudServer and a custom
> BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute correctly
> atomic updates. Check
> JIRA-4080<
> https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
> >
> *3.* An asynchronous proccess updates partially some document fields. After
> that operation I automatically execute a commit, so the index must be
> reloaded.
>
> What I have checked is that both using atomic updates or complete document
> reindexations* aleatory documents are not updated* *even if I saw debugging
> how the add() and commit() operations were executed correctly* *and without
> errors*. Has anyone experienced a similar behavior? Is it posible that if
> an index update operation didn´t finish and CloudSolrServer receives a new
> one this second update operation doesn´t complete?
>
> Thank you in advance.
>
> Regards,
>
> --
>
> - Luis Cappa
>


SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hello everyone.

I´ve starting to seriously worry about with SolrCloud due an strange
behavior that I have detected. The situation is this the following:

*1.* SolrCloud with one shard and two Solr instances.
*2.* Indexation via SolrJ with CloudServer and a custom
BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute correctly
atomic updates. Check
JIRA-4080<https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055>
*3.* An asynchronous proccess updates partially some document fields. After
that operation I automatically execute a commit, so the index must be
reloaded.

What I have checked is that both using atomic updates or complete document
reindexations* aleatory documents are not updated* *even if I saw debugging
how the add() and commit() operations were executed correctly* *and without
errors*. Has anyone experienced a similar behavior? Is it posible that if
an index update operation didn´t finish and CloudSolrServer receives a new
one this second update operation doesn´t complete?

Thank you in advance.

Regards,

-- 

- Luis Cappa


Re: Field names w/ leading digits cause strange behavior

2012-04-24 Thread bleakley
Thank you for verifying the issue. I've created a ticket at
https://issues.apache.org/jira/browse/SOLR-3407

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936599.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field names w/ leading digits cause strange behavior

2012-04-24 Thread Erick Erickson
Hmmm, this does NOT happen on 3.6, and it DOES happen on
trunk. Sure sounds like a JIRA to me, would you mind raising one?

I can't imagine this is desired behavior, it's just weird.

Thanks for pointing this out!
Erick

On Tue, Apr 24, 2012 at 3:38 PM, bleakley  wrote:
> When specifying a field name that starts with a digit (or digits) in the "fl"
> parameter solr returns both the field name and field value as the those
> digits. For example, using nightly build
> "apache-solr-4.0-2012-04-24_08-27-47" I run:
>
> java -jar start.jar
> and
> java -jar post.jar solr.xml monitor.xml
>
> If I then add a field to the field list that starts with a digit (
> localhost:8983/solr/select?q=*:*&fl=24 ) the results look like:
> ...
> 
> 24
> 
> ...
>
> if I try fl=24_7 it looks like everything after the underscore is truncated
> ...
> 
> 24
> 
> ...
>
> and if I try fl=3test it looks like everything after the last digit is
> truncated
> ...
> 
> 3
> 
> ...
>
> If I have an actual value for that field (say I've indexed 24_7 to be "true"
> ) I get back that value as well as the behavior above.
> ...
> 
> true
> 24
> 
> ...
>
> Is it ok the have fields that start with digits? If so, is there a different
> way to specify them using the "fl" parameter? Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Field names w/ leading digits cause strange behavior

2012-04-24 Thread bleakley
When specifying a field name that starts with a digit (or digits) in the "fl"
parameter solr returns both the field name and field value as the those
digits. For example, using nightly build
"apache-solr-4.0-2012-04-24_08-27-47" I run:

java -jar start.jar
and
java -jar post.jar solr.xml monitor.xml

If I then add a field to the field list that starts with a digit (
localhost:8983/solr/select?q=*:*&fl=24 ) the results look like:
...

24

...

if I try fl=24_7 it looks like everything after the underscore is truncated
...

24

...

and if I try fl=3test it looks like everything after the last digit is
truncated
...

3

...

If I have an actual value for that field (say I've indexed 24_7 to be "true"
) I get back that value as well as the behavior above.
...

true
24

...

Is it ok the have fields that start with digits? If so, is there a different
way to specify them using the "fl" parameter? Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange behavior with search on empty string and NOT

2012-04-09 Thread Chris Hostetter

: Would it be a good idea to have Solr throw syntax error if an empty string
: query occurs? 

erick's explanation wasn't very precise ... 

solr doesn't have any special handling of "empty strings", but what you 
are searching for *might* be a totally valid query based on how the field 
type is configured (ie: strfield, or keywordtokenizer, etc...

in your case, you seem to be seraching for "" in a field for the 
analyzer produces no tokens for "", so it falls out of the query.


-Hoss


Re: Strange behavior with search on empty string and NOT

2012-03-13 Thread Lan
Would it be a good idea to have Solr throw syntax error if an empty string
query occurs? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3823572.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange behavior with search on empty string and NOT

2012-03-12 Thread Erick Erickson
Because Lucene query syntax is not a strict Boolean logic system.
There's a good explanation here:
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

Adding &debugQuery=on to your search is your friend .. You'll see
that your return (at least on 3.5 with going at /solr/select) returns
this as the parsed query:

-name:foobar

Solr really doesn't have the semantics for empty strings (or NULL for
that matter) so it just gets dropped out.

Best
Erick

On Sun, Mar 11, 2012 at 11:36 PM, Lan  wrote:
> I am curious why solr results are inconsistent for the query below for an
> empty string search on a TextField.
>
> q=name:"" returns 0 results
> q=name:"" AND NOT name:"FOOBAR" return all results in the solr index. Should
> it should not return 0 results too?
>
> Here is the debugQuery.
>
> 
> 
> 0
> 1
> 
> on
> on
> 0
> name:"" AND NOT name:"BLAH232282"
> 0
> 2.2
> 
> 
> 
> 
> name:"" AND NOT name:"BLAH232282"
> name:"" AND NOT name:"BLAH232282"
> -PhraseQuery(name:"blah 232282")
> -name:"blah 232282"
> 
> LuceneQParser
> 
> 1.0
> 
> 1.0
> 
> 1.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 0.0
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 
> 
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3818023.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Strange behavior with search on empty string and NOT

2012-03-11 Thread Lan
I am curious why solr results are inconsistent for the query below for an
empty string search on a TextField. 

q=name:"" returns 0 results
q=name:"" AND NOT name:"FOOBAR" return all results in the solr index. Should
it should not return 0 results too?

Here is the debugQuery.



0
1

on
on
0
name:"" AND NOT name:"BLAH232282"
0
2.2




name:"" AND NOT name:"BLAH232282"
name:"" AND NOT name:"BLAH232282"
-PhraseQuery(name:"blah 232282")
-name:"blah 232282"

LuceneQParser

1.0

1.0

1.0


0.0


0.0


0.0


0.0


0.0



0.0

0.0


0.0


0.0


0.0


0.0


0.0







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3818023.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: strange behavior of scores and term proximity use

2011-11-25 Thread Erick Erickson
You  might try with a less "fraught" search phrase,
"to be or not to be" is a classic query that may be all
stop words.

Otherwise, I'm clueless.

On Wed, Nov 23, 2011 at 3:15 PM, Ariel Zerbib  wrote:
> I tested with the version 4.0-2011-11-04_09-29-42.
>
> Ariel
>
>
> 2011/11/17 Erick Erickson 
>
>> Hmmm, I'm not seeing similar behavior on a trunk from today, when did
>> you get your copy?
>>
>> Erick
>>
>> On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib 
>> wrote:
>> > Hi,
>> >
>> > For this term proximity query: ab_main_title_l0:"to be or not to be"~1000
>> >
>> >
>> http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true
>> >
>> > The third first results are the following one:
>> >
>> > 
>> > 
>> > 
>> >  0
>> >  5
>> > 
>> > 
>> >  
>> >    2315190010001021
>> >    
>> >      og54ct8n To be or not to be a Jew. 5w8ojsx2
>> >    
>> >    3.0814114
>> >  
>> >    2313006480001021
>> >    
>> >      og54ct8n To be or not to be 5w8ojsx2
>> >    
>> >    3.0814114
>> >  
>> >    2356410250001021
>> >    
>> >      og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2
>> >    
>> >    3.0814114
>> > 
>> > 
>> >  ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000
>> >  ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000
>> >  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
>> > not to be 5w8ojsx2"~1000)
>> >  ab_main_title_l0:"og54ct8n to be or not
>> > to be 5w8ojsx2"~1000
>> >  
>> >    
>> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
>> >  5.337161 = fieldWeight in 378403, product of:
>> >    0.57735026 = tf(freq=0.3334), with freq of:
>> >      0.3334 = phraseFreq=0.3334
>> >    29.581549 = idf(), sum of:
>> >      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>> >      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>> >    0.3125 = fieldNorm(doc=378403)
>> > 
>> >    
>> > 9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
>> >  9.244234 = fieldWeight in 482807, product of:
>> >    1.0 = tf(freq=1.0), with freq of:
>> >      1.0 = phraseFreq=1.0
>> >    29.581549 = idf(), sum of:
>> >      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>> >      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>> >    0.3125 = fieldNorm(doc=482807)
>> > 
>> >    
>> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
>> > 5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
>> >  5.337161 = fieldWeight in 1317563, product of:
>> >    0.57735026 = tf(freq=0.3334), with freq of:
>> >      0.3334 = phraseFreq=0.3334
>> >    29.581549 = idf(), sum of:
>> >      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>> >      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>> >      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>> >      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>> >      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>> >    0.3125 = fieldNorm(doc=1317563)
>> > 
>> > 
>> >
>> > The used version is a 4.0 October snapshot.
>> >
>> > I have 2 questions about the result:
>> > - Why debug print and scores in result are different?
>> > - What is the expected behavior of this kind of term proximity query?
>> >          - The debug scores seem to be well ordered but the result scores
>> > seem to be wrong.
>> >
>> >
>> > Thanks,
>> > Ariel
>> >
>>
>


Re: strange behavior of scores and term proximity use

2011-11-23 Thread Ariel Zerbib
I tested with the version 4.0-2011-11-04_09-29-42.

Ariel


2011/11/17 Erick Erickson 

> Hmmm, I'm not seeing similar behavior on a trunk from today, when did
> you get your copy?
>
> Erick
>
> On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib 
> wrote:
> > Hi,
> >
> > For this term proximity query: ab_main_title_l0:"to be or not to be"~1000
> >
> >
> http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true
> >
> > The third first results are the following one:
> >
> > 
> > 
> > 
> >  0
> >  5
> > 
> > 
> >  
> >2315190010001021
> >
> >  og54ct8n To be or not to be a Jew. 5w8ojsx2
> >
> >3.0814114
> >  
> >2313006480001021
> >
> >  og54ct8n To be or not to be 5w8ojsx2
> >
> >3.0814114
> >  
> >2356410250001021
> >
> >  og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2
> >
> >3.0814114
> > 
> > 
> >  ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000
> >  ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000
> >  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
> > not to be 5w8ojsx2"~1000)
> >  ab_main_title_l0:"og54ct8n to be or not
> > to be 5w8ojsx2"~1000
> >  
> >
> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
> >  5.337161 = fieldWeight in 378403, product of:
> >0.57735026 = tf(freq=0.3334), with freq of:
> >  0.3334 = phraseFreq=0.3334
> >29.581549 = idf(), sum of:
> >  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
> >  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
> >0.3125 = fieldNorm(doc=378403)
> > 
> >
> > 9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
> >  9.244234 = fieldWeight in 482807, product of:
> >1.0 = tf(freq=1.0), with freq of:
> >  1.0 = phraseFreq=1.0
> >29.581549 = idf(), sum of:
> >  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
> >  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
> >0.3125 = fieldNorm(doc=482807)
> > 
> >
> > 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> > 5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
> >  5.337161 = fieldWeight in 1317563, product of:
> >0.57735026 = tf(freq=0.3334), with freq of:
> >  0.3334 = phraseFreq=0.3334
> >29.581549 = idf(), sum of:
> >  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
> >  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
> >  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
> >  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
> >  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
> >0.3125 = fieldNorm(doc=1317563)
> > 
> > 
> >
> > The used version is a 4.0 October snapshot.
> >
> > I have 2 questions about the result:
> > - Why debug print and scores in result are different?
> > - What is the expected behavior of this kind of term proximity query?
> >  - The debug scores seem to be well ordered but the result scores
> > seem to be wrong.
> >
> >
> > Thanks,
> > Ariel
> >
>


Re: strange behavior of scores and term proximity use

2011-11-17 Thread Erick Erickson
Hmmm, I'm not seeing similar behavior on a trunk from today, when did
you get your copy?

Erick

On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib  wrote:
> Hi,
>
> For this term proximity query: ab_main_title_l0:"to be or not to be"~1000
>
> http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true
>
> The third first results are the following one:
>
> 
> 
> 
>  0
>  5
> 
> 
>  
>    2315190010001021
>    
>      og54ct8n To be or not to be a Jew. 5w8ojsx2
>    
>    3.0814114
>  
>    2313006480001021
>    
>      og54ct8n To be or not to be 5w8ojsx2
>    
>    3.0814114
>  
>    2356410250001021
>    
>      og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2
>    
>    3.0814114
> 
> 
>  ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000
>  ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000
>  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
> not to be 5w8ojsx2"~1000)
>  ab_main_title_l0:"og54ct8n to be or not
> to be 5w8ojsx2"~1000
>  
>    
> 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
>  5.337161 = fieldWeight in 378403, product of:
>    0.57735026 = tf(freq=0.3334), with freq of:
>      0.3334 = phraseFreq=0.3334
>    29.581549 = idf(), sum of:
>      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>    0.3125 = fieldNorm(doc=378403)
> 
>    
> 9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
>  9.244234 = fieldWeight in 482807, product of:
>    1.0 = tf(freq=1.0), with freq of:
>      1.0 = phraseFreq=1.0
>    29.581549 = idf(), sum of:
>      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>    0.3125 = fieldNorm(doc=482807)
> 
>    
> 5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
> 5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
>  5.337161 = fieldWeight in 1317563, product of:
>    0.57735026 = tf(freq=0.3334), with freq of:
>      0.3334 = phraseFreq=0.3334
>    29.581549 = idf(), sum of:
>      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
>      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
>      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
>      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
>      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
>    0.3125 = fieldNorm(doc=1317563)
> 
> 
>
> The used version is a 4.0 October snapshot.
>
> I have 2 questions about the result:
> - Why debug print and scores in result are different?
> - What is the expected behavior of this kind of term proximity query?
>          - The debug scores seem to be well ordered but the result scores
> seem to be wrong.
>
>
> Thanks,
> Ariel
>


strange behavior of scores and term proximity use

2011-11-16 Thread Ariel Zerbib
Hi,

For this term proximity query: ab_main_title_l0:"to be or not to be"~1000

http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true

The third first results are the following one:




  0
  5


  
2315190010001021

  og54ct8n To be or not to be a Jew. 5w8ojsx2

3.0814114
  
2313006480001021

  og54ct8n To be or not to be 5w8ojsx2

3.0814114
  
2356410250001021

  og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2

3.0814114


  ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000
  ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000
  PhraseQuery(ab_main_title_l0:"og54ct8n to be or
not to be 5w8ojsx2"~1000)
  ab_main_title_l0:"og54ct8n to be or not
to be 5w8ojsx2"~1000
  

5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 378403, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=378403)


9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
  9.244234 = fieldWeight in 482807, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=482807)


5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 1317563, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=1317563)



The used version is a 4.0 October snapshot.

I have 2 questions about the result:
- Why debug print and scores in result are different?
- What is the expected behavior of this kind of term proximity query?
  - The debug scores seem to be well ordered but the result scores
seem to be wrong.


Thanks,
Ariel


Re: Strange behavior

2011-06-16 Thread Denis Kuzmenok
Of  course,  i  did  stop  the solr before copying the index. Deleting
index and reindexing on production server did solve an issue. Strange,
but working..


> Have you stopped Solr before manually copying the data? This way you
> can be sure that index is the same and you didn't have any new docs on
> the fly.



Re: Strange behavior

2011-06-16 Thread Alexey Serba
Have you stopped Solr before manually copying the data? This way you
can be sure that index is the same and you didn't have any new docs on
the fly.

2011/6/14 Denis Kuzmenok :
> What  should  i provide, OS is the same, environment is the same, solr
> is  completely  copied,  searches  work,  except that one, and that is
> strange..
>
>> I think you will need to provide more information than this, no-one on this 
>> list is omniscient AFAIK.
>
>> François
>
>> On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:
>
>>> Hi.
>>>
>>> I've  debugged search on test machine, after copying to production server
>>> the  entire  directory  (entire solr directory), i've noticed that one
>>> query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
>>> production.
>>> How can that be?
>>>
>
>
>
>
>


Re: Strange behavior

2011-06-14 Thread Erick Erickson
Well, you could provide the results with &debugQuery=on. You could
provide the schema.xml and solrconfig.xml files for both. You
could provide a listing of your index files. You could provide some
evidence that you've tried chasing down your problem using tools
like Luke or the Solr admin interface. Something please...

You might also review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

2011/6/14 Denis Kuzmenok :
> What  should  i provide, OS is the same, environment is the same, solr
> is  completely  copied,  searches  work,  except that one, and that is
> strange..
>
>> I think you will need to provide more information than this, no-one on this 
>> list is omniscient AFAIK.
>
>> François
>
>> On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:
>
>>> Hi.
>>>
>>> I've  debugged search on test machine, after copying to production server
>>> the  entire  directory  (entire solr directory), i've noticed that one
>>> query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
>>> production.
>>> How can that be?
>>>
>
>
>
>
>


Re: Strange behavior

2011-06-14 Thread Denis Kuzmenok
What  should  i provide, OS is the same, environment is the same, solr
is  completely  copied,  searches  work,  except that one, and that is
strange.. 

> I think you will need to provide more information than this, no-one on this 
> list is omniscient AFAIK.

> François

> On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

>> Hi.
>> 
>> I've  debugged search on test machine, after copying to production server
>> the  entire  directory  (entire solr directory), i've noticed that one
>> query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
>> production.
>> How can that be?
>> 






Re: Strange behavior

2011-06-14 Thread François Schiettecatte
I think you will need to provide more information than this, no-one on this 
list is omniscient AFAIK.

François

On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

> Hi.
> 
> I've  debugged search on test machine, after copying to production server
> the  entire  directory  (entire solr directory), i've noticed that one
> query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
> production.
> How can that be?
> 



Strange behavior

2011-06-14 Thread Denis Kuzmenok
Hi.

I've  debugged search on test machine, after copying to production server
the  entire  directory  (entire solr directory), i've noticed that one
query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
production.
How can that be?



Re: strange behavior of echoParams

2011-04-13 Thread Bernd Fehling

Hi Erik,

never mind.
Can't reproduce this strange behavior.
Obviously stopping and starting of solr solved this.

Thanks,
Bernd


Am 13.04.2011 16:00, schrieb Erik Hatcher:

What does the parsed query look like with debugQuery=true for both scenarios?
Any difference?
Doesn't make any sense that echoParams would have an effect, unless somehow 
your search client is relying on parameters returned to do something with 
them.?!

Erik

On Apr 13, 2011, at 09:57 , Bernd Fehling wrote:


Dear list,

after setting "echoParams" to "none" wildcard search isn't working.
Only if I set "echoParams" to "explicit" then wildcard is possible.

http://wiki.apache.org/solr/CoreQueryParameters
states that "echoParams" is for debugging purposes.

We use Solr 3.1.0.

Snippet from solrconfig.xml:

 
   none

   xml
   10
 


Any explanation about this behavior?

Regards,
Bernd






Re: strange behavior of echoParams

2011-04-13 Thread Erik Hatcher
What does the parsed query look like with debugQuery=true for both scenarios?  
Any difference?  Doesn't make any sense that echoParams would have an effect, 
unless somehow your search client is relying on parameters returned to do 
something with them.?!

Erik

On Apr 13, 2011, at 09:57 , Bernd Fehling wrote:

> Dear list,
> 
> after setting "echoParams" to "none" wildcard search isn't working.
> Only if I set "echoParams" to "explicit" then wildcard is possible.
> 
> http://wiki.apache.org/solr/CoreQueryParameters
> states that "echoParams" is for debugging purposes.
> 
> We use Solr 3.1.0.
> 
> Snippet from solrconfig.xml:
> 
> 
>   none
> 
>   xml
>   10
> 
> 
> 
> Any explanation about this behavior?
> 
> Regards,
> Bernd



strange behavior of echoParams

2011-04-13 Thread Bernd Fehling

Dear list,

after setting "echoParams" to "none" wildcard search isn't working.
Only if I set "echoParams" to "explicit" then wildcard is possible.

http://wiki.apache.org/solr/CoreQueryParameters
states that "echoParams" is for debugging purposes.

We use Solr 3.1.0.

Snippet from solrconfig.xml:

 
   none

   xml
   10
 


Any explanation about this behavior?

Regards,
Bernd


RE: Strange behavior for certain words

2010-05-13 Thread Ahmet Arslan
Hi,
       Thanks for your response. Attached are the Schema.xml and sample docs 
that were indexed. The query and response are as below. The attachment 
Prodsku4270257.xml has a field "paymenttype" whose value is 'prepaid'.

query:
q=prepaid&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=json&debugQuery=on&explainOther=&hl=on

But you are populating your text field from deviceType, features, description 
and color. paymentType is not copied into text. So this behavior is normal.
Either add this copy field declaration 
   
Or query directly this field: q=paymentType:prepaid



  

RE: Strange behavior for certain words

2010-05-12 Thread Naga Darbha
Hi Rama,

What field types are these Title and Description?

You may go to SOLR admin console and try "Analysis", and select the field type 
that you have used for Title and Description and provide those words Prepaid 
and Postpaid in the indexing analyzer and see how is it storing the information.

regards,
Naga Ranjan

-Original Message-
From: RamaKrishna Atmakur [mailto:ramkrishn...@hotmail.com] 
Sent: Thursday, May 13, 2010 5:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior for certain words


Hi,
   We are trying to use SOLR for searching our catalog online and during QA 
came across a interesting case where SOLR is not returning results that it 
should.

Specificially, we have indexed things like "Title" and "Description", of the 
words in the Title happens to be "Prepaid' and "Postpaid". However when we 
search on those words, SOLR does not return any results.
But if we search on some other words in the same title in which the word 
"Prepaid" occurs then the correct results are returned. In fact SOLR even 
returns the result count for the Prepaid and Postpaid facets.

We know that there are no synonyms associated with both those words and these 
words are also not in any other list such as stopwords.txt etc.

Any idea as to why this should be happening ?

Thanks in advance,
Rama
  


RE: Strange behavior for certain words

2010-05-12 Thread RamaKrishna Atmakur

Hi,
   Thanks for your response. Attached are the Schema.xml and sample docs 
that were indexed. The query and response are as below. The attachment 
Prodsku4270257.xml has a field "paymenttype" whose value is 'prepaid'.

query:
q=prepaid&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=json&debugQuery=on&explainOther=&hl=on
Result:
{
 "responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"wt":"json",
"debugQuery":"on",
"start":"0",
"rows":"10",
"explainOther":"",
"indent":"on",
"fl":"*,score",
"hl":"on",
"qt":"standard",
"version":"2.2",
"q":"prepaid",
"hl.fl":""}},
 "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
 },
 "highlighting":{},
 "debug":{
  "rawquerystring":"prepaid",
  "querystring":"prepaid",
  "parsedquery":"text:prepaid",
  "parsedquery_toString":"text:prepaid",
  "explain":{},
  "QParser":"OldLuceneQParser",
  "timing":{
"time":0.0,
"prepare":{
 "time":0.0,
 "org.apache.solr.handler.component.QueryComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.FacetComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.MoreLikeThisComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.HighlightComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.DebugComponent":{
  "time":0.0}},
"process":{
 "time":0.0,
 "org.apache.solr.handler.component.QueryComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.FacetComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.MoreLikeThisComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.HighlightComponent":{
  "time":0.0},
 "org.apache.solr.handler.component.DebugComponent":{
  "time":0.0}
Thanks and Regards
Rama K Atmakur.

> Date: Wed, 12 May 2010 20:46:11 -0400
> Subject: Re: Strange behavior for certain words
> From: erickerick...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Hmmm, there's not much information to go on here.
> You might review this page:
> http://wiki.apache.org/solr/UsingMailingLists
> and post with more information. At minimum,
> the field definitions, the query output (include
> &debugQuery=on), perhaps what comes out
> of the analysis admin page for both indexing
> and querying the problem text, and whatever
> else you can think of that would help analyze the
> problem.
> 
> Best
> Erick
> 
> On Wed, May 12, 2010 at 8:26 PM, RamaKrishna Atmakur <
> ramkrishn...@hotmail.com> wrote:
> 
> >
> > Hi,
> >   We are trying to use SOLR for searching our catalog online and during QA
> > came across a interesting case where SOLR is not returning results that it
> > should.
> >
> > Specificially, we have indexed things like "Title" and "Description", of
> > the words in the Title happens to be "Prepaid' and "Postpaid". However when
> > we search on those words, SOLR does not return any results.
> > But if we search on some other words in the same title in which the word
> > "Prepaid" occurs then the correct results are returned. In fact SOLR even
> > returns the result count for the Prepaid and Postpaid facets.
> >
> > We know that there are no synonyms associated with both those words and
> > these words are also not in any other list such as stopwords.txt etc.
> >
> > Any idea as to why this should be happening ?
> >
> > Thanks in advance,
> > Rama
> >
  





  

  






































  

  




  

 

 






  
  








Re: Strange behavior for certain words

2010-05-12 Thread Erick Erickson
Hmmm, there's not much information to go on here.
You might review this page:
http://wiki.apache.org/solr/UsingMailingLists
and post with more information. At minimum,
the field definitions, the query output (include
&debugQuery=on), perhaps what comes out
of the analysis admin page for both indexing
and querying the problem text, and whatever
else you can think of that would help analyze the
problem.

Best
Erick

On Wed, May 12, 2010 at 8:26 PM, RamaKrishna Atmakur <
ramkrishn...@hotmail.com> wrote:

>
> Hi,
>   We are trying to use SOLR for searching our catalog online and during QA
> came across a interesting case where SOLR is not returning results that it
> should.
>
> Specificially, we have indexed things like "Title" and "Description", of
> the words in the Title happens to be "Prepaid' and "Postpaid". However when
> we search on those words, SOLR does not return any results.
> But if we search on some other words in the same title in which the word
> "Prepaid" occurs then the correct results are returned. In fact SOLR even
> returns the result count for the Prepaid and Postpaid facets.
>
> We know that there are no synonyms associated with both those words and
> these words are also not in any other list such as stopwords.txt etc.
>
> Any idea as to why this should be happening ?
>
> Thanks in advance,
> Rama
>


Strange behavior for certain words

2010-05-12 Thread RamaKrishna Atmakur

Hi,
   We are trying to use SOLR for searching our catalog online and during QA 
came across a interesting case where SOLR is not returning results that it 
should.

Specificially, we have indexed things like "Title" and "Description", of the 
words in the Title happens to be "Prepaid' and "Postpaid". However when we 
search on those words, SOLR does not return any results.
But if we search on some other words in the same title in which the word 
"Prepaid" occurs then the correct results are returned. In fact SOLR even 
returns the result count for the Prepaid and Postpaid facets.

We know that there are no synonyms associated with both those words and these 
words are also not in any other list such as stopwords.txt etc.

Any idea as to why this should be happening ?

Thanks in advance,
Rama
  

Re: Strange Behavior When Using CSVRequestHandler

2010-01-07 Thread Erick Erickson
It puzzles me too. I don't know the internals of that code
well enough to speculate, but once you're into undefined
behavior, I have great faith in *many* inexplicable things
happening.

Erick

On Thu, Jan 7, 2010 at 9:45 AM, danben  wrote:

>
> Erick - thanks very much, all of this makes sense.  But the one thing I
> still
> find puzzling is the fact that re-adding the file a second, third, fourth
> etc time causes numDocs to increase, and ALWAYS by the same amount
> (141,645).  Any ideas as to what could cause that?
>
> Dan
>
>
> Erick Erickson wrote:
> >
> > I think the root of your problem is that unique fields should NOT
> > be multivalued. See
> >
> http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
> >
> > <
> http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
> >In
> > this case, since you're tokenizing, your "query" field is
> > implicitly multi-valued, I don't know what the behavior will be.
> >
> > But there's another problem:
> > All the filters in your analyzer definition will mess up the
> > correspondence between the Unix uniq and numDocs even
> > if you got by the above. I.e
> >
> > StopFilter would make the lines "a problem" and "the problem" identical.
> > WordDelimiter would do all kinds of interesting things
> > LowerCaseFilter would make "Myproblem" and "myproblem" identical.
> > RemoveDuplicatesFilter would make "interesting interesting" and
> > "interesting" identical
> >
> > You could define a second field, make *that* one unique and NOT analyzer
> > it in any way...
> >
> > You could hash your sentences and define the hash as your unique key.
> >
> > You could
> >
> > HTH
> > Erick
> >
> > On Wed, Jan 6, 2010 at 1:06 PM, danben  wrote:
> >
> >>
> >> The problem:
> >>
> >> Not all of the documents that I expect to be indexed are showing up in
> >> the
> >> index.
> >>
> >> The background:
> >>
> >> I start off with an empty index based on a schema with a single field
> >> named
> >> 'query', marked as unique and using the following analyzer:
> >>
> >> 
> >>
> >> >> words="stopwords.txt" enablePositionIncrements="true"/>
> >> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>
> >>
> >> 
> >>
> >> My input is a utf-8 encoded file with one sentence per line.  Its total
> >> size
> >> is about 60MB.  I would like each line of the file to correspond to a
> >> single
> >> document in the solr index.  If I print the number of unique lines in
> the
> >> file (using cat | sort | uniq | wc -l), I get a little over 2M.
>  Printing
> >> the total number of lines in the file gives me around 2.7M.
> >>
> >> I use the following to start indexing:
> >>
> >> curl
> >> '
> >>
> http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=
> >> \'
> >>
> >> When this command completes, I see numDocs is approximately 470k (which
> >> is
> >> what I find strange) and maxDocs is approximately 890k (which is fine
> >> since
> >> I know I have around 700k duplicates).  Even more confusing is that if I
> >> run
> >> this exact command a second time without performing any other
> operations,
> >> numDocs goes up to around 610k, and a third time brings it up to about
> >> 750k.
> >>
> >> Can anyone tell me what might cause Solr not to index everything in my
> >> input
> >> file the first time, and why it would be able to index new documents the
> >> second and third times?
> >>
> >> I also have this line in solrconfig.xml, if it matters:
> >>
> >>  >> multipartUploadLimitInKB="2048" />
> >>
> >> Thanks,
> >> Dan
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-%28Solr-1.4%29-tp27026926p27061086.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Strange Behavior When Using CSVRequestHandler

2010-01-07 Thread danben

Erick - thanks very much, all of this makes sense.  But the one thing I still
find puzzling is the fact that re-adding the file a second, third, fourth
etc time causes numDocs to increase, and ALWAYS by the same amount
(141,645).  Any ideas as to what could cause that?

Dan


Erick Erickson wrote:
> 
> I think the root of your problem is that unique fields should NOT
> be multivalued. See
> http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
> 
> <http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)>In
> this case, since you're tokenizing, your "query" field is
> implicitly multi-valued, I don't know what the behavior will be.
> 
> But there's another problem:
> All the filters in your analyzer definition will mess up the
> correspondence between the Unix uniq and numDocs even
> if you got by the above. I.e
> 
> StopFilter would make the lines "a problem" and "the problem" identical.
> WordDelimiter would do all kinds of interesting things
> LowerCaseFilter would make "Myproblem" and "myproblem" identical.
> RemoveDuplicatesFilter would make "interesting interesting" and
> "interesting" identical
> 
> You could define a second field, make *that* one unique and NOT analyzer
> it in any way...
> 
> You could hash your sentences and define the hash as your unique key.
> 
> You could
> 
> HTH
> Erick
> 
> On Wed, Jan 6, 2010 at 1:06 PM, danben  wrote:
> 
>>
>> The problem:
>>
>> Not all of the documents that I expect to be indexed are showing up in
>> the
>> index.
>>
>> The background:
>>
>> I start off with an empty index based on a schema with a single field
>> named
>> 'query', marked as unique and using the following analyzer:
>>
>> 
>>
>>> words="stopwords.txt" enablePositionIncrements="true"/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>
>> 
>>
>> My input is a utf-8 encoded file with one sentence per line.  Its total
>> size
>> is about 60MB.  I would like each line of the file to correspond to a
>> single
>> document in the solr index.  If I print the number of unique lines in the
>> file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
>> the total number of lines in the file gives me around 2.7M.
>>
>> I use the following to start indexing:
>>
>> curl
>> '
>> http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=
>> \'
>>
>> When this command completes, I see numDocs is approximately 470k (which
>> is
>> what I find strange) and maxDocs is approximately 890k (which is fine
>> since
>> I know I have around 700k duplicates).  Even more confusing is that if I
>> run
>> this exact command a second time without performing any other operations,
>> numDocs goes up to around 610k, and a third time brings it up to about
>> 750k.
>>
>> Can anyone tell me what might cause Solr not to index everything in my
>> input
>> file the first time, and why it would be able to index new documents the
>> second and third times?
>>
>> I also have this line in solrconfig.xml, if it matters:
>>
>> > multipartUploadLimitInKB="2048" />
>>
>> Thanks,
>> Dan
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-%28Solr-1.4%29-tp27026926p27061086.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread Erick Erickson
I think the root of your problem is that unique fields should NOT
be multivalued. See
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)

<http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)>In
this case, since you're tokenizing, your "query" field is
implicitly multi-valued, I don't know what the behavior will be.

But there's another problem:
All the filters in your analyzer definition will mess up the
correspondence between the Unix uniq and numDocs even
if you got by the above. I.e

StopFilter would make the lines "a problem" and "the problem" identical.
WordDelimiter would do all kinds of interesting things
LowerCaseFilter would make "Myproblem" and "myproblem" identical.
RemoveDuplicatesFilter would make "interesting interesting" and
"interesting" identical

You could define a second field, make *that* one unique and NOT analyzer
it in any way...

You could hash your sentences and define the hash as your unique key.

You could

HTH
Erick

On Wed, Jan 6, 2010 at 1:06 PM, danben  wrote:

>
> The problem:
>
> Not all of the documents that I expect to be indexed are showing up in the
> index.
>
> The background:
>
> I start off with an empty index based on a schema with a single field named
> 'query', marked as unique and using the following analyzer:
>
> 
>
> words="stopwords.txt" enablePositionIncrements="true"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>
> 
>
> My input is a utf-8 encoded file with one sentence per line.  Its total
> size
> is about 60MB.  I would like each line of the file to correspond to a
> single
> document in the solr index.  If I print the number of unique lines in the
> file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
> the total number of lines in the file gives me around 2.7M.
>
> I use the following to start indexing:
>
> curl
> '
> http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=
> \'
>
> When this command completes, I see numDocs is approximately 470k (which is
> what I find strange) and maxDocs is approximately 890k (which is fine since
> I know I have around 700k duplicates).  Even more confusing is that if I
> run
> this exact command a second time without performing any other operations,
> numDocs goes up to around 610k, and a third time brings it up to about
> 750k.
>
> Can anyone tell me what might cause Solr not to index everything in my
> input
> file the first time, and why it would be able to index new documents the
> second and third times?
>
> I also have this line in solrconfig.xml, if it matters:
>
>  multipartUploadLimitInKB="2048" />
>
> Thanks,
> Dan
>
> --
> View this message in context:
> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread danben

The problem:

Not all of the documents that I expect to be indexed are showing up in the
index.

The background:

I start off with an empty index based on a schema with a single field named
'query', marked as unique and using the following analyzer:









My input is a utf-8 encoded file with one sentence per line.  Its total size
is about 60MB.  I would like each line of the file to correspond to a single
document in the solr index.  If I print the number of unique lines in the
file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
the total number of lines in the file gives me around 2.7M.

I use the following to start indexing:

curl
'http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=\'

When this command completes, I see numDocs is approximately 470k (which is
what I find strange) and maxDocs is approximately 890k (which is fine since
I know I have around 700k duplicates).  Even more confusing is that if I run
this exact command a second time without performing any other operations,
numDocs goes up to around 610k, and a third time brings it up to about 750k.

Can anyone tell me what might cause Solr not to index everything in my input
file the first time, and why it would be able to index new documents the
second and third times?

I also have this line in solrconfig.xml, if it matters:



Thanks,
Dan

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: SOLR - extremely strange behavior! Documents disappeared...

2009-08-18 Thread Fuad Efendi
UPDATE:

Crazy staff with SLES10 SP2 default installation/partitioning, LVM (Logical
Volume Manager) shows 400Gb available, but... I lost 90% of index without
even noticing that!

Aug 16, 2009 8:04:32 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java)

- then somehow no any exceptions without few hours, no any corrupted index
after several commits, then again "not enough space", etc.; finally
corrupted index (still, SATA)


Thanks


-Original Message-
From: Funtick [mailto:f...@efendi.ca] 
Sent: August-18-09 12:25 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR  - extremely strange behavior! Documents
disappeared...


sorry for typo in prev msg,

Increase = 2,297,231 - 1,786,552  = 500,000 (average)

RATE (non-unique-id:unique-id) = 7,000,000 : 500,000 = 14:1

but 125:1 (initial 30 hours) was very strange...



Funtick wrote:
> 
> UPDATE:
> 
> After few more minutes (after previous commit):
> docsPending: about 7,000,000
> 
> After commit:
> numDocs: 2,297,231
> 
> Increase = 2,297,231 - 1,281,851 = 1,000,000 (average)
> 
> So that I have 7 docs with same ID in average.
> 
> Having 100,000,000 and then dropping below 1,000,000 is strange; it is a
> bug somewhere... need to investigate ramBufferSize and MergePolicy,
> including SOLR uniqueId implementation...
> 
> 
> 
> Funtick wrote:
>> 
>> After running an application which heavily uses MD5 HEX-representation as
>>  for SOLR v.1.4-dev-trunk:
>> 
>> 1. After 30 hours: 
>> 101,000,000 documents added
>> 
>> 2. Commit: 
>> numDocs = 783,714 
>> maxDoc = 3,975,393
>> 
>> 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
>> optimize:
>> numDocs=1,281,851
>> maxDocs=1,281,851
>> 
>> It looks _extremely_ strange that within an hour I have such a huge
>> increase with same 'average' document set...
>> 
>> I am suspecting something goes wrong with Lucene buffer flush / index
>> merge OR SOLR - Unique ID handling...
>> 
>> According to my own estimates, I should have about 10,000,000 new
>> documents now... I had 0.5 millions within an hour, and 0.8 mlns within a
>> day; same 'random' documents.
>> 
>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>> Gb. Why? I haven't issued any "commit"...
>> 
>> I am using ramBufferMB=8192
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-D
ocuments-disappeared...-tp25017728p25018263.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: SOLR - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

sorry for typo in prev msg,

Increase = 2,297,231 - 1,786,552  = 500,000 (average)

RATE (non-unique-id:unique-id) = 7,000,000 : 500,000 = 14:1

but 125:1 (initial 30 hours) was very strange...



Funtick wrote:
> 
> UPDATE:
> 
> After few more minutes (after previous commit):
> docsPending: about 7,000,000
> 
> After commit:
> numDocs: 2,297,231
> 
> Increase = 2,297,231 - 1,281,851 = 1,000,000 (average)
> 
> So that I have 7 docs with same ID in average.
> 
> Having 100,000,000 and then dropping below 1,000,000 is strange; it is a
> bug somewhere... need to investigate ramBufferSize and MergePolicy,
> including SOLR uniqueId implementation...
> 
> 
> 
> Funtick wrote:
>> 
>> After running an application which heavily uses MD5 HEX-representation as
>>  for SOLR v.1.4-dev-trunk:
>> 
>> 1. After 30 hours: 
>> 101,000,000 documents added
>> 
>> 2. Commit: 
>> numDocs = 783,714 
>> maxDoc = 3,975,393
>> 
>> 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
>> optimize:
>> numDocs=1,281,851
>> maxDocs=1,281,851
>> 
>> It looks _extremely_ strange that within an hour I have such a huge
>> increase with same 'average' document set...
>> 
>> I am suspecting something goes wrong with Lucene buffer flush / index
>> merge OR SOLR - Unique ID handling...
>> 
>> According to my own estimates, I should have about 10,000,000 new
>> documents now... I had 0.5 millions within an hour, and 0.8 mlns within a
>> day; same 'random' documents.
>> 
>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>> Gb. Why? I haven't issued any "commit"...
>> 
>> I am using ramBufferMB=8192
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25018263.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

UPDATE:

After few more minutes (after previous commit):
docsPending: about 7,000,000

After commit:
numDocs: 2,297,231

Increase = 2,297,231 - 1,281,851 = 1,000,000 (average)

So that I have 7 docs with same ID in average.

Having 100,000,000 and then dropping below 1,000,000 is strange; it is a bug
somewhere... need to investigate ramBufferSize and MergePolicy, including
SOLR uniqueId implementation...



Funtick wrote:
> 
> After running an application which heavily uses MD5 HEX-representation as
>  for SOLR v.1.4-dev-trunk:
> 
> 1. After 30 hours: 
> 101,000,000 documents added
> 
> 2. Commit: 
> numDocs = 783,714 
> maxDoc = 3,975,393
> 
> 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
> optimize:
> numDocs=1,281,851
> maxDocs=1,281,851
> 
> It looks _extremely_ strange that within an hour I have such a huge
> increase with same 'average' document set...
> 
> I am suspecting something goes wrong with Lucene buffer flush / index
> merge OR SOLR - Unique ID handling...
> 
> According to my own estimates, I should have about 10,000,000 new
> documents now... I had 0.5 millions within an hour, and 0.8 mlns within a
> day; same 'random' documents.
> 
> This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
> Why? I haven't issued any "commit"...
> 
> I am using ramBufferMB=8192
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25018221.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

One more hour, and I have +0.5 mlns more (after commit/optimize)

Something strange happening with SOLR buffer flush (if we have single
segment???)... explicit commit prevents it...

30 hours, with index flush, commit: 783,714
+ 1 hour, commit, optimize: 1,281,851
+ 1 hour, commit, optimize: 1,786,552

Same random docs retrieved from web...



Funtick wrote:
> 
> 
> But how to explain that within an hour (after commit) I have had about
> 500,000 new documents, and within 30 hours (after commit) only 783,714?
> 
> Same _random_enough_ documents... 
> 
> BTW, SOLR Console was showing only few hundreds "deletesById" although I
> don't use any deleteById explicitly; only "update" with "allowOverwrite"
> and "uniqueId".
> 
> 
> 
> 
> markrmiller wrote:
>> 
>> I'd say you have a lot of documents that have the same id.
>> When you add a doc with the same id, first the old one is deleted, then
>> the
>> new one is added (atomically though).
>> 
>> The deleted docs are not removed from the index immediately though - the
>> doc
>> id is just marked as deleted.
>> 
>> Over time though, as segments are merged due to hitting triggers while
>> adding new documents, deletes are removed (which deletes depends on which
>> segments have been merged).
>> 
>> So if you add a tone of documents over time, many with the same ids, you
>> would likely see this type of maxDoc, numDoc churn. maxDoc will include
>> deleted docs while numDoc will not.
>> 
>> 
>> -- 
>> - Mark
>> 
>> http://www.lucidimagination.com
>> 
>> On Mon, Aug 17, 2009 at 11:09 PM, Funtick  wrote:
>> 
>>>
>>> After running an application which heavily uses MD5 HEX-representation
>>> as
>>>  for SOLR v.1.4-dev-trunk:
>>>
>>> 1. After 30 hours:
>>> 101,000,000 documents added
>>>
>>> 2. Commit:
>>> numDocs = 783,714
>>> maxDoc = 3,975,393
>>>
>>> 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
>>> optimize:
>>> numDocs=1,281,851
>>> maxDocs=1,281,851
>>>
>>> It looks _extremely_ strange that within an hour I have such a huge
>>> increase
>>> with same 'average' document set...
>>>
>>> I am suspecting something goes wrong with Lucene buffer flush / index
>>> merge
>>> OR SOLR - Unique ID handling...
>>>
>>> According to my own estimates, I should have about 10,000,000 new
>>> documents
>>> now... I had 0.5 millions within an hour, and 0.8 mlns within a day;
>>> same
>>> 'random' documents.
>>>
>>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>>> Gb.
>>> Why? I haven't issued any "commit"...
>>>
>>> I am using ramBufferMB=8192
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017967.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick


But how to explain that within an hour (after commit) I have had about
500,000 new documents, and within 30 hours (after commit) only 1,300,000?

Same _random_enough_ documents... 

BTW, SOLR Console was showing only few hundreds "deletesById" although I
don't use any deleteById explicitly; only "update" with "allowOverwrite" and
"uniqueId".




markrmiller wrote:
> 
> I'd say you have a lot of documents that have the same id.
> When you add a doc with the same id, first the old one is deleted, then
> the
> new one is added (atomically though).
> 
> The deleted docs are not removed from the index immediately though - the
> doc
> id is just marked as deleted.
> 
> Over time though, as segments are merged due to hitting triggers while
> adding new documents, deletes are removed (which deletes depends on which
> segments have been merged).
> 
> So if you add a tone of documents over time, many with the same ids, you
> would likely see this type of maxDoc, numDoc churn. maxDoc will include
> deleted docs while numDoc will not.
> 
> 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
> 
> On Mon, Aug 17, 2009 at 11:09 PM, Funtick  wrote:
> 
>>
>> After running an application which heavily uses MD5 HEX-representation as
>>  for SOLR v.1.4-dev-trunk:
>>
>> 1. After 30 hours:
>> 101,000,000 documents added
>>
>> 2. Commit:
>> numDocs = 783,714
>> maxDoc = 3,975,393
>>
>> 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
>> optimize:
>> numDocs=1,281,851
>> maxDocs=1,281,851
>>
>> It looks _extremely_ strange that within an hour I have such a huge
>> increase
>> with same 'average' document set...
>>
>> I am suspecting something goes wrong with Lucene buffer flush / index
>> merge
>> OR SOLR - Unique ID handling...
>>
>> According to my own estimates, I should have about 10,000,000 new
>> documents
>> now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
>> 'random' documents.
>>
>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>> Gb.
>> Why? I haven't issued any "commit"...
>>
>> I am using ramBufferMB=8192
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017826.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Mark Miller
I'd say you have a lot of documents that have the same id.
When you add a doc with the same id, first the old one is deleted, then the
new one is added (atomically though).

The deleted docs are not removed from the index immediately though - the doc
id is just marked as deleted.

Over time though, as segments are merged due to hitting triggers while
adding new documents, deletes are removed (which deletes depends on which
segments have been merged).

So if you add a tone of documents over time, many with the same ids, you
would likely see this type of maxDoc, numDoc churn. maxDoc will include
deleted docs while numDoc will not.


-- 
- Mark

http://www.lucidimagination.com

On Mon, Aug 17, 2009 at 11:09 PM, Funtick  wrote:

>
> After running an application which heavily uses MD5 HEX-representation as
>  for SOLR v.1.4-dev-trunk:
>
> 1. After 30 hours:
> 101,000,000 documents added
>
> 2. Commit:
> numDocs = 783,714
> maxDoc = 3,975,393
>
> 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
> optimize:
> numDocs=1,281,851
> maxDocs=1,281,851
>
> It looks _extremely_ strange that within an hour I have such a huge
> increase
> with same 'average' document set...
>
> I am suspecting something goes wrong with Lucene buffer flush / index merge
> OR SOLR - Unique ID handling...
>
> According to my own estimates, I should have about 10,000,000 new documents
> now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
> 'random' documents.
>
> This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
> Why? I haven't issued any "commit"...
>
> I am using ramBufferMB=8192
>
>
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


SOLR - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

After running an application which heavily uses MD5 HEX-representation as
 for SOLR v.1.4-dev-trunk:

1. After 30 hours: 
101,000,000 documents added

2. Commit: 
numDocs = 783,714 
maxDoc = 3,975,393

3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
optimize:
numDocs=1,281,851
maxDocs=1,281,851

It looks _extremely_ strange that within an hour I have such a huge increase
with same 'average' document set...

I am suspecting something goes wrong with Lucene buffer flush / index merge
OR SOLR - Unique ID handling...

According to my own estimates, I should have about 10,000,000 new documents
now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
'random' documents.

This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
Why? I haven't issued any "commit"...

I am using ramBufferMB=8192






-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange behavior

2008-02-12 Thread Yonik Seeley
On Feb 12, 2008 9:50 AM, Traut <[EMAIL PROTECTED]> wrote:
> Thank you, it works. Stemming filter works only with lowercased words?

I've never tried it in the order you have it.
You could try the analysis admin page and report back what happens...

-Yonik


> On Feb 12, 2008 4:29 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> > Try putting the stemmer after the lowercase filter.
> > -Yonik
> >
> > On Feb 12, 2008 9:15 AM, Traut <[EMAIL PROTECTED]> wrote:
> > > Hi all
> > >
> > > Please take a look at this strange behavior (connected with stemming I
> > > suppose):
> > >
> > >
> > > type:
> > >
> > >  > > stored="false">
> > >   
> > > 
> > > 
> > > 
> > > 
> > >   
> > >   
> > > 
> > > 
> > > 
> > > 
> > >   
> > > 
> > >
> > > field:
> > >
> > >  >  stored="false"/>
> > >
> > >
> > >
> > > I'm adding a document:
> > >
> > > 99 > > name="name">Apple
> > >
> > > 
> > >
> > >
> > > Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result.
> > But
> > > "name:appl*" - 1 result
> > >
> > >
> > > Adding next document:
> > >
> > > 8 > > name="name">Somenamele
> > >
> > > 
> > >
> > >
> > > Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1
> > result
> > >
> > >
> > > What is the problem with "Apple" ? Maybe StandardTokenizer understands
> > it as
> > > trademark :) ?
> > >
> > >
> > > Thank you in advence
> > >
> > >
> > > --
> > > Best regards,
> > > Traut
> > >
> >
>
>
>
> --
> Best regards,
> Traut
>


Re: Strange behavior

2008-02-12 Thread Traut
Thank you, it works. Stemming filter works only with lowercased words?

On Feb 12, 2008 4:29 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> Try putting the stemmer after the lowercase filter.
> -Yonik
>
> On Feb 12, 2008 9:15 AM, Traut <[EMAIL PROTECTED]> wrote:
> > Hi all
> >
> > Please take a look at this strange behavior (connected with stemming I
> > suppose):
> >
> >
> > type:
> >
> >  > stored="false">
> >   
> > 
> > 
> > 
> > 
> >   
> >   
> > 
> > 
> > 
> > 
> >   
> > 
> >
> > field:
> >
> >   stored="false"/>
> >
> >
> >
> > I'm adding a document:
> >
> > 99 > name="name">Apple
> >
> > 
> >
> >
> > Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result.
> But
> > "name:appl*" - 1 result
> >
> >
> > Adding next document:
> >
> > 8 > name="name">Somenamele
> >
> > 
> >
> >
> > Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1
> result
> >
> >
> > What is the problem with "Apple" ? Maybe StandardTokenizer understands
> it as
> > trademark :) ?
> >
> >
> > Thank you in advence
> >
> >
> > --
> > Best regards,
> > Traut
> >
>



-- 
Best regards,
Traut


Re: Strange behavior

2008-02-12 Thread Yonik Seeley
Try putting the stemmer after the lowercase filter.
-Yonik

On Feb 12, 2008 9:15 AM, Traut <[EMAIL PROTECTED]> wrote:
> Hi all
>
> Please take a look at this strange behavior (connected with stemming I
> suppose):
>
>
> type:
>
>  stored="false">
>   
> 
> 
> 
> 
>   
>   
> 
> 
> 
> 
>   
> 
>
> field:
>
> 
>
>
>
> I'm adding a document:
>
> 99 name="name">Apple
>
> 
>
>
> Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result. But
> "name:appl*" - 1 result
>
>
> Adding next document:
>
> 8 name="name">Somenamele
>
> 
>
>
> Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1 result
>
>
> What is the problem with "Apple" ? Maybe StandardTokenizer understands it as
> trademark :) ?
>
>
> Thank you in advence
>
>
> --
> Best regards,
> Traut
>


Strange behavior

2008-02-12 Thread Traut
Hi all

Please take a look at this strange behavior (connected with stemming I
suppose):


type:


  




  
  




  


field:





I'm adding a document:

99Apple




Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result. But
"name:appl*" - 1 result


Adding next document:

8Somenamele




Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1 result


What is the problem with "Apple" ? Maybe StandardTokenizer understands it as
trademark :) ?


Thank you in advence


-- 
Best regards,
Traut


Re: Strange behavior MoreLikeThis Feature

2007-11-22 Thread Rishabh Joshi
Thanks Ryan. I now know the reason why.
Before I explain the reason, let me correct the mistake I made in my earlier
mail. I was not using the first document mentioned in the xml . Instead it
was this one:

  IW-02
  iPod & iPod Mini USB 2.0 Cable
  Belkin
  electronics
  connector
  car power adapter for iPod, white
  2
  11.50
  1
  false


The reason I was getting strange result was because of the character "i".
Here is what I learnt from debug info:

"debug":{
  "rawquerystring":"id:neardup06",
  "querystring":"id:neardup06",
  "parsedquery":"features:og features:en features:til features:er
features:af features:der features:ts features:se features:i features:p
features:pet features:brag features:efter features:zombier features:k
features:tilbag features:ala features:sviner features:folk
features:klassisk features:resid features:horder features:lidt
features:man features:denn",
  "parsedquery_toString":"features:og features:en features:til
features:er features:af features:der features:ts features:se
features:i features:p features:pet features:brag features:efter
features:zombier features:k features:tilbag features:ala
features:sviner features:folk features:klassisk features:resid
features:horder features:lidt features:man features:denn",
  "explain":{
"id=IW-02,internal_docid=8":"\n0.0050230525 = (MATCH) product of:\n
0.12557632 = (MATCH) sum of:\n0.12557632 = (MATCH)
weight(features:i in 8), product of:\n  0.17474915 =
queryWeight(features:i), product of:\n1.9162908 =
idf(docFreq=3)\n0.09119135 = queryNorm\n  0.71860904 =
(MATCH) fieldWeight(features:i in 8), product of:\n1.0 =
tf(termFreq(features:i)=1)\n1.9162908 = idf(docFreq=3)\n
 0.375 = fieldNorm(field=features, doc=8)\n  0.04 = coord(1/25)\n"}}}

The field "features" uses the default fieldtype - "text" in the schema.xml.
The problem was solved by adding the character "i" to the
stopwords.txtfile. the "i"s in document 2 were matched with the "i" in
"iPod" of document
1.

I still have to figure out why a single character - "i" - matched the "i" in
a word - "iPod".

Regards,
Rishabh

On 22/11/2007, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
> >
> > Now when I run the following query:
> >
> http://localhost:8080/solr/mlt?q=id:neardup06&mlt.fl=features&mlt.mindf=1&mlt.mintf=1&mlt.displayTerms=details&wt=json&indent=on
> >
>
> try adding:
>   &debugQuery=on
>
> to your query string and you can see why each document matches...
>
> My guess is that "features" uses a text field with stemming and a
> stemmed word matches
>
> ryan
>


  1   2   >