[jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module

2018-08-01 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565703#comment-16565703
 ] 

Dmitry Kan commented on LUCENE-2562:


[~arafalov] thanks for your input! Can you please elaborate on 'If Luke is 
supposed to be part of Lucene-only distribution, I guess the discussion is a 
bit more complicated' ?

> Make Luke a Lucene/Solr Module
> --
>
> Key: LUCENE-2562
> URL: https://issues.apache.org/jira/browse/LUCENE-2562
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mark Miller
>Priority: Major
>  Labels: gsoc2014
> Attachments: LUCENE-2562-Ivy.patch, LUCENE-2562-Ivy.patch, 
> LUCENE-2562-Ivy.patch, LUCENE-2562-ivy.patch, LUCENE-2562.patch, 
> LUCENE-2562.patch, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, 
> Luke-ALE-4.png, Luke-ALE-5.png, luke-javafx1.png, luke-javafx2.png, 
> luke-javafx3.png, luke1.jpg, luke2.jpg, luke3.jpg, lukeALE-documents.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> see
> "RE: Luke - in need of maintainer": 
> http://markmail.org/message/m4gsto7giltvrpuf
> "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q
> I think it would be great if there was a version of Luke that always worked 
> with trunk - and it would also be great if it was easier to match Luke jars 
> with Lucene versions.
> While I'd like to get GWT Luke into the mix as well, I think the easiest 
> starting point is to straight port Luke to another UI toolkit before 
> abstracting out DTO objects that both GWT Luke and Pivot Luke could share.
> I've started slowly converting Luke's use of thinlet to Apache Pivot. I 
> haven't/don't have a lot of time for this at the moment, but I've plugged 
> away here and there over the past work or two. There is still a *lot* to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module

2018-07-21 Thread Dmitry Kan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551676#comment-16551676
 ] 

Dmitry Kan commented on LUCENE-2562:


Hi [~steve_rowe] thanks for your support with filing the ticket. Looking to 
solve this one way or another.

Thanks [~Tomoko Uchida] for your contribution and research so far!

> Make Luke a Lucene/Solr Module
> --
>
> Key: LUCENE-2562
> URL: https://issues.apache.org/jira/browse/LUCENE-2562
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mark Miller
>Priority: Major
>  Labels: gsoc2014
> Attachments: LUCENE-2562-Ivy.patch, LUCENE-2562-Ivy.patch, 
> LUCENE-2562-Ivy.patch, LUCENE-2562-ivy.patch, LUCENE-2562.patch, 
> LUCENE-2562.patch, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, 
> Luke-ALE-4.png, Luke-ALE-5.png, luke-javafx1.png, luke-javafx2.png, 
> luke-javafx3.png, luke1.jpg, luke2.jpg, luke3.jpg, lukeALE-documents.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> see
> "RE: Luke - in need of maintainer": 
> http://markmail.org/message/m4gsto7giltvrpuf
> "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q
> I think it would be great if there was a version of Luke that always worked 
> with trunk - and it would also be great if it was easier to match Luke jars 
> with Lucene versions.
> While I'd like to get GWT Luke into the mix as well, I think the easiest 
> starting point is to straight port Luke to another UI toolkit before 
> abstracting out DTO objects that both GWT Luke and Pivot Luke could share.
> I've started slowly converting Luke's use of thinlet to Apache Pivot. I 
> haven't/don't have a lot of time for this at the moment, but I've plugged 
> away here and there over the past work or two. There is still a *lot* to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10231) Cursor value always different for last page with sorting by a date based function using NOW

2017-03-13 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907528#comment-15907528
 ] 

Dmitry Kan commented on SOLR-10231:
---

[~hossman] thanks for clarifying and suggestions. Going to test the fixed 
timestamp value for the NOW param. In the meantime we falled back to non-cursor 
pagination method. Btw, would the same issue exist in 6.x?

> Cursor value always different for last page with sorting by a date based 
> function using NOW
> ---
>
> Key: SOLR-10231
> URL: https://issues.apache.org/jira/browse/SOLR-10231
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 4.10.2
>Reporter: Dmitry Kan
>
> Cursor based results fetching is a deal breaker for search performance.
> It works extremely well when paging using sort by field(s).
> Example, that works (Id is unique field in the schema):
> Query:
> {code}
> http://solr-host:8983/solr/documents/select?q=*:*=DocumentId:76581059=AoIGAC5TU1ItNzY1ODEwNTktMQ===DocumentId=UserId+asc%2CId+desc=1
> {code}
> Response:
> {code}
> 
> 
> 0
> 4
> 
> *:*
> DocumentId
> AoIGAC5TU1ItNzY1ODEwNTktMQ==
> DocumentId:76581059
> UserId asc,Id desc
> 1
> 
> 
> 
> AoIGAC5TU1ItNzY1ODEwNTktMQ==
> 
> {code}
> nextCursorMark equals to cursorMark and so we know this is last page.
> However, sorting by function behaves differently:
> Query:
> {code}
> http://solr-host:8983/solr/documents/select?rows=1=*:*=DocumentId:76581059=AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE==DocumentId=min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))%20asc,Id%20desc
> {code}
> Response:
> {code}
> 
> 
> 0
> 6
> 
> *:*
> DocumentId
> AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE=
> DocumentId:76581059
> 
> min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))
>  asc,Id desc
> 
> 1
> 
> 
> 
> 
> 76581059
> 
> 
> AoIFQf9yFyAuU1NSLTc2NTgxMDU5LTE=
> 
> {code}
> nextCursorMark does not equal to cursorMark, which suggests there are more 
> results. Which is not true (numFound=1). And so the client goes into infinite 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10231) Cursor value always different for last page with sorting by function

2017-03-05 Thread Dmitry Kan (JIRA)
Dmitry Kan created SOLR-10231:
-

 Summary: Cursor value always different for last page with sorting 
by function
 Key: SOLR-10231
 URL: https://issues.apache.org/jira/browse/SOLR-10231
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SearchComponents - other
Affects Versions: 4.10.2
Reporter: Dmitry Kan


Cursor based results fetching is a deal breaker for search performance.
It works extremely well when paging using sort by field(s).

Example, that works (Id is unique field in the schema):
Query:
{code}
http://solr-host:8983/solr/documents/select?q=*:*=DocumentId:76581059=AoIGAC5TU1ItNzY1ODEwNTktMQ===DocumentId=UserId+asc%2CId+desc=1
{code}
Response:
{code}


0
4

*:*
DocumentId
AoIGAC5TU1ItNzY1ODEwNTktMQ==
DocumentId:76581059
UserId asc,Id desc
1



AoIGAC5TU1ItNzY1ODEwNTktMQ==

{code}

nextCursorMark equals to cursorMark and so we know this is last page.

However, sorting by function behaves differently:
Query:
{code}
http://solr-host:8983/solr/documents/select?rows=1=*:*=DocumentId:76581059=AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE==DocumentId=min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))%20asc,Id%20desc
{code}
Response:
{code}


0
6

*:*
DocumentId
AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE=
DocumentId:76581059

min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))
 asc,Id desc

1




76581059


AoIFQf9yFyAuU1NSLTc2NTgxMDU5LTE=

{code}

nextCursorMark does not equal to cursorMark, which suggests there are more 
results. Which is not true (numFound=1). And so the client goes into infinite 
loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2015-04-09 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487389#comment-14487389
 ] 

Dmitry Kan commented on SOLR-4722:
--

Thanks for the great patch. I confirm it works in solr 4.10.3, although 
recompilation was necessary.

 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, Trunk
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, SOLR-4722.patch, 
 solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-09-25 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147585#comment-14147585
 ] 

Dmitry Kan commented on SOLR-6152:
--

I'm ready to work on this, but need some guidance for the feature spec. I.e. 
what would be the most natural way of configuring prepolutated values? Should 
it be a UI feature or could it be a special config entry in solrconfig.xml? 
Thoughts?

 Pre-populating values into search parameters on the query page of solr admin
 

 Key: SOLR-6152
 URL: https://issues.apache.org/jira/browse/SOLR-6152
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: prepoluate_query_parameters_query_page.bmp


 In some use cases, it is highly desirable to be able to pre-populate the 
 query page of solr admin with specific values.
 In particular use case of mine, the solr admin user must pass a date range 
 value without which the query would fail.
 It isn't easy to remember the value format for non-solr experts, so I would 
 like to have a way of hooking that value example into the query page.
 See the screenshot attached, where I have inserted the fq parameter with date 
 range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-09-25 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147625#comment-14147625
 ] 

Dmitry Kan commented on SOLR-6152:
--

Ok, I see what you are getting at. I think I like this, sounds useful. This 
jira and what you describe may potentially reuse some code. But these two sound 
like different features to me.

I need to take first stab at this so that there is something material to 
contemplate about. Hoping to get moral support from [~steffkes] too :)



 Pre-populating values into search parameters on the query page of solr admin
 

 Key: SOLR-6152
 URL: https://issues.apache.org/jira/browse/SOLR-6152
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: prepoluate_query_parameters_query_page.bmp


 In some use cases, it is highly desirable to be able to pre-populate the 
 query page of solr admin with specific values.
 In particular use case of mine, the solr admin user must pass a date range 
 value without which the query would fail.
 It isn't easy to remember the value format for non-solr experts, so I would 
 like to have a way of hooking that value example into the query page.
 See the screenshot attached, where I have inserted the fq parameter with date 
 range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5178) Admin UI - Memory Graph on Dashboard shows NaN for unused Swap

2014-08-12 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5178:
-

Attachment: SOLR-5178.patch

a patch for solr 4.6.0. It adds a check for when both free swap and total swap 
are 0 (dividing one by another will give NaN).

 Admin UI - Memory Graph on Dashboard shows NaN for unused Swap
 --

 Key: SOLR-5178
 URL: https://issues.apache.org/jira/browse/SOLR-5178
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.3, 4.4
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-5178.patch, screenshot-vladimir.jpeg


 If the System doesn't use Swap, the displayed memory graph on the dashboard 
 shows {{NaN}} (not a number) because it tries to divide by zero.
 {code}system:{
   name:Linux,
   version:3.2.0-39-virtual,
   arch:amd64,
   systemLoadAverage:3.38,
   committedVirtualMemorySize:32454287360,
   freePhysicalMemorySize:912945152,
   freeSwapSpaceSize:0,
   processCpuTime:5627465000,
   totalPhysicalMemorySize:71881908224,
   totalSwapSpaceSize:0,
   openFileDescriptorCount:350,
   maxFileDescriptorCount:4096,
   uname: Linux ip-xxx-xxx-xxx-xxx 3.2.0-39-virtual #62-Ubuntu SMP Thu 
 Feb 28 00:48:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux\n,
   uptime: 11:24:39 up 4 days, 23:03, 1 user, load average: 3.38, 3.10, 
 2.95\n
 }{code}
 We should add an additional check for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2014-06-11 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028325#comment-14028325
 ] 

Dmitry Kan commented on SOLR-3585:
--

I would agree with [~dsmiley]. Every good api (and to some extent Solr is an 
api in the client view) takes advantage of multi-threading by itself. In this 
case a client can be as thin as possible and not care about threads. And if 
client has enough idle cpus, sure, it could post in parallel. For example, we 
run solr on pretty beefy machines with lots of cpu cores and most of the time 
those are idling.

Some of the latest findings of ours with soft commits and high posting pressure 
show, that posting may sometimes even fail and failed docs re-posting fixes the 
issue.

 processing updates in multiple threads
 --

 Key: SOLR-3585
 URL: https://issues.apache.org/jira/browse/SOLR-3585
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0-ALPHA, 5.0
Reporter: Mikhail Khludnev
 Attachments: SOLR-3585.patch, SOLR-3585.patch, multithreadupd.patch, 
 report.tar.gz


 Hello,
 I'd like to contribute update processor which forks many threads which 
 concurrently process the stream of commands. It may be beneficial for users 
 who streams many docs through single request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-06-09 Thread Dmitry Kan (JIRA)
Dmitry Kan created SOLR-6152:


 Summary: Pre-populating values into search parameters on the query 
page of solr admin
 Key: SOLR-6152
 URL: https://issues.apache.org/jira/browse/SOLR-6152
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: prepoluate_query_parameters_query_page.bmp

In some use cases, it is highly desirable to be able to pre-populate the query 
page of solr admin with specific values.

In particular use case of mine, the solr admin user must pass a date range 
value without which the query would fail.

It isn't easy to remember the value format for non-solr experts, so I would 
like to have a way of hooking that value example into the query page.

See the screenshot attached, where I have inserted the fq parameter with date 
range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin

2014-06-09 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-6152:
-

Attachment: prepoluate_query_parameters_query_page.bmp

screenshot of query page

 Pre-populating values into search parameters on the query page of solr admin
 

 Key: SOLR-6152
 URL: https://issues.apache.org/jira/browse/SOLR-6152
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: prepoluate_query_parameters_query_page.bmp


 In some use cases, it is highly desirable to be able to pre-populate the 
 query page of solr admin with specific values.
 In particular use case of mine, the solr admin user must pass a date range 
 value without which the query would fail.
 It isn't easy to remember the value format for non-solr experts, so I would 
 like to have a way of hooking that value example into the query page.
 See the screenshot attached, where I have inserted the fq parameter with date 
 range into the Raw Query Parameters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2014-03-24 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Labels: patch  (was: )

 Solr sends all doc ids to all shards in the query counting facets
 -

 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4, 4.3, 4.3.1
Reporter: Dmitry Kan

 Setup: front end solr and shards.
 Summary: solr frontend sends all doc ids received from QueryComponent to all 
 shards which causes POST request buffer size overflow.
 Symptoms:
 The query is: http://pastebin.com/0DndK1Cs
 I have omitted the shards parameter.
 The router log: http://pastebin.com/FTVH1WF3
 Notice the port of a shard, that is affected. That port changes all the time, 
 even for the same request
 The log entry is prepended with lines:
 SEVERE: org.apache.solr.common.SolrException: Internal Server Error
 Internal Server Error
 (they are not in the pastebin link)
 The shard log: http://pastebin.com/exwCx3LX
 Suggestion: change the data structure in FacetComponent to send only doc ids 
 that belong to a shard and not a concatenation of all doc ids.
 Why is this important: for scaling. Adding more shards will result in 
 overflowing the POST request buffer at some point anyway.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2014-03-24 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Labels:   (was: patch)

 Solr sends all doc ids to all shards in the query counting facets
 -

 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4, 4.3, 4.3.1
Reporter: Dmitry Kan

 Setup: front end solr and shards.
 Summary: solr frontend sends all doc ids received from QueryComponent to all 
 shards which causes POST request buffer size overflow.
 Symptoms:
 The query is: http://pastebin.com/0DndK1Cs
 I have omitted the shards parameter.
 The router log: http://pastebin.com/FTVH1WF3
 Notice the port of a shard, that is affected. That port changes all the time, 
 even for the same request
 The log entry is prepended with lines:
 SEVERE: org.apache.solr.common.SolrException: Internal Server Error
 Internal Server Error
 (they are not in the pastebin link)
 The shard log: http://pastebin.com/exwCx3LX
 Suggestion: change the data structure in FacetComponent to send only doc ids 
 that belong to a shard and not a concatenation of all doc ids.
 Why is this important: for scaling. Adding more shards will result in 
 overflowing the POST request buffer at some point anyway.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2014-03-21 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943275#comment-13943275
 ] 

Dmitry Kan commented on SOLR-5394:
--

[~mikemccand] can you reproduce the bug with the patch?

 facet.method=fcs seems to be using threads when it shouldn't
 

 Key: SOLR-5394
 URL: https://issues.apache.org/jira/browse/SOLR-5394
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Michael McCandless
 Attachments: SOLR-5394.patch, SOLR-5394.patch, 
 SOLR-5394_keep_threads_original_value.patch


 I built a wikipedia index, with multiple fields for faceting.
 When I do facet.method=fcs with facet.field=dateFacet and 
 facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
 bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
 But I thought threads for each field is turned off by default?
 Even if I add facet.threads=0, it still spins up all the threads.
 I think something is wrong in SimpleFacets.parseParams; somehow, that method 
 returns early (because localParams) is null, leaving threads=-1, and then the 
 later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2014-03-20 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394.patch

This patch sets the default threads to 1 (single thread execution) as per 
Vitaly's suggestion. Fixed the test case with unspecified threads parameter: 
the number of threads is expected to be the default (=1). The tests in 
TestSimpleFacet pass.

 facet.method=fcs seems to be using threads when it shouldn't
 

 Key: SOLR-5394
 URL: https://issues.apache.org/jira/browse/SOLR-5394
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Michael McCandless
 Attachments: SOLR-5394.patch, SOLR-5394.patch, 
 SOLR-5394_keep_threads_original_value.patch


 I built a wikipedia index, with multiple fields for faceting.
 When I do facet.method=fcs with facet.field=dateFacet and 
 facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
 bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
 But I thought threads for each field is turned off by default?
 Even if I add facet.threads=0, it still spins up all the threads.
 I think something is wrong in SimpleFacets.parseParams; somehow, that method 
 returns early (because localParams) is null, leaving threads=-1, and then the 
 later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.

2014-03-16 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937293#comment-13937293
 ] 

Dmitry Kan commented on LUCENE-3758:


[~erickerickson] right, agree, this should be handled in another jira as a 
local param. We have implemented this as an operator as we allow mixing ordered 
and unordered clauses in the same query.

 Allow the ComplexPhraseQueryParser to search order or un-order proximity 
 queries.
 -

 Key: LUCENE-3758
 URL: https://issues.apache.org/jira/browse/LUCENE-3758
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 4.0-ALPHA
Reporter: Tomás Fernández Löbbe
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-3758.patch, LUCENE-3758.patch, LUCENE-3758.patch


 The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder 
 value hardcoded to true. This could be configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact

2014-03-11 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930792#comment-13930792
 ] 

Dmitry Kan commented on SOLR-4904:
--

[~kamaci] yes, it is still valid. I would imagine that for some extreme commit 
policy cases, like soft-committing every second this might not be a good fit 
(as index changes so fast), but for other cases this sounds like a good idea.

 Send internal doc ids and index version in distributed faceting to make 
 queries more compact
 

 Key: SOLR-4904
 URL: https://issues.apache.org/jira/browse/SOLR-4904
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4, 4.3
Reporter: Dmitry Kan

 This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and 
 works nice with fixing the root cause of issue SOLR-4903.
 Basically QueryComponent could send internal lucene ids along with the index 
 version number so that in subsequent queries to other solr components, like 
 FacetComponent, the internal ids would be sent. The index version is required 
 to ensure we deal with the same index.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-10 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926081#comment-13926081
 ] 

Dmitry Kan commented on LUCENE-5422:


I agree with [~mikemccand] in that the issue should be better scoped. The case 
with compressing stemmed / non-stemmed terms posting lists is quite tricky and 
requires more thought.

One clear case for this issue is storing reversed term along with it is 
original non-reversed version. Both should point to the same posting list 
(subject to some after-stemming-hash-check).

What do you guys think?

 Postings lists deduplication
 

 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan
  Labels: gsoc2014

 The context:
 http://markmail.org/thread/tywtrjjcfdbzww6f
 Robert Muir and I have discussed what Robert eventually named postings
 lists deduplication at Berlin Buzzwords 2013 conference.
 The idea is to allow multiple terms to point to the same postings list to
 save space. This can be achieved by new index codec implementation, but this 
 jira is open to other ideas as well.
 The application / impact of this is positive for synonyms, exact / inexact
 terms, leading wildcard support via storing reversed term etc.
 For example, at the moment, when supporting exact (unstemmed) and inexact 
 (stemmed)
 searches, we store both unstemmed and stemmed variant of a word form and
 that leads to index bloating. That is why we had to remove the leading
 wildcard support via reversing a token on index and query time because of
 the same index size considerations.
 Comment from Mike McCandless:
 Neat idea!
 Would this idea allow a single term to point to (the union of) N other
 posting lists?  It seems like that's necessary e.g. to handle the
 exact/inexact case.
 And then, to produce the Docs/AndPositionsEnum you'd need to do the
 merge sort across those N posting lists?
 Such a thing might also be do-able as runtime only wrapper around the
 postings API (FieldsProducer), if you could at runtime do the reverse
 expansion (e.g. stem - all of its surface forms).
 Comment from Robert Muir:
 I think the exact/inexact is trickier (detecting it would be the hard
 part), and you are right, another solution might work better.
 but for the reverse wildcard and synonyms situation, it seems we could even
 detect it on write if we created some hash of the previous terms postings.
 if the hash matches for the current term, we know it might be a duplicate
 and would have to actually do the costly check they are the same.
 maybe there are better ways to do it, but it might be a fun postingformat
 experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

2014-03-10 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926081#comment-13926081
 ] 

Dmitry Kan edited comment on LUCENE-5422 at 3/10/14 7:27 PM:
-

I agree with [~mikemccand] in that the issue should be better scoped. The case 
with compressing stemmed / non-stemmed terms posting lists is quite tricky and 
requires more thought.

One clear case for this issue is storing reversed term along with its original 
non-reversed version. Both should point to the same posting list (subject to 
some after-stemming-hash-check).

What do you guys think?


was (Author: dmitry_key):
I agree with [~mikemccand] in that the issue should be better scoped. The case 
with compressing stemmed / non-stemmed terms posting lists is quite tricky and 
requires more thought.

One clear case for this issue is storing reversed term along with it is 
original non-reversed version. Both should point to the same posting list 
(subject to some after-stemming-hash-check).

What do you guys think?

 Postings lists deduplication
 

 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan
  Labels: gsoc2014

 The context:
 http://markmail.org/thread/tywtrjjcfdbzww6f
 Robert Muir and I have discussed what Robert eventually named postings
 lists deduplication at Berlin Buzzwords 2013 conference.
 The idea is to allow multiple terms to point to the same postings list to
 save space. This can be achieved by new index codec implementation, but this 
 jira is open to other ideas as well.
 The application / impact of this is positive for synonyms, exact / inexact
 terms, leading wildcard support via storing reversed term etc.
 For example, at the moment, when supporting exact (unstemmed) and inexact 
 (stemmed)
 searches, we store both unstemmed and stemmed variant of a word form and
 that leads to index bloating. That is why we had to remove the leading
 wildcard support via reversing a token on index and query time because of
 the same index size considerations.
 Comment from Mike McCandless:
 Neat idea!
 Would this idea allow a single term to point to (the union of) N other
 posting lists?  It seems like that's necessary e.g. to handle the
 exact/inexact case.
 And then, to produce the Docs/AndPositionsEnum you'd need to do the
 merge sort across those N posting lists?
 Such a thing might also be do-able as runtime only wrapper around the
 postings API (FieldsProducer), if you could at runtime do the reverse
 expansion (e.g. stem - all of its surface forms).
 Comment from Robert Muir:
 I think the exact/inexact is trickier (detecting it would be the hard
 part), and you are right, another solution might work better.
 but for the reverse wildcard and synonyms situation, it seems we could even
 detect it on write if we created some hash of the previous terms postings.
 if the hash matches for the current term, we know it might be a duplicate
 and would have to actually do the costly check they are the same.
 maybe there are better ways to do it, but it might be a fun postingformat
 experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

2014-03-05 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921155#comment-13921155
 ] 

Dmitry Kan edited comment on LUCENE-5422 at 3/5/14 6:23 PM:


[~Vishmi Money]

LUCENE-2082 deals with segment merging which is _process_ performed on Lucene 
index every now and then.

This jira deals with the index _structure_ and suggests that compression of 
index can be achieved for certain (described) use cases. While these jiras are 
related, this jira can be considered as standalone project in itself.

perhaps [~otis] could add something?


was (Author: dmitry_key):
[~Vishmi Money]

LUCENE-2082 deals with segment merging which is iprocess/i performed on 
Lucene index every now and then.

This jira deals with the index emstructure/em and suggests that compression 
of index can be achieved for certain (described) use cases. While these jiras 
are related, this jira can be considered as standalone project in itself.

perhaps [~otis] could add something?

 Postings lists deduplication
 

 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan
  Labels: gsoc2014

 The context:
 http://markmail.org/thread/tywtrjjcfdbzww6f
 Robert Muir and I have discussed what Robert eventually named postings
 lists deduplication at Berlin Buzzwords 2013 conference.
 The idea is to allow multiple terms to point to the same postings list to
 save space. This can be achieved by new index codec implementation, but this 
 jira is open to other ideas as well.
 The application / impact of this is positive for synonyms, exact / inexact
 terms, leading wildcard support via storing reversed term etc.
 For example, at the moment, when supporting exact (unstemmed) and inexact 
 (stemmed)
 searches, we store both unstemmed and stemmed variant of a word form and
 that leads to index bloating. That is why we had to remove the leading
 wildcard support via reversing a token on index and query time because of
 the same index size considerations.
 Comment from Mike McCandless:
 Neat idea!
 Would this idea allow a single term to point to (the union of) N other
 posting lists?  It seems like that's necessary e.g. to handle the
 exact/inexact case.
 And then, to produce the Docs/AndPositionsEnum you'd need to do the
 merge sort across those N posting lists?
 Such a thing might also be do-able as runtime only wrapper around the
 postings API (FieldsProducer), if you could at runtime do the reverse
 expansion (e.g. stem - all of its surface forms).
 Comment from Robert Muir:
 I think the exact/inexact is trickier (detecting it would be the hard
 part), and you are right, another solution might work better.
 but for the reverse wildcard and synonyms situation, it seems we could even
 detect it on write if we created some hash of the previous terms postings.
 if the hash matches for the current term, we know it might be a duplicate
 and would have to actually do the costly check they are the same.
 maybe there are better ways to do it, but it might be a fun postingformat
 experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-05 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921155#comment-13921155
 ] 

Dmitry Kan commented on LUCENE-5422:


[~Vishmi Money]

LUCENE-2082 deals with segment merging which is iprocess/i performed on 
Lucene index every now and then.

This jira deals with the index emstructure/em and suggests that compression 
of index can be achieved for certain (described) use cases. While these jiras 
are related, this jira can be considered as standalone project in itself.

perhaps [~otis] could add something?

 Postings lists deduplication
 

 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan
  Labels: gsoc2014

 The context:
 http://markmail.org/thread/tywtrjjcfdbzww6f
 Robert Muir and I have discussed what Robert eventually named postings
 lists deduplication at Berlin Buzzwords 2013 conference.
 The idea is to allow multiple terms to point to the same postings list to
 save space. This can be achieved by new index codec implementation, but this 
 jira is open to other ideas as well.
 The application / impact of this is positive for synonyms, exact / inexact
 terms, leading wildcard support via storing reversed term etc.
 For example, at the moment, when supporting exact (unstemmed) and inexact 
 (stemmed)
 searches, we store both unstemmed and stemmed variant of a word form and
 that leads to index bloating. That is why we had to remove the leading
 wildcard support via reversing a token on index and query time because of
 the same index size considerations.
 Comment from Mike McCandless:
 Neat idea!
 Would this idea allow a single term to point to (the union of) N other
 posting lists?  It seems like that's necessary e.g. to handle the
 exact/inexact case.
 And then, to produce the Docs/AndPositionsEnum you'd need to do the
 merge sort across those N posting lists?
 Such a thing might also be do-able as runtime only wrapper around the
 postings API (FieldsProducer), if you could at runtime do the reverse
 expansion (e.g. stem - all of its surface forms).
 Comment from Robert Muir:
 I think the exact/inexact is trickier (detecting it would be the hard
 part), and you are right, another solution might work better.
 but for the reverse wildcard and synonyms situation, it seems we could even
 detect it on write if we created some hash of the previous terms postings.
 if the hash matches for the current term, we know it might be a duplicate
 and would have to actually do the costly check they are the same.
 maybe there are better ways to do it, but it might be a fun postingformat
 experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-13 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900240#comment-13900240
 ] 

Dmitry Kan commented on SOLR-5697:
--

Hoss: thanks for looking into this. I can confirm all test cases work fine with 
solr 4.7 (solr-4.7-2014-02-12_02-54-24.tgz). I'm guessing very little chance 
this gets backported to solr 4.3.1? BTW, using exact same configs didn't 
produce an NPE for solr 4.7 (it gets thrown as you said for 4.6.1 however).

 Delete by query does not work properly with customly configured query parser
 

 Key: SOLR-5697
 URL: https://issues.apache.org/jira/browse/SOLR-5697
 Project: Solr
  Issue Type: Bug
  Components: query parsers, update
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Fix For: 5.0, 4.7

 Attachments: query_parser_maven_project.tgz, shard.tgz


 The shard with the configuration illustrating the issue is attached. Since 
 the size of the archive exceed the upload limit, I have dropped the solr.war 
 from the webapps directory. Please add it (SOLR 4.3.1).
 Also attached is example query parser maven project. The binary has been 
 already deployed onto lib directories of each core.
 Start the shard using startUp_multicore.sh.
 1. curl 
 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequeryTitle:this_title/query/delete' -H 
 Content-type:text/xml
 This query produces an exception:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 2. Change the multicore/metadata/solrconfig.xml and 
 multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
 requestHandler name=/select.
 Issue the same query. Result is same:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 3. Keep the same config as in 2. and specify query parser in the local params:
 curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' 
 -H Content-type:text/xml
 The result:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime3/int/lstlst name=errorstr name=msgno field name 
 specified in query and no default specified via 'df' param/strint 
 name=code400/int/lst
 /response
 The reason being because our query parser is mis-behaving in that it 
 removes colons from the input queries = we get on the server side:
 Modified input query: Title:this_title --- Titlethis_title
 5593 [qtp2121668094-15] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
 webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31
 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
 org.apache.solr.common.SolrException: no field name specified in query and no 
 default specified via 'df' param
   at 
 org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
   at 
 org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
   at 
 org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
   at 
 org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
   at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
 org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
   at 
 org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
   at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
   

[jira] [Closed] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-13 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan closed SOLR-5697.



works as expected with solr 4.7. See previous comment.

 Delete by query does not work properly with customly configured query parser
 

 Key: SOLR-5697
 URL: https://issues.apache.org/jira/browse/SOLR-5697
 Project: Solr
  Issue Type: Bug
  Components: query parsers, update
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Fix For: 5.0, 4.7

 Attachments: query_parser_maven_project.tgz, shard.tgz


 The shard with the configuration illustrating the issue is attached. Since 
 the size of the archive exceed the upload limit, I have dropped the solr.war 
 from the webapps directory. Please add it (SOLR 4.3.1).
 Also attached is example query parser maven project. The binary has been 
 already deployed onto lib directories of each core.
 Start the shard using startUp_multicore.sh.
 1. curl 
 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequeryTitle:this_title/query/delete' -H 
 Content-type:text/xml
 This query produces an exception:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 2. Change the multicore/metadata/solrconfig.xml and 
 multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
 requestHandler name=/select.
 Issue the same query. Result is same:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 3. Keep the same config as in 2. and specify query parser in the local params:
 curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' 
 -H Content-type:text/xml
 The result:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime3/int/lstlst name=errorstr name=msgno field name 
 specified in query and no default specified via 'df' param/strint 
 name=code400/int/lst
 /response
 The reason being because our query parser is mis-behaving in that it 
 removes colons from the input queries = we get on the server side:
 Modified input query: Title:this_title --- Titlethis_title
 5593 [qtp2121668094-15] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
 webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31
 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
 org.apache.solr.common.SolrException: no field name specified in query and no 
 default specified via 'df' param
   at 
 org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
   at 
 org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
   at 
 org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
   at 
 org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
   at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
 org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
   at 
 org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
   at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
   at 
 org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
   at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
   at 
 

[jira] [Created] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)
Dmitry Kan created SOLR-5697:


 Summary: Delete by query does not work properly with customly 
configured query parser
 Key: SOLR-5697
 URL: https://issues.apache.org/jira/browse/SOLR-5697
 Project: Solr
  Issue Type: Bug
  Components: query parsers, update
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: query_parser_maven_project.tgz

The shard with the configuration illustrating the issue is attached.
Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequeryTitle:this_title/query/delete' -H 
Content-type:text/xml

This query produces an exception:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name=/select.

Issue the same query. Result is same:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H 
Content-type:text/xml


The result:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime3/int/lstlst name=errorstr name=msgno field name 
specified in query and no default specified via 'df' param/strint 
name=code400/int/lst
/response


The reason being because our query parser is mis-behaving in that it removes 
colons from the input queries = we get on the server side:

Modified input query: Title:this_title --- Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=oncommit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Attachment: query_parser_maven_project.tgz

 Delete by query does not work properly with customly configured query parser
 

 Key: SOLR-5697
 URL: https://issues.apache.org/jira/browse/SOLR-5697
 Project: Solr
  Issue Type: Bug
  Components: query parsers, update
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: query_parser_maven_project.tgz


 The shard with the configuration illustrating the issue is attached.
 Also attached is example query parser maven project. The binary has been 
 already deployed onto lib directories of each core.
 Start the shard using startUp_multicore.sh.
 1. curl 
 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequeryTitle:this_title/query/delete' -H 
 Content-type:text/xml
 This query produces an exception:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 2. Change the multicore/metadata/solrconfig.xml and 
 multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
 requestHandler name=/select.
 Issue the same query. Result is same:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 3. Keep the same config as in 2. and specify query parser in the local params:
 curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' 
 -H Content-type:text/xml
 The result:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime3/int/lstlst name=errorstr name=msgno field name 
 specified in query and no default specified via 'df' param/strint 
 name=code400/int/lst
 /response
 The reason being because our query parser is mis-behaving in that it 
 removes colons from the input queries = we get on the server side:
 Modified input query: Title:this_title --- Titlethis_title
 5593 [qtp2121668094-15] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
 webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31
 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
 org.apache.solr.common.SolrException: no field name specified in query and no 
 default specified via 'df' param
   at 
 org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
   at 
 org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
   at 
 org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
   at 
 org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
   at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
 org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
   at 
 org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
   at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
   at 
 org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
   at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Description: 
The shard with the configuration illustrating the issue is attached. Since the 
size of the archive exceed the upload limit, I have dropped the solr.war from 
the webapps. Please add it (SOLR 4.3.1).


Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequeryTitle:this_title/query/delete' -H 
Content-type:text/xml

This query produces an exception:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name=/select.

Issue the same query. Result is same:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H 
Content-type:text/xml


The result:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime3/int/lstlst name=errorstr name=msgno field name 
specified in query and no default specified via 'df' param/strint 
name=code400/int/lst
/response


The reason being because our query parser is mis-behaving in that it removes 
colons from the input queries = we get on the server side:

Modified input query: Title:this_title --- Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=oncommit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Attachment: shard.tgz

shard with config files without solr.war file.

 Delete by query does not work properly with customly configured query parser
 

 Key: SOLR-5697
 URL: https://issues.apache.org/jira/browse/SOLR-5697
 Project: Solr
  Issue Type: Bug
  Components: query parsers, update
Affects Versions: 4.3.1
Reporter: Dmitry Kan
 Attachments: query_parser_maven_project.tgz, shard.tgz


 The shard with the configuration illustrating the issue is attached. Since 
 the size of the archive exceed the upload limit, I have dropped the solr.war 
 from the webapps. Please add it (SOLR 4.3.1).
 Also attached is example query parser maven project. The binary has been 
 already deployed onto lib directories of each core.
 Start the shard using startUp_multicore.sh.
 1. curl 
 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequeryTitle:this_title/query/delete' -H 
 Content-type:text/xml
 This query produces an exception:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 2. Change the multicore/metadata/solrconfig.xml and 
 multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
 requestHandler name=/select.
 Issue the same query. Result is same:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
 parser 'lucene'/strint name=code400/int/lst
 /response
 3. Keep the same config as in 2. and specify query parser in the local params:
 curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
 --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' 
 -H Content-type:text/xml
 The result:
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime3/int/lstlst name=errorstr name=msgno field name 
 specified in query and no default specified via 'df' param/strint 
 name=code400/int/lst
 /response
 The reason being because our query parser is mis-behaving in that it 
 removes colons from the input queries = we get on the server side:
 Modified input query: Title:this_title --- Titlethis_title
 5593 [qtp2121668094-15] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] 
 webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31
 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
 org.apache.solr.common.SolrException: no field name specified in query and no 
 default specified via 'df' param
   at 
 org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
   at 
 org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
   at 
 org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
   at 
 org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
   at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
 org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
   at 
 org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
   at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
   at 
 org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
   at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
   at 
 

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Description: 
The shard with the configuration illustrating the issue is attached. Since the 
size of the archive exceed the upload limit, I have dropped the solr.war from 
the webapps directory. Please add it (SOLR 4.3.1).


Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequeryTitle:this_title/query/delete' -H 
Content-type:text/xml

This query produces an exception:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name=/select.

Issue the same query. Result is same:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H 
Content-type:text/xml


The result:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime3/int/lstlst name=errorstr name=msgno field name 
specified in query and no default specified via 'df' param/strint 
name=code400/int/lst
/response


The reason being because our query parser is mis-behaving in that it removes 
colons from the input queries = we get on the server side:

Modified input query: Title:this_title --- Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=oncommit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 

[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser

2014-02-05 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5697:
-

Description: 
The shard with the configuration illustrating the issue is attached. Since the 
size of the archive exceed the upload limit, I have dropped the solr.war from 
the webapps directory. Please add it (SOLR 4.3.1).


Also attached is example query parser maven project. The binary has been 
already deployed onto lib directories of each core.

Start the shard using startUp_multicore.sh.


1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequeryTitle:this_title/query/delete' -H 
Content-type:text/xml

This query produces an exception:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime33/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


2. Change the multicore/metadata/solrconfig.xml and 
multicore/statements/solrconfig.xml by uncommenting the defType parameters on 
requestHandler name=/select.

Issue the same query. Result is same:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime30/int/lstlst name=errorstr name=msgUnknown query 
parser 'lucene'/strint name=code400/int/lst
/response


3. Keep the same config as in 2. and specify query parser in the local params:

curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' 
--data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H 
Content-type:text/xml


The result:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status400/intint 
name=QTime3/int/lstlst name=errorstr name=msgno field name 
specified in query and no default specified via 'df' param/strint 
name=code400/int/lst
/response


The reason being because our query parser is mis-behaving in that it removes 
colons from the input queries = we get on the server side:

Modified input query: Title:this_title --- Titlethis_title
5593 [qtp2121668094-15] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [metadata] webapp=/solr 
path=/update params={debugQuery=oncommit=false} {} 0 31
5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: no field name specified in query and no 
default specified via 'df' param
at 
org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319)
at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 

[jira] [Created] (LUCENE-5422) Postings lists deduplication

2014-01-30 Thread Dmitry Kan (JIRA)
Dmitry Kan created LUCENE-5422:
--

 Summary: Postings lists deduplication
 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan


The context:
http://markmail.org/thread/tywtrjjcfdbzww6f

Robert Muir and I have discussed what Robert eventually named postings
lists deduplication at Berlin Buzzwords 2013 conference.

The idea is to allow multiple terms to point to the same postings list to
save space. This can be achieved by new index codec implementation, but this 
jira is open to other ideas as well.

The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.

For example, at the moment, when supporting exact (unstemmed) and inexact 
(stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. That is why we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.

Comment from Mike McCandless:
Neat idea!

Would this idea allow a single term to point to (the union of) N other
posting lists?  It seems like that's necessary e.g. to handle the
exact/inexact case.

And then, to produce the Docs/AndPositionsEnum you'd need to do the
merge sort across those N posting lists?

Such a thing might also be do-able as runtime only wrapper around the
postings API (FieldsProducer), if you could at runtime do the reverse
expansion (e.g. stem - all of its surface forms).


Comment from Robert Muir:
I think the exact/inexact is trickier (detecting it would be the hard
part), and you are right, another solution might work better.

but for the reverse wildcard and synonyms situation, it seems we could even
detect it on write if we created some hash of the previous terms postings.
if the hash matches for the current term, we know it might be a duplicate
and would have to actually do the costly check they are the same.

maybe there are better ways to do it, but it might be a fun postingformat
experiment to try.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5422) Postings lists deduplication

2014-01-30 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated LUCENE-5422:
---

Labels: gsoc2014  (was: )

 Postings lists deduplication
 

 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan
  Labels: gsoc2014

 The context:
 http://markmail.org/thread/tywtrjjcfdbzww6f
 Robert Muir and I have discussed what Robert eventually named postings
 lists deduplication at Berlin Buzzwords 2013 conference.
 The idea is to allow multiple terms to point to the same postings list to
 save space. This can be achieved by new index codec implementation, but this 
 jira is open to other ideas as well.
 The application / impact of this is positive for synonyms, exact / inexact
 terms, leading wildcard support via storing reversed term etc.
 For example, at the moment, when supporting exact (unstemmed) and inexact 
 (stemmed)
 searches, we store both unstemmed and stemmed variant of a word form and
 that leads to index bloating. That is why we had to remove the leading
 wildcard support via reversing a token on index and query time because of
 the same index size considerations.
 Comment from Mike McCandless:
 Neat idea!
 Would this idea allow a single term to point to (the union of) N other
 posting lists?  It seems like that's necessary e.g. to handle the
 exact/inexact case.
 And then, to produce the Docs/AndPositionsEnum you'd need to do the
 merge sort across those N posting lists?
 Such a thing might also be do-able as runtime only wrapper around the
 postings API (FieldsProducer), if you could at runtime do the reverse
 expansion (e.g. stem - all of its surface forms).
 Comment from Robert Muir:
 I think the exact/inexact is trickier (detecting it would be the hard
 part), and you are right, another solution might work better.
 but for the reverse wildcard and synonyms situation, it seems we could even
 detect it on write if we created some hash of the previous terms postings.
 if the hash matches for the current term, we know it might be a duplicate
 and would have to actually do the costly check they are the same.
 maybe there are better ways to do it, but it might be a fun postingformat
 experiment to try.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2013-12-12 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394_keep_threads_original_value.patch

During debugging with facet.threads=0 I have noticed that when we advanced to 
parseParams method, threads=0 and this method resets it to -1 which breaks the 
latter logic. So I added a condition around threads=-1.

I would be happy if someone can review this little patch and give feedback.

 facet.method=fcs seems to be using threads when it shouldn't
 

 Key: SOLR-5394
 URL: https://issues.apache.org/jira/browse/SOLR-5394
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Michael McCandless
 Attachments: SOLR-5394_keep_threads_original_value.patch


 I built a wikipedia index, with multiple fields for faceting.
 When I do facet.method=fcs with facet.field=dateFacet and 
 facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
 bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
 But I thought threads for each field is turned off by default?
 Even if I add facet.threads=0, it still spins up all the threads.
 I think something is wrong in SimpleFacets.parseParams; somehow, that method 
 returns early (because localParams) is null, leaving threads=-1, and then the 
 later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2013-12-12 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: (was: SOLR-5394_keep_threads_original_value.patch)

 facet.method=fcs seems to be using threads when it shouldn't
 

 Key: SOLR-5394
 URL: https://issues.apache.org/jira/browse/SOLR-5394
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Michael McCandless
 Attachments: SOLR-5394_keep_threads_original_value.patch


 I built a wikipedia index, with multiple fields for faceting.
 When I do facet.method=fcs with facet.field=dateFacet and 
 facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
 bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
 But I thought threads for each field is turned off by default?
 Even if I add facet.threads=0, it still spins up all the threads.
 I think something is wrong in SimpleFacets.parseParams; somehow, that method 
 returns early (because localParams) is null, leaving threads=-1, and then the 
 later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2013-12-12 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394_keep_threads_original_value.patch

 facet.method=fcs seems to be using threads when it shouldn't
 

 Key: SOLR-5394
 URL: https://issues.apache.org/jira/browse/SOLR-5394
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Michael McCandless
 Attachments: SOLR-5394_keep_threads_original_value.patch


 I built a wikipedia index, with multiple fields for faceting.
 When I do facet.method=fcs with facet.field=dateFacet and 
 facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
 bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
 But I thought threads for each field is turned off by default?
 Even if I add facet.threads=0, it still spins up all the threads.
 I think something is wrong in SimpleFacets.parseParams; somehow, that method 
 returns early (because localParams) is null, leaving threads=-1, and then the 
 later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-12-11 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845661#comment-13845661
 ] 

Dmitry Kan commented on SOLR-1604:
--

[~rebeccatang] you can define a solr core (even for a single index) and use its 
lib directory to copy the complex phrase parser jar.

https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml

HTH

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhrase-4.2.1.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2013-10-15 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795092#comment-13795092
 ] 

Dmitry Kan commented on SOLR-1726:
--

[~sstults] Thanks for the use case. This leans towards offline as well, but 
certainly makes sense.
Our current use case is realtime though and we attacking the problem of deep 
paging differently at the moment (on the querying client side).

 Deep Paging and Large Results Improvements
 --

 Key: SOLR-1726
 URL: https://issues.apache.org/jira/browse/SOLR-1726
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.6

 Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
 ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, 
 SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java


 There are possibly ways to improve collections of deep paging by passing 
 Solr/Lucene more information about the last page of results seen, thereby 
 saving priority queue operations.   See LUCENE-2215.
 There may also be better options for retrieving large numbers of rows at a 
 time that are worth exploring.  LUCENE-2127.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5277) Stamp core names on log entries for certain classes

2013-09-26 Thread Dmitry Kan (JIRA)
Dmitry Kan created SOLR-5277:


 Summary: Stamp core names on log entries for certain classes
 Key: SOLR-5277
 URL: https://issues.apache.org/jira/browse/SOLR-5277
 Project: Solr
  Issue Type: Bug
  Components: search, update
Affects Versions: 4.4, 4.3.1, 4.5
Reporter: Dmitry Kan


It is handy that certain Java classes stamp a [coreName] on a log entry. It 
would be useful for multicore setup if more classes would stamp this 
information.

In particular we came accross a situaion with commits coming in a quick 
succession to the same multicore shard and found it to be hard time figuring 
out was it the same core or different cores.

The classes in question with log sample output:

o.a.s.c.SolrCore

06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
SolrDeletionPolicy.onCommit: commits:num=2

11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.SolrCore 
- Soft AutoCommit: if uncommited for 1000ms;



o.a.s.u.UpdateHandler

14:45:24.447 [commitScheduler-9-thread-1] INFO  
org.apache.solr.update.UpdateHandler - start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler - 
end_commit_flush



o.a.s.s.SolrIndexSearcher

14:45:24.553 [commitScheduler-7-thread-1] INFO  
org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main


The original question was posted on #solr and on SO:

http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5200) Add REST support for reading and modifying Solr configuration

2013-08-30 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754452#comment-13754452
 ] 

Dmitry Kan commented on SOLR-5200:
--

One parameter relevant to us is mergeFactor.

 Add REST support for reading and modifying Solr configuration
 -

 Key: SOLR-5200
 URL: https://issues.apache.org/jira/browse/SOLR-5200
 Project: Solr
  Issue Type: New Feature
Reporter: Steve Rowe
Assignee: Steve Rowe

 There should be a REST API to allow full read access to, and write access to 
 some elements of, Solr's per-core and per-node configuration not already 
 covered by the Schema REST API: 
 {{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and 
 {{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of 
 {{solr.properties}}).
 Use cases for runtime configuration modification include scripted setup, 
 troubleshooting, and tuning.
 Tentative rules-of-thumb about configuration items that should not be 
 modifiable at runtime:
 # Startup-only items, e.g. where to start core discovery
 # Items that are deprecated in 4.X and will be removed in 5.0
 # Items that if modified should be followed by a full re-index
 Some issues to consider:
 Persistence: How (and even whether) to handle persistence for configuration 
 modifications via REST API is not clear - e.g. persisting the entire config 
 file or having one or more sidecar config files that get persisted.  The 
 extent of what should be modifiable will likely affect how persistence is 
 implemented.  For example, if the only {{solrconfig.xml}} modifiable items 
 turn out to be plugin configurations, an alternative to 
 full-{{solrconfig.xml}} persistence could be individual plugin registration 
 of runtime config modifiable items, along with per-plugin sidecar config 
 persistence.
 Live reload: Most (if not all) per-core configuration modifications will 
 require core reload, though it will be a live reload, so some things won't 
 be modifiable, e.g. {{dataDir}} and {{IndexWriter}} related settings in 
 {{indexConfig}} - see SOLR-3592.  (Should a full reload be supported to 
 handle changes in these places?)
 Interpolation aka property substitution: I think it would be useful on read 
 access to optionally return raw values in addition to the interpolated 
 values, e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. 
 interpolated value {{8983}}.   Modification requests will accept raw values - 
 property interpolation will be applied.  At present interpolation is done 
 once, at parsing time, but if property value modification is supported via 
 the REST API, an alternative could be to delay interpolation until values are 
 requested; in this way, property value modification would not trigger 
 re-parsing the affected configuration source.
 Response format: Similarly to the schema REST API, results could be returned 
 in XML, JSON, or any other response writer's output format.
 Transient cores: How should non-loaded transient cores be handled?  Simplest 
 thing would be to load the transient core before handling the request, just 
 like other requests.
 Below I provide an exhaustive list of configuration items in the files in 
 question and indicate which ones I think could be modifiable at runtime.  I 
 don't mean to imply that these must all be made modifiable, or for those that 
 are made modifiable, that they must be made so at once - a piecemeal approach 
 will very likely be more appropriate.
 h2. {{solrconfig.xml}}
 Note that XIncludes and includes via Document Entities won't survive a 
 modification request (assuming persistence is via overwriting the original 
 file).
 ||XPath under {{/config/}}||Should be modifiable via REST 
 API?||Rationale||Description||
 |{{luceneMatchVersion}}|No|Modifying this should be followed by a full 
 re-index|Controls what version of Lucene various components of Solr adhere to|
 |{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available 
 via classloader for {{solrconfig.xml}} and {{schema.xml}}| 
 |{{dataDir}}|No|Not supported by live RELOAD|Holds all index data|
 |{{directoryFactory}}|No|Not supported by live RELOAD|index directory 
 factory|
 |{{codecFactory}}|No|Modifying this should be followed by a full 
 re-index|index codec factory, per-field SchemaCodecFactory by default|
 |{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it 
 should be possible to modify an already Managed schema's mutability|Managed 
 or Classic (non-mutable) schema factory|
 |{{indexConfig}}|No|{{IndexWriter}}-related settings not supported by live 
 RELOAD|low-level indexing behavior|
 |{{jmx}}|Yes| |Enables JMX if an MBeanServer is found|
 |{{updateHandler@class}}|No| |Defaults to DirectUpdateHandler2|
 

[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-24 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Affects Version/s: 4.3.1

 Solr sends all doc ids to all shards in the query counting facets
 -

 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4, 4.3, 4.3.1
Reporter: Dmitry Kan

 Setup: front end solr and shards.
 Summary: solr frontend sends all doc ids received from QueryComponent to all 
 shards which causes POST request buffer size overflow.
 Symptoms:
 The query is: http://pastebin.com/0DndK1Cs
 I have omitted the shards parameter.
 The router log: http://pastebin.com/FTVH1WF3
 Notice the port of a shard, that is affected. That port changes all the time, 
 even for the same request
 The log entry is prepended with lines:
 SEVERE: org.apache.solr.common.SolrException: Internal Server Error
 Internal Server Error
 (they are not in the pastebin link)
 The shard log: http://pastebin.com/exwCx3LX
 Suggestion: change the data structure in FacetComponent to send only doc ids 
 that belong to a shard and not a concatenation of all doc ids.
 Why is this important: for scaling. Adding more shards will result in 
 overflowing the POST request buffer at some point anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2013-06-18 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686964#comment-13686964
 ] 

Dmitry Kan commented on SOLR-1726:
--

Scrolling is not intended for real time user requests, it is intended for 
cases like scrolling over large portions of data that exists within 
elasticsearch to reindex it for example.

are there any other applications for this except re-indexing?

Also, is it known, how internally the scrolling is implemented, i.e. is it 
efficient in transferring to the client of only what is needed?

 Deep Paging and Large Results Improvements
 --

 Key: SOLR-1726
 URL: https://issues.apache.org/jira/browse/SOLR-1726
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.4

 Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
 ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, 
 SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java


 There are possibly ways to improve collections of deep paging by passing 
 Solr/Lucene more information about the last page of results seen, thereby 
 saving priority queue operations.   See LUCENE-2215.
 There may also be better options for retrieving large numbers of rows at a 
 time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2082) Performance improvement for merging posting lists

2013-06-14 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683303#comment-13683303
 ] 

Dmitry Kan commented on LUCENE-2082:


hi [~whzz],

Would you be potentially interested in other postings lists idea that came up 
recently?

http://markmail.org/message/6ro7bbez3v3y5mfx#query:+page:1+mid:tywtrjjcfdbzww6f+state:results

It can be of quite high impact on the index size and hopefully relatively easy 
to start an experiment using the lucene codec technology.

Just in case you would get interested.

 Performance improvement for merging posting lists
 -

 Key: LUCENE-2082
 URL: https://issues.apache.org/jira/browse/LUCENE-2082
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael Busch
Priority: Minor
  Labels: gsoc2013
 Fix For: 4.4


 A while ago I had an idea about how to improve the merge performance
 for posting lists. This is currently by far the most expensive part of
 segment merging due to all the VInt de-/encoding. Not sure if an idea
 for improving this was already mentioned in the past?
 So the basic idea is it to perform a raw copy of as much posting data
 as possible. The reason why this is difficult is that we have to
 remove deleted documents. But often the fraction of deleted docs in a
 segment is rather low (10%?), so it's likely that there are quite
 long consecutive sections without any deletions.
 To find these sections we could use the skip lists. Basically at any
 point during the merge we would find the skip entry before the next
 deleted doc. All entries to this point can be copied without
 de-/encoding of the VInts. Then for the section that has deleted docs
 we perform the normal way of merging to remove the deletes. Then we
 check again with the skip lists if we can raw copy the next section.
 To make this work there are a few different necessary changes:
 1) Currently the multilevel skiplist reader/writer can only deal with 
 fixed-size
 skips (16 on the lowest level). It would be an easy change to allow
 variable-size skips, but then the MultiLevelSkipListReader can't
 return numSkippedDocs anymore, which SegmentTermDocs needs - change 2)
 2) Store the last docID in which a term occurred in the term
 dictionary. This would also be beneficial for other use cases. By
 doing that the SegmentTermDocs#next(), #read() and #skipTo() know when
 the end of the postinglist is reached. Currently they have to track
 the df, which is why after a skip it's important to take the
 numSkippedDocs into account.
 3) Change the merging algorithm according to my description above. It's
 important to create a new skiplist entry at the beginning of every
 block that is copied in raw mode, because its next skip entry's values
 are deltas from the beginning of the block. Also the very first posting, and
 that one only, needs to be decoded/encoded to make sure that the
 payload length is explicitly written (i.e. must not depend on the
 previous length). Also such a skip entry has to be created at the
 beginning of each source segment's posting list. With change 2) we don't
 have to worry about the positions of the skip entries. And having a few
 extra skip entries in merged segments won't hurt much.
 If a segment has no deletions at all this will avoid any
 decoding/encoding of VInts (best case). I think it will also work
 great for segments with a rather low amount of deletions. We should
 probably then have a threshold: if the number of deletes exceeds this
 threshold we should fall back to old style merging.
 I haven't implemented any of this, so there might be complications I
 haven't thought about. Please let me know if you can think of reasons
 why this wouldn't work or if you think more changes are necessary.
 I will probably not have time to work on this soon, but I wanted to
 open this issue to not forget about it :). Anyone should feel free to
 take this!
 Btw: I think the flex-indexing branch would be a great place to try this
 out as a new codec. This would also be good to figure out what APIs
 are needed to make merging fully flexible as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-06 Thread Dmitry Kan (JIRA)
Dmitry Kan created SOLR-4903:


 Summary: Solr sends all doc ids to all shards in the query 
counting facets
 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4
Reporter: Dmitry Kan


Setup: front end solr and shards.

Summary: solr frontend sends all doc ids received from QueryComponent to all 
shards which causes POST request buffer size overflow.

Symptoms:

The query is: http://pastebin.com/0DndK1Cs
I have omitted the shards parameter.

The router log: http://pastebin.com/FTVH1WF3
Notice the port of a shard, that is affected. That port changes all the time, 
even for the same request
The log entry is prepended with lines:

SEVERE: org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

(they are not in the pastebin link)

The shard log: http://pastebin.com/exwCx3LX

Suggestion: change the data structure in FacetComponent to send only doc ids 
that belong to a shard and not a concatenation of all doc ids.

Why is this important: for scaling. Adding more shards will result in 
overflowing the POST request buffer at some point anyway.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact

2013-06-06 Thread Dmitry Kan (JIRA)
Dmitry Kan created SOLR-4904:


 Summary: Send internal doc ids and index version in distributed 
faceting to make queries more compact
 Key: SOLR-4904
 URL: https://issues.apache.org/jira/browse/SOLR-4904
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.3, 3.4
Reporter: Dmitry Kan


This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and 
works nice with fixing the root cause of issue SOLR-4903.

Basically QueryComponent could send internal lucene ids along with the index 
version number so that in subsequent queries to other solr components, like 
FacetComponent, the internal ids would be sent. The index version is required 
to ensure we deal with the same index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-06 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Affects Version/s: 4.3

 Solr sends all doc ids to all shards in the query counting facets
 -

 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4, 4.3
Reporter: Dmitry Kan

 Setup: front end solr and shards.
 Summary: solr frontend sends all doc ids received from QueryComponent to all 
 shards which causes POST request buffer size overflow.
 Symptoms:
 The query is: http://pastebin.com/0DndK1Cs
 I have omitted the shards parameter.
 The router log: http://pastebin.com/FTVH1WF3
 Notice the port of a shard, that is affected. That port changes all the time, 
 even for the same request
 The log entry is prepended with lines:
 SEVERE: org.apache.solr.common.SolrException: Internal Server Error
 Internal Server Error
 (they are not in the pastebin link)
 The shard log: http://pastebin.com/exwCx3LX
 Suggestion: change the data structure in FacetComponent to send only doc ids 
 that belong to a shard and not a concatenation of all doc ids.
 Why is this important: for scaling. Adding more shards will result in 
 overflowing the POST request buffer at some point anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2013-04-29 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644450#comment-13644450
 ] 

Dmitry Kan commented on SOLR-1726:
--

does the deep paging issue apply to facet paging?

 Deep Paging and Large Results Improvements
 --

 Key: SOLR-1726
 URL: https://issues.apache.org/jira/browse/SOLR-1726
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.3

 Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
 ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, 
 SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java


 There are possibly ways to improve collections of deep paging by passing 
 Solr/Lucene more information about the last page of results seen, thereby 
 saving priority queue operations.   See LUCENE-2215.
 There may also be better options for retrieving large numbers of rows at a 
 time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2013-02-20 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582207#comment-13582207
 ] 

Dmitry Kan commented on LUCENE-1486:


OK, after some study, here is what we did:

we treat the AND clauses as spanNearQuery objects. So, the

a AND b

becomes %a b%~slop, where %%~ operator is an unordered SpanNear query (change 
to QueryParser.jj was required for this).

When there is a case of NOT clause with nested clauses:

NOT( (a AND b) OR (c AND d) ) = NOT ( %a b%~slop OR %c d%~slop ) ,

we need to handle SpanNearQueries in the addComplexPhraseClause method. In 
order to handle this, we just added to the if statement:

[code]
if (qc instanceof BooleanQuery) {
[/code]

the following else if statement:

[code]
else if (childQuery instanceof SpanNearQuery) {
ors.add((SpanQuery)childQuery);
}
[/code]


 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 2.4
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.2, 5.0

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2013-02-18 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580828#comment-13580828
 ] 

Dmitry Kan commented on LUCENE-1486:


Can someone give me a hand on this parser (despite the jira is so old)?

We need to have the NOT logic work properly in the boolean sense, that is the 
following should work correctly:

a AND NOT b
a AND NOT (b OR c)
a AND NOT ((b OR c) AND (d OR e))

Can anybody guide me here? Is it at all possible to accomplish this with this 
original CPQP implementation? I would not be afraid of changing QueryParser.jj 
lexical specification, if the task requires it.

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 2.4
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.2, 5.0

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-01-18 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557053#comment-13557053
 ] 

Dmitry Kan commented on SOLR-1604:
--

Hello! Great work!

I have two questions:

1) What would it take to incorporate phrase searches into this extended query 
parser?
\a b\ c~100
that is, a b (phrase search) is found in that order and exactly side by side 
=100 tokens away from c.

2) does this implementation support the Boolean operators, like AND, OR, NOT 
(at least OR and NOT are supported as far as I can see)? Can they be nested?

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, 
 SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-01-16 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-1604:
-

Attachment: ComplexPhrase_solr_3.4.zip

This is ComplexPhrase project based on the version submitted on 21/Jul/11. It 
compiles and runs under solr 3.4. I have uncommented the tests in 
/org/apache/solr/search/ComplexPhraseQParserPluginTest.java and they passed.

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, 
 SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3755) shard splitting

2013-01-14 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553117#comment-13553117
 ] 

Dmitry Kan commented on SOLR-3755:
--

Somewhat related: control naming of shards. This could be applicable for both 
hashing based collections and custom sharding based collections. 
shardNames=myshard1,myshard2,myshard3?

Would this suit to logical (e.g. date based) sharding as well? Do you plan to 
support such a sharding type in the current shard splitting implementation? Not 
sure, if this helps: we have implemented our own custom date based sharding 
(splitting and routing) for solr 3.x and found it to be the most logical way of 
sharding our data (both from the load balancing and use case point of view). 
The routing implementation is done via loading a custom shards config file that 
contains mapping of date ranges to shards.

 shard splitting
 ---

 Key: SOLR-3755
 URL: https://issues.apache.org/jira/browse/SOLR-3755
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Yonik Seeley
 Attachments: SOLR-3755.patch, SOLR-3755.patch


 We can currently easily add replicas to handle increases in query volume, but 
 we should also add a way to add additional shards dynamically by splitting 
 existing shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1337) Spans and Payloads Query Support

2012-12-19 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536838#comment-13536838
 ] 

Dmitry Kan commented on SOLR-1337:
--

[Jan Høydahl] at Lucene query parser level. New token FUZZY_SLOP_SHARP (name 
isn't probably the best sound, but can be changed) has been introduced in the 
QueryParser.jj and supportive code implemented. The syntax is same as that of ~ 
operator, i.e. term1 term2 ... termn #slope.

 Spans and Payloads Query Support
 

 Key: SOLR-1337
 URL: https://issues.apache.org/jira/browse/SOLR-1337
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 4.1


 It would be really nice to have query side support for: Spans and Payloads.  
 The main ingredient missing at this point is QueryParser support and a output 
 format for the spans and the payload spans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1337) Spans and Payloads Query Support

2012-12-18 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534860#comment-13534860
 ] 

Dmitry Kan commented on SOLR-1337:
--

[~janhoy]  Jan: we implemented a new operator for Lucene / SOLR 3.4 that does 
exactly what you say, see: 
https://issues.apache.org/jira/browse/LUCENE-3758?focusedCommentId=13207710page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13207710

if you or anyone else needs a patch, just let me know.

 Spans and Payloads Query Support
 

 Key: SOLR-1337
 URL: https://issues.apache.org/jira/browse/SOLR-1337
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 4.1


 It would be really nice to have query side support for: Spans and Payloads.  
 The main ingredient missing at this point is QueryParser support and a output 
 format for the spans and the payload spans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3858) Doc-to-shard assignment based on range property on shards

2012-10-29 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486052#comment-13486052
 ] 

Dmitry Kan commented on SOLR-3858:
--

Is there an idea of how the range property should be defined? Something like 
this in solrconfig:

docIdToShardAssignment
   rangeFieldFieldName/rangeField !-- e.g. a date field --
   rangeStart20121001/rangeStart !-- granularity probably should be 
customizable --
   rangeEnd20121031/rangeEnd
/docIdToShardAssignment

?

Does this property (if defined) turn the sharding scheme into logical sharding?


 Doc-to-shard assignment based on range property on shards
 ---

 Key: SOLR-3858
 URL: https://issues.apache.org/jira/browse/SOLR-3858
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley

 Anything that maps a document id to a shard should consult the ranges defined 
 on the shards (currently indexing and real-time get).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2012-08-31 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445749#comment-13445749
 ] 

Dmitry Kan commented on SOLR-3585:
--

Mikhail,

True  thanks for link. In any case, the test proves that _there is_ a gain, 
even for a non-server horse. I might find a way to run this on a server + 
(possibly) play with solrj. In our use case, local streaming is used for larger 
batch (re-)processing and solrj for relatively tiny updates.

 processing updates in multiple threads
 --

 Key: SOLR-3585
 URL: https://issues.apache.org/jira/browse/SOLR-3585
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0-ALPHA
Reporter: Mikhail Khludnev
Priority: Minor
 Attachments: multithreadupd.patch, report.tar.gz, SOLR-3585.patch, 
 SOLR-3585.patch


 Hello,
 I'd like to contribute update processor which forks many threads which 
 concurrently process the stream of commands. It may be beneficial for users 
 who streams many docs through single request. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2012-08-30 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445246#comment-13445246
 ] 

Dmitry Kan commented on SOLR-3585:
--

Summary:

1/2/4/8 threads

There was a gain for 2 threads, after that increasing amount of threads didn't 
matter for the indexing speed (again, can be too little data, too slow machine 
vs server)

URL:

http://localhost:8983/solr/update?commit=trueseparator=%09escape=\update.chain=threadsbacking.chain=logrunstream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvstream.contentType=text/csv;charset=utf-8

Intel(R) Core2 Duo CPU T6600 @ 2.20GHz
RAM: 4 GB
OS: Windows 7 64 bit

PC was moderately used during the indexing (Internet surfing mostly)

Solr started with:
java -Xmx512M -Xms512M -jar start.jar

Stats and Log extract:

---
one thread
---

565576 milliseconds (9.43 seconds)
size of data/index: 1.61 GB

30.08.2012 22:34:10 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommi
t=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream
.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example
\data\book_edition.tsvupdate.chain=threads} {add=[/m/0g9nk5p, /m/0g9rf0q, /m/0g
j6_r3, /m/0gj702y, /m/0gk99b7, /m/0g461_s, /m/0g4thbr, /m/0g4vp__, /m/0gkgw7x, /
m/0gb390f, ... (3401498 adds)]} 0 565576

---
two threads
---

400085 milliseconds (6.67 seconds)
size of data/index: 916MB

30.08.2012 22:09:16 org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1

30.08.2012 22:15:56 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update 
params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads}
 {add=[/m/0g9nk5p, /m/0gj6_r3, /m/0gkgw7x, /m/0g9_qhd, /m/0g9_r1t, /m/0g9jxyt, 
/m/0g4wdtq, /m/0d0s9y1, /m/0d9pb_v, /m/0d0tfz7, ... (1838414 adds)]} 0 400085
30.08.2012 22:15:56 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update 
params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads}
 {add=[/m/0g9rf0q, /m/0gj702y, /m/0gk99b7, /m/0g461_s, /m/0g4thbr, /m/0g4vp__, 
/m/0gb390f, /m/0gb34pf, /m/0h8fm59, /m/0g99vfk, ... (1563084 adds)]} 0 400085


---
four threads
---

423969 milliseconds (7.07 seconds)
size of data/index: 915 MB

30.08.2012 21:52:03 org.apache.solr.core.SolrDeletionPolicy updateCommits

INFO: [collection1] webapp=/solr path=/update 
params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads}
 {add=[/m/0g9nk5p, /m/0dgjnsn, /m/0d0s539, /m/0d0t8b3, /m/0d9n2sg, /m/0d0s18j, 
/m/07n7lbm, /m/07n7mh6, /m/07n7mq0, /m/07n7n_d, ... (844367 adds)]} 0 r
30.08.2012 21:59:07 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/0gj702y, /m/0gk99b7, /m/0gkgw7x, /m/0gb390f, 
/m/0g9_qhd, /m/0h2ymt3, /m/0g4wdtq, /m/0d0s9y1, /m/0d0tfz7, /m/0d0tdf1, ... 
(815450 adds)]} 0 423969
30.08.2012 21:59:07 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/0g9rf0q, /m/0g461_s, /m/0g4thbr, /m/0g4vp__, 
/m/0gb34pf, /m/0h8fm59, /m/0g99vfk, /m/0g9_r1t, /m/0g9jxyt, /m/0ghc2b5, ... 
(836534 adds)]} 0 423969
30.08.2012 21:59:07 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update 
params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads}
 {add=[/m/0gj6_r3, /m/0d0sfq_, /m/0d9mhx1, /m/07tc6lf, /m/07tc75v, /m/07tc7jq, 
/m/07tc8kz, /m/07tc8wr, /m/07tc_cn, /m/07tc_fl, ... (905147 adds)]} 0 423969

---
eight threads
---

431710 milliseconds (7.20 seconds)
size of data/index: 1.00 GB


30.08.2012 22:47:43 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update 
params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream
.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads}
 {add=[/m/0gk99b7, /m/0d0vb6s, /m/07t8mw8, 

[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2012-07-09 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409329#comment-13409329
 ] 

Dmitry Kan commented on SOLR-3585:
--

Mikhail, thanks for the stats. They look good to me! And prove that the patch 
should help increasing the indexing throughput. In about 2,5 weeks I should be 
able to try your patch and tell you the results on my hardware.

 processing updates in multiple threads
 --

 Key: SOLR-3585
 URL: https://issues.apache.org/jira/browse/SOLR-3585
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
 Attachments: SOLR-3585.patch, multithreadupd.patch, report.tar.gz


 Hello,
 I'd like to contribute update processor which forks many threads which 
 concurrently process the stream of commands. It may be beneficial for users 
 who streams many docs through single request. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2012-07-06 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408234#comment-13408234
 ] 

Dmitry Kan commented on SOLR-3585:
--

Mikhail, this sounds interesting to me. Have you tested this already to prove 
that there is a gain in time using your approach? Also did you find some 
optimal parameters, like amount of threads, so that some sensible default 
values could be set?

 processing updates in multiple threads
 --

 Key: SOLR-3585
 URL: https://issues.apache.org/jira/browse/SOLR-3585
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
 Attachments: SOLR-3585.patch, multithreadupd.patch


 Hello,
 I'd like to contribute update processor which forks many threads which 
 concurrently process the stream of commands. It may be beneficial for users 
 who streams many docs through single request. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-09-23 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113375#comment-13113375
 ] 

Dmitry Kan commented on SOLR-2403:
--

Peter: In one of the distributed faceting sessions we have found out, that the 
zero facets can be filtered by (undocumented?) facet.zeros parameter. Does 
anything change, if you set it to 0 (filtering out zero-facets)?

 Problem with facet.sort=lex, shards, and facet.mincount
 ---

 Key: SOLR-2403
 URL: https://issues.apache.org/jira/browse/SOLR-2403
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
 Environment: RHEL5, Ubuntu 10.04
Reporter: Peter Cline

 I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 
 1.4.1.  I can if necessary and update.
 Solr is not returning the proper number of facet values when sorting 
 alphabetically, using distributed search, and using a facet.mincount that 
 excludes some of the values in the first facet.limit values.
 Easiest explained by example.  Sorting alphabetically, the first 20 values 
 for my subject_facet field have few documents.  19 facet values have only 1 
 document associated, and 1 has 2 documents.  There are plenty after that have 
 more than 2.
 {code}
 http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2
 {code}
 comes back with the expected 20 facet values with = 2 documents associated.
 If I add a shards parameter that points back to itself, the result is 
 different.
 {code}
 http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr
 {code}
 comes back with only 1 facet value: the single value in the first 20 that had 
 more than 1 document.  
 It appears to me that mincount is ignored when doing the original query to 
 the shards, then applied afterwards.
 Let me know if you need any more info.  
 Thanks,
 Peter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org