[jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module
[ https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565703#comment-16565703 ] Dmitry Kan commented on LUCENE-2562: [~arafalov] thanks for your input! Can you please elaborate on 'If Luke is supposed to be part of Lucene-only distribution, I guess the discussion is a bit more complicated' ? > Make Luke a Lucene/Solr Module > -- > > Key: LUCENE-2562 > URL: https://issues.apache.org/jira/browse/LUCENE-2562 > Project: Lucene - Core > Issue Type: Task >Reporter: Mark Miller >Priority: Major > Labels: gsoc2014 > Attachments: LUCENE-2562-Ivy.patch, LUCENE-2562-Ivy.patch, > LUCENE-2562-Ivy.patch, LUCENE-2562-ivy.patch, LUCENE-2562.patch, > LUCENE-2562.patch, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, > Luke-ALE-4.png, Luke-ALE-5.png, luke-javafx1.png, luke-javafx2.png, > luke-javafx3.png, luke1.jpg, luke2.jpg, luke3.jpg, lukeALE-documents.png > > Time Spent: 20m > Remaining Estimate: 0h > > see > "RE: Luke - in need of maintainer": > http://markmail.org/message/m4gsto7giltvrpuf > "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q > I think it would be great if there was a version of Luke that always worked > with trunk - and it would also be great if it was easier to match Luke jars > with Lucene versions. > While I'd like to get GWT Luke into the mix as well, I think the easiest > starting point is to straight port Luke to another UI toolkit before > abstracting out DTO objects that both GWT Luke and Pivot Luke could share. > I've started slowly converting Luke's use of thinlet to Apache Pivot. I > haven't/don't have a lot of time for this at the moment, but I've plugged > away here and there over the past work or two. There is still a *lot* to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module
[ https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551676#comment-16551676 ] Dmitry Kan commented on LUCENE-2562: Hi [~steve_rowe] thanks for your support with filing the ticket. Looking to solve this one way or another. Thanks [~Tomoko Uchida] for your contribution and research so far! > Make Luke a Lucene/Solr Module > -- > > Key: LUCENE-2562 > URL: https://issues.apache.org/jira/browse/LUCENE-2562 > Project: Lucene - Core > Issue Type: Task >Reporter: Mark Miller >Priority: Major > Labels: gsoc2014 > Attachments: LUCENE-2562-Ivy.patch, LUCENE-2562-Ivy.patch, > LUCENE-2562-Ivy.patch, LUCENE-2562-ivy.patch, LUCENE-2562.patch, > LUCENE-2562.patch, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, > Luke-ALE-4.png, Luke-ALE-5.png, luke-javafx1.png, luke-javafx2.png, > luke-javafx3.png, luke1.jpg, luke2.jpg, luke3.jpg, lukeALE-documents.png > > Time Spent: 10m > Remaining Estimate: 0h > > see > "RE: Luke - in need of maintainer": > http://markmail.org/message/m4gsto7giltvrpuf > "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q > I think it would be great if there was a version of Luke that always worked > with trunk - and it would also be great if it was easier to match Luke jars > with Lucene versions. > While I'd like to get GWT Luke into the mix as well, I think the easiest > starting point is to straight port Luke to another UI toolkit before > abstracting out DTO objects that both GWT Luke and Pivot Luke could share. > I've started slowly converting Luke's use of thinlet to Apache Pivot. I > haven't/don't have a lot of time for this at the moment, but I've plugged > away here and there over the past work or two. There is still a *lot* to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10231) Cursor value always different for last page with sorting by a date based function using NOW
[ https://issues.apache.org/jira/browse/SOLR-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907528#comment-15907528 ] Dmitry Kan commented on SOLR-10231: --- [~hossman] thanks for clarifying and suggestions. Going to test the fixed timestamp value for the NOW param. In the meantime we falled back to non-cursor pagination method. Btw, would the same issue exist in 6.x? > Cursor value always different for last page with sorting by a date based > function using NOW > --- > > Key: SOLR-10231 > URL: https://issues.apache.org/jira/browse/SOLR-10231 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SearchComponents - other >Affects Versions: 4.10.2 >Reporter: Dmitry Kan > > Cursor based results fetching is a deal breaker for search performance. > It works extremely well when paging using sort by field(s). > Example, that works (Id is unique field in the schema): > Query: > {code} > http://solr-host:8983/solr/documents/select?q=*:*=DocumentId:76581059=AoIGAC5TU1ItNzY1ODEwNTktMQ===DocumentId=UserId+asc%2CId+desc=1 > {code} > Response: > {code} > > > 0 > 4 > > *:* > DocumentId > AoIGAC5TU1ItNzY1ODEwNTktMQ== > DocumentId:76581059 > UserId asc,Id desc > 1 > > > > AoIGAC5TU1ItNzY1ODEwNTktMQ== > > {code} > nextCursorMark equals to cursorMark and so we know this is last page. > However, sorting by function behaves differently: > Query: > {code} > http://solr-host:8983/solr/documents/select?rows=1=*:*=DocumentId:76581059=AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE==DocumentId=min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))%20asc,Id%20desc > {code} > Response: > {code} > > > 0 > 6 > > *:* > DocumentId > AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE= > DocumentId:76581059 > > min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5)) > asc,Id desc > > 1 > > > > > 76581059 > > > AoIFQf9yFyAuU1NSLTc2NTgxMDU5LTE= > > {code} > nextCursorMark does not equal to cursorMark, which suggests there are more > results. Which is not true (numFound=1). And so the client goes into infinite > loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10231) Cursor value always different for last page with sorting by function
Dmitry Kan created SOLR-10231: - Summary: Cursor value always different for last page with sorting by function Key: SOLR-10231 URL: https://issues.apache.org/jira/browse/SOLR-10231 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SearchComponents - other Affects Versions: 4.10.2 Reporter: Dmitry Kan Cursor based results fetching is a deal breaker for search performance. It works extremely well when paging using sort by field(s). Example, that works (Id is unique field in the schema): Query: {code} http://solr-host:8983/solr/documents/select?q=*:*=DocumentId:76581059=AoIGAC5TU1ItNzY1ODEwNTktMQ===DocumentId=UserId+asc%2CId+desc=1 {code} Response: {code} 0 4 *:* DocumentId AoIGAC5TU1ItNzY1ODEwNTktMQ== DocumentId:76581059 UserId asc,Id desc 1 AoIGAC5TU1ItNzY1ODEwNTktMQ== {code} nextCursorMark equals to cursorMark and so we know this is last page. However, sorting by function behaves differently: Query: {code} http://solr-host:8983/solr/documents/select?rows=1=*:*=DocumentId:76581059=AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE==DocumentId=min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5))%20asc,Id%20desc {code} Response: {code} 0 6 *:* DocumentId AoIFQf9yCCAuU1NSLTc2NTgxMDU5LTE= DocumentId:76581059 min(ms(NOW,DynamicDateField_1),ms(NOW,DynamicDateField_12),ms(NOW,DynamicDateField_3),ms(NOW,DynamicDateField_5)) asc,Id desc 1 76581059 AoIFQf9yFyAuU1NSLTc2NTgxMDU5LTE= {code} nextCursorMark does not equal to cursorMark, which suggests there are more results. Which is not true (numFound=1). And so the client goes into infinite loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487389#comment-14487389 ] Dmitry Kan commented on SOLR-4722: -- Thanks for the great patch. I confirm it works in solr 4.10.3, although recompilation was necessary. Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, Trunk Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin
[ https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147585#comment-14147585 ] Dmitry Kan commented on SOLR-6152: -- I'm ready to work on this, but need some guidance for the feature spec. I.e. what would be the most natural way of configuring prepolutated values? Should it be a UI feature or could it be a special config entry in solrconfig.xml? Thoughts? Pre-populating values into search parameters on the query page of solr admin Key: SOLR-6152 URL: https://issues.apache.org/jira/browse/SOLR-6152 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: prepoluate_query_parameters_query_page.bmp In some use cases, it is highly desirable to be able to pre-populate the query page of solr admin with specific values. In particular use case of mine, the solr admin user must pass a date range value without which the query would fail. It isn't easy to remember the value format for non-solr experts, so I would like to have a way of hooking that value example into the query page. See the screenshot attached, where I have inserted the fq parameter with date range into the Raw Query Parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin
[ https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147625#comment-14147625 ] Dmitry Kan commented on SOLR-6152: -- Ok, I see what you are getting at. I think I like this, sounds useful. This jira and what you describe may potentially reuse some code. But these two sound like different features to me. I need to take first stab at this so that there is something material to contemplate about. Hoping to get moral support from [~steffkes] too :) Pre-populating values into search parameters on the query page of solr admin Key: SOLR-6152 URL: https://issues.apache.org/jira/browse/SOLR-6152 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: prepoluate_query_parameters_query_page.bmp In some use cases, it is highly desirable to be able to pre-populate the query page of solr admin with specific values. In particular use case of mine, the solr admin user must pass a date range value without which the query would fail. It isn't easy to remember the value format for non-solr experts, so I would like to have a way of hooking that value example into the query page. See the screenshot attached, where I have inserted the fq parameter with date range into the Raw Query Parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5178) Admin UI - Memory Graph on Dashboard shows NaN for unused Swap
[ https://issues.apache.org/jira/browse/SOLR-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5178: - Attachment: SOLR-5178.patch a patch for solr 4.6.0. It adds a check for when both free swap and total swap are 0 (dividing one by another will give NaN). Admin UI - Memory Graph on Dashboard shows NaN for unused Swap -- Key: SOLR-5178 URL: https://issues.apache.org/jira/browse/SOLR-5178 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.3, 4.4 Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-5178.patch, screenshot-vladimir.jpeg If the System doesn't use Swap, the displayed memory graph on the dashboard shows {{NaN}} (not a number) because it tries to divide by zero. {code}system:{ name:Linux, version:3.2.0-39-virtual, arch:amd64, systemLoadAverage:3.38, committedVirtualMemorySize:32454287360, freePhysicalMemorySize:912945152, freeSwapSpaceSize:0, processCpuTime:5627465000, totalPhysicalMemorySize:71881908224, totalSwapSpaceSize:0, openFileDescriptorCount:350, maxFileDescriptorCount:4096, uname: Linux ip-xxx-xxx-xxx-xxx 3.2.0-39-virtual #62-Ubuntu SMP Thu Feb 28 00:48:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux\n, uptime: 11:24:39 up 4 days, 23:03, 1 user, load average: 3.38, 3.10, 2.95\n }{code} We should add an additional check for that. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3585) processing updates in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028325#comment-14028325 ] Dmitry Kan commented on SOLR-3585: -- I would agree with [~dsmiley]. Every good api (and to some extent Solr is an api in the client view) takes advantage of multi-threading by itself. In this case a client can be as thin as possible and not care about threads. And if client has enough idle cpus, sure, it could post in parallel. For example, we run solr on pretty beefy machines with lots of cpu cores and most of the time those are idling. Some of the latest findings of ours with soft commits and high posting pressure show, that posting may sometimes even fail and failed docs re-posting fixes the issue. processing updates in multiple threads -- Key: SOLR-3585 URL: https://issues.apache.org/jira/browse/SOLR-3585 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0-ALPHA, 5.0 Reporter: Mikhail Khludnev Attachments: SOLR-3585.patch, SOLR-3585.patch, multithreadupd.patch, report.tar.gz Hello, I'd like to contribute update processor which forks many threads which concurrently process the stream of commands. It may be beneficial for users who streams many docs through single request. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin
Dmitry Kan created SOLR-6152: Summary: Pre-populating values into search parameters on the query page of solr admin Key: SOLR-6152 URL: https://issues.apache.org/jira/browse/SOLR-6152 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: prepoluate_query_parameters_query_page.bmp In some use cases, it is highly desirable to be able to pre-populate the query page of solr admin with specific values. In particular use case of mine, the solr admin user must pass a date range value without which the query would fail. It isn't easy to remember the value format for non-solr experts, so I would like to have a way of hooking that value example into the query page. See the screenshot attached, where I have inserted the fq parameter with date range into the Raw Query Parameters. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6152) Pre-populating values into search parameters on the query page of solr admin
[ https://issues.apache.org/jira/browse/SOLR-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-6152: - Attachment: prepoluate_query_parameters_query_page.bmp screenshot of query page Pre-populating values into search parameters on the query page of solr admin Key: SOLR-6152 URL: https://issues.apache.org/jira/browse/SOLR-6152 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: prepoluate_query_parameters_query_page.bmp In some use cases, it is highly desirable to be able to pre-populate the query page of solr admin with specific values. In particular use case of mine, the solr admin user must pass a date range value without which the query would fail. It isn't easy to remember the value format for non-solr experts, so I would like to have a way of hooking that value example into the query page. See the screenshot attached, where I have inserted the fq parameter with date range into the Raw Query Parameters. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets
[ https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-4903: - Labels: patch (was: ) Solr sends all doc ids to all shards in the query counting facets - Key: SOLR-4903 URL: https://issues.apache.org/jira/browse/SOLR-4903 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.4, 4.3, 4.3.1 Reporter: Dmitry Kan Setup: front end solr and shards. Summary: solr frontend sends all doc ids received from QueryComponent to all shards which causes POST request buffer size overflow. Symptoms: The query is: http://pastebin.com/0DndK1Cs I have omitted the shards parameter. The router log: http://pastebin.com/FTVH1WF3 Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: http://pastebin.com/exwCx3LX Suggestion: change the data structure in FacetComponent to send only doc ids that belong to a shard and not a concatenation of all doc ids. Why is this important: for scaling. Adding more shards will result in overflowing the POST request buffer at some point anyway. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets
[ https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-4903: - Labels: (was: patch) Solr sends all doc ids to all shards in the query counting facets - Key: SOLR-4903 URL: https://issues.apache.org/jira/browse/SOLR-4903 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.4, 4.3, 4.3.1 Reporter: Dmitry Kan Setup: front end solr and shards. Summary: solr frontend sends all doc ids received from QueryComponent to all shards which causes POST request buffer size overflow. Symptoms: The query is: http://pastebin.com/0DndK1Cs I have omitted the shards parameter. The router log: http://pastebin.com/FTVH1WF3 Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: http://pastebin.com/exwCx3LX Suggestion: change the data structure in FacetComponent to send only doc ids that belong to a shard and not a concatenation of all doc ids. Why is this important: for scaling. Adding more shards will result in overflowing the POST request buffer at some point anyway. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't
[ https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943275#comment-13943275 ] Dmitry Kan commented on SOLR-5394: -- [~mikemccand] can you reproduce the bug with the patch? facet.method=fcs seems to be using threads when it shouldn't Key: SOLR-5394 URL: https://issues.apache.org/jira/browse/SOLR-5394 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Michael McCandless Attachments: SOLR-5394.patch, SOLR-5394.patch, SOLR-5394_keep_threads_original_value.patch I built a wikipedia index, with multiple fields for faceting. When I do facet.method=fcs with facet.field=dateFacet and facet.field=userNameFacet, and then kill -QUIT the java process, I see a bunch (46, I think) of facetExecutor-7-thread-N threads had spun up. But I thought threads for each field is turned off by default? Even if I add facet.threads=0, it still spins up all the threads. I think something is wrong in SimpleFacets.parseParams; somehow, that method returns early (because localParams) is null, leaving threads=-1, and then the later code that would have set threads to 0 never runs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't
[ https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5394: - Attachment: SOLR-5394.patch This patch sets the default threads to 1 (single thread execution) as per Vitaly's suggestion. Fixed the test case with unspecified threads parameter: the number of threads is expected to be the default (=1). The tests in TestSimpleFacet pass. facet.method=fcs seems to be using threads when it shouldn't Key: SOLR-5394 URL: https://issues.apache.org/jira/browse/SOLR-5394 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Michael McCandless Attachments: SOLR-5394.patch, SOLR-5394.patch, SOLR-5394_keep_threads_original_value.patch I built a wikipedia index, with multiple fields for faceting. When I do facet.method=fcs with facet.field=dateFacet and facet.field=userNameFacet, and then kill -QUIT the java process, I see a bunch (46, I think) of facetExecutor-7-thread-N threads had spun up. But I thought threads for each field is turned off by default? Even if I add facet.threads=0, it still spins up all the threads. I think something is wrong in SimpleFacets.parseParams; somehow, that method returns early (because localParams) is null, leaving threads=-1, and then the later code that would have set threads to 0 never runs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.
[ https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937293#comment-13937293 ] Dmitry Kan commented on LUCENE-3758: [~erickerickson] right, agree, this should be handled in another jira as a local param. We have implemented this as an operator as we allow mixing ordered and unordered clauses in the same query. Allow the ComplexPhraseQueryParser to search order or un-order proximity queries. - Key: LUCENE-3758 URL: https://issues.apache.org/jira/browse/LUCENE-3758 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Affects Versions: 4.0-ALPHA Reporter: Tomás Fernández Löbbe Assignee: Erick Erickson Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-3758.patch, LUCENE-3758.patch, LUCENE-3758.patch The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder value hardcoded to true. This could be configurable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact
[ https://issues.apache.org/jira/browse/SOLR-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930792#comment-13930792 ] Dmitry Kan commented on SOLR-4904: -- [~kamaci] yes, it is still valid. I would imagine that for some extreme commit policy cases, like soft-committing every second this might not be a good fit (as index changes so fast), but for other cases this sounds like a good idea. Send internal doc ids and index version in distributed faceting to make queries more compact Key: SOLR-4904 URL: https://issues.apache.org/jira/browse/SOLR-4904 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.4, 4.3 Reporter: Dmitry Kan This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and works nice with fixing the root cause of issue SOLR-4903. Basically QueryComponent could send internal lucene ids along with the index version number so that in subsequent queries to other solr components, like FacetComponent, the internal ids would be sent. The index version is required to ensure we deal with the same index. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926081#comment-13926081 ] Dmitry Kan commented on LUCENE-5422: I agree with [~mikemccand] in that the issue should be better scoped. The case with compressing stemmed / non-stemmed terms posting lists is quite tricky and requires more thought. One clear case for this issue is storing reversed term along with it is original non-reversed version. Both should point to the same posting list (subject to some after-stemming-hash-check). What do you guys think? Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan Labels: gsoc2014 The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926081#comment-13926081 ] Dmitry Kan edited comment on LUCENE-5422 at 3/10/14 7:27 PM: - I agree with [~mikemccand] in that the issue should be better scoped. The case with compressing stemmed / non-stemmed terms posting lists is quite tricky and requires more thought. One clear case for this issue is storing reversed term along with its original non-reversed version. Both should point to the same posting list (subject to some after-stemming-hash-check). What do you guys think? was (Author: dmitry_key): I agree with [~mikemccand] in that the issue should be better scoped. The case with compressing stemmed / non-stemmed terms posting lists is quite tricky and requires more thought. One clear case for this issue is storing reversed term along with it is original non-reversed version. Both should point to the same posting list (subject to some after-stemming-hash-check). What do you guys think? Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan Labels: gsoc2014 The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921155#comment-13921155 ] Dmitry Kan edited comment on LUCENE-5422 at 3/5/14 6:23 PM: [~Vishmi Money] LUCENE-2082 deals with segment merging which is _process_ performed on Lucene index every now and then. This jira deals with the index _structure_ and suggests that compression of index can be achieved for certain (described) use cases. While these jiras are related, this jira can be considered as standalone project in itself. perhaps [~otis] could add something? was (Author: dmitry_key): [~Vishmi Money] LUCENE-2082 deals with segment merging which is iprocess/i performed on Lucene index every now and then. This jira deals with the index emstructure/em and suggests that compression of index can be achieved for certain (described) use cases. While these jiras are related, this jira can be considered as standalone project in itself. perhaps [~otis] could add something? Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan Labels: gsoc2014 The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921155#comment-13921155 ] Dmitry Kan commented on LUCENE-5422: [~Vishmi Money] LUCENE-2082 deals with segment merging which is iprocess/i performed on Lucene index every now and then. This jira deals with the index emstructure/em and suggests that compression of index can be achieved for certain (described) use cases. While these jiras are related, this jira can be considered as standalone project in itself. perhaps [~otis] could add something? Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan Labels: gsoc2014 The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900240#comment-13900240 ] Dmitry Kan commented on SOLR-5697: -- Hoss: thanks for looking into this. I can confirm all test cases work fine with solr 4.7 (solr-4.7-2014-02-12_02-54-24.tgz). I'm guessing very little chance this gets backported to solr 4.3.1? BTW, using exact same configs didn't produce an NPE for solr 4.7 (it gets thrown as you said for 4.6.1 however). Delete by query does not work properly with customly configured query parser Key: SOLR-5697 URL: https://issues.apache.org/jira/browse/SOLR-5697 Project: Solr Issue Type: Bug Components: query parsers, update Affects Versions: 4.3.1 Reporter: Dmitry Kan Fix For: 5.0, 4.7 Attachments: query_parser_maven_project.tgz, shard.tgz The shard with the configuration illustrating the issue is attached. Since the size of the archive exceed the upload limit, I have dropped the solr.war from the webapps directory. Please add it (SOLR 4.3.1). Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
[jira] [Closed] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan closed SOLR-5697. works as expected with solr 4.7. See previous comment. Delete by query does not work properly with customly configured query parser Key: SOLR-5697 URL: https://issues.apache.org/jira/browse/SOLR-5697 Project: Solr Issue Type: Bug Components: query parsers, update Affects Versions: 4.3.1 Reporter: Dmitry Kan Fix For: 5.0, 4.7 Attachments: query_parser_maven_project.tgz, shard.tgz The shard with the configuration illustrating the issue is attached. Since the size of the archive exceed the upload limit, I have dropped the solr.war from the webapps directory. Please add it (SOLR 4.3.1). Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at
[jira] [Created] (SOLR-5697) Delete by query does not work properly with customly configured query parser
Dmitry Kan created SOLR-5697: Summary: Delete by query does not work properly with customly configured query parser Key: SOLR-5697 URL: https://issues.apache.org/jira/browse/SOLR-5697 Project: Solr Issue Type: Bug Components: query parsers, update Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: query_parser_maven_project.tgz The shard with the configuration illustrating the issue is attached. Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at
[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5697: - Attachment: query_parser_maven_project.tgz Delete by query does not work properly with customly configured query parser Key: SOLR-5697 URL: https://issues.apache.org/jira/browse/SOLR-5697 Project: Solr Issue Type: Bug Components: query parsers, update Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: query_parser_maven_project.tgz The shard with the configuration illustrating the issue is attached. Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at
[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5697: - Description: The shard with the configuration illustrating the issue is attached. Since the size of the archive exceed the upload limit, I have dropped the solr.war from the webapps. Please add it (SOLR 4.3.1). Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5697: - Attachment: shard.tgz shard with config files without solr.war file. Delete by query does not work properly with customly configured query parser Key: SOLR-5697 URL: https://issues.apache.org/jira/browse/SOLR-5697 Project: Solr Issue Type: Bug Components: query parsers, update Affects Versions: 4.3.1 Reporter: Dmitry Kan Attachments: query_parser_maven_project.tgz, shard.tgz The shard with the configuration illustrating the issue is attached. Since the size of the archive exceed the upload limit, I have dropped the solr.war from the webapps. Please add it (SOLR 4.3.1). Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at
[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5697: - Description: The shard with the configuration illustrating the issue is attached. Since the size of the archive exceed the upload limit, I have dropped the solr.war from the webapps directory. Please add it (SOLR 4.3.1). Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at
[jira] [Updated] (SOLR-5697) Delete by query does not work properly with customly configured query parser
[ https://issues.apache.org/jira/browse/SOLR-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5697: - Description: The shard with the configuration illustrating the issue is attached. Since the size of the archive exceed the upload limit, I have dropped the solr.war from the webapps directory. Please add it (SOLR 4.3.1). Also attached is example query parser maven project. The binary has been already deployed onto lib directories of each core. Start the shard using startUp_multicore.sh. 1. curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequeryTitle:this_title/query/delete' -H Content-type:text/xml This query produces an exception: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime33/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 2. Change the multicore/metadata/solrconfig.xml and multicore/statements/solrconfig.xml by uncommenting the defType parameters on requestHandler name=/select. Issue the same query. Result is same: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime30/int/lstlst name=errorstr name=msgUnknown query parser 'lucene'/strint name=code400/int/lst /response 3. Keep the same config as in 2. and specify query parser in the local params: curl 'http://localhost:8983/solr/metadata/update?commit=falsedebugQuery=on' --data-binary 'deletequery{!qparser1}Title:this_title/query/delete' -H Content-type:text/xml The result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime3/int/lstlst name=errorstr name=msgno field name specified in query and no default specified via 'df' param/strint name=code400/int/lst /response The reason being because our query parser is mis-behaving in that it removes colons from the input queries = we get on the server side: Modified input query: Title:this_title --- Titlethis_title 5593 [qtp2121668094-15] INFO org.apache.solr.update.processor.LogUpdateProcessor – [metadata] webapp=/solr path=/update params={debugQuery=oncommit=false} {} 0 31 5594 [qtp2121668094-15] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.parser.SolrQueryParserBase.checkNullField(SolrQueryParserBase.java:924) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:944) at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765) at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300) at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186) at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108) at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97) at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.update.DirectUpdateHandler2.getQuery(DirectUpdateHandler2.java:319) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:349) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:772) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at
[jira] [Created] (LUCENE-5422) Postings lists deduplication
Dmitry Kan created LUCENE-5422: -- Summary: Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated LUCENE-5422: --- Labels: gsoc2014 (was: ) Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan Labels: gsoc2014 The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't
[ https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5394: - Attachment: SOLR-5394_keep_threads_original_value.patch During debugging with facet.threads=0 I have noticed that when we advanced to parseParams method, threads=0 and this method resets it to -1 which breaks the latter logic. So I added a condition around threads=-1. I would be happy if someone can review this little patch and give feedback. facet.method=fcs seems to be using threads when it shouldn't Key: SOLR-5394 URL: https://issues.apache.org/jira/browse/SOLR-5394 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Michael McCandless Attachments: SOLR-5394_keep_threads_original_value.patch I built a wikipedia index, with multiple fields for faceting. When I do facet.method=fcs with facet.field=dateFacet and facet.field=userNameFacet, and then kill -QUIT the java process, I see a bunch (46, I think) of facetExecutor-7-thread-N threads had spun up. But I thought threads for each field is turned off by default? Even if I add facet.threads=0, it still spins up all the threads. I think something is wrong in SimpleFacets.parseParams; somehow, that method returns early (because localParams) is null, leaving threads=-1, and then the later code that would have set threads to 0 never runs. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't
[ https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5394: - Attachment: (was: SOLR-5394_keep_threads_original_value.patch) facet.method=fcs seems to be using threads when it shouldn't Key: SOLR-5394 URL: https://issues.apache.org/jira/browse/SOLR-5394 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Michael McCandless Attachments: SOLR-5394_keep_threads_original_value.patch I built a wikipedia index, with multiple fields for faceting. When I do facet.method=fcs with facet.field=dateFacet and facet.field=userNameFacet, and then kill -QUIT the java process, I see a bunch (46, I think) of facetExecutor-7-thread-N threads had spun up. But I thought threads for each field is turned off by default? Even if I add facet.threads=0, it still spins up all the threads. I think something is wrong in SimpleFacets.parseParams; somehow, that method returns early (because localParams) is null, leaving threads=-1, and then the later code that would have set threads to 0 never runs. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't
[ https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5394: - Attachment: SOLR-5394_keep_threads_original_value.patch facet.method=fcs seems to be using threads when it shouldn't Key: SOLR-5394 URL: https://issues.apache.org/jira/browse/SOLR-5394 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Michael McCandless Attachments: SOLR-5394_keep_threads_original_value.patch I built a wikipedia index, with multiple fields for faceting. When I do facet.method=fcs with facet.field=dateFacet and facet.field=userNameFacet, and then kill -QUIT the java process, I see a bunch (46, I think) of facetExecutor-7-thread-N threads had spun up. But I thought threads for each field is turned off by default? Even if I add facet.threads=0, it still spins up all the threads. I think something is wrong in SimpleFacets.parseParams; somehow, that method returns early (because localParams) is null, leaving threads=-1, and then the later code that would have set threads to 0 never runs. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845661#comment-13845661 ] Dmitry Kan commented on SOLR-1604: -- [~rebeccatang] you can define a solr core (even for a single index) and use its lib directory to copy the complex phrase parser jar. https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml HTH Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhrase-4.2.1.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795092#comment-13795092 ] Dmitry Kan commented on SOLR-1726: -- [~sstults] Thanks for the use case. This leans towards offline as well, but certainly makes sense. Our current use case is realtime though and we attacking the problem of deep paging differently at the moment (on the querying client side). Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.6 Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5277) Stamp core names on log entries for certain classes
Dmitry Kan created SOLR-5277: Summary: Stamp core names on log entries for certain classes Key: SOLR-5277 URL: https://issues.apache.org/jira/browse/SOLR-5277 Project: Solr Issue Type: Bug Components: search, update Affects Versions: 4.4, 4.3.1, 4.5 Reporter: Dmitry Kan It is handy that certain Java classes stamp a [coreName] on a log entry. It would be useful for multicore setup if more classes would stamp this information. In particular we came accross a situaion with commits coming in a quick succession to the same multicore shard and found it to be hard time figuring out was it the same core or different cores. The classes in question with log sample output: o.a.s.c.SolrCore 06:57:53.577 [qtp1640764503-13617] INFO org.apache.solr.core.SolrCore - SolrDeletionPolicy.onCommit: commits:num=2 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms; o.a.s.u.UpdateHandler 14:45:24.447 [commitScheduler-9-thread-1] INFO org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 06:57:53.591 [qtp1640764503-13617] INFO org.apache.solr.update.UpdateHandler - end_commit_flush o.a.s.s.SolrIndexSearcher 14:45:24.553 [commitScheduler-7-thread-1] INFO org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main The original question was posted on #solr and on SO: http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5200) Add REST support for reading and modifying Solr configuration
[ https://issues.apache.org/jira/browse/SOLR-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754452#comment-13754452 ] Dmitry Kan commented on SOLR-5200: -- One parameter relevant to us is mergeFactor. Add REST support for reading and modifying Solr configuration - Key: SOLR-5200 URL: https://issues.apache.org/jira/browse/SOLR-5200 Project: Solr Issue Type: New Feature Reporter: Steve Rowe Assignee: Steve Rowe There should be a REST API to allow full read access to, and write access to some elements of, Solr's per-core and per-node configuration not already covered by the Schema REST API: {{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and {{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of {{solr.properties}}). Use cases for runtime configuration modification include scripted setup, troubleshooting, and tuning. Tentative rules-of-thumb about configuration items that should not be modifiable at runtime: # Startup-only items, e.g. where to start core discovery # Items that are deprecated in 4.X and will be removed in 5.0 # Items that if modified should be followed by a full re-index Some issues to consider: Persistence: How (and even whether) to handle persistence for configuration modifications via REST API is not clear - e.g. persisting the entire config file or having one or more sidecar config files that get persisted. The extent of what should be modifiable will likely affect how persistence is implemented. For example, if the only {{solrconfig.xml}} modifiable items turn out to be plugin configurations, an alternative to full-{{solrconfig.xml}} persistence could be individual plugin registration of runtime config modifiable items, along with per-plugin sidecar config persistence. Live reload: Most (if not all) per-core configuration modifications will require core reload, though it will be a live reload, so some things won't be modifiable, e.g. {{dataDir}} and {{IndexWriter}} related settings in {{indexConfig}} - see SOLR-3592. (Should a full reload be supported to handle changes in these places?) Interpolation aka property substitution: I think it would be useful on read access to optionally return raw values in addition to the interpolated values, e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. interpolated value {{8983}}. Modification requests will accept raw values - property interpolation will be applied. At present interpolation is done once, at parsing time, but if property value modification is supported via the REST API, an alternative could be to delay interpolation until values are requested; in this way, property value modification would not trigger re-parsing the affected configuration source. Response format: Similarly to the schema REST API, results could be returned in XML, JSON, or any other response writer's output format. Transient cores: How should non-loaded transient cores be handled? Simplest thing would be to load the transient core before handling the request, just like other requests. Below I provide an exhaustive list of configuration items in the files in question and indicate which ones I think could be modifiable at runtime. I don't mean to imply that these must all be made modifiable, or for those that are made modifiable, that they must be made so at once - a piecemeal approach will very likely be more appropriate. h2. {{solrconfig.xml}} Note that XIncludes and includes via Document Entities won't survive a modification request (assuming persistence is via overwriting the original file). ||XPath under {{/config/}}||Should be modifiable via REST API?||Rationale||Description|| |{{luceneMatchVersion}}|No|Modifying this should be followed by a full re-index|Controls what version of Lucene various components of Solr adhere to| |{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available via classloader for {{solrconfig.xml}} and {{schema.xml}}| |{{dataDir}}|No|Not supported by live RELOAD|Holds all index data| |{{directoryFactory}}|No|Not supported by live RELOAD|index directory factory| |{{codecFactory}}|No|Modifying this should be followed by a full re-index|index codec factory, per-field SchemaCodecFactory by default| |{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it should be possible to modify an already Managed schema's mutability|Managed or Classic (non-mutable) schema factory| |{{indexConfig}}|No|{{IndexWriter}}-related settings not supported by live RELOAD|low-level indexing behavior| |{{jmx}}|Yes| |Enables JMX if an MBeanServer is found| |{{updateHandler@class}}|No| |Defaults to DirectUpdateHandler2|
[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets
[ https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-4903: - Affects Version/s: 4.3.1 Solr sends all doc ids to all shards in the query counting facets - Key: SOLR-4903 URL: https://issues.apache.org/jira/browse/SOLR-4903 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.4, 4.3, 4.3.1 Reporter: Dmitry Kan Setup: front end solr and shards. Summary: solr frontend sends all doc ids received from QueryComponent to all shards which causes POST request buffer size overflow. Symptoms: The query is: http://pastebin.com/0DndK1Cs I have omitted the shards parameter. The router log: http://pastebin.com/FTVH1WF3 Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: http://pastebin.com/exwCx3LX Suggestion: change the data structure in FacetComponent to send only doc ids that belong to a shard and not a concatenation of all doc ids. Why is this important: for scaling. Adding more shards will result in overflowing the POST request buffer at some point anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686964#comment-13686964 ] Dmitry Kan commented on SOLR-1726: -- Scrolling is not intended for real time user requests, it is intended for cases like scrolling over large portions of data that exists within elasticsearch to reindex it for example. are there any other applications for this except re-indexing? Also, is it known, how internally the scrolling is implemented, i.e. is it efficient in transferring to the client of only what is needed? Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.4 Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2082) Performance improvement for merging posting lists
[ https://issues.apache.org/jira/browse/LUCENE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683303#comment-13683303 ] Dmitry Kan commented on LUCENE-2082: hi [~whzz], Would you be potentially interested in other postings lists idea that came up recently? http://markmail.org/message/6ro7bbez3v3y5mfx#query:+page:1+mid:tywtrjjcfdbzww6f+state:results It can be of quite high impact on the index size and hopefully relatively easy to start an experiment using the lucene codec technology. Just in case you would get interested. Performance improvement for merging posting lists - Key: LUCENE-2082 URL: https://issues.apache.org/jira/browse/LUCENE-2082 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael Busch Priority: Minor Labels: gsoc2013 Fix For: 4.4 A while ago I had an idea about how to improve the merge performance for posting lists. This is currently by far the most expensive part of segment merging due to all the VInt de-/encoding. Not sure if an idea for improving this was already mentioned in the past? So the basic idea is it to perform a raw copy of as much posting data as possible. The reason why this is difficult is that we have to remove deleted documents. But often the fraction of deleted docs in a segment is rather low (10%?), so it's likely that there are quite long consecutive sections without any deletions. To find these sections we could use the skip lists. Basically at any point during the merge we would find the skip entry before the next deleted doc. All entries to this point can be copied without de-/encoding of the VInts. Then for the section that has deleted docs we perform the normal way of merging to remove the deletes. Then we check again with the skip lists if we can raw copy the next section. To make this work there are a few different necessary changes: 1) Currently the multilevel skiplist reader/writer can only deal with fixed-size skips (16 on the lowest level). It would be an easy change to allow variable-size skips, but then the MultiLevelSkipListReader can't return numSkippedDocs anymore, which SegmentTermDocs needs - change 2) 2) Store the last docID in which a term occurred in the term dictionary. This would also be beneficial for other use cases. By doing that the SegmentTermDocs#next(), #read() and #skipTo() know when the end of the postinglist is reached. Currently they have to track the df, which is why after a skip it's important to take the numSkippedDocs into account. 3) Change the merging algorithm according to my description above. It's important to create a new skiplist entry at the beginning of every block that is copied in raw mode, because its next skip entry's values are deltas from the beginning of the block. Also the very first posting, and that one only, needs to be decoded/encoded to make sure that the payload length is explicitly written (i.e. must not depend on the previous length). Also such a skip entry has to be created at the beginning of each source segment's posting list. With change 2) we don't have to worry about the positions of the skip entries. And having a few extra skip entries in merged segments won't hurt much. If a segment has no deletions at all this will avoid any decoding/encoding of VInts (best case). I think it will also work great for segments with a rather low amount of deletions. We should probably then have a threshold: if the number of deletes exceeds this threshold we should fall back to old style merging. I haven't implemented any of this, so there might be complications I haven't thought about. Please let me know if you can think of reasons why this wouldn't work or if you think more changes are necessary. I will probably not have time to work on this soon, but I wanted to open this issue to not forget about it :). Anyone should feel free to take this! Btw: I think the flex-indexing branch would be a great place to try this out as a new codec. This would also be good to figure out what APIs are needed to make merging fully flexible as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets
Dmitry Kan created SOLR-4903: Summary: Solr sends all doc ids to all shards in the query counting facets Key: SOLR-4903 URL: https://issues.apache.org/jira/browse/SOLR-4903 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.4 Reporter: Dmitry Kan Setup: front end solr and shards. Summary: solr frontend sends all doc ids received from QueryComponent to all shards which causes POST request buffer size overflow. Symptoms: The query is: http://pastebin.com/0DndK1Cs I have omitted the shards parameter. The router log: http://pastebin.com/FTVH1WF3 Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: http://pastebin.com/exwCx3LX Suggestion: change the data structure in FacetComponent to send only doc ids that belong to a shard and not a concatenation of all doc ids. Why is this important: for scaling. Adding more shards will result in overflowing the POST request buffer at some point anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact
Dmitry Kan created SOLR-4904: Summary: Send internal doc ids and index version in distributed faceting to make queries more compact Key: SOLR-4904 URL: https://issues.apache.org/jira/browse/SOLR-4904 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.3, 3.4 Reporter: Dmitry Kan This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and works nice with fixing the root cause of issue SOLR-4903. Basically QueryComponent could send internal lucene ids along with the index version number so that in subsequent queries to other solr components, like FacetComponent, the internal ids would be sent. The index version is required to ensure we deal with the same index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets
[ https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-4903: - Affects Version/s: 4.3 Solr sends all doc ids to all shards in the query counting facets - Key: SOLR-4903 URL: https://issues.apache.org/jira/browse/SOLR-4903 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.4, 4.3 Reporter: Dmitry Kan Setup: front end solr and shards. Summary: solr frontend sends all doc ids received from QueryComponent to all shards which causes POST request buffer size overflow. Symptoms: The query is: http://pastebin.com/0DndK1Cs I have omitted the shards parameter. The router log: http://pastebin.com/FTVH1WF3 Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: http://pastebin.com/exwCx3LX Suggestion: change the data structure in FacetComponent to send only doc ids that belong to a shard and not a concatenation of all doc ids. Why is this important: for scaling. Adding more shards will result in overflowing the POST request buffer at some point anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644450#comment-13644450 ] Dmitry Kan commented on SOLR-1726: -- does the deep paging issue apply to facet paging? Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.3 Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java, SOLR-1726.patch, SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582207#comment-13582207 ] Dmitry Kan commented on LUCENE-1486: OK, after some study, here is what we did: we treat the AND clauses as spanNearQuery objects. So, the a AND b becomes %a b%~slop, where %%~ operator is an unordered SpanNear query (change to QueryParser.jj was required for this). When there is a case of NOT clause with nested clauses: NOT( (a AND b) OR (c AND d) ) = NOT ( %a b%~slop OR %c d%~slop ) , we need to handle SpanNearQueries in the addComplexPhraseClause method. In order to handle this, we just added to the if statement: [code] if (qc instanceof BooleanQuery) { [/code] the following else if statement: [code] else if (childQuery instanceof SpanNearQuery) { ors.add((SpanQuery)childQuery); } [/code] Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Affects Versions: 2.4 Reporter: Mark Harwood Priority: Minor Fix For: 4.2, 5.0 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580828#comment-13580828 ] Dmitry Kan commented on LUCENE-1486: Can someone give me a hand on this parser (despite the jira is so old)? We need to have the NOT logic work properly in the boolean sense, that is the following should work correctly: a AND NOT b a AND NOT (b OR c) a AND NOT ((b OR c) AND (d OR e)) Can anybody guide me here? Is it at all possible to accomplish this with this original CPQP implementation? I would not be afraid of changing QueryParser.jj lexical specification, if the task requires it. Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Affects Versions: 2.4 Reporter: Mark Harwood Priority: Minor Fix For: 4.2, 5.0 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557053#comment-13557053 ] Dmitry Kan commented on SOLR-1604: -- Hello! Great work! I have two questions: 1) What would it take to incorporate phrase searches into this extended query parser? \a b\ c~100 that is, a b (phrase search) is found in that order and exactly side by side =100 tokens away from c. 2) does this implementation support the Boolean operators, like AND, OR, NOT (at least OR and NOT are supported as far as I can see)? Can they be nested? Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-1604: - Attachment: ComplexPhrase_solr_3.4.zip This is ComplexPhrase project based on the version submitted on 21/Jul/11. It compiles and runs under solr 3.4. I have uncommented the tests in /org/apache/solr/search/ComplexPhraseQParserPluginTest.java and they passed. Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3755) shard splitting
[ https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553117#comment-13553117 ] Dmitry Kan commented on SOLR-3755: -- Somewhat related: control naming of shards. This could be applicable for both hashing based collections and custom sharding based collections. shardNames=myshard1,myshard2,myshard3? Would this suit to logical (e.g. date based) sharding as well? Do you plan to support such a sharding type in the current shard splitting implementation? Not sure, if this helps: we have implemented our own custom date based sharding (splitting and routing) for solr 3.x and found it to be the most logical way of sharding our data (both from the load balancing and use case point of view). The routing implementation is done via loading a custom shards config file that contains mapping of date ranges to shards. shard splitting --- Key: SOLR-3755 URL: https://issues.apache.org/jira/browse/SOLR-3755 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Yonik Seeley Attachments: SOLR-3755.patch, SOLR-3755.patch We can currently easily add replicas to handle increases in query volume, but we should also add a way to add additional shards dynamically by splitting existing shards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1337) Spans and Payloads Query Support
[ https://issues.apache.org/jira/browse/SOLR-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536838#comment-13536838 ] Dmitry Kan commented on SOLR-1337: -- [Jan Høydahl] at Lucene query parser level. New token FUZZY_SLOP_SHARP (name isn't probably the best sound, but can be changed) has been introduced in the QueryParser.jj and supportive code implemented. The syntax is same as that of ~ operator, i.e. term1 term2 ... termn #slope. Spans and Payloads Query Support Key: SOLR-1337 URL: https://issues.apache.org/jira/browse/SOLR-1337 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 4.1 It would be really nice to have query side support for: Spans and Payloads. The main ingredient missing at this point is QueryParser support and a output format for the spans and the payload spans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1337) Spans and Payloads Query Support
[ https://issues.apache.org/jira/browse/SOLR-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534860#comment-13534860 ] Dmitry Kan commented on SOLR-1337: -- [~janhoy] Jan: we implemented a new operator for Lucene / SOLR 3.4 that does exactly what you say, see: https://issues.apache.org/jira/browse/LUCENE-3758?focusedCommentId=13207710page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13207710 if you or anyone else needs a patch, just let me know. Spans and Payloads Query Support Key: SOLR-1337 URL: https://issues.apache.org/jira/browse/SOLR-1337 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 4.1 It would be really nice to have query side support for: Spans and Payloads. The main ingredient missing at this point is QueryParser support and a output format for the spans and the payload spans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3858) Doc-to-shard assignment based on range property on shards
[ https://issues.apache.org/jira/browse/SOLR-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486052#comment-13486052 ] Dmitry Kan commented on SOLR-3858: -- Is there an idea of how the range property should be defined? Something like this in solrconfig: docIdToShardAssignment rangeFieldFieldName/rangeField !-- e.g. a date field -- rangeStart20121001/rangeStart !-- granularity probably should be customizable -- rangeEnd20121031/rangeEnd /docIdToShardAssignment ? Does this property (if defined) turn the sharding scheme into logical sharding? Doc-to-shard assignment based on range property on shards --- Key: SOLR-3858 URL: https://issues.apache.org/jira/browse/SOLR-3858 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Anything that maps a document id to a shard should consult the ranges defined on the shards (currently indexing and real-time get). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3585) processing updates in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445749#comment-13445749 ] Dmitry Kan commented on SOLR-3585: -- Mikhail, True thanks for link. In any case, the test proves that _there is_ a gain, even for a non-server horse. I might find a way to run this on a server + (possibly) play with solrj. In our use case, local streaming is used for larger batch (re-)processing and solrj for relatively tiny updates. processing updates in multiple threads -- Key: SOLR-3585 URL: https://issues.apache.org/jira/browse/SOLR-3585 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0-ALPHA Reporter: Mikhail Khludnev Priority: Minor Attachments: multithreadupd.patch, report.tar.gz, SOLR-3585.patch, SOLR-3585.patch Hello, I'd like to contribute update processor which forks many threads which concurrently process the stream of commands. It may be beneficial for users who streams many docs through single request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3585) processing updates in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445246#comment-13445246 ] Dmitry Kan commented on SOLR-3585: -- Summary: 1/2/4/8 threads There was a gain for 2 threads, after that increasing amount of threads didn't matter for the indexing speed (again, can be too little data, too slow machine vs server) URL: http://localhost:8983/solr/update?commit=trueseparator=%09escape=\update.chain=threadsbacking.chain=logrunstream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvstream.contentType=text/csv;charset=utf-8 Intel(R) Core2 Duo CPU T6600 @ 2.20GHz RAM: 4 GB OS: Windows 7 64 bit PC was moderately used during the indexing (Internet surfing mostly) Solr started with: java -Xmx512M -Xms512M -jar start.jar Stats and Log extract: --- one thread --- 565576 milliseconds (9.43 seconds) size of data/index: 1.61 GB 30.08.2012 22:34:10 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommi t=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream .file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example \data\book_edition.tsvupdate.chain=threads} {add=[/m/0g9nk5p, /m/0g9rf0q, /m/0g j6_r3, /m/0gj702y, /m/0gk99b7, /m/0g461_s, /m/0g4thbr, /m/0g4vp__, /m/0gkgw7x, / m/0gb390f, ... (3401498 adds)]} 0 565576 --- two threads --- 400085 milliseconds (6.67 seconds) size of data/index: 916MB 30.08.2012 22:09:16 org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1 30.08.2012 22:15:56 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads} {add=[/m/0g9nk5p, /m/0gj6_r3, /m/0gkgw7x, /m/0g9_qhd, /m/0g9_r1t, /m/0g9jxyt, /m/0g4wdtq, /m/0d0s9y1, /m/0d9pb_v, /m/0d0tfz7, ... (1838414 adds)]} 0 400085 30.08.2012 22:15:56 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads} {add=[/m/0g9rf0q, /m/0gj702y, /m/0gk99b7, /m/0g461_s, /m/0g4thbr, /m/0g4vp__, /m/0gb390f, /m/0gb34pf, /m/0h8fm59, /m/0g99vfk, ... (1563084 adds)]} 0 400085 --- four threads --- 423969 milliseconds (7.07 seconds) size of data/index: 915 MB 30.08.2012 21:52:03 org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads} {add=[/m/0g9nk5p, /m/0dgjnsn, /m/0d0s539, /m/0d0t8b3, /m/0d9n2sg, /m/0d0s18j, /m/07n7lbm, /m/07n7mh6, /m/07n7mq0, /m/07n7n_d, ... (844367 adds)]} 0 r 30.08.2012 21:59:07 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/0gj702y, /m/0gk99b7, /m/0gkgw7x, /m/0gb390f, /m/0g9_qhd, /m/0h2ymt3, /m/0g4wdtq, /m/0d0s9y1, /m/0d0tfz7, /m/0d0tdf1, ... (815450 adds)]} 0 423969 30.08.2012 21:59:07 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/0g9rf0q, /m/0g461_s, /m/0g4thbr, /m/0g4vp__, /m/0gb34pf, /m/0h8fm59, /m/0g99vfk, /m/0g9_r1t, /m/0g9jxyt, /m/0ghc2b5, ... (836534 adds)]} 0 423969 30.08.2012 21:59:07 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream.file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads} {add=[/m/0gj6_r3, /m/0d0sfq_, /m/0d9mhx1, /m/07tc6lf, /m/07tc75v, /m/07tc7jq, /m/07tc8kz, /m/07tc8wr, /m/07tc_cn, /m/07tc_fl, ... (905147 adds)]} 0 423969 --- eight threads --- 431710 milliseconds (7.20 seconds) size of data/index: 1.00 GB 30.08.2012 22:47:43 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logruncommit=truestream.contentType=text/csv;charset%3Dutf-8separator=%09escape=\stream .file=d:\Projects\information_retrieval\solr\apache-solr-4.0.0-BETA\solr\example\data\book_edition.tsvupdate.chain=threads} {add=[/m/0gk99b7, /m/0d0vb6s, /m/07t8mw8,
[jira] [Commented] (SOLR-3585) processing updates in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409329#comment-13409329 ] Dmitry Kan commented on SOLR-3585: -- Mikhail, thanks for the stats. They look good to me! And prove that the patch should help increasing the indexing throughput. In about 2,5 weeks I should be able to try your patch and tell you the results on my hardware. processing updates in multiple threads -- Key: SOLR-3585 URL: https://issues.apache.org/jira/browse/SOLR-3585 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Mikhail Khludnev Priority: Minor Attachments: SOLR-3585.patch, multithreadupd.patch, report.tar.gz Hello, I'd like to contribute update processor which forks many threads which concurrently process the stream of commands. It may be beneficial for users who streams many docs through single request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3585) processing updates in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408234#comment-13408234 ] Dmitry Kan commented on SOLR-3585: -- Mikhail, this sounds interesting to me. Have you tested this already to prove that there is a gain in time using your approach? Also did you find some optimal parameters, like amount of threads, so that some sensible default values could be set? processing updates in multiple threads -- Key: SOLR-3585 URL: https://issues.apache.org/jira/browse/SOLR-3585 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Mikhail Khludnev Priority: Minor Attachments: SOLR-3585.patch, multithreadupd.patch Hello, I'd like to contribute update processor which forks many threads which concurrently process the stream of commands. It may be beneficial for users who streams many docs through single request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113375#comment-13113375 ] Dmitry Kan commented on SOLR-2403: -- Peter: In one of the distributed faceting sessions we have found out, that the zero facets can be filtered by (undocumented?) facet.zeros parameter. Does anything change, if you set it to 0 (filtering out zero-facets)? Problem with facet.sort=lex, shards, and facet.mincount --- Key: SOLR-2403 URL: https://issues.apache.org/jira/browse/SOLR-2403 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: RHEL5, Ubuntu 10.04 Reporter: Peter Cline I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 1.4.1. I can if necessary and update. Solr is not returning the proper number of facet values when sorting alphabetically, using distributed search, and using a facet.mincount that excludes some of the values in the first facet.limit values. Easiest explained by example. Sorting alphabetically, the first 20 values for my subject_facet field have few documents. 19 facet values have only 1 document associated, and 1 has 2 documents. There are plenty after that have more than 2. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2 {code} comes back with the expected 20 facet values with = 2 documents associated. If I add a shards parameter that points back to itself, the result is different. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr {code} comes back with only 1 facet value: the single value in the first 20 that had more than 1 document. It appears to me that mincount is ignored when doing the original query to the shards, then applied afterwards. Let me know if you need any more info. Thanks, Peter -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org