Re: DocValued SortableText Field is slower than Non DocValued String Field for Facet
I'm not sure about _performance_, but I'm pretty sure you don't want to be faceting on docValued SortableTextField (and faceting on non-docValued SortableTextField, though I think technically possible, works against uninverted _indexed_values, so ends up doing something entirely different): https://issues.apache.org/jira/browse/SOLR-13056. TL;DR: with SortableTextField bulk faceting happens over docValues (which for SortableTextField contains the full sort value string) and refinement happens against indexed values (which are tokenized). So it can behave very strangely, at least in multi-shard collections. See also: https://issues.apache.org/jira/browse/SOLR-8362 Quick clarification, you say "non Docvalued String Field" ... I'm assuming you're talking about "StrField", not "TextField". wrt performance difference, I'm willing to bet (though not certain) that you're really simply noticing a discrepancy between docValues and non-docValues faceting -- accordingly, for your use case I'd expect faceting against StrField _with_ docValues to have similar performance to SortableTextField with docValues. Further possibly-relevant discussion can be found in the following thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202006.mbox/%3CCAF%3DheHFd6GBABzKzDQPTfpYUUQJXxYwue4OC86QOm_AR0X3_ZQ%40mail.gmail.com%3E On Thu, Jan 28, 2021 at 7:25 PM Jae Joo wrote: > I am wondering that the performance of facet of DocValued SortableText > Field is slower than non Docvalued String Field. > > Does anyone know why? > > > Thanks, > > Jae >
DocValued SortableText Field is slower than Non DocValued String Field for Facet
I am wondering that the performance of facet of DocValued SortableText Field is slower than non Docvalued String Field. Does anyone know why? Thanks, Jae
Re: Case insensitive search on String field
In a word, “no”. The string type is intentionally primitive, no analysis/case changing is done at all. You say “you cannot reindex the data”. Why not? Just due to time constraints or is the original data no longer available? If all the fields are stored, you can pull the docs from the collection and index it into a new collection. See: https://lucene.apache.org/solr/guide/8_1/collections-api.html, the REINDEXCOLLECTION command. Best, Erick > On Jul 25, 2020, at 2:22 PM, Anshuman Singh wrote: > > Hi, > > We missed the fact that case insensitive search doesn't work with > field type "string". We have 3B docs indexed and we cannot reindex the data. > > Now, as schema changes require reindexing, is there any other way to > achieve case insensitive search on string fields? > > Regards, > Anshuman
Case insensitive search on String field
Hi, We missed the fact that case insensitive search doesn't work with field type "string". We have 3B docs indexed and we cannot reindex the data. Now, as schema changes require reindexing, is there any other way to achieve case insensitive search on string fields? Regards, Anshuman
Re: string field max size
Thanks Erick for this last confirmation. I've at the end I've used the standard "text_ws": And the field On Fri, Sep 6, 2019 at 2:54 AM Erick Erickson wrote: > bq. What I do not understand is what happens to the Analyzers, Tokenizers, > and > Filters in the indexing chain > > They are irrelevant. The analysis chain is only executed when > indexed=true. > > Best, > Erick > > > On Sep 5, 2019, at 9:03 AM, Vincenzo D'Amore wrote: > > > > What I do not understand is what happens to the Analyzers, Tokenizers, > and > > Filters in the indexing chain > > -- Vincenzo D'Amore
Re: string field max size
bq. What I do not understand is what happens to the Analyzers, Tokenizers, and Filters in the indexing chain They are irrelevant. The analysis chain is only executed when indexed=true. Best, Erick > On Sep 5, 2019, at 9:03 AM, Vincenzo D'Amore wrote: > > What I do not understand is what happens to the Analyzers, Tokenizers, and > Filters in the indexing chain
Re: string field max size
I agree, stored=true and indexed =false should resolve this size issue. On Thu, 5 Sep 2019 at 21:54, Erick Erickson wrote: > Use a text field with stored=true and indexed=false? That'll allow you to > return it... > > On Thu, Sep 5, 2019, 07:04 Vincenzo D'Amore wrote: > > > Hi all, > > > > sorry for the silly question, I need to store in Solr a string field > larger > > than 32k (index="false"). > > > > Given that storing field larger than 32k rises an exception: > > "DocValuesField "filterQuery" is too large, must be <= 32766", I thought > to > > use predefined type text_ws. > > > > Any suggestions? > > > > Thanks in advance and best regards, > > Vincenzo > > > > -- > > Vincenzo D'Amore > > > -- Thanks Jitendra
Re: string field max size
Thanks Erick for the prompt answer. What I do not understand is what happens to the Analyzers, Tokenizers, and Filters in the indexing chain. Are they executed or not? Well, answering to my own question I think no, but so what's the difference between string and text when they are not indexed? Just the way how are they stored and retrieved? On Thu, Sep 5, 2019 at 1:54 PM Erick Erickson wrote: > Use a text field with stored=true and indexed=false? That'll allow you to > return it... > > On Thu, Sep 5, 2019, 07:04 Vincenzo D'Amore wrote: > > > Hi all, > > > > sorry for the silly question, I need to store in Solr a string field > larger > > than 32k (index="false"). > > > > Given that storing field larger than 32k rises an exception: > > "DocValuesField "filterQuery" is too large, must be <= 32766", I thought > to > > use predefined type text_ws. > > > > Any suggestions? > > > > Thanks in advance and best regards, > > Vincenzo > > > > -- > > Vincenzo D'Amore > > > -- Vincenzo D'Amore
Re: string field max size
Use a text field with stored=true and indexed=false? That'll allow you to return it... On Thu, Sep 5, 2019, 07:04 Vincenzo D'Amore wrote: > Hi all, > > sorry for the silly question, I need to store in Solr a string field larger > than 32k (index="false"). > > Given that storing field larger than 32k rises an exception: > "DocValuesField "filterQuery" is too large, must be <= 32766", I thought to > use predefined type text_ws. > > Any suggestions? > > Thanks in advance and best regards, > Vincenzo > > -- > Vincenzo D'Amore >
string field max size
Hi all, sorry for the silly question, I need to store in Solr a string field larger than 32k (index="false"). Given that storing field larger than 32k rises an exception: "DocValuesField "filterQuery" is too large, must be <= 32766", I thought to use predefined type text_ws. Any suggestions? Thanks in advance and best regards, Vincenzo -- Vincenzo D'Amore
Range query on multivalued string field results in useless highlighting
Range queries against mutivalued string fields produces useless highlighting, even though "hl.highlightMultiTerm":"true" I have uncovered what I believe is a bug. At the very lease it is a difference in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1). I have a Field defined in my schema as: I am using a query containing a Range clause and I am using highlighting to get the list of values that match the range query. All examples below were using the appropriate Solr Admin Server Query page. The range query using Solr v5.1.0 produces CORRECT and useful results: { "responseHeader": { "status": 0, "QTime": 366, "params": { "q": "ResourceCorrespondent:[A TO B}", "hl": "true", "indent": "true", "hl.preserveMulti": "true", "fl": "ResourceCorrespondent,ResourceID", "hl.requireFieldMatch": "true", "hl.usePhraseHighlighter": "true", "hl.fl": "ResourceCorrespondent", "wt": "json", "hl.highlightMultiTerm": "true", "_": "1553275722025" } }, "response": { "numFound": 999, "start": 0, "docs": [ { "ResourceCorrespondent": [ "Stanley, Wendell M.", "Avery, Roy" ], "ResourceID": "CCAAHG" }, { "ResourceCorrespondent": [ "Avery, Roy" ], "ResourceID": "CCGMDS" }, ... lots more docs, then ] }, ... we get to the highlighting portion of the response ... this tells me which values of each ResourceCorrespondent field ... actually matching the query "highlighting": { "CCAAHG": { "ResourceCorrespondent": [ "Avery, Roy" ] }, "CCGMDS": { "ResourceCorrespondent": [ "Avery, Roy" ] }, "BBACKV": { "ResourceCorrespondent": [ "American Institute of Biological Sciences", "Albritton, Errett C." ] }, ... lots more useful highlight values. Note two matching values ... for document BBACKV. } *** *** However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the response is basically the same including the number of documents found { "responseHeader":{ "status":0, "QTime":245, "params":{ "q":"ResourceCorrespondent:[A TO B}", "hl":"on", "hl.preserveMulti":"true", "fl":"ResourceID, ResourceCorrespondent", "hl.requireFieldMatch":"true", "hl.fl":"ResourceCorrespondent", "hightlightMultiTerm":"true", "wt":"json", "_":"1553105129887", "usePhraseHighLighter":"true"}}, "response":{"numFound":999,"start":0,"docs":[ The documents are in a different order, but that doesn't matter. The problem is with the lighlighting which is effectively empty. I don't know what values in each document actually matched the query: "highlighting":{ "QQBBLX":{}, "QQBCLN":{}, "QQBCLM":{}, ... etc. *** NOTE: The data is the same for all Solr versions and the Solr indexes were rebuilt for each Solr version. *** Changing to using "=unified", the highlighting looks like: "highlighting":{ "QQBBLX":{ "ResourceCorrespondent":[]}, "QQBCLN":{ "ResourceCorrespondent":[]}, "QQBCLM":{ "ResourceCorrespondent":[]}, *** Closer but still no useful values *** NOTE: if I change only the query to be a wildcard query to q="ResourceCorrespondent:A*" the highlighting is correct in both Solr v7.5.0 and v7.7.1: "highlighting":{ "QQBBLX":{ "ResourceCorrespondent":["American Public Health Association"]}, "QQBCLN":{ "ResourceCorrespondent":["Abram, Morris B."]}, "QQBCLM":{ "ResourceCorrespondent":["Abram, Morris B."]}, ... etc. *** This makes me think there is some problem with a Range query feeding the Highlighter code. *** All variations of hl specs or other query parameters do not fix the problem. The wildcard query is my current work around but there still is a problem with range queries: So there is some incompatibility among: 1) A multivalued string field AND 2) A range query against that field AND 3) Highlighting The highlight portion of the response is effectively "empty" I don't know when this issue was first introduced. I have recently been updating from 5.1.0 to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening versions but I gave up to save my sanity. --Karl
Re: Solr filter query on STRING field [Was:Re: solr filter query on text field]
First one treats space as end of operation, so the second keyword is searched against default field (id). Try putting the whole thing into the quotes. Or use Field Query Parser: https://lucene.apache.org/solr/guide/7_5/other-parsers.html#field-query-parser Regards, Alex. On Wed, Oct 24, 2018, 4:59 AM Marek Tichy, wrote: > Hi, > > I'm having troubles with the filter query on a multiple string field, > specifically with a space between words. Looking at the histogram and > values using Solr UI it correctly shows that the indexing stores the > string "Key case" as it should. However the following filter queries: > > fq=sm_field_tags:Key case //doesn't work > fq=sm_field_tags:Key+case //doesn't work > fq=sm_field_tags:Key* //does work > fq=sm_field_tags:Key?case //does work > > > Debug shows (for the first case): > "filter_queries":["sm_field_tags:Key case"], > "parsed_filter_queries":["sm_field_tags:Key id:case"] > > Why does it parse to id: case ? Solr version is 7.4.0 > > Many thanks > Marek > > > > > > > > > > > bq. is there any difference if the fq field is a string field vs test > > > > Absolutely. string fields are not analyzed in any way. They're not > > tokenized. There are case sensitive. Etc. For example takd > > My dog > > as input. A string field will have a _single_ token "My dog.". It will > > not match a search on "my". It will not match a search on "dog". It > > won't even match "my dog." as a phrase since the case is different. It > > won't even match "My dog" because there's no period at the end. It > > will only match "My dog.". > >
Solr filter query on STRING field [Was:Re: solr filter query on text field]
Hi, I'm having troubles with the filter query on a multiple string field, specifically with a space between words. Looking at the histogram and values using Solr UI it correctly shows that the indexing stores the string "Key case" as it should. However the following filter queries: fq=sm_field_tags:Key case //doesn't work fq=sm_field_tags:Key+case //doesn't work fq=sm_field_tags:Key* //does work fq=sm_field_tags:Key?case //does work Debug shows (for the first case): "filter_queries":["sm_field_tags:Key case"], "parsed_filter_queries":["sm_field_tags:Key id:case"] Why does it parse to id: case ? Solr version is 7.4.0 Many thanks Marek > bq. is there any difference if the fq field is a string field vs test > > Absolutely. string fields are not analyzed in any way. They're not > tokenized. There are case sensitive. Etc. For example takd > My dog > as input. A string field will have a _single_ token "My dog.". It will > not match a search on "my". It will not match a search on "dog". It > won't even match "my dog." as a phrase since the case is different. It > won't even match "My dog" because there's no period at the end. It > will only match "My dog.".
Re: Json object values in solr string field
Thanks Alex/Shawn, Yeah currently we handling by writing some custom code from the response and calculating the assets, but we lossing the power of default stats and facet features when going with this approach. Also actually it's not duplicate data, but as per our current design the data resides like 2 docs for one account that we are planning to compress at the same time need to use stats and facet. I know it's quite complicated if we need to achieve both at the same time, i thinking about it how to solve. On Thu, Sep 27, 2018, 11:19 AM Alexandre Rafalovitch wrote: > If the duplicate data is only indexed, it is not actually duplicated. It is > only an index entry and the record ids where it shows. > > Regards, > Alex > > On Thu, Sep 27, 2018, 10:55 AM Balanathagiri Ayyasamypalanivel, < > bala.cit...@gmail.com> wrote: > > > Hi Alex, thanks, we have that set up already in place, we are thinking to > > optimize more to resign the data to avoid these duplication. > > > > Regards, > > Bala. > > > > On Thu, Sep 27, 2018, 10:31 AM Alexandre Rafalovitch > > > wrote: > > > > > Well, my feeling is that you are going in the wrong direction. And that > > > maybe you need to focus more on separating your - non solr - storage > > > representation and your - solr - search oriented representation. > > > > > > E.g. if your issue is storage, maybe you can focus on stored=false > > > indexed=true approach. > > > > > > Regards, > > > Alex > > > > > > On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, < > > > bala.cit...@gmail.com> wrote: > > > > > > > Any suggestions? > > > > Regards, > > > > Bala. > > > > > > > > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel < > > > > bala.cit...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > Thanks for the reply, actually we are planning to optimize the huge > > > > volume > > > > > of data. > > > > > > > > > > For example, in our current system we have as below, so we can do > > facet > > > > > pivot or stats to get the sum of asset_td for each acct, but the > data > > > > > growing lot whenever more asset getting added. > > > > > > > > > > Id | Accts| assetid | asset_td > > > > > 1| Acct1 | asset1 | 20 > > > > > 2| Acct1 | asset2 | 30 > > > > > 3| Acct2 | asset3 | 10 > > > > > 4| Acct3 | asset2 | 10 > > > > > > > > > > So we planned to change as > > > > > > > > > > Id | Accts | asset_s > > > > > 1 | Acct1 | [{"asset1": "20", "asset2":"30"}] > > > > > 2 | Acct2 | [{"asset3": "10"}] > > > > > 3 | Acct3 | [{"asset2": "10"}] > > > > > > > > > > But only draw back here is we have to parse the json to do the sum > of > > > the > > > > > values, is there any other way to handle this scenario. > > > > > > > > > > Regards, > > > > > Bala. > > > > > > > > > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey > > > wrote: > > > > > > > > > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: > > > > >> > Currently I am storing json object type of values in string > field > > in > > > > >> solr. > > > > >> > Using this field, in the code I am parsing json objects and > doing > > > sum > > > > of > > > > >> > the values under it. > > > > >> > > > > > >> > In solr, do we have any option in doing it by default when using > > the > > > > >> json > > > > >> > object field values. > > > > >> > > > > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know > > > > >> this. It has no idea that the data is JSON, and won't be able to > do > > > > >> anything special with the info contained there. > > > > >> > > > > >> Thanks, > > > > >> Shawn > > > > >> > > > > >> > > > > > > > > > >
Re: Json object values in solr string field
If the duplicate data is only indexed, it is not actually duplicated. It is only an index entry and the record ids where it shows. Regards, Alex On Thu, Sep 27, 2018, 10:55 AM Balanathagiri Ayyasamypalanivel, < bala.cit...@gmail.com> wrote: > Hi Alex, thanks, we have that set up already in place, we are thinking to > optimize more to resign the data to avoid these duplication. > > Regards, > Bala. > > On Thu, Sep 27, 2018, 10:31 AM Alexandre Rafalovitch > wrote: > > > Well, my feeling is that you are going in the wrong direction. And that > > maybe you need to focus more on separating your - non solr - storage > > representation and your - solr - search oriented representation. > > > > E.g. if your issue is storage, maybe you can focus on stored=false > > indexed=true approach. > > > > Regards, > > Alex > > > > On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, < > > bala.cit...@gmail.com> wrote: > > > > > Any suggestions? > > > Regards, > > > Bala. > > > > > > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel < > > > bala.cit...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > Thanks for the reply, actually we are planning to optimize the huge > > > volume > > > > of data. > > > > > > > > For example, in our current system we have as below, so we can do > facet > > > > pivot or stats to get the sum of asset_td for each acct, but the data > > > > growing lot whenever more asset getting added. > > > > > > > > Id | Accts| assetid | asset_td > > > > 1| Acct1 | asset1 | 20 > > > > 2| Acct1 | asset2 | 30 > > > > 3| Acct2 | asset3 | 10 > > > > 4| Acct3 | asset2 | 10 > > > > > > > > So we planned to change as > > > > > > > > Id | Accts | asset_s > > > > 1 | Acct1 | [{"asset1": "20", "asset2":"30"}] > > > > 2 | Acct2 | [{"asset3": "10"}] > > > > 3 | Acct3 | [{"asset2": "10"}] > > > > > > > > But only draw back here is we have to parse the json to do the sum of > > the > > > > values, is there any other way to handle this scenario. > > > > > > > > Regards, > > > > Bala. > > > > > > > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey > > wrote: > > > > > > > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: > > > >> > Currently I am storing json object type of values in string field > in > > > >> solr. > > > >> > Using this field, in the code I am parsing json objects and doing > > sum > > > of > > > >> > the values under it. > > > >> > > > > >> > In solr, do we have any option in doing it by default when using > the > > > >> json > > > >> > object field values. > > > >> > > > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know > > > >> this. It has no idea that the data is JSON, and won't be able to do > > > >> anything special with the info contained there. > > > >> > > > >> Thanks, > > > >> Shawn > > > >> > > > >> > > > > > >
Re: Json object values in solr string field
On 9/27/2018 8:53 AM, Balanathagiri Ayyasamypalanivel wrote: Thanks Shawn for your prompt response. Actually we have to filter on the query time while calculate the score. The challenge here is we should not add the asset and put as static field in the index time. The asset needs to be calculated while query time with some filters. Solr doesn't have that ability as far as I am aware. No matter how you slice this, you'll be writing custom code to handle it. In response to another part of the thread: search engines typically involve a lot of data duplication. It's usually faster to simply duplicate data in multiple documents than to try and normalize the data like a relational database does. Thanks, Shawn
Re: Json object values in solr string field
Hi Alex, thanks, we have that set up already in place, we are thinking to optimize more to resign the data to avoid these duplication. Regards, Bala. On Thu, Sep 27, 2018, 10:31 AM Alexandre Rafalovitch wrote: > Well, my feeling is that you are going in the wrong direction. And that > maybe you need to focus more on separating your - non solr - storage > representation and your - solr - search oriented representation. > > E.g. if your issue is storage, maybe you can focus on stored=false > indexed=true approach. > > Regards, > Alex > > On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, < > bala.cit...@gmail.com> wrote: > > > Any suggestions? > > Regards, > > Bala. > > > > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel < > > bala.cit...@gmail.com> wrote: > > > > > Hi, > > > > > > Thanks for the reply, actually we are planning to optimize the huge > > volume > > > of data. > > > > > > For example, in our current system we have as below, so we can do facet > > > pivot or stats to get the sum of asset_td for each acct, but the data > > > growing lot whenever more asset getting added. > > > > > > Id | Accts| assetid | asset_td > > > 1| Acct1 | asset1 | 20 > > > 2| Acct1 | asset2 | 30 > > > 3| Acct2 | asset3 | 10 > > > 4| Acct3 | asset2 | 10 > > > > > > So we planned to change as > > > > > > Id | Accts | asset_s > > > 1 | Acct1 | [{"asset1": "20", "asset2":"30"}] > > > 2 | Acct2 | [{"asset3": "10"}] > > > 3 | Acct3 | [{"asset2": "10"}] > > > > > > But only draw back here is we have to parse the json to do the sum of > the > > > values, is there any other way to handle this scenario. > > > > > > Regards, > > > Bala. > > > > > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey > wrote: > > > > > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: > > >> > Currently I am storing json object type of values in string field in > > >> solr. > > >> > Using this field, in the code I am parsing json objects and doing > sum > > of > > >> > the values under it. > > >> > > > >> > In solr, do we have any option in doing it by default when using the > > >> json > > >> > object field values. > > >> > > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know > > >> this. It has no idea that the data is JSON, and won't be able to do > > >> anything special with the info contained there. > > >> > > >> Thanks, > > >> Shawn > > >> > > >> > > >
Re: Json object values in solr string field
Thanks Shawn for your prompt response. Actually we have to filter on the query time while calculate the score. The challenge here is we should not add the asset and put as static field in the index time. The asset needs to be calculated while query time with some filters. Regards, Bala. On Thu, Sep 27, 2018, 10:35 AM Shawn Heisey wrote: > On 9/26/2018 12:46 PM, Balanathagiri Ayyasamypalanivel wrote: > > But only draw back here is we have to parse the json to do the sum of the > > values, is there any other way to handle this scenario. > > Solr cannot do that for you. You could put this in your indexing > software -- add up the numbers and put the result into a new field in > your Solr document, so that the information is already in the index when > you do your query. This could be done with a custom Update Processor (a > Solr plugin that you would need to write), but if you already have > custom indexing software, it's probably easier to simply change that > software than to try and write a plugin. > > Thanks, > Shawn > >
Re: Json object values in solr string field
On 9/26/2018 12:46 PM, Balanathagiri Ayyasamypalanivel wrote: But only draw back here is we have to parse the json to do the sum of the values, is there any other way to handle this scenario. Solr cannot do that for you. You could put this in your indexing software -- add up the numbers and put the result into a new field in your Solr document, so that the information is already in the index when you do your query. This could be done with a custom Update Processor (a Solr plugin that you would need to write), but if you already have custom indexing software, it's probably easier to simply change that software than to try and write a plugin. Thanks, Shawn
Re: Json object values in solr string field
Well, my feeling is that you are going in the wrong direction. And that maybe you need to focus more on separating your - non solr - storage representation and your - solr - search oriented representation. E.g. if your issue is storage, maybe you can focus on stored=false indexed=true approach. Regards, Alex On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, < bala.cit...@gmail.com> wrote: > Any suggestions? > Regards, > Bala. > > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel < > bala.cit...@gmail.com> wrote: > > > Hi, > > > > Thanks for the reply, actually we are planning to optimize the huge > volume > > of data. > > > > For example, in our current system we have as below, so we can do facet > > pivot or stats to get the sum of asset_td for each acct, but the data > > growing lot whenever more asset getting added. > > > > Id | Accts| assetid | asset_td > > 1| Acct1 | asset1 | 20 > > 2| Acct1 | asset2 | 30 > > 3| Acct2 | asset3 | 10 > > 4| Acct3 | asset2 | 10 > > > > So we planned to change as > > > > Id | Accts | asset_s > > 1 | Acct1 | [{"asset1": "20", "asset2":"30"}] > > 2 | Acct2 | [{"asset3": "10"}] > > 3 | Acct3 | [{"asset2": "10"}] > > > > But only draw back here is we have to parse the json to do the sum of the > > values, is there any other way to handle this scenario. > > > > Regards, > > Bala. > > > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey wrote: > > > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: > >> > Currently I am storing json object type of values in string field in > >> solr. > >> > Using this field, in the code I am parsing json objects and doing sum > of > >> > the values under it. > >> > > >> > In solr, do we have any option in doing it by default when using the > >> json > >> > object field values. > >> > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know > >> this. It has no idea that the data is JSON, and won't be able to do > >> anything special with the info contained there. > >> > >> Thanks, > >> Shawn > >> > >> >
Re: Json object values in solr string field
Any suggestions? Regards, Bala. On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel < bala.cit...@gmail.com> wrote: > Hi, > > Thanks for the reply, actually we are planning to optimize the huge volume > of data. > > For example, in our current system we have as below, so we can do facet > pivot or stats to get the sum of asset_td for each acct, but the data > growing lot whenever more asset getting added. > > Id | Accts| assetid | asset_td > 1| Acct1 | asset1 | 20 > 2| Acct1 | asset2 | 30 > 3| Acct2 | asset3 | 10 > 4| Acct3 | asset2 | 10 > > So we planned to change as > > Id | Accts | asset_s > 1 | Acct1 | [{"asset1": "20", "asset2":"30"}] > 2 | Acct2 | [{"asset3": "10"}] > 3 | Acct3 | [{"asset2": "10"}] > > But only draw back here is we have to parse the json to do the sum of the > values, is there any other way to handle this scenario. > > Regards, > Bala. > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey wrote: > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: >> > Currently I am storing json object type of values in string field in >> solr. >> > Using this field, in the code I am parsing json objects and doing sum of >> > the values under it. >> > >> > In solr, do we have any option in doing it by default when using the >> json >> > object field values. >> >> Even if you have JSON-formatted strings in Solr, Solr doesn't know >> this. It has no idea that the data is JSON, and won't be able to do >> anything special with the info contained there. >> >> Thanks, >> Shawn >> >>
Re: Json object values in solr string field
Hi, Thanks for the reply, actually we are planning to optimize the huge volume of data. For example, in our current system we have as below, so we can do facet pivot or stats to get the sum of asset_td for each acct, but the data growing lot whenever more asset getting added. Id | Accts| assetid | asset_td 1| Acct1 | asset1 | 20 2| Acct1 | asset2 | 30 3| Acct2 | asset3 | 10 4| Acct3 | asset2 | 10 So we planned to change as Id | Accts | asset_s 1 | Acct1 | [{"asset1": "20", "asset2":"30"}] 2 | Acct2 | [{"asset3": "10"}] 3 | Acct3 | [{"asset2": "10"}] But only draw back here is we have to parse the json to do the sum of the values, is there any other way to handle this scenario. Regards, Bala. On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey wrote: > On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: > > Currently I am storing json object type of values in string field in > solr. > > Using this field, in the code I am parsing json objects and doing sum of > > the values under it. > > > > In solr, do we have any option in doing it by default when using the json > > object field values. > > Even if you have JSON-formatted strings in Solr, Solr doesn't know > this. It has no idea that the data is JSON, and won't be able to do > anything special with the info contained there. > > Thanks, > Shawn > >
Re: Json object values in solr string field
On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote: Currently I am storing json object type of values in string field in solr. Using this field, in the code I am parsing json objects and doing sum of the values under it. In solr, do we have any option in doing it by default when using the json object field values. Even if you have JSON-formatted strings in Solr, Solr doesn't know this. It has no idea that the data is JSON, and won't be able to do anything special with the info contained there. Thanks, Shawn
Json object values in solr string field
Hi, Currently I am storing json object type of values in string field in solr. Using this field, in the code I am parsing json objects and doing sum of the values under it. In solr, do we have any option in doing it by default when using the json object field values. Regards, Bala.
Re: Highlighting is not working with docValues only String field
I have opened JIRA https://issues.apache.org/jira/browse/SOLR-12663 On Sat, Aug 11, 2018 at 8:59 PM Erick Erickson wrote: > I can see why it wouldn't and also why it could/should. I also wonder about > SortableTextField, perhaps mention that too. > > Seems worth a JIRA to me if there isn't one already > > On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran < > kramachand...@commvault.com> wrote: > > > We are using Solr 7.2.1, highlighting is not working with docValues only > > String field. > > > > Should I open a JIRA for this? > > > > Schema: > > > > id > > > >> required="true"/> > >> stored="true"/> > >> stored="false"/> > > > > > > > > Data: > > [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line > > 2"},{"id":3,"name":"Testing line 3"}] > > > > Query: > > > > > http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1 > > > > Response: > > {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing > line > > 1","name1":"Testing line 1"},{"id":"2","name":"Testing line > > 2","name1":"Testing line 2"},{"id":"3","name":"Testing line > > 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing > > line 1"]},"2":{"name":["Testing line > > 2"]},"3":{"name":["Testing line 3"]}}} > > > > > > With Thanks & Regards > > Karthik Ramachandran > > P Please don't print this e-mail unless you really need to > > > > ***Legal Disclaimer*** > > "This communication may contain confidential and privileged material for > > the > > sole use of the intended recipient. Any unauthorized review, use or > > distribution > > by others is strictly prohibited. If you have received the message by > > mistake, > > please advise the sender by reply email and delete the message. Thank > you." > > ** > > > -- With Thanks & Regards Karthik Ramachandran P Please don't print this e-mail unless you really need to
Re: Highlighting is not working with docValues only String field
I can see why it wouldn't and also why it could/should. I also wonder about SortableTextField, perhaps mention that too. Seems worth a JIRA to me if there isn't one already On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran < kramachand...@commvault.com> wrote: > We are using Solr 7.2.1, highlighting is not working with docValues only > String field. > > Should I open a JIRA for this? > > Schema: > > id > >required="true"/> >stored="true"/> >stored="false"/> > > > > Data: > [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line > 2"},{"id":3,"name":"Testing line 3"}] > > Query: > > http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1 > > Response: > {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line > 1","name1":"Testing line 1"},{"id":"2","name":"Testing line > 2","name1":"Testing line 2"},{"id":"3","name":"Testing line > 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing > line 1"]},"2":{"name":["Testing line > 2"]},"3":{"name":["Testing line 3"]}}} > > > With Thanks & Regards > Karthik Ramachandran > P Please don't print this e-mail unless you really need to > > ***Legal Disclaimer*** > "This communication may contain confidential and privileged material for > the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message by > mistake, > please advise the sender by reply email and delete the message. Thank you." > ** >
Highlighting is not working with docValues only String field
We are using Solr 7.2.1, highlighting is not working with docValues only String field. Should I open a JIRA for this? Schema: id Data: [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line 2"},{"id":3,"name":"Testing line 3"}] Query: http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1 Response: {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line 1","name1":"Testing line 1"},{"id":"2","name":"Testing line 2","name1":"Testing line 2"},{"id":"3","name":"Testing line 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing line 1"]},"2":{"name":["Testing line 2"]},"3":{"name":["Testing line 3"]}}} With Thanks & Regards Karthik Ramachandran P Please don't print this e-mail unless you really need to ***Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." **
Re: truncate string field type
suppose I want to search the "l(i|a)*on k(i|e)*ng". there is a space between two words. I want solr to retrieve the exact match that these two words or their other cases are adjacent. If I want to use text field type, each one of these words are considered as tokens, so solr may bring back other results too; However, we have strict costumers who only need exact matches if any result is available not more! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: truncate string field type
Are you sure Solr is the right tool for you? Regexp searches is the really last resort approach in the domain. I suggest that maybe you rethink your actual business case (share it here) to benefiy from tokenization or look if other tools are better. As it is, you are using a drill to hammer nails. Regards, Alex On Tue, Jul 10, 2018, 2:44 AM Zahra Aminolroaya, wrote: > Thanks Alexandre and Erick. Erick I want to use my regular expression to > search a field and Solr text field token the document, so the regular > expression result will not be valid. I want Solr not to token my doc, > although I will lose some terms using solr string. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: truncate string field type
Thanks Alexandre and Erick. Erick I want to use my regular expression to search a field and Solr text field token the document, so the regular expression result will not be valid. I want Solr not to token my doc, although I will lose some terms using solr string. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: truncate string field type
Why do you want to add such long strings to your index in the first place? There are almost useless for search, you want tokenized (text_general is a good place to start) if you want to search for words within the string. "The number of bytes limit" is 32K or so, right? What do you want to do with the data going in there? There may be good reasons, but I've seen confusion around strings in the past. Best, Erick On Sat, Jul 7, 2018 at 11:12 PM, Alexandre Rafalovitch wrote: > Did you look into UpdateRequestProcessors? > > There is a truncate one there. > > Regards, > Alex > > On Sun, Jul 8, 2018, 12:44 AM Zahra Aminolroaya, > wrote: > >> I want to truncate my string field type due to its number of bytes limit. I >> wrote the following in my schema: >> >> >> >> >> >> > prefixLength="32700"/> >> >> >> >> > prefixLength="32700"/> >> >> >> >> However, I found that StrField (string) does not support specifying an >> analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be >> more than 1000. >> >> I want to have the same application of string. Do you think it is >> reasonable >> to use "text_general" field type with solr.KeywordTokenizerFactory filter >> to have the same application? Do I lose any feature? >> >> If I use text_general, it is not needed to truncate. >> >> >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>
Re: truncate string field type
Did you look into UpdateRequestProcessors? There is a truncate one there. Regards, Alex On Sun, Jul 8, 2018, 12:44 AM Zahra Aminolroaya, wrote: > I want to truncate my string field type due to its number of bytes limit. I > wrote the following in my schema: > > > > > >prefixLength="32700"/> > > > >prefixLength="32700"/> > > > > However, I found that StrField (string) does not support specifying an > analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be > more than 1000. > > I want to have the same application of string. Do you think it is > reasonable > to use "text_general" field type with solr.KeywordTokenizerFactory filter > to have the same application? Do I lose any feature? > > If I use text_general, it is not needed to truncate. > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
truncate string field type
I want to truncate my string field type due to its number of bytes limit. I wrote the following in my schema: However, I found that StrField (string) does not support specifying an analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be more than 1000. I want to have the same application of string. Do you think it is reasonable to use "text_general" field type with solr.KeywordTokenizerFactory filter to have the same application? Do I lose any feature? If I use text_general, it is not needed to truncate. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet
I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this. I should be able to look into this shortly if no one else does. -Yonik On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley <ysee...@gmail.com> wrote: > Thanks for the complete info that allowed me to easily reproduce this! > The bug seems to extend beyond hll/unique... I tried min(string_s) and > got wonky results as well. > > -Yonik > > > On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> > wrote: >> Hello, >> >> I've encountered 2 issues while trying to apply unique()/hll() function to a >> string field inside a range facet: >> >> Results are incorrect for a single-valued string field. >> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. >> >> >> How to reproduce: >> >> Create a core based on the default configSet. >> Add several simple documents to the core, like these: >> >> [ >> { >> "id": "14790", >> "int_i": 2010, >> "date_dt": "2010-01-01T00:00:00Z", >> "string_s": "a", >> "string_ss": ["a", "b"] >> }, >> { >> "id": "12254", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "e", >> "string_ss": ["b", "c"] >> }, >> { >> "id": "12937", >> "int_i": 2008, >> "date_dt": "2008-01-01T00:00:00Z", >> "string_s": "c", >> "string_ss": ["c", "d"] >> }, >> { >> "id": "10575", >> "int_i": 2008, >> "date_dt": "2008-01-01T00:00:00Z", >> "string_s": "b", >> "string_ss": ["d", "e"] >> }, >> { >> "id": "13644", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "e", >> "string_ss": ["e", "a"] >> }, >> { >> "id": "8405", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "d", >> "string_ss": ["a", "b"] >> }, >> { >> "id": "6128", >> "int_i": 2008, >> "date_dt": "2008-01-01T00:00:00Z", >> "string_s": "a", >> "string_ss": ["b", "c"] >> }, >> { >> "id": "5220", >> "int_i": 2015, >> "date_dt": "2015-01-01T00:00:00Z", >> "string_s": "d", >> "string_ss": ["c", "d"] >> }, >> { >> "id": "6850", >> "int_i": 2012, >> "date_dt": "2012-01-01T00:00:00Z", >> "string_s": "b", >> "string_ss": ["d", "e"] >> }, >> { >> "id": "5748", >> "int_i": 2014, >> "date_dt": "2014-01-01T00:00:00Z", >> "string_s": "e", >> "string_ss": ["e", "a"] >> } >> ] >> >> 3. Try queries like the following for a single-valued string field: >> >> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)" >> >> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)" >> >> Distinct counts returned are incorrect in general. For example, for the set >> of documents above, the response will contain: >> >> { >> "val": 2010, >> "count": 1, >> "distinct_count": 0 >> } >> >> and >> >> "between": { >> "count": 10, >> "distinct_count": 1 >> } >> >> (there should be 5 distinct values). >> >> Note, the result depends on the order in which the documents are added. >> >> 4. Try queries like the following for a multi-valued string field: >> >> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)" >> >> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)" >> >> I’m getting ArrayIndexOutOfBoundsException for such queries. >> >> Note, everything looks Ok for other field types (I tried single- and >> multi-valued ints, doubles and dates) or when the enclosing facet is a terms >> facet or there is no enclosing facet at all. >> >> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and >> 5.x, as it seems, do not have such issues. >> >> Is it a bug? Or, may be, I’ve missed something? >> >> Thanks, >> >> Volodymyr >>
Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet
Thanks for the complete info that allowed me to easily reproduce this! The bug seems to extend beyond hll/unique... I tried min(string_s) and got wonky results as well. -Yonik On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> wrote: > Hello, > > I've encountered 2 issues while trying to apply unique()/hll() function to a > string field inside a range facet: > > Results are incorrect for a single-valued string field. > I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. > > > How to reproduce: > > Create a core based on the default configSet. > Add several simple documents to the core, like these: > > [ > { > "id": "14790", > "int_i": 2010, > "date_dt": "2010-01-01T00:00:00Z", > "string_s": "a", > "string_ss": ["a", "b"] > }, > { > "id": "12254", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "e", > "string_ss": ["b", "c"] > }, > { > "id": "12937", > "int_i": 2008, > "date_dt": "2008-01-01T00:00:00Z", > "string_s": "c", > "string_ss": ["c", "d"] > }, > { > "id": "10575", > "int_i": 2008, > "date_dt": "2008-01-01T00:00:00Z", > "string_s": "b", > "string_ss": ["d", "e"] > }, > { > "id": "13644", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "e", > "string_ss": ["e", "a"] > }, > { > "id": "8405", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "d", > "string_ss": ["a", "b"] > }, > { > "id": "6128", > "int_i": 2008, > "date_dt": "2008-01-01T00:00:00Z", > "string_s": "a", > "string_ss": ["b", "c"] > }, > { > "id": "5220", > "int_i": 2015, > "date_dt": "2015-01-01T00:00:00Z", > "string_s": "d", > "string_ss": ["c", "d"] > }, > { > "id": "6850", > "int_i": 2012, > "date_dt": "2012-01-01T00:00:00Z", > "string_s": "b", > "string_ss": ["d", "e"] > }, > { > "id": "5748", > "int_i": 2014, > "date_dt": "2014-01-01T00:00:00Z", > "string_s": "e", > "string_ss": ["e", "a"] > } > ] > > 3. Try queries like the following for a single-valued string field: > > q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)" > > q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)" > > Distinct counts returned are incorrect in general. For example, for the set > of documents above, the response will contain: > > { > "val": 2010, > "count": 1, > "distinct_count": 0 > } > > and > > "between": { > "count": 10, > "distinct_count": 1 > } > > (there should be 5 distinct values). > > Note, the result depends on the order in which the documents are added. > > 4. Try queries like the following for a multi-valued string field: > > q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)" > > q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)" > > I’m getting ArrayIndexOutOfBoundsException for such queries. > > Note, everything looks Ok for other field types (I tried single- and > multi-valued ints, doubles and dates) or when the enclosing facet is a terms > facet or there is no enclosing facet at all. > > I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and > 5.x, as it seems, do not have such issues. > > Is it a bug? Or, may be, I’ve missed something? > > Thanks, > > Volodymyr >
Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet
Hello, I've encountered 2 issues while trying to apply unique()/hll() function to a string field inside a range facet: 1. Results are incorrect for a single-valued string field. 2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. How to reproduce: 1. Create a core based on the default configSet. 2. Add several simple documents to the core, like these: [ { "id": "14790", "int_i": 2010, "date_dt": "2010-01-01T00:00:00Z", "string_s": "a", "string_ss": ["a", "b"] }, { "id": "12254", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "e", "string_ss": ["b", "c"] }, { "id": "12937", "int_i": 2008, "date_dt": "2008-01-01T00:00:00Z", "string_s": "c", "string_ss": ["c", "d"] }, { "id": "10575", "int_i": 2008, "date_dt": "2008-01-01T00:00:00Z", "string_s": "b", "string_ss": ["d", "e"] }, { "id": "13644", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "e", "string_ss": ["e", "a"] }, { "id": "8405", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "d", "string_ss": ["a", "b"] }, { "id": "6128", "int_i": 2008, "date_dt": "2008-01-01T00:00:00Z", "string_s": "a", "string_ss": ["b", "c"] }, { "id": "5220", "int_i": 2015, "date_dt": "2015-01-01T00:00:00Z", "string_s": "d", "string_ss": ["c", "d"] }, { "id": "6850", "int_i": 2012, "date_dt": "2012-01-01T00:00:00Z", "string_s": "b", "string_ss": ["d", "e"] }, { "id": "5748", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "e", "string_ss": ["e", "a"] } ] 3. Try queries like the following for a single-valued string field: q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)" q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)" Distinct counts returned are incorrect in general. For example, for the set of documents above, the response will contain: { "val": 2010, "count": 1, "distinct_count": 0 } and "between": { "count": 10, "distinct_count": 1 } (there should be 5 distinct values). Note, the result depends on the order in which the documents are added. 4. Try queries like the following for a multi-valued string field: q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)" q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)" I’m getting ArrayIndexOutOfBoundsException for such queries. Note, everything looks Ok for other field types (I tried single- and multi-valued ints, doubles and dates) or when the enclosing facet is a terms facet or there is no enclosing facet at all. I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and 5.x, as it seems, do not have such issues. Is it a bug? Or, may be, I’ve missed something? Thanks, Volodymyr q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)" docs_1-10.json Description: application/json q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"
Re: Making a String field case-insensitive
Hi Emir, Thanks for your advice. This works. Regards, Edwin On 1 November 2017 at 18:08, Emir Arnautovićwrote: > Hi, > You can use KeywordTokenizer and LowerCaseTokenFilterFactory. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 1 Nov 2017, at 09:50, Zheng Lin Edwin Yeo > wrote: > > > > Hi, > > > > Would like to find out, what is the best way to lower-case a String index > > in Solr, to make it case insensitive, while preserving the structure of > the > > string (ie It should not break into different tokens at space, and should > > not remove any characters or symbols) > > > > I found that solr.StrField does not use lower case filter. But if I > change > > it to solr.TextField and uses Standard Tokenizer, the fields get broken > up. > > > > Eg: > > > > For this configuration, > > > > > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > > > > > > > > > > > > > > > > > > > > The string "*SYStem 500 **" gets broken down into this > > > > *system | 500* > > > > The system and 500 are separated into 2 tokens, which is not what we > want. > > Also, the * is being removed. > > > > > > We will like to have something like this. This will preserve what it is > as > > a string but just lowercase it. > > > > *system 500 ** > >
Re: Making a String field case-insensitive
Hi, You can use KeywordTokenizer and LowerCaseTokenFilterFactory. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 1 Nov 2017, at 09:50, Zheng Lin Edwin Yeowrote: > > Hi, > > Would like to find out, what is the best way to lower-case a String index > in Solr, to make it case insensitive, while preserving the structure of the > string (ie It should not break into different tokens at space, and should > not remove any characters or symbols) > > I found that solr.StrField does not use lower case filter. But if I change > it to solr.TextField and uses Standard Tokenizer, the fields get broken up. > > Eg: > > For this configuration, > > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > > > > > > > > > The string "*SYStem 500 **" gets broken down into this > > *system | 500* > > The system and 500 are separated into 2 tokens, which is not what we want. > Also, the * is being removed. > > > We will like to have something like this. This will preserve what it is as > a string but just lowercase it. > > *system 500 **
Making a String field case-insensitive
Hi, Would like to find out, what is the best way to lower-case a String index in Solr, to make it case insensitive, while preserving the structure of the string (ie It should not break into different tokens at space, and should not remove any characters or symbols) I found that solr.StrField does not use lower case filter. But if I change it to solr.TextField and uses Standard Tokenizer, the fields get broken up. Eg: For this configuration, The string "*SYStem 500 **" gets broken down into this *system | 500* The system and 500 are separated into 2 tokens, which is not what we want. Also, the * is being removed. We will like to have something like this. This will preserve what it is as a string but just lowercase it. *system 500 **
Re: AW: AW: FacetField-Result on String-Field contains value with count 0?
On 1/13/2017 7:36 AM, Sebastian Riemer wrote: > Thanks, that's actually where I come from. But I don't want to exclude values > leading to a count of zero. > > Background to this: A user searched for mediaType "book" which gave him 10 > results. Now some other task/routine whatever changes all those 10 books to > be say 10 ebooks, because the type has been incorrect. The user makes a > refresh, still looking for "book" gets 0 results (which is expected) and > because we rule out facet.fields having count 0, I don't get back the > selected mediaType "book" and thus I cannot select this value in the > select-dropdown-filter for the mediaType. This leads to confusion for the > user, since he has no results, but doesn't see that it's because of he still > has that mediaType-filter set to a value "books" which now actually leads to > 0 results. Some users are always going to be confused in one way or another when something behaves in a way that's contrary to their expectations. If you plan your interface correctly, you can eliminate the biggest sources of confusion ... but there's an applicable saying here: You can never make things idiot-proof. There's always a better idiot. The facet.mincount parameter is the way to deal with this problem, as Bill Bell already mentioned. One of the reasons that facet.mincount exists is to remove terms that have no documents, but still exist in the index. If the q parameter was an actual query instead of "all docs" and the request didn't have facet.mincount, then the facet for that field would still have thirteen entries, many of which might be zero. Thanks, Shawn
AW: AW: FacetField-Result on String-Field contains value with count 0?
Thanks @Toke, for pointing out these options. I'll have a read about expungeDeletes. Sounds even more so, that having solr filter out 0-counts is a good idea and I should handle my use-case outside of solr. Thanks again, Sebastian On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote: > the second search should have been this: http://localhost:8983/solr/w > emi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0 > =json > (or in other words, give me all documents having value "1" for field > "m_mediaType_s") > > Since this search gives zero results, why is it included in the > facet.fields result-count list? Qualified guess (I don't know the JSON faceting code in details): The list of possible facet values is extracted from the DocValues structure in the segment files, without respect to documents marked as deleted. At some point you had one or more documents with m_mediaType_s:1, which were later deleted. If your index is not too large, you can verify this by optimizing down to 1 segment, which will remove all traces of deleted documents (unless the index is already 1 segment). If you cannot live with the false terms, committing with expungeDeletes=true should do the trick, although it is likely to make your indexing process a lot heavier. The reason for this inaccuracy is that it is quite heavy to verify whether a docvalue is referenced by a document: Each time one or more documents in a segment are deleted, all references from all documents in that segment would have to be checked to create a correct mapping. As this only affects mincount=0 combined with your use case where _all_ documents with a certain docvalue are deleted, my guess it that it is seen as too much of an edge case to handle. -- Toke Eskildsen, Royal Danish Library
AW: FacetField-Result on String-Field contains value with count 0?
Nice, thank you very much for your explanation! >> Solr returns all fields as facet result where there was some value at some time as long as the the documents are somewhere in the index, even when they're marked as indexed. So there must have been a document with m_mediaType_s=1. Even if all these documents are deleted already, its values still appear in the facet result. I did not know about that! That makes perfect sense. I am quite sure there has been a time where that field contained the value "1". Even more, as now where I rebuild my index, the value "1" is not present as facet.field result anymore. I'll think about how to deal with my situation then, maybe it would be better to keep solr filtering out 0-count facet-fields and insert the filterquery leading to 0 results into the select-dropdown "manually". -Ursprüngliche Nachricht- Von: Michael Kuhlmann [mailto:k...@solr.info] Gesendet: Freitag, 13. Januar 2017 15:43 An: solr-user@lucene.apache.org Betreff: Re: FacetField-Result on String-Field contains value with count 0? Then I don't understand your problem. Solr already does exactly what you want. Maybe the problem is different: I assume that there never was a value of "1" in the index, leading to your confusion. Solr returns all fields as facet result where there was some value at some time as long as the the documents are somewhere in the index, even when they're marked as indexed. So there must have been a document with m_mediaType_s=1. Even if all these documents are deleted already, its values still appear in the facet result. This holds true until segments get merged so that all deleted documents are pruned. So if you send a forceMerge request, chances are good that "1" won't come up any more. -Michael Am 13.01.2017 um 15:36 schrieb Sebastian Riemer: > Hi Bill, > > Thanks, that's actually where I come from. But I don't want to exclude values > leading to a count of zero. > > Background to this: A user searched for mediaType "book" which gave him 10 > results. Now some other task/routine whatever changes all those 10 books to > be say 10 ebooks, because the type has been incorrect. The user makes a > refresh, still looking for "book" gets 0 results (which is expected) and > because we rule out facet.fields having count 0, I don't get back the > selected mediaType "book" and thus I cannot select this value in the > select-dropdown-filter for the mediaType. This leads to confusion for the > user, since he has no results, but doesn't see that it's because of he still > has that mediaType-filter set to a value "books" which now actually leads to > 0 results. > > -Ursprüngliche Nachricht- > Von: billnb...@gmail.com [mailto:billnb...@gmail.com] > Gesendet: Freitag, 13. Januar 2017 15:23 > An: solr-user@lucene.apache.org > Betreff: Re: AW: FacetField-Result on String-Field contains value with count > 0? > > Set mincount to 1 > > Bill Bell > Sent from mobile > > >> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote: >> >> Pardon me, >> the second search should have been this: >> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22 >> t =on=*:*=0=0=json (or in other words, give me all >> documents having value "1" for field "m_mediaType_s") >> >> Since this search gives zero results, why is it included in the facet.fields >> result-count list? >> >> >> >> Hi, >> >> Please help me understand: >> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json >> returns: >> >> "facet_counts":{ >>"facet_queries":{}, >>"facet_fields":{ >> "m_mediaType_s":[ >>"2",25561, >>"3",19027, >>"10",1966, >>"11",1705, >>"12",1067, >>"4",1056, >>"5",291, >>"8",68, >>"13",2, >>"6",2, >>"7",1, >>"9",1, >>"1",0]}, >>"facet_ranges":{}, >>"facet_intervals":{}, >>"facet_heatmaps":{}}} >> >> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22 >> t >> =on=*:*=0=0=json >> >> >> ? "response":{"numFound":25561,"start":0,"docs":[] >> >> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22 >> t >> =on=*:*=0=0=json >> >> >> ? "response":{"numFound":0,"start":0,"docs":[] >> >> So why does the search for facet.field even contain the value "1", if it >> does not exist? >> >> And why does it e.g. not contain >> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsInclude >> I tInTheFacetFieldsResultListAnywaysWithCountZero" : 0 >> >> Best regards, >> Sebastian >> >> Additional info, field m_mediaType_s is a string; >> > stored="true" /> >> > /> >>
Re: AW: FacetField-Result on String-Field contains value with count 0?
On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote: > the second search should have been this: http://localhost:8983/solr/w > emi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0 > =json > (or in other words, give me all documents having value "1" for field > "m_mediaType_s") > > Since this search gives zero results, why is it included in the > facet.fields result-count list? Qualified guess (I don't know the JSON faceting code in details): The list of possible facet values is extracted from the DocValues structure in the segment files, without respect to documents marked as deleted. At some point you had one or more documents with m_mediaType_s:1, which were later deleted. If your index is not too large, you can verify this by optimizing down to 1 segment, which will remove all traces of deleted documents (unless the index is already 1 segment). If you cannot live with the false terms, committing with expungeDeletes=true should do the trick, although it is likely to make your indexing process a lot heavier. The reason for this inaccuracy is that it is quite heavy to verify whether a docvalue is referenced by a document: Each time one or more documents in a segment are deleted, all references from all documents in that segment would have to be checked to create a correct mapping. As this only affects mincount=0 combined with your use case where _all_ documents with a certain docvalue are deleted, my guess it that it is seen as too much of an edge case to handle. -- Toke Eskildsen, Royal Danish Library
Re: FacetField-Result on String-Field contains value with count 0?
Then I don't understand your problem. Solr already does exactly what you want. Maybe the problem is different: I assume that there never was a value of "1" in the index, leading to your confusion. Solr returns all fields as facet result where there was some value at some time as long as the the documents are somewhere in the index, even when they're marked as indexed. So there must have been a document with m_mediaType_s=1. Even if all these documents are deleted already, its values still appear in the facet result. This holds true until segments get merged so that all deleted documents are pruned. So if you send a forceMerge request, chances are good that "1" won't come up any more. -Michael Am 13.01.2017 um 15:36 schrieb Sebastian Riemer: > Hi Bill, > > Thanks, that's actually where I come from. But I don't want to exclude values > leading to a count of zero. > > Background to this: A user searched for mediaType "book" which gave him 10 > results. Now some other task/routine whatever changes all those 10 books to > be say 10 ebooks, because the type has been incorrect. The user makes a > refresh, still looking for "book" gets 0 results (which is expected) and > because we rule out facet.fields having count 0, I don't get back the > selected mediaType "book" and thus I cannot select this value in the > select-dropdown-filter for the mediaType. This leads to confusion for the > user, since he has no results, but doesn't see that it's because of he still > has that mediaType-filter set to a value "books" which now actually leads to > 0 results. > > -Ursprüngliche Nachricht- > Von: billnb...@gmail.com [mailto:billnb...@gmail.com] > Gesendet: Freitag, 13. Januar 2017 15:23 > An: solr-user@lucene.apache.org > Betreff: Re: AW: FacetField-Result on String-Field contains value with count > 0? > > Set mincount to 1 > > Bill Bell > Sent from mobile > > >> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote: >> >> Pardon me, >> the second search should have been this: >> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22 >> =on=*:*=0=0=json (or in other words, give me all >> documents having value "1" for field "m_mediaType_s") >> >> Since this search gives zero results, why is it included in the facet.fields >> result-count list? >> >> >> >> Hi, >> >> Please help me understand: >> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json >> returns: >> >> "facet_counts":{ >>"facet_queries":{}, >>"facet_fields":{ >> "m_mediaType_s":[ >>"2",25561, >>"3",19027, >>"10",1966, >>"11",1705, >>"12",1067, >>"4",1056, >>"5",291, >>"8",68, >>"13",2, >>"6",2, >>"7",1, >>"9",1, >>"1",0]}, >>"facet_ranges":{}, >>"facet_intervals":{}, >>"facet_heatmaps":{}}} >> >> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22 >> =on=*:*=0=0=json >> >> >> ? "response":{"numFound":25561,"start":0,"docs":[] >> >> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22 >> =on=*:*=0=0=json >> >> >> ? "response":{"numFound":0,"start":0,"docs":[] >> >> So why does the search for facet.field even contain the value "1", if it >> does not exist? >> >> And why does it e.g. not contain >> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI >> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0 >> >> Best regards, >> Sebastian >> >> Additional info, field m_mediaType_s is a string; >> > stored="true" /> >> > /> >>
AW: AW: FacetField-Result on String-Field contains value with count 0?
Hi Bill, Thanks, that's actually where I come from. But I don't want to exclude values leading to a count of zero. Background to this: A user searched for mediaType "book" which gave him 10 results. Now some other task/routine whatever changes all those 10 books to be say 10 ebooks, because the type has been incorrect. The user makes a refresh, still looking for "book" gets 0 results (which is expected) and because we rule out facet.fields having count 0, I don't get back the selected mediaType "book" and thus I cannot select this value in the select-dropdown-filter for the mediaType. This leads to confusion for the user, since he has no results, but doesn't see that it's because of he still has that mediaType-filter set to a value "books" which now actually leads to 0 results. -Ursprüngliche Nachricht- Von: billnb...@gmail.com [mailto:billnb...@gmail.com] Gesendet: Freitag, 13. Januar 2017 15:23 An: solr-user@lucene.apache.org Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0? Set mincount to 1 Bill Bell Sent from mobile > On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote: > > Pardon me, > the second search should have been this: > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22 > =on=*:*=0=0=json (or in other words, give me all > documents having value "1" for field "m_mediaType_s") > > Since this search gives zero results, why is it included in the facet.fields > result-count list? > > > > Hi, > > Please help me understand: > http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json > returns: > > "facet_counts":{ >"facet_queries":{}, >"facet_fields":{ > "m_mediaType_s":[ >"2",25561, >"3",19027, >"10",1966, >"11",1705, >"12",1067, >"4",1056, >"5",291, >"8",68, >"13",2, >"6",2, >"7",1, >"9",1, >"1",0]}, >"facet_ranges":{}, >"facet_intervals":{}, >"facet_heatmaps":{}}} > > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22 > =on=*:*=0=0=json > > > ? "response":{"numFound":25561,"start":0,"docs":[] > > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22 > =on=*:*=0=0=json > > > ? "response":{"numFound":0,"start":0,"docs":[] > > So why does the search for facet.field even contain the value "1", if it does > not exist? > > And why does it e.g. not contain > "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI > tInTheFacetFieldsResultListAnywaysWithCountZero" : 0 > > Best regards, > Sebastian > > Additional info, field m_mediaType_s is a string; > stored="true" /> > /> >
Re: AW: FacetField-Result on String-Field contains value with count 0?
Set mincount to 1 Bill Bell Sent from mobile > On Jan 13, 2017, at 7:19 AM, Sebastian Riemerwrote: > > Pardon me, > the second search should have been this: > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json > > (or in other words, give me all documents having value "1" for field > "m_mediaType_s") > > Since this search gives zero results, why is it included in the facet.fields > result-count list? > > > > Hi, > > Please help me understand: > http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json > returns: > > "facet_counts":{ >"facet_queries":{}, >"facet_fields":{ > "m_mediaType_s":[ >"2",25561, >"3",19027, >"10",1966, >"11",1705, >"12",1067, >"4",1056, >"5",291, >"8",68, >"13",2, >"6",2, >"7",1, >"9",1, >"1",0]}, >"facet_ranges":{}, >"facet_intervals":{}, >"facet_heatmaps":{}}} > > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json > > > ? "response":{"numFound":25561,"start":0,"docs":[] > > http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json > > > ? "response":{"numFound":0,"start":0,"docs":[] > > So why does the search for facet.field even contain the value "1", if it does > not exist? > > And why does it e.g. not contain > "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero" > : 0 > > Best regards, > Sebastian > > Additional info, field m_mediaType_s is a string; > stored="true" /> > >
AW: FacetField-Result on String-Field contains value with count 0?
Pardon me, the second search should have been this: http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json (or in other words, give me all documents having value "1" for field "m_mediaType_s") Since this search gives zero results, why is it included in the facet.fields result-count list? Hi, Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json returns: "facet_counts":{ "facet_queries":{}, "facet_fields":{ "m_mediaType_s":[ "2",25561, "3",19027, "10",1966, "11",1705, "12",1067, "4",1056, "5",291, "8",68, "13",2, "6",2, "7",1, "9",1, "1",0]}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}} http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json ? "response":{"numFound":25561,"start":0,"docs":[] http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json ? "response":{"numFound":0,"start":0,"docs":[] So why does the search for facet.field even contain the value "1", if it does not exist? And why does it e.g. not contain "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero" : 0 Best regards, Sebastian Additional info, field m_mediaType_s is a string;
FacetField-Result on String-Field contains value with count 0?
Hi, Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json returns: "facet_counts":{ "facet_queries":{}, "facet_fields":{ "m_mediaType_s":[ "2",25561, "3",19027, "10",1966, "11",1705, "12",1067, "4",1056, "5",291, "8",68, "13",2, "6",2, "7",1, "9",1, "1",0]}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}} http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json ? "response":{"numFound":25561,"start":0,"docs":[] http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json ? "response":{"numFound":0,"start":0,"docs":[] So why does the search for facet.field even contain the value "1", if it does not exist? And why does it e.g. not contain "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero" : 0 Best regards, Sebastian Additional info, field m_mediaType_s is a string;
Index Size in String Field vs Text Field
Hi, Would like to check, will the index size for fields which has been defined as String be generally smaller than fields which has been defined as a Text Field (Eg: KeywordTokenizerFactory)? Assuming that both of them contains the same value in the fields, and there is no additional filters for KeywordTokenizerFactory. I'm using Solr 6.2.0 Regards, Edwin
Re: Sub faceting on string field using json facet runs extremly slow
Can somebody confirm whether the jira SOLR-8096 will affect json facet also as I see sub faceting using term facet on string field is ruuning 5x slower than on integer field for same number of hits and unique terms. On 17-May-2016 3:33 pm, "Vijay Tiwary" <vijaykr.tiw...@gmail.com> wrote: > Below is the request > >q=*:*=0=0={ > > "customer_id": { > > type": "terms", > > "limit": -1, > > "field": "cid_ti", > > "mincount": 1, > > "facet": { > > "contact_s": { > > "type": > "terms", > > "limit": 1, > > "field": > "contact_s", > > > "mincount": 1 > > } > > > > } > > } > > }=age_td:[25 TO 50] > > > > > > > > > > On 17-May-2016 2:20 pm, "chandan khatri" <chandankhat...@gmail.com> wrote: > >> Can you please share the query for sub faceting? >> >> On Tue, May 17, 2016 at 2:13 PM, Vijay Tiwary <vijaykr.tiw...@gmail.com> >> wrote: >> >>> Hello all, >>> I have an index of 8 shards having 1 replica each distubuted across 8 >>> node >>> solr cloud . Size of index is 300 gb having 30 million documents. Solr >>> json >>> facet runs extremly slow if I am sub faceting on string field even if >>> tnumfound is only around 2 (also I am not returning any rows i.e >>> rows=0). >>> Is there any way to improve the performance? >>> >>> Thanks, >>> Vijay >>> >> >>
Sub faceting on string field using json facet runs extremly slow
Hello all, I have an index of 8 shards having 1 replica each distubuted across 8 node solr cloud . Size of index is 300 gb having 30 million documents. Solr json facet runs extremly slow if I am sub faceting on string field even if tnumfound is only around 2 (also I am not returning any rows i.e rows=0). Is there any way to improve the performance? Thanks, Vijay
Solr541 Carriage Return Stripped Off In String Field ?
Hello. I have a question regarding to "string" type field. [ Symptom ] When a string value including carriage return line feed (\r\n) and passed that over to a string field, it is stored, however, when I query that document and see the value of the field, carriage return is stripped off away. [ Question ] Is this a supposed behavior ? [ Environment ] Apache Solr 5.4.1 Document added via its SolrJ [ How To Reproduce ] (1) Download Apache Solr 5.4.1 (2) Create a core , "test" (3) Prepare two fields, "id" and "field20" Assign the following attributes to those fields ; - type="string" indexed="true" stored="true" required="true" multiValued="false" (4) Start up the Solr and from AdminGUI, make sure that everything is working and no error coming up, and confirm that the defined two fields are available. (5) Make a tiny test program using SolrJ, to test a document insert, and to query against it. Jar files used ; - apache-solr-solrj-5.4.0.jar - apache-solr-core-5.4.0.jar - commons-codec-1.9.jar - httpclient-4.5.1.jar - commons-io-2.4.jar - slf4j-api-1.7.13.jar - jcl-over-slf4j-1.7.14.jar - slf4j-jdk14-1.7.14.jar (6) Insert a document where the value of field20 given as "ABC\r\nDEF" (7) When I query that document, from both AdminGUI and SolrJ, I see the value retrieved as "ABD\nDEF" , where "\r" is stripped off. [ Test Code ] package solrtest ; public class SolrTest { public static void main(String[] args) throws IOException,SolrServerException { String url = "http://localhost:8983/solr/test; ; HttpSolrServer server = new HttpSolrServer(url) ; server.setParser(new XMLResponseParser()) ; String mydata = "ABC\r\nDEF" ; byte[] asciiCodes = mydata.getBytes("US-ASCII") ; System.out.println (asciiCodes[3] + " , " + asciiCodes[4]) ; SolrInputDocument mydoc = new SolrInputDocument() ; mydoc.addField ( "id" , "98765" , 1.0f ) ; mydoc.addField ( "field20" , mydata , 1.0f ) ; Collection docs = new ArrayList() ; docs.add ( mydoc ) ; server.add ( docs ) ; server.commit () ; SolrQuery myquery = new SolrQuery() ; myquery.setQuery (" id:98765" ) ; QueryResponse rsp = server.query(myquery) ; SolrDocumentList hits = rsp.getResults() ; String target = "" ; int pos = 0 ; while ( pos < hits.getNumFound() ) { ListIterator docloop = hits.listIterator() ; while ( docloop.hasNext() ) { pos++ ; SolrDocument hitdoc = docloop.next() ; Map<String, Collection> fieldvalues = hitdoc.getFieldValuesMap() ; Iterator fieldnames = hitdoc.getFieldNames().iterator() ; while ( fieldnames.hasNext() ) { String fieldname = fieldnames.next() ; Collection cellvalues = fieldvalues.get(fieldname) ; Iterator valueloop = cellvalues.iterator() ; while ( valueloop.hasNext() ) { Object cellobj = valueloop.next() ; String cellvalue = cellobj.toString() ; if ( fieldname.equals("field20") ) { target = cellvalue ; } } } } } asciiCodes = target.getBytes("US-ASCII") ; for ( int i=0 ; i < target.length() ; i++ ) { System.out.print ( asciiCodes[i] + " " ) ; } System.out.println ("\r\n") ; server.close() ; } } -- Thank you in advance. Yuichiro Kosila , Tokyo/Japan
RE: How to convert string field to date
Thanks steve. Workaround 2 is working fine. Thanks again. --sreenivasa kallu -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Thursday, January 28, 2016 6:03 PM To: solr-user@lucene.apache.org Subject: Re: How to convert string field to date Try workaround 2, I did and it worked for me. See my comment on the issue: <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607-3FfocusedCommentId-3D15122751-26page-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-23comment-2D15122751=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=YvKSGXdvGRaysNwzHzvAmlBnY6iorT9wVevdTbUPjbQ=ryXl7Qzxnej4YdkT8uiP1iNipk3zqQycBuewsOMqFjs= > -- Steve www.lucidworks.com > On Jan 28, 2016, at 6:45 PM, Kallu, Sreenivasa (HQP) > <sreenivasa.ka...@roberthalf.com> wrote: > > Thanks steve for prompt response. > > I tried workaround one. > i.e. 1. Add attr_date via add-dynamic-field instead of add-field > (even though the name has no asterisk) > > I am able to add dynamic field attr_date. But while starting the solr , I am > getting following message. > Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should > have either a leading or a trailing asterisk, and no others. > > So solr looking for either leading * or trailing * in the dynamic field name. > > I can see similar problems in workaround 2. > > Any other suggestions? > > Advanced Thanks. > --sreenivasa kallu > > -Original Message- > From: Steve Rowe [mailto:sar...@gmail.com] > Sent: Thursday, January 28, 2016 1:17 PM > To: solr-user@lucene.apache.org > Subject: Re: How to convert string field to date > > Hi Sreenivasa, > > This is a known bug: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > _jira_browse_SOLR-2D8607=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRA > lAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=ZJBCYIV-H5H3 > u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH > 1_ljJkk= > > (though the problem is not just about catch-all fields as the issue > currently indicates - all dynamic fields are affected) > > Two workarounds (neither tested): > > 1. Add attr_date via add-dynamic-field instead of add-field (even though the > name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then > add attr_* back; these can be done with a single request. > > I’ll update SOLR_8607 to reflect these things. > > -- > Steve > www.lucidworks.com > >> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) >> <sreenivasa.ka...@roberthalf.com> wrote: >> >> Hi, >> I am new to solr. >> >> I am using managed-schema. I am not using schema.xml. I am indexing outlook >> email messages. >> I can see only see three fields ( id,_version_,_text_) defined in >> managed-schema. Remaining fields are handled by following dynamic >> field > stored="true" multiValued="true"/> >> >> I have field name attr_date with type string. I want convert this >> field type to date. Currently date range is not working on this field. >> I tried schema API to add new field attr_date and got following error >> message "Field 'attr_date' already exists". I tried to replace field type >> to date and got following error message "The field 'attr_date' is not >> present in this schema, and so cannot be replaced". >> >> Please help me to convert "attr_date" field type to date. >> >> Advanced Thanks. >> --sreenivasa kallu >> >> >
How to convert string field to date
Hi, I am new to solr. I am using managed-schema. I am not using schema.xml. I am indexing outlook email messages. I can see only see three fields ( id,_version_,_text_) defined in managed-schema. Remaining fields are handled by following dynamic field I have field name attr_date with type string. I want convert this field type to date. Currently date range is not working on this field. I tried schema API to add new field attr_date and got following error message "Field 'attr_date' already exists". I tried to replace field type to date and got following error message "The field 'attr_date' is not present in this schema, and so cannot be replaced". Please help me to convert "attr_date" field type to date. Advanced Thanks. --sreenivasa kallu
Re: How to convert string field to date
Hi Sreenivasa, This is a known bug: https://issues.apache.org/jira/browse/SOLR-8607 (though the problem is not just about catch-all fields as the issue currently indicates - all dynamic fields are affected) Two workarounds (neither tested): 1. Add attr_date via add-dynamic-field instead of add-field (even though the name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then add attr_* back; these can be done with a single request. I’ll update SOLR_8607 to reflect these things. -- Steve www.lucidworks.com > On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) >wrote: > > Hi, > I am new to solr. > > I am using managed-schema. I am not using schema.xml. I am indexing outlook > email messages. > I can see only see three fields ( id,_version_,_text_) defined in > managed-schema. Remaining fields are > handled by following dynamic field > multiValued="true"/> > > I have field name attr_date with type string. I want convert this field type > to date. Currently date range is not > working on this field. I tried schema API to add new field attr_date and got > following error message > "Field 'attr_date' already exists". I tried to replace field type to date > and got following error message > "The field 'attr_date' is not present in this schema, and so cannot be > replaced". > > Please help me to convert "attr_date" field type to date. > > Advanced Thanks. > --sreenivasa kallu > >
RE: How to convert string field to date
Thanks steve for prompt response. I tried workaround one. i.e. 1. Add attr_date via add-dynamic-field instead of add-field (even though the name has no asterisk) I am able to add dynamic field attr_date. But while starting the solr , I am getting following message. Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should have either a leading or a trailing asterisk, and no others. So solr looking for either leading * or trailing * in the dynamic field name. I can see similar problems in workaround 2. Any other suggestions? Advanced Thanks. --sreenivasa kallu -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Thursday, January 28, 2016 1:17 PM To: solr-user@lucene.apache.org Subject: Re: How to convert string field to date Hi Sreenivasa, This is a known bug: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=ZJBCYIV-H5H3u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH1_ljJkk= (though the problem is not just about catch-all fields as the issue currently indicates - all dynamic fields are affected) Two workarounds (neither tested): 1. Add attr_date via add-dynamic-field instead of add-field (even though the name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then add attr_* back; these can be done with a single request. I’ll update SOLR_8607 to reflect these things. -- Steve www.lucidworks.com > On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) > <sreenivasa.ka...@roberthalf.com> wrote: > > Hi, > I am new to solr. > > I am using managed-schema. I am not using schema.xml. I am indexing outlook > email messages. > I can see only see three fields ( id,_version_,_text_) defined in > managed-schema. Remaining fields are handled by following dynamic > field stored="true" multiValued="true"/> > > I have field name attr_date with type string. I want convert this > field type to date. Currently date range is not working on this field. > I tried schema API to add new field attr_date and got following error > message "Field 'attr_date' already exists". I tried to replace field type to > date and got following error message "The field 'attr_date' is not present in > this schema, and so cannot be replaced". > > Please help me to convert "attr_date" field type to date. > > Advanced Thanks. > --sreenivasa kallu > >
Re: How to convert string field to date
Try workaround 2, I did and it worked for me. See my comment on the issue: <https://issues.apache.org/jira/browse/SOLR-8607?focusedCommentId=15122751=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15122751> -- Steve www.lucidworks.com > On Jan 28, 2016, at 6:45 PM, Kallu, Sreenivasa (HQP) > <sreenivasa.ka...@roberthalf.com> wrote: > > Thanks steve for prompt response. > > I tried workaround one. > i.e. 1. Add attr_date via add-dynamic-field instead of add-field (even > though the name has no asterisk) > > I am able to add dynamic field attr_date. But while starting the solr , I am > getting following message. > Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should > have either a leading or a trailing asterisk, and no others. > > So solr looking for either leading * or trailing * in the dynamic field name. > > I can see similar problems in workaround 2. > > Any other suggestions? > > Advanced Thanks. > --sreenivasa kallu > > -Original Message- > From: Steve Rowe [mailto:sar...@gmail.com] > Sent: Thursday, January 28, 2016 1:17 PM > To: solr-user@lucene.apache.org > Subject: Re: How to convert string field to date > > Hi Sreenivasa, > > This is a known bug: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=ZJBCYIV-H5H3u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH1_ljJkk= > > > (though the problem is not just about catch-all fields as the issue currently > indicates - all dynamic fields are affected) > > Two workarounds (neither tested): > > 1. Add attr_date via add-dynamic-field instead of add-field (even though the > name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then > add attr_* back; these can be done with a single request. > > I’ll update SOLR_8607 to reflect these things. > > -- > Steve > www.lucidworks.com > >> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) >> <sreenivasa.ka...@roberthalf.com> wrote: >> >> Hi, >> I am new to solr. >> >> I am using managed-schema. I am not using schema.xml. I am indexing outlook >> email messages. >> I can see only see three fields ( id,_version_,_text_) defined in >> managed-schema. Remaining fields are handled by following dynamic >> field > stored="true" multiValued="true"/> >> >> I have field name attr_date with type string. I want convert this >> field type to date. Currently date range is not working on this field. >> I tried schema API to add new field attr_date and got following error >> message "Field 'attr_date' already exists". I tried to replace field type >> to date and got following error message "The field 'attr_date' is not >> present in this schema, and so cannot be replaced". >> >> Please help me to convert "attr_date" field type to date. >> >> Advanced Thanks. >> --sreenivasa kallu >> >> >
Re: How to perform phonetic matching/query for multivalued string field
That is, use a TextField plus a KeywordTokenizerFactory, rather than a StringField On Wed, Sep 16, 2015, at 09:03 PM, Upayavira wrote: > If you want to analyse a string field, use the KeywordTokenizer - it > just passes the whole field through as a single tokenizer. > > Does that get you there? > > On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote: > > I understand that i can configure "solr.PhoneticFilterFactory" for both > > indexing and query time for "solr.TextField". However, i want to query a > > list of term (indexed and stored) from a field ordered by phonetic > > similarity, which can be easily done by most of relational database. > > > > Term Component allows me to perform exactly matching and regex based > > fuzzy > > matching from multi-valued field. However, the solr string field does not > > allow to customise the default analyser. Is there any other way to > > circumvent the problem? > > > > thanks, > > Jerry > > > > > > > > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote: > > > > Hi, > > > > > > > > > > > > I want to query a list of terms indexed and stored in multivalued string > > > > field via Term Component. The term component can support exact matching > > > > and > > > > regex based fuzzy matching. However, Is any way i can configure scheme > > > > to > > > > do phonetic matching/query? > > > > > > Phonetic matching is done at index time - that is - you use a > > > PhoneticFilterFactory in your analysis chain, such that you are doing > > > exact match lookups on the phonetic terms. > > > > > > Make sense? > > > > > > Upayavira > > >
Re: How to perform phonetic matching/query for multivalued string field
If you want to analyse a string field, use the KeywordTokenizer - it just passes the whole field through as a single tokenizer. Does that get you there? On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote: > I understand that i can configure "solr.PhoneticFilterFactory" for both > indexing and query time for "solr.TextField". However, i want to query a > list of term (indexed and stored) from a field ordered by phonetic > similarity, which can be easily done by most of relational database. > > Term Component allows me to perform exactly matching and regex based > fuzzy > matching from multi-valued field. However, the solr string field does not > allow to customise the default analyser. Is there any other way to > circumvent the problem? > > thanks, > Jerry > > > > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote: > > > Hi, > > > > > > > > > I want to query a list of terms indexed and stored in multivalued string > > > field via Term Component. The term component can support exact matching > > > and > > > regex based fuzzy matching. However, Is any way i can configure scheme to > > > do phonetic matching/query? > > > > Phonetic matching is done at index time - that is - you use a > > PhoneticFilterFactory in your analysis chain, such that you are doing > > exact match lookups on the phonetic terms. > > > > Make sense? > > > > Upayavira > >
Re: How to perform phonetic matching/query for multivalued string field
I understand that i can configure "solr.PhoneticFilterFactory" for both indexing and query time for "solr.TextField". However, i want to query a list of term (indexed and stored) from a field ordered by phonetic similarity, which can be easily done by most of relational database. Term Component allows me to perform exactly matching and regex based fuzzy matching from multi-valued field. However, the solr string field does not allow to customise the default analyser. Is there any other way to circumvent the problem? thanks, Jerry On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote: > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote: > > Hi, > > > > > > I want to query a list of terms indexed and stored in multivalued string > > field via Term Component. The term component can support exact matching > > and > > regex based fuzzy matching. However, Is any way i can configure scheme to > > do phonetic matching/query? > > Phonetic matching is done at index time - that is - you use a > PhoneticFilterFactory in your analysis chain, such that you are doing > exact match lookups on the phonetic terms. > > Make sense? > > Upayavira >
Re: How to perform phonetic matching/query for multivalued string field
On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote: > Hi, > > > I want to query a list of terms indexed and stored in multivalued string > field via Term Component. The term component can support exact matching > and > regex based fuzzy matching. However, Is any way i can configure scheme to > do phonetic matching/query? Phonetic matching is done at index time - that is - you use a PhoneticFilterFactory in your analysis chain, such that you are doing exact match lookups on the phonetic terms. Make sense? Upayavira
Re: How to perform phonetic matching/query for multivalued string field
Many thanks for your suggestion. It works well for querying the field with phonetic matching and responses a list of docs tagged with the term. However, is there any way that i can get a list of matched terms ? The phonetic matching seems not work with Term Component (i'm using terms.regex to filter). Jie Gao, Research Assistant, Department of Computer Science, The University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, UK On 16 September 2015 at 21:04, Upayavira <u...@odoko.co.uk> wrote: > That is, use a TextField plus a KeywordTokenizerFactory, rather than a > StringField > > On Wed, Sep 16, 2015, at 09:03 PM, Upayavira wrote: > > If you want to analyse a string field, use the KeywordTokenizer - it > > just passes the whole field through as a single tokenizer. > > > > Does that get you there? > > > > On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote: > > > I understand that i can configure "solr.PhoneticFilterFactory" for both > > > indexing and query time for "solr.TextField". However, i want to query > a > > > list of term (indexed and stored) from a field ordered by phonetic > > > similarity, which can be easily done by most of relational database. > > > > > > Term Component allows me to perform exactly matching and regex based > > > fuzzy > > > matching from multi-valued field. However, the solr string field does > not > > > allow to customise the default analyser. Is there any other way to > > > circumvent the problem? > > > > > > thanks, > > > Jerry > > > > > > > > > > > > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > > > > > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote: > > > > > Hi, > > > > > > > > > > > > > > > I want to query a list of terms indexed and stored in multivalued > string > > > > > field via Term Component. The term component can support exact > matching > > > > > and > > > > > regex based fuzzy matching. However, Is any way i can configure > scheme to > > > > > do phonetic matching/query? > > > > > > > > Phonetic matching is done at index time - that is - you use a > > > > PhoneticFilterFactory in your analysis chain, such that you are doing > > > > exact match lookups on the phonetic terms. > > > > > > > > Make sense? > > > > > > > > Upayavira > > > > >
Re: How to perform phonetic matching/query for multivalued string field
I bet the terms component does not analyse the terms, so you will need to hand in already analysed phonetic terms. You could use the http://localhost:8983/solr/YOUR-CORE/analysis/field URL to have Solr analyse the field for you before passing it back to the term component. Upayavira On Wed, Sep 16, 2015, at 10:03 PM, Jie Gao wrote: > Many thanks for your suggestion. > > It works well for querying the field with phonetic matching and responses > a > list of docs tagged with the term. > > However, is there any way that i can get a list of matched terms ? The > phonetic matching seems not work with Term Component (i'm using > terms.regex > to filter). > > Jie Gao, > Research Assistant, > Department of Computer Science, The University of Sheffield, > Regent Court, 211 Portobello, S1 4DP, Sheffield, UK > > On 16 September 2015 at 21:04, Upayavira <u...@odoko.co.uk> wrote: > > > That is, use a TextField plus a KeywordTokenizerFactory, rather than a > > StringField > > > > On Wed, Sep 16, 2015, at 09:03 PM, Upayavira wrote: > > > If you want to analyse a string field, use the KeywordTokenizer - it > > > just passes the whole field through as a single tokenizer. > > > > > > Does that get you there? > > > > > > On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote: > > > > I understand that i can configure "solr.PhoneticFilterFactory" for both > > > > indexing and query time for "solr.TextField". However, i want to query > > a > > > > list of term (indexed and stored) from a field ordered by phonetic > > > > similarity, which can be easily done by most of relational database. > > > > > > > > Term Component allows me to perform exactly matching and regex based > > > > fuzzy > > > > matching from multi-valued field. However, the solr string field does > > not > > > > allow to customise the default analyser. Is there any other way to > > > > circumvent the problem? > > > > > > > > thanks, > > > > Jerry > > > > > > > > > > > > > > > > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > > > > > > > > > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote: > > > > > > Hi, > > > > > > > > > > > > > > > > > > I want to query a list of terms indexed and stored in multivalued > > string > > > > > > field via Term Component. The term component can support exact > > matching > > > > > > and > > > > > > regex based fuzzy matching. However, Is any way i can configure > > scheme to > > > > > > do phonetic matching/query? > > > > > > > > > > Phonetic matching is done at index time - that is - you use a > > > > > PhoneticFilterFactory in your analysis chain, such that you are doing > > > > > exact match lookups on the phonetic terms. > > > > > > > > > > Make sense? > > > > > > > > > > Upayavira > > > > > > >
How to perform phonetic matching/query for multivalued string field
Hi, I want to query a list of terms indexed and stored in multivalued string field via Term Component. The term component can support exact matching and regex based fuzzy matching. However, Is any way i can configure scheme to do phonetic matching/query? Thanks, Jerry
Re: SOLRJ Atomic updates of String field
I understood the query now. Atomic Update and Optimistic Concurrency are independent in Solr version 5. Not sure about version 4.2, if they are combined in this version a _version_ field is needed to pass in every update. The atomic/partial update will succeed if version in the request and indexed doc matches otherwise response will have HTTP error code 409. You can try by passing the _version_ of indexed doc during update. It's also good to add a unit test in Solr for partial update which currently I see missing. On Wed, Nov 12, 2014 at 1:00 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Bbarani, Partial update solrJ example can be found in : http://find.searchhub.org/document/5b1187abfcfad33f Ahmet On Tuesday, November 11, 2014 8:51 PM, bbarani bbar...@gmail.com wrote: I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLRJ Atomic updates of String field
I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLRJ Atomic updates of String field
Sorry didn't get what you are trying to achieve and the issue. On Wed, Nov 12, 2014 at 12:20 AM, bbarani bbar...@gmail.com wrote: I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLRJ Atomic updates of String field
Hi Bbarani, Partial update solrJ example can be found in : http://find.searchhub.org/document/5b1187abfcfad33f Ahmet On Tuesday, November 11, 2014 8:51 PM, bbarani bbar...@gmail.com wrote: I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact match on string field with special characters
I may have provided too much background story for my question. What I am trying to do at the core here, is an exact match on a single field. I do this programmatically by reading the field value from the facet query and setting it equal to the field name for a subsequent search. if this is a sample facet query result ... (Field1 is defined as a string) [Field1:[HI! THIS IS A VALUE FOR \FIELD1\ (100)] Then I need to run a search for that exact value. The problem is the double quotes and slashes when I try to construct the facet query ... String fq = Field1: + \ + value + \; The quotes play havoc with the concatenation, as do backslashes. I was wondering if there's a way to build the search without having to manually construct it in code. The only thing I can come up with is to transform the field data at index time by replacing double quotes and backslashes. I don't strip special chars because I'm using the facet values for display. This problem may be specific to SolrJ. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209p4162907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact match on string field with special characters
Shoot I just noticed the error in my original post which would certainly cause confusion. Instead of query.addFacetField(fq); I meant to write query.setParam(fq, fg); Sorry. -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209p4162908.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Exact match on string field with special characters
This should do what you want: String fq = Field1 + \ + org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(value) + \; -Michael -Original Message- From: tedsolr [mailto:tsm...@sciquest.com] Sent: Monday, October 06, 2014 10:49 AM To: solr-user@lucene.apache.org Subject: Re: Exact match on string field with special characters I may have provided too much background story for my question. What I am trying to do at the core here, is an exact match on a single field. I do this programmatically by reading the field value from the facet query and setting it equal to the field name for a subsequent search. if this is a sample facet query result ... (Field1 is defined as a string) [Field1:[HI! THIS IS A VALUE FOR \FIELD1\ (100)] Then I need to run a search for that exact value. The problem is the double quotes and slashes when I try to construct the facet query ... String fq = Field1: + \ + value + \; The quotes play havoc with the concatenation, as do backslashes. I was wondering if there's a way to build the search without having to manually construct it in code. The only thing I can come up with is to transform the field data at index time by replacing double quotes and backslashes. I don't strip special chars because I'm using the facet values for display. This problem may be specific to SolrJ. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209p4162907.html Sent from the Solr - User mailing list archive at Nabble.com.
Exact match on string field with special characters
I am trying to do SQL like aggregation (GROUP BY) with solr faceting. So I use string fields for faceting - to try to get an exact match. However, it seems like to run a facet query I have to surround the value with double quotes. That poses issues when the field value is green bath towels -or- red \cars Those two special characters must be transformed somehow on indexing so I can create the query: (java) ... String fg = fieldName + :\ + fieldValue + \; query.addFacetField(fq); ... Is there a way to request an exact match search without having to resort to the quotes? I could possibly convert spaces to underscores at index time, but I'd like to avoid munging that data because I'm using the string field for display too! That saves time/searches when aggregating against 10 - 15 fields which takes a whole lot of facet searches to begin with. Using Solr 4.9 -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Exact match on string field with special characters
When you call addFacetField, the parameter you pass it should just be the fieldName. The fieldValue shouldn't come into play at all (unless I'm misunderstanding what you're trying to do). If you ever do need to escape a value for a query, you can use org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(). -Michael -Original Message- From: tedsolr [mailto:tsm...@sciquest.com] Sent: Wednesday, October 01, 2014 5:33 PM To: solr-user@lucene.apache.org Subject: Exact match on string field with special characters I am trying to do SQL like aggregation (GROUP BY) with solr faceting. So I use string fields for faceting - to try to get an exact match. However, it seems like to run a facet query I have to surround the value with double quotes. That poses issues when the field value is green bath towels -or- red \cars Those two special characters must be transformed somehow on indexing so I can create the query: (java) ... String fg = fieldName + :\ + fieldValue + \; query.addFacetField(fq); ... Is there a way to request an exact match search without having to resort to the quotes? I could possibly convert spaces to underscores at index time, but I'd like to avoid munging that data because I'm using the string field for display too! That saves time/searches when aggregating against 10 - 15 fields which takes a whole lot of facet searches to begin with. Using Solr 4.9 -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact match on string field with special characters
Hi, raw query parser or term query parser would be handy. https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser Ahmet On Thursday, October 2, 2014 12:32 AM, tedsolr tsm...@sciquest.com wrote: I am trying to do SQL like aggregation (GROUP BY) with solr faceting. So I use string fields for faceting - to try to get an exact match. However, it seems like to run a facet query I have to surround the value with double quotes. That poses issues when the field value is green bath towels -or- red \cars Those two special characters must be transformed somehow on indexing so I can create the query: (java) ... String fg = fieldName + :\ + fieldValue + \; query.addFacetField(fq); ... Is there a way to request an exact match search without having to resort to the quotes? I could possibly convert spaces to underscores at index time, but I'd like to avoid munging that data because I'm using the string field for display too! That saves time/searches when aggregating against 10 - 15 fields which takes a whole lot of facet searches to begin with. Using Solr 4.9 -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209.html Sent from the Solr - User mailing list archive at Nabble.com.
How to summarize a String Field ?
Hi One of my filed called AMOUNT is String,and I want to calculate the sum of the this filed. I have try it with the stats component,it only give out the stats information without sum item just as following: lst name=AMOUNT str name=min/str str name=max5000/str long name=count24230/long long name=missing26362/long lst name=facets/ /lst Is there any ways to achieve this object? Regards
Re: How to summarize a String Field ?
You cannot do this as far as I know, it must be a numeric field (float/int/tint/tfloat whatever). Best Erick On Thu, Sep 18, 2014 at 12:46 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi One of my filed called AMOUNT is String,and I want to calculate the sum of the this filed. I have try it with the stats component,it only give out the stats information without sum item just as following: lst name=AMOUNT str name=min/str str name=max5000/str long name=count24230/long long name=missing26362/long lst name=facets/ /lst Is there any ways to achieve this object? Regards
Re: How to summarize a String Field ?
Do a copyField to a numeric field. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Thursday, September 18, 2014 11:35 AM To: solr-user@lucene.apache.org Subject: Re: How to summarize a String Field ? You cannot do this as far as I know, it must be a numeric field (float/int/tint/tfloat whatever). Best Erick On Thu, Sep 18, 2014 at 12:46 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi One of my filed called AMOUNT is String,and I want to calculate the sum of the this filed. I have try it with the stats component,it only give out the stats information without sum item just as following: lst name=AMOUNT str name=min/str str name=max5000/str long name=count24230/long long name=missing26362/long lst name=facets/ /lst Is there any ways to achieve this object? Regards
Typecast non stored string field for sorting
Hi friends, I have a field which is string which I created by mistake it should have been int. It is not stored just indexed. I want to numerically sort it, and hence I want a function which can at query convert to integer or double and then I can apply sort. Is it possible? If not then can I create a new field with the value from non stored field? Please advise. Thanks Abhishek -- Thanks and kind Regards, Abhishek jain +91 9971376767
Re: Typecast non stored string field for sorting
I don't know of any way offhand to do this except to re-index. You can't, for instance, say copy from this indexed field to this other indexed field. Is it possible for you to re-index? Best, Erick On Wed, Apr 23, 2014 at 12:46 PM, abhishek jain abhishek.netj...@gmail.com wrote: Hi friends, I have a field which is string which I created by mistake it should have been int. It is not stored just indexed. I want to numerically sort it, and hence I want a function which can at query convert to integer or double and then I can apply sort. Is it possible? If not then can I create a new field with the value from non stored field? Please advise. Thanks Abhishek -- Thanks and kind Regards, Abhishek jain +91 9971376767
Re: Typecast non stored string field for sorting
I think you can write a custom function query and use it on query time. -- View this message in context: http://lucene.472066.n3.nabble.com/Typecast-non-stored-string-field-for-sorting-tp4132759p4132779.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlight feature is not working on string field type- Apache Solr
Hi; When you examine Solr example folder you can see that highlighting feature works for String field. Here is the definition for cat field that is a type of String: field name=cat type=string indexed=true stored=true multiValued=true/ if you run that from your browser: http://localhost:8983/solr/collection1/select?q=cat:*%20AND%20name:samsunghl=truehl.fl=* You will see that highlighting works as excepted. All in all, what is your Solr version and configuration of search handler? Thanks; Furkan KAMACI 2013/12/5 pyramesh pyrames...@gmail.com Hi ALL, I have recently build small search application using Apache solr. now I am facing an issue. Highlighting text feature is not working on string field type, But it working on text field type. when I search the content on string field type, the results are getting displaying, but not getting highlight. can any one please guide me on the same. Thanks in Advance !!! Regards, Ramesh py -- View this message in context: http://lucene.472066.n3.nabble.com/highlight-feature-is-not-working-on-string-field-type-Apache-Solr-tp4105084.html Sent from the Solr - User mailing list archive at Nabble.com.
highlight feature is not working on string field type- Apache Solr
Hi ALL, I have recently build small search application using Apache solr. now I am facing an issue. Highlighting text feature is not working on string field type, But it working on text field type. when I search the content on string field type, the results are getting displaying, but not getting highlight. can any one please guide me on the same. Thanks in Advance !!! Regards, Ramesh py -- View this message in context: http://lucene.472066.n3.nabble.com/highlight-feature-is-not-working-on-string-field-type-Apache-Solr-tp4105084.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: String field does not yield partial match result using qf parameter
fieldType string is not tokenized, so your observation is correct. You need to use a fieldType with analysis and tokenization to get the behavior you want. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 25. juni 2013 kl. 02:35 skrev Mugoma Joseph O. mug...@yengas.com: It looks like partial search works only with copied to field. This works: $ curl http://localhost:8282/solr/links/select?q=text_ngrams:yengaswt=jsonindent=onfl=id,domain,score; On Tue, June 25, 2013 12:39 am, Mugoma Joseph O. wrote: Hello, I am newbie to solr. I am trying out partial search (match). My experience is opposite of http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html When I add 'qf' to to dismax query I get no result unless there's a full match. I am using NGramFilterFactory as follows: fieldType name=text_edgengrams class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer /fieldType ... field name=text_ngrams type=text_edgengrams indexed=true stored=false multiValued=true / ... field name=domain type=string indexed=true stored=true/ ... copyField source=domain dest=text_ngrams/ If I have yengas.com in indexed I can search for yengas.com but not yengas. However, If I drop 'qf' I can search for yengas. Example searches: $ curl http://localhost:8282/solr/links/select?q=domain:yengaswt=jsonindent=onfl=id,domain,score; = response:{numFound:0,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?q=domain:yengas.comwt=jsonindent=onfl=id,domain,score; = response:{numFound:3,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?defType=dismaxq=yengasqf=domain^4pf=domainps=0fl=id,domain,score; = response:{numFound:0,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?defType=dismaxq=yengas.compf=domainps=0fl=id,domain,score; = response:{numFound:3,start:0,docs:[] The partial match fails on dismax and normal query. What could I be missing? Thanks. Mugoma.
String field does not yield partial match result using qf parameter
Hello, I am newbie to solr. I am trying out partial search (match). My experience is opposite of http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html When I add 'qf' to to dismax query I get no result unless there's a full match. I am using NGramFilterFactory as follows: fieldType name=text_edgengrams class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer /fieldType ... field name=text_ngrams type=text_edgengrams indexed=true stored=false multiValued=true / ... field name=domain type=string indexed=true stored=true/ ... copyField source=domain dest=text_ngrams/ If I have yengas.com in indexed I can search for yengas.com but not yengas. However, If I drop 'qf' I can search for yengas. Example searches: $ curl http://localhost:8282/solr/links/select?q=domain:yengaswt=jsonindent=onfl=id,domain,score; = response:{numFound:0,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?q=domain:yengas.comwt=jsonindent=onfl=id,domain,score; = response:{numFound:3,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?defType=dismaxq=yengasqf=domain^4pf=domainps=0fl=id,domain,score; = response:{numFound:0,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?defType=dismaxq=yengas.compf=domainps=0fl=id,domain,score; = response:{numFound:3,start:0,docs:[] The partial match fails on dismax and normal query. What could I be missing? Thanks. Mugoma.
Re: String field does not yield partial match result using qf parameter
It looks like partial search works only with copied to field. This works: $ curl http://localhost:8282/solr/links/select?q=text_ngrams:yengaswt=jsonindent=onfl=id,domain,score; On Tue, June 25, 2013 12:39 am, Mugoma Joseph O. wrote: Hello, I am newbie to solr. I am trying out partial search (match). My experience is opposite of http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html When I add 'qf' to to dismax query I get no result unless there's a full match. I am using NGramFilterFactory as follows: fieldType name=text_edgengrams class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer /fieldType ... field name=text_ngrams type=text_edgengrams indexed=true stored=false multiValued=true / ... field name=domain type=string indexed=true stored=true/ ... copyField source=domain dest=text_ngrams/ If I have yengas.com in indexed I can search for yengas.com but not yengas. However, If I drop 'qf' I can search for yengas. Example searches: $ curl http://localhost:8282/solr/links/select?q=domain:yengaswt=jsonindent=onfl=id,domain,score; = response:{numFound:0,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?q=domain:yengas.comwt=jsonindent=onfl=id,domain,score; = response:{numFound:3,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?defType=dismaxq=yengasqf=domain^4pf=domainps=0fl=id,domain,score; = response:{numFound:0,start:0,docs:[] $ curl http://localhost:8282/solr/links/select?defType=dismaxq=yengas.compf=domainps=0fl=id,domain,score; = response:{numFound:3,start:0,docs:[] The partial match fails on dismax and normal query. What could I be missing? Thanks. Mugoma.
Re: Solr string field stripping new lines line breaks
Dears, My english is bad. But I will try to explain. I have indexed databases and files. The files included : docx, pdf, txt. Then I have indexed all of data. But my indexed document pdf files text all of through continued. I try to appear line break text. Document files text line breaks to indexed document also line breaks. My frontend app is SOLARIUM. How can I appear line break the indexed data? Please assist me on this. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-string-field-stripping-new-lines-line-breaks-tp3984384p4071595.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr string field stripping new lines line breaks
First, please start a new thread when you change the topic, doing so makes the threads easier to track. But what is your evidence that line breaks are stripped? The stored data is a verbatim copy of the data that went in to the field, nothing at all is changed. So one of several things is happening 1 they may be being stripped by whatever turns the PDF into a Solr document, SOLARIUM? 2 if you're displaying them in a browser, the line breaks may be there but just being ignored by the browser. You could write a very brief SolrJ program or similar and see the raw output by getting the data directly from your index... Best Erick On Wed, Jun 19, 2013 at 5:50 AM, sodoo first...@yahoo.com wrote: Dears, My english is bad. But I will try to explain. I have indexed databases and files. The files included : docx, pdf, txt. Then I have indexed all of data. But my indexed document pdf files text all of through continued. I try to appear line break text. Document files text line breaks to indexed document also line breaks. My frontend app is SOLARIUM. How can I appear line break the indexed data? Please assist me on this. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-string-field-stripping-new-lines-line-breaks-tp3984384p4071595.html Sent from the Solr - User mailing list archive at Nabble.com.
Error indexing string field
I have a field declared as type string, so should it care whats inside the string? Caused by: java.lang.NumberFormatException: For input string: 1835-1910. Thanks -Peri
Re: Error indexing string field
: I have a field declared as type string, so should it care whats inside the string? : : Caused by: java.lang.NumberFormatException: For input string: 1835-1910. you haven't given us any information we can use to help you... schema? high level error that wrapped that NFE? full stack trace of the entire error? data you are indexing? https://wiki.apache.org/solr/UsingMailingLists Best guesses: * you aren't indexing into the field you think you are * there is a copyField from teh field you are using into another field you forgot about * you are using an update processor that expects numbers * you are using a DataImportHandler feature that expects numbers -Hoss
Re: string field does not yield exact match result using qf parameter
Hi Jan my question is when I tweak pf and qf parameter and the results change slightly and I do not think for exact match you need to implement the solution that you mentioned in your reply. you can always have string field and in your pf parameter you can boost that field to get the exact match results on top. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: string field does not yield exact match result using qf parameter
Hi, You can try to increase the pf boost for your string field, I don't think you'll have success in having it boosted with pf since it's a string? Check explain output with debugQuery=true and see whether you get a phrase boost. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 2. mai 2013 kl. 19:16 skrev kirpakaroji kirpakar...@yahoo.com: Hi Jan my question is when I tweak pf and qf parameter and the results change slightly and I do not think for exact match you need to implement the solution that you mentioned in your reply. you can always have string field and in your pf parameter you can boost that field to get the exact match results on top. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html Sent from the Solr - User mailing list archive at Nabble.com.
string field does not yield exact match result using qf parameter
I have a question regarding boosting the exact match queries to top, followed by partial match and if there is no exact match then give me partial match. The following 2 solutions have yielded different results, and I was not clear on it why This is the schema I have field name=f1 type=string indexed=true stored=true / field name=f2 type=text_general indexed=false stored=true multiValued=true/ field name=f3 type=pt_field indexed=true stored=true / copyField source=f1 dest=f3 / uniqueKeyf1/uniqueKey fieldType name=pt_field class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0/ filter class=solr.StopFilterFactory ignoreCase=true words=./lang/stopwords_pt.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Portuguese/ /analyzer /fieldType in my solrconfig.xml I have str name=dff1/str str name=qff1^10 f3^1/str str name=pff1^10 f3^1/str now if I try to specify the query with these parameters in solrconfig.xml, 99% of the time exactmatch first and then partial match 1%of the time the exact match result is in the index but does not show on the results and does not give any partial matches for that query either. But if I make it qf=f3pf=f1^10 f3^1 yields the exactmatch result on top 100% of the time. Why I am seeing this behavior. is there anyway to say qf=f1 on the interface and get only exact results if present (in this case though f1 is string but the q parameter has spaces. do I need to use pf field I am using dismax query parser. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: string field does not yield exact match result using qf parameter
Hi, The pf feature will only kick in for phrases, i.e. multiple tokens. Per definition a string is one single token, so it will never kick in for strings. A workaround can be found here: https://github.com/cominvent/exactmatch -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 30. apr. 2013 kl. 20:52 skrev kirpakaroji kirpakar...@yahoo.com: I have a question regarding boosting the exact match queries to top, followed by partial match and if there is no exact match then give me partial match. The following 2 solutions have yielded different results, and I was not clear on it why This is the schema I have field name=f1 type=string indexed=true stored=true / field name=f2 type=text_general indexed=false stored=true multiValued=true/ field name=f3 type=pt_field indexed=true stored=true / copyField source=f1 dest=f3 / uniqueKeyf1/uniqueKey fieldType name=pt_field class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0/ filter class=solr.StopFilterFactory ignoreCase=true words=./lang/stopwords_pt.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Portuguese/ /analyzer /fieldType in my solrconfig.xml I have str name=dff1/str str name=qff1^10 f3^1/str str name=pff1^10 f3^1/str now if I try to specify the query with these parameters in solrconfig.xml, 99% of the time exactmatch first and then partial match 1%of the time the exact match result is in the index but does not show on the results and does not give any partial matches for that query either. But if I make it qf=f3pf=f1^10 f3^1 yields the exactmatch result on top 100% of the time. Why I am seeing this behavior. is there anyway to say qf=f1 on the interface and get only exact results if present (in this case though f1 is string but the q parameter has spaces. do I need to use pf field I am using dismax query parser. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Interesting issue with special characters in a string field value
Hello Jack, I'm not sure if this is an option for you, but if you submit and retrieve your documents using only SolrJ, you won't have to worry about escaping them for encoding into a particular document format. SolrJ would handle that for you. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Sun, Feb 24, 2013 at 12:29 AM, Jack Park jackp...@topicquests.org wrote: Ok. I have revisited this issue as deeply as possible using simplistic unit tests, tossing out indexes, and starting fresh. A typical Solr document might have a label, e.g. the string inside the quotes: Node Type. That would be queried, according to what I've been able to read, as a Phrase Query, which means, include the quotes around the text. When I use the admin query panel with this query: label:Node Type A fragment of the full document is returned. it is this: doc str name=locatorNodeType/str arr name=label strNode Type/str /arr In my code using SolrJ, I have printlines just as the escaped query string comes in, and one which shows what the SolrQuery looks like after setting it up to go online. I then show what came back: Solr3Client.runQuery- label:Node Type 0 10 Solr3Client.runQuery-1 q=label%3A%22Node+Type%22start=0rows=10 {numFound=1,start=0,docs=[SolrDocument{locator=NodeType, smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests typology node type., isPrivate=false, creatorId=SystemUser, label=Node Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST 2013, createdDate=Sat Feb 23 20:43:22 PST 2013, _version_=1427826019119661056}]} What that says is that SolrQuery inserted a + inside the query string, and that it found 1 document, but did not return it. In the largest picture, I have returned to using XMLResponseParser on the theory that I will now be able to take advantage of partialUpdates on multi-valued fields (ListString) but haven't tested that yet. I am not yet escaping such things as or but just escaping those things mentioned in the Solr documents which are reserved characters. So, the current update is this: learning about phrase queries, and judicious escaping of reserved characters seems to be helping. Next up entails two issues: more robust testing of escaped characters, and trying to discover what is the best approach to dealing with characters that must be escaped to get past XML, e.g. '', '', and others. Many thanks Jack On Fri, Feb 22, 2013 at 2:44 PM, Jack Park jackp...@topicquests.org wrote: Michael, I don't think you misunderstood. I will soon give a full response here, but am on the road at the moment. Many thanks Jack On Friday, February 22, 2013, Michael Della Bitta michael.della.bi...@appinions.com wrote: My mistake, I misunderstood the problem. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : If you're submitting documents as XML, you're always going to have to : escape meaningful XML characters going in. If you ask for them back as : XML, you should be prepared to unescape special XML characters as that still wouldn't explain the discrepency he's claiming to see between the json xml resmonses (the json containing an empty string Jack: please elaborate with specifics about your solr version, field, field type, how you indexed your doc, and what the request urls raw responses that you get are (ie: don't trust the XML you see in your browser, it may be unescaping escaped sequences in element text to be helpful .. use something like curl) For example... BEGIN GOOD EXAMPLE OF SPECIFICS--- I'm using Solr 4.x with the 4.x example schema which has the following field... field name=cat type=string indexed=true stored=true multiValued=true/ fieldType name=string class=solr.StrField sortMissingLast=true / I indexed a doc like this... $ curl http://localhost:8983/solr/update?commit=true; -H 'Content-type:application/json' -d '[{id:hoss, cat:Something to use as a source node } ]' And this is what i get from the following requests... $ curl http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true; ?xml version=1.0 encoding=UTF-8? response result name=response numFound=1 start=0 doc str name=idhoss/str arr name=cat strlt;Something to use as a source nodegt;/str /arr long name=_version_1427705631375097856/long/doc /result /response $ curl http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true; { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node],
Re: Interesting issue with special characters in a string field value
I did run attempt queries with and without escaping at the admin query browser; made no difference. I seem to recall that the system did not work without escaping, but it does seem worth blocking escaping and testing again. Many thanks Jack On Sun, Feb 24, 2013 at 1:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hello Jack, I'm not sure if this is an option for you, but if you submit and retrieve your documents using only SolrJ, you won't have to worry about escaping them for encoding into a particular document format. SolrJ would handle that for you. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Sun, Feb 24, 2013 at 12:29 AM, Jack Park jackp...@topicquests.org wrote: Ok. I have revisited this issue as deeply as possible using simplistic unit tests, tossing out indexes, and starting fresh. A typical Solr document might have a label, e.g. the string inside the quotes: Node Type. That would be queried, according to what I've been able to read, as a Phrase Query, which means, include the quotes around the text. When I use the admin query panel with this query: label:Node Type A fragment of the full document is returned. it is this: doc str name=locatorNodeType/str arr name=label strNode Type/str /arr In my code using SolrJ, I have printlines just as the escaped query string comes in, and one which shows what the SolrQuery looks like after setting it up to go online. I then show what came back: Solr3Client.runQuery- label:Node Type 0 10 Solr3Client.runQuery-1 q=label%3A%22Node+Type%22start=0rows=10 {numFound=1,start=0,docs=[SolrDocument{locator=NodeType, smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests typology node type., isPrivate=false, creatorId=SystemUser, label=Node Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST 2013, createdDate=Sat Feb 23 20:43:22 PST 2013, _version_=1427826019119661056}]} What that says is that SolrQuery inserted a + inside the query string, and that it found 1 document, but did not return it. In the largest picture, I have returned to using XMLResponseParser on the theory that I will now be able to take advantage of partialUpdates on multi-valued fields (ListString) but haven't tested that yet. I am not yet escaping such things as or but just escaping those things mentioned in the Solr documents which are reserved characters. So, the current update is this: learning about phrase queries, and judicious escaping of reserved characters seems to be helping. Next up entails two issues: more robust testing of escaped characters, and trying to discover what is the best approach to dealing with characters that must be escaped to get past XML, e.g. '', '', and others. Many thanks Jack On Fri, Feb 22, 2013 at 2:44 PM, Jack Park jackp...@topicquests.org wrote: Michael, I don't think you misunderstood. I will soon give a full response here, but am on the road at the moment. Many thanks Jack On Friday, February 22, 2013, Michael Della Bitta michael.della.bi...@appinions.com wrote: My mistake, I misunderstood the problem. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : If you're submitting documents as XML, you're always going to have to : escape meaningful XML characters going in. If you ask for them back as : XML, you should be prepared to unescape special XML characters as that still wouldn't explain the discrepency he's claiming to see between the json xml resmonses (the json containing an empty string Jack: please elaborate with specifics about your solr version, field, field type, how you indexed your doc, and what the request urls raw responses that you get are (ie: don't trust the XML you see in your browser, it may be unescaping escaped sequences in element text to be helpful .. use something like curl) For example... BEGIN GOOD EXAMPLE OF SPECIFICS--- I'm using Solr 4.x with the 4.x example schema which has the following field... field name=cat type=string indexed=true stored=true multiValued=true/ fieldType name=string class=solr.StrField sortMissingLast=true / I indexed a doc like this... $ curl http://localhost:8983/solr/update?commit=true; -H 'Content-type:application/json' -d '[{id:hoss, cat:Something to use as a source node } ]' And this is what i get from the following requests... $ curl http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true; ?xml version=1.0 encoding=UTF-8? response result name=response numFound=1 start=0 doc str name=idhoss/str arr name=cat
Re: Interesting issue with special characters in a string field value
Ok. I have revisited this issue as deeply as possible using simplistic unit tests, tossing out indexes, and starting fresh. A typical Solr document might have a label, e.g. the string inside the quotes: Node Type. That would be queried, according to what I've been able to read, as a Phrase Query, which means, include the quotes around the text. When I use the admin query panel with this query: label:Node Type A fragment of the full document is returned. it is this: doc str name=locatorNodeType/str arr name=label strNode Type/str /arr In my code using SolrJ, I have printlines just as the escaped query string comes in, and one which shows what the SolrQuery looks like after setting it up to go online. I then show what came back: Solr3Client.runQuery- label:Node Type 0 10 Solr3Client.runQuery-1 q=label%3A%22Node+Type%22start=0rows=10 {numFound=1,start=0,docs=[SolrDocument{locator=NodeType, smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests typology node type., isPrivate=false, creatorId=SystemUser, label=Node Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST 2013, createdDate=Sat Feb 23 20:43:22 PST 2013, _version_=1427826019119661056}]} What that says is that SolrQuery inserted a + inside the query string, and that it found 1 document, but did not return it. In the largest picture, I have returned to using XMLResponseParser on the theory that I will now be able to take advantage of partialUpdates on multi-valued fields (ListString) but haven't tested that yet. I am not yet escaping such things as or but just escaping those things mentioned in the Solr documents which are reserved characters. So, the current update is this: learning about phrase queries, and judicious escaping of reserved characters seems to be helping. Next up entails two issues: more robust testing of escaped characters, and trying to discover what is the best approach to dealing with characters that must be escaped to get past XML, e.g. '', '', and others. Many thanks Jack On Fri, Feb 22, 2013 at 2:44 PM, Jack Park jackp...@topicquests.org wrote: Michael, I don't think you misunderstood. I will soon give a full response here, but am on the road at the moment. Many thanks Jack On Friday, February 22, 2013, Michael Della Bitta michael.della.bi...@appinions.com wrote: My mistake, I misunderstood the problem. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : If you're submitting documents as XML, you're always going to have to : escape meaningful XML characters going in. If you ask for them back as : XML, you should be prepared to unescape special XML characters as that still wouldn't explain the discrepency he's claiming to see between the json xml resmonses (the json containing an empty string Jack: please elaborate with specifics about your solr version, field, field type, how you indexed your doc, and what the request urls raw responses that you get are (ie: don't trust the XML you see in your browser, it may be unescaping escaped sequences in element text to be helpful .. use something like curl) For example... BEGIN GOOD EXAMPLE OF SPECIFICS--- I'm using Solr 4.x with the 4.x example schema which has the following field... field name=cat type=string indexed=true stored=true multiValued=true/ fieldType name=string class=solr.StrField sortMissingLast=true / I indexed a doc like this... $ curl http://localhost:8983/solr/update?commit=true; -H 'Content-type:application/json' -d '[{id:hoss, cat:Something to use as a source node } ]' And this is what i get from the following requests... $ curl http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true; ?xml version=1.0 encoding=UTF-8? response result name=response numFound=1 start=0 doc str name=idhoss/str arr name=cat strlt;Something to use as a source nodegt;/str /arr long name=_version_1427705631375097856/long/doc /result /response $ curl http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true; { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node], _version_:1427705631375097856}] }} $ curl http://localhost:8983/solr/select?q=cat:%22Something+to+use+as+a+source+node%22wt=jsonindent=trueomitHeader=true { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node], _version_:1427705631375097856}] }} END GOOD EXAMPLE OF SPECIFICS--- : Even more curious, if I use this query at the console: : : details:Something to use as a source node : : I get nothing back. note in my last example above the importance of
Interesting issue with special characters in a string field value
I have a multi-value stored field called details I've been deliberately sending it values like Something to use as a source node If I fetch a document with that field at the admin query console, using XML, I get: arr name=details strSomething to use as a source node/str /arr If I fetch with JSON, I get: details: [ ], Even more curious, if I use this query at the console: details:Something to use as a source node I get nothing back. I think I'm having an identity crisis in relation to escaping characters at SolrJ. The values are going up, and when the query is to bring the document back, they come back. But, as individuals values, they don't appear to submit to query. If I actually escape them going up, then the document is full of escaped characters, which can be troublesome when fetching and using. Any thoughts? Many thanks Jack
Re: Interesting issue with special characters in a string field value
Hi Jack, If you're submitting documents as XML, you're always going to have to escape meaningful XML characters going in. If you ask for them back as XML, you should be prepared to unescape special XML characters as output. Same goes for JSON, etc. There's really no way around this... it's just a fact of life when dealing with document formats. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Feb 22, 2013 at 3:24 PM, Jack Park jackp...@topicquests.org wrote: I have a multi-value stored field called details I've been deliberately sending it values like Something to use as a source node If I fetch a document with that field at the admin query console, using XML, I get: arr name=details strSomething to use as a source node/str /arr If I fetch with JSON, I get: details: [ ], Even more curious, if I use this query at the console: details:Something to use as a source node I get nothing back. I think I'm having an identity crisis in relation to escaping characters at SolrJ. The values are going up, and when the query is to bring the document back, they come back. But, as individuals values, they don't appear to submit to query. If I actually escape them going up, then the document is full of escaped characters, which can be troublesome when fetching and using. Any thoughts? Many thanks Jack
Re: Interesting issue with special characters in a string field value
: If you're submitting documents as XML, you're always going to have to : escape meaningful XML characters going in. If you ask for them back as : XML, you should be prepared to unescape special XML characters as that still wouldn't explain the discrepency he's claiming to see between the json xml resmonses (the json containing an empty string Jack: please elaborate with specifics about your solr version, field, field type, how you indexed your doc, and what the request urls raw responses that you get are (ie: don't trust the XML you see in your browser, it may be unescaping escaped sequences in element text to be helpful .. use something like curl) For example... BEGIN GOOD EXAMPLE OF SPECIFICS--- I'm using Solr 4.x with the 4.x example schema which has the following field... field name=cat type=string indexed=true stored=true multiValued=true/ fieldType name=string class=solr.StrField sortMissingLast=true / I indexed a doc like this... $ curl http://localhost:8983/solr/update?commit=true; -H 'Content-type:application/json' -d '[{id:hoss, cat:Something to use as a source node } ]' And this is what i get from the following requests... $ curl http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true; ?xml version=1.0 encoding=UTF-8? response result name=response numFound=1 start=0 doc str name=idhoss/str arr name=cat strlt;Something to use as a source nodegt;/str /arr long name=_version_1427705631375097856/long/doc /result /response $ curl http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true; { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node], _version_:1427705631375097856}] }} $ curl http://localhost:8983/solr/select?q=cat:%22Something+to+use+as+a+source+node%22wt=jsonindent=trueomitHeader=true { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node], _version_:1427705631375097856}] }} END GOOD EXAMPLE OF SPECIFICS--- : Even more curious, if I use this query at the console: : : details:Something to use as a source node : : I get nothing back. note in my last example above the importance of using quotes (or the {!term} qparser) to query string fields that contain special characters like whitespace -- whitespace is syntacally meaningul to the lucene query parser, it seperates clauses of a boolean query. -Hoss
Re: Interesting issue with special characters in a string field value
My mistake, I misunderstood the problem. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : If you're submitting documents as XML, you're always going to have to : escape meaningful XML characters going in. If you ask for them back as : XML, you should be prepared to unescape special XML characters as that still wouldn't explain the discrepency he's claiming to see between the json xml resmonses (the json containing an empty string Jack: please elaborate with specifics about your solr version, field, field type, how you indexed your doc, and what the request urls raw responses that you get are (ie: don't trust the XML you see in your browser, it may be unescaping escaped sequences in element text to be helpful .. use something like curl) For example... BEGIN GOOD EXAMPLE OF SPECIFICS--- I'm using Solr 4.x with the 4.x example schema which has the following field... field name=cat type=string indexed=true stored=true multiValued=true/ fieldType name=string class=solr.StrField sortMissingLast=true / I indexed a doc like this... $ curl http://localhost:8983/solr/update?commit=true; -H 'Content-type:application/json' -d '[{id:hoss, cat:Something to use as a source node } ]' And this is what i get from the following requests... $ curl http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true; ?xml version=1.0 encoding=UTF-8? response result name=response numFound=1 start=0 doc str name=idhoss/str arr name=cat strlt;Something to use as a source nodegt;/str /arr long name=_version_1427705631375097856/long/doc /result /response $ curl http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true; { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node], _version_:1427705631375097856}] }} $ curl http://localhost:8983/solr/select?q=cat:%22Something+to+use+as+a+source+node%22wt=jsonindent=trueomitHeader=true { response:{numFound:1,start:0,docs:[ { id:hoss, cat:[Something to use as a source node], _version_:1427705631375097856}] }} END GOOD EXAMPLE OF SPECIFICS--- : Even more curious, if I use this query at the console: : : details:Something to use as a source node : : I get nothing back. note in my last example above the importance of using quotes (or the {!term} qparser) to query string fields that contain special characters like whitespace -- whitespace is syntacally meaningul to the lucene query parser, it seperates clauses of a boolean query. -Hoss