subject:"String field"

Re: DocValued SortableText Field is slower than Non DocValued String Field for Facet

2021-01-28 Thread Michael Gibney

I'm not sure about _performance_, but I'm pretty sure you don't want to be
faceting on docValued SortableTextField (and faceting on non-docValued
SortableTextField, though I think technically possible, works against
uninverted _indexed_values, so ends up doing something entirely different):
https://issues.apache.org/jira/browse/SOLR-13056.

TL;DR: with SortableTextField bulk faceting happens over docValues (which
for SortableTextField contains the full sort value string) and refinement
happens against indexed values (which are tokenized). So it can behave very
strangely, at least in multi-shard collections. See also:
https://issues.apache.org/jira/browse/SOLR-8362

Quick clarification, you say "non Docvalued String Field" ... I'm assuming
you're talking about "StrField", not "TextField".

wrt performance difference, I'm willing to bet (though not certain) that
you're really simply noticing a discrepancy between docValues and
non-docValues faceting -- accordingly, for your use case I'd expect
faceting against StrField _with_ docValues to have similar performance to
SortableTextField with docValues. Further possibly-relevant discussion can
be found in the following thread:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202006.mbox/%3CCAF%3DheHFd6GBABzKzDQPTfpYUUQJXxYwue4OC86QOm_AR0X3_ZQ%40mail.gmail.com%3E

On Thu, Jan 28, 2021 at 7:25 PM Jae Joo  wrote:

> I am wondering that the performance of facet of DocValued SortableText
> Field is slower than non Docvalued String Field.
>
> Does anyone know why?
>
>
> Thanks,
>
> Jae
>

DocValued SortableText Field is slower than Non DocValued String Field for Facet

2021-01-28 Thread Jae Joo

I am wondering that the performance of facet of DocValued SortableText
Field is slower than non Docvalued String Field.

Does anyone know why?


Thanks,

Jae

Re: Case insensitive search on String field

2020-07-25 Thread Erick Erickson

In a word, “no”. The string type is intentionally primitive, no analysis/case 
changing is done at all.

You say “you cannot reindex the data”. Why not? Just due to time constraints or 
is the original data no longer available?

If all the fields are stored, you can pull the docs from the collection and 
index it into a new collection. See: 
https://lucene.apache.org/solr/guide/8_1/collections-api.html, the 
REINDEXCOLLECTION command.

Best,
Erick

> On Jul 25, 2020, at 2:22 PM, Anshuman Singh  wrote:
> 
> Hi,
> 
> We missed the fact that case insensitive search doesn't work with
> field type "string". We have 3B docs indexed and we cannot reindex the data.
> 
> Now, as schema changes require reindexing, is there any other way to
> achieve case insensitive search on string fields?
> 
> Regards,
> Anshuman

Case insensitive search on String field

2020-07-25 Thread Anshuman Singh

Hi,

We missed the fact that case insensitive search doesn't work with
field type "string". We have 3B docs indexed and we cannot reindex the data.

Now, as schema changes require reindexing, is there any other way to
achieve case insensitive search on string fields?

Regards,
Anshuman

Re: string field max size

2019-09-06 Thread Vincenzo D'Amore

Thanks Erick for this last confirmation. I've at the end I've used the
standard "text_ws":


  

  


And the field



On Fri, Sep 6, 2019 at 2:54 AM Erick Erickson 
wrote:

> bq. What I do not understand is what happens to the Analyzers, Tokenizers,
> and
> Filters in the indexing chain
>
> They are irrelevant. The analysis chain is only executed when
> indexed=true.
>
> Best,
> Erick
>
> > On Sep 5, 2019, at 9:03 AM, Vincenzo D'Amore  wrote:
> >
> > What I do not understand is what happens to the Analyzers, Tokenizers,
> and
> > Filters in the indexing chain
>
>

-- 
Vincenzo D'Amore

Re: string field max size

2019-09-05 Thread Erick Erickson

bq. What I do not understand is what happens to the Analyzers, Tokenizers, and
Filters in the indexing chain

They are irrelevant. The analysis chain is only executed when indexed=true. 

Best,
Erick

> On Sep 5, 2019, at 9:03 AM, Vincenzo D'Amore  wrote:
> 
> What I do not understand is what happens to the Analyzers, Tokenizers, and
> Filters in the indexing chain

Re: string field max size

2019-09-05 Thread Jitendra soni

I agree, stored=true and indexed =false should resolve this size issue.

On Thu, 5 Sep 2019 at 21:54, Erick Erickson  wrote:

> Use a text field with stored=true and indexed=false? That'll allow you to
> return it...
>
> On Thu, Sep 5, 2019, 07:04 Vincenzo D'Amore  wrote:
>
> > Hi all,
> >
> > sorry for the silly question, I need to store in Solr a string field
> larger
> > than 32k (index="false").
> >
> > Given that storing field larger than 32k rises an exception:
> > "DocValuesField "filterQuery" is too large, must be <= 32766", I thought
> to
> > use predefined type text_ws.
> >
> > Any suggestions?
> >
> > Thanks in advance and best regards,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> >
>
-- 
Thanks
Jitendra

Re: string field max size

2019-09-05 Thread Vincenzo D'Amore

Thanks Erick for the prompt answer.
What I do not understand is what happens to the Analyzers, Tokenizers, and
Filters in the indexing chain.
Are they executed or not? Well, answering to my own question I think no,
but so what's the difference between string and text when they are not
indexed?
Just the way how are they stored and retrieved?

On Thu, Sep 5, 2019 at 1:54 PM Erick Erickson 
wrote:

> Use a text field with stored=true and indexed=false? That'll allow you to
> return it...
>
> On Thu, Sep 5, 2019, 07:04 Vincenzo D'Amore  wrote:
>
> > Hi all,
> >
> > sorry for the silly question, I need to store in Solr a string field
> larger
> > than 32k (index="false").
> >
> > Given that storing field larger than 32k rises an exception:
> > "DocValuesField "filterQuery" is too large, must be <= 32766", I thought
> to
> > use predefined type text_ws.
> >
> > Any suggestions?
> >
> > Thanks in advance and best regards,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> >
>

-- 
Vincenzo D'Amore

Re: string field max size

2019-09-05 Thread Erick Erickson

Use a text field with stored=true and indexed=false? That'll allow you to
return it...

On Thu, Sep 5, 2019, 07:04 Vincenzo D'Amore  wrote:

> Hi all,
>
> sorry for the silly question, I need to store in Solr a string field larger
> than 32k (index="false").
>
> Given that storing field larger than 32k rises an exception:
> "DocValuesField "filterQuery" is too large, must be <= 32766", I thought to
> use predefined type text_ws.
>
> Any suggestions?
>
> Thanks in advance and best regards,
> Vincenzo
>
> --
> Vincenzo D'Amore
>

string field max size

2019-09-05 Thread Vincenzo D'Amore

Hi all,

sorry for the silly question, I need to store in Solr a string field larger
than 32k (index="false").

Given that storing field larger than 32k rises an exception:
"DocValuesField "filterQuery" is too large, must be <= 32766", I thought to
use predefined type text_ws.

Any suggestions?

Thanks in advance and best regards,
Vincenzo

-- 
Vincenzo D'Amore

Range query on multivalued string field results in useless highlighting

2019-03-22 Thread Wolf, Karl (NIH/NLM/LHC) [C]

Range queries against mutivalued string fields produces useless highlighting, 
even though "hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference 
in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a Field defined in my schema as:




I am using a query containing a Range clause and I am using highlighting to get 
the list of values that match the range query.

All examples below were using the appropriate Solr Admin Server Query page.

The range query using Solr v5.1.0 produces CORRECT and useful results:

{
  "responseHeader": {
"status": 0,
"QTime": 366,
"params": {
  "q": "ResourceCorrespondent:[A TO B}",
  "hl": "true",
  "indent": "true",
  "hl.preserveMulti": "true",
  "fl": "ResourceCorrespondent,ResourceID",
  "hl.requireFieldMatch": "true",
  "hl.usePhraseHighlighter": "true",
  "hl.fl": "ResourceCorrespondent",
  "wt": "json",
  "hl.highlightMultiTerm": "true",
  "_": "1553275722025"
}
  },
  "response": {
"numFound": 999,
"start": 0,
"docs": [
  {
"ResourceCorrespondent": [
  "Stanley, Wendell M.",
  "Avery, Roy"
],
"ResourceID": "CCAAHG"
  },
  {
"ResourceCorrespondent": [
  "Avery, Roy"
],
"ResourceID": "CCGMDS"
  },
... lots more docs, then
]
  },
... we get to the highlighting portion of the response
... this tells me which values of each ResourceCorrespondent field
... actually matching the query

  "highlighting": {
"CCAAHG": {
  "ResourceCorrespondent": [
"Avery, Roy"
  ]
},
"CCGMDS": {
  "ResourceCorrespondent": [
"Avery, Roy"
  ]
},
"BBACKV": {
  "ResourceCorrespondent": [
"American Institute of Biological Sciences",
"Albritton, Errett C."
  ]
},
... lots more useful highlight values. Note two matching values
... for document BBACKV.
}

***
***
However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top 
portion of the
response is basically the same including the number of documents found

{
  "responseHeader":{
"status":0,
"QTime":245,
"params":{
  "q":"ResourceCorrespondent:[A TO B}",
  "hl":"on",
  "hl.preserveMulti":"true",
  "fl":"ResourceID, ResourceCorrespondent",
  "hl.requireFieldMatch":"true",
  "hl.fl":"ResourceCorrespondent",
  "hightlightMultiTerm":"true",
  "wt":"json",
  "_":"1553105129887",
  "usePhraseHighLighter":"true"}},
  "response":{"numFound":999,"start":0,"docs":[

The documents are in a different order, but that doesn't matter.

The problem is with the lighlighting which is effectively empty. I don't know 
what
values in each document actually matched the query:

  "highlighting":{
"QQBBLX":{},
"QQBCLN":{},
"QQBCLM":{},
... etc.

*** NOTE: The data is the same for all Solr versions and the Solr indexes were 
rebuilt
for each Solr version.

***
Changing to using "=unified", the highlighting looks like:

  "highlighting":{
"QQBBLX":{
  "ResourceCorrespondent":[]},
"QQBCLN":{
  "ResourceCorrespondent":[]},
"QQBCLM":{
  "ResourceCorrespondent":[]},

*** Closer but still no useful values

***
NOTE: if I change only the query to be a wildcard query to 
q="ResourceCorrespondent:A*"

the highlighting is correct in both Solr v7.5.0 and v7.7.1:

  "highlighting":{
"QQBBLX":{
  "ResourceCorrespondent":["American Public Health Association"]},
"QQBCLN":{
  "ResourceCorrespondent":["Abram, Morris B."]},
"QQBCLM":{
  "ResourceCorrespondent":["Abram, Morris B."]},
... etc.

*** This makes me think there is some problem with a Range query feeding the
Highlighter code.

***
All variations of hl specs or other query parameters do not fix the problem.
The wildcard query is my current work around but there still is a problem with
range queries:

So there is some incompatibility among:

1) A multivalued string field AND
2) A range query against that field AND
3) Highlighting

The highlight portion of the response is effectively "empty"

I don't know when this issue was first introduced. I have recently been 
updating from 5.1.0
to 7.5.0 in one big leap. I have attempted to read through the change logs for 
the intervening
versions but I gave up to save my sanity.

--Karl

Re: Solr filter query on STRING field [Was:Re: solr filter query on text field]

2018-10-24 Thread Alexandre Rafalovitch

First one treats space as end of operation, so the second keyword is
searched against default field (id). Try putting the whole thing into the
quotes. Or use Field Query Parser:
https://lucene.apache.org/solr/guide/7_5/other-parsers.html#field-query-parser

Regards,
   Alex.

On Wed, Oct 24, 2018, 4:59 AM Marek Tichy,  wrote:

> Hi,
>
> I'm having troubles with the filter query on a multiple string field,
> specifically with a space between words. Looking at the histogram and
> values using Solr UI it correctly shows that the indexing stores the
> string "Key case" as it should. However the following filter queries:
>
> fq=sm_field_tags:Key case  //doesn't work
> fq=sm_field_tags:Key+case  //doesn't work
> fq=sm_field_tags:Key* //does work
> fq=sm_field_tags:Key?case //does work
>
>
> Debug shows (for the first case):
> "filter_queries":["sm_field_tags:Key case"],
> "parsed_filter_queries":["sm_field_tags:Key id:case"]
>
> Why does it parse to id: case ? Solr version is 7.4.0
>
> Many thanks
> Marek
>
>
>
>
>
>
>
>
>
> > bq.  is there any difference if the fq field is a string field vs test
> >
> > Absolutely. string fields are not analyzed in any way. They're not
> > tokenized. There are case sensitive. Etc. For example takd
> > My dog
> > as input. A string field will have a _single_ token "My dog.". It will
> > not match a search on "my". It will not match a search on "dog". It
> > won't even match "my dog." as a phrase since the case is different. It
> > won't even match "My dog" because there's no period at the end. It
> > will only match "My dog.".
>
>

Solr filter query on STRING field [Was:Re: solr filter query on text field]

2018-10-24 Thread Marek Tichy

Hi,

I'm having troubles with the filter query on a multiple string field,
specifically with a space between words. Looking at the histogram and
values using Solr UI it correctly shows that the indexing stores the
string "Key case" as it should. However the following filter queries:

fq=sm_field_tags:Key case      //doesn't work
fq=sm_field_tags:Key+case      //doesn't work
fq=sm_field_tags:Key*             //does work
fq=sm_field_tags:Key?case //does work


Debug shows (for the first case):
"filter_queries":["sm_field_tags:Key case"],
"parsed_filter_queries":["sm_field_tags:Key id:case"]

Why does it parse to id: case ? Solr version is 7.4.0

Many thanks
Marek









> bq.  is there any difference if the fq field is a string field vs test
>
> Absolutely. string fields are not analyzed in any way. They're not
> tokenized. There are case sensitive. Etc. For example takd
> My dog
> as input. A string field will have a _single_ token "My dog.". It will
> not match a search on "my". It will not match a search on "dog". It
> won't even match "my dog." as a phrase since the case is different. It
> won't even match "My dog" because there's no period at the end. It
> will only match "My dog.".

Re: Json object values in solr string field

2018-09-27 Thread Balanathagiri Ayyasamypalanivel

Thanks Alex/Shawn,

Yeah currently we handling by writing some custom code from the response
and calculating the assets, but we lossing the power of default stats and
facet features when going with this approach.

Also actually it's not duplicate data, but as per our current design the
data resides like 2 docs for one account that we are planning to compress
at the same time need to use stats and facet. I know it's quite complicated
if we need to achieve both at the same time, i thinking about it how to
solve.

On Thu, Sep 27, 2018, 11:19 AM Alexandre Rafalovitch 
wrote:

> If the duplicate data is only indexed, it is not actually duplicated. It is
> only an index entry and the record ids where it shows.
>
> Regards,
> Alex
>
> On Thu, Sep 27, 2018, 10:55 AM Balanathagiri Ayyasamypalanivel, <
> bala.cit...@gmail.com> wrote:
>
> > Hi Alex, thanks, we have that set up already in place, we are thinking to
> > optimize more to resign the data to avoid these duplication.
> >
> > Regards,
> > Bala.
> >
> > On Thu, Sep 27, 2018, 10:31 AM Alexandre Rafalovitch  >
> > wrote:
> >
> > > Well, my feeling is that you are going in the wrong direction. And that
> > > maybe you need to focus more on separating your - non solr - storage
> > > representation and your - solr - search oriented representation.
> > >
> > > E.g. if your issue is storage, maybe you can focus on stored=false
> > > indexed=true approach.
> > >
> > > Regards,
> > > Alex
> > >
> > > On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, <
> > > bala.cit...@gmail.com> wrote:
> > >
> > > > Any suggestions?
> > > > Regards,
> > > > Bala.
> > > >
> > > > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel <
> > > > bala.cit...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for the reply, actually we are planning to optimize the huge
> > > > volume
> > > > > of data.
> > > > >
> > > > > For example, in our current system we have as below, so we can do
> > facet
> > > > > pivot or stats to get the sum of asset_td for each acct, but the
> data
> > > > > growing lot whenever more asset getting added.
> > > > >
> > > > > Id | Accts| assetid | asset_td
> > > > > 1| Acct1 | asset1 | 20
> > > > > 2| Acct1 | asset2 | 30
> > > > > 3| Acct2 | asset3 | 10
> > > > > 4| Acct3 | asset2 | 10
> > > > >
> > > > > So we planned to change as
> > > > >
> > > > > Id | Accts | asset_s
> > > > > 1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
> > > > > 2  | Acct2 | [{"asset3": "10"}]
> > > > > 3  | Acct3 | [{"asset2": "10"}]
> > > > >
> > > > > But only draw back here is we have to parse the json to do the sum
> of
> > > the
> > > > > values, is there any other way to handle this scenario.
> > > > >
> > > > > Regards,
> > > > > Bala.
> > > > >
> > > > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey 
> > > wrote:
> > > > >
> > > > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
> > > > >> > Currently I am storing json object type of values in string
> field
> > in
> > > > >> solr.
> > > > >> > Using this field, in the code I am parsing json objects and
> doing
> > > sum
> > > > of
> > > > >> > the values under it.
> > > > >> >
> > > > >> > In solr, do we have any option in doing it by default when using
> > the
> > > > >> json
> > > > >> > object field values.
> > > > >>
> > > > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know
> > > > >> this.  It has no idea that the data is JSON, and won't be able to
> do
> > > > >> anything special with the info contained there.
> > > > >>
> > > > >> Thanks,
> > > > >> Shawn
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: Json object values in solr string field

2018-09-27 Thread Alexandre Rafalovitch

If the duplicate data is only indexed, it is not actually duplicated. It is
only an index entry and the record ids where it shows.

Regards,
Alex

On Thu, Sep 27, 2018, 10:55 AM Balanathagiri Ayyasamypalanivel, <
bala.cit...@gmail.com> wrote:

> Hi Alex, thanks, we have that set up already in place, we are thinking to
> optimize more to resign the data to avoid these duplication.
>
> Regards,
> Bala.
>
> On Thu, Sep 27, 2018, 10:31 AM Alexandre Rafalovitch 
> wrote:
>
> > Well, my feeling is that you are going in the wrong direction. And that
> > maybe you need to focus more on separating your - non solr - storage
> > representation and your - solr - search oriented representation.
> >
> > E.g. if your issue is storage, maybe you can focus on stored=false
> > indexed=true approach.
> >
> > Regards,
> > Alex
> >
> > On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, <
> > bala.cit...@gmail.com> wrote:
> >
> > > Any suggestions?
> > > Regards,
> > > Bala.
> > >
> > > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel <
> > > bala.cit...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for the reply, actually we are planning to optimize the huge
> > > volume
> > > > of data.
> > > >
> > > > For example, in our current system we have as below, so we can do
> facet
> > > > pivot or stats to get the sum of asset_td for each acct, but the data
> > > > growing lot whenever more asset getting added.
> > > >
> > > > Id | Accts| assetid | asset_td
> > > > 1| Acct1 | asset1 | 20
> > > > 2| Acct1 | asset2 | 30
> > > > 3| Acct2 | asset3 | 10
> > > > 4| Acct3 | asset2 | 10
> > > >
> > > > So we planned to change as
> > > >
> > > > Id | Accts | asset_s
> > > > 1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
> > > > 2  | Acct2 | [{"asset3": "10"}]
> > > > 3  | Acct3 | [{"asset2": "10"}]
> > > >
> > > > But only draw back here is we have to parse the json to do the sum of
> > the
> > > > values, is there any other way to handle this scenario.
> > > >
> > > > Regards,
> > > > Bala.
> > > >
> > > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey 
> > wrote:
> > > >
> > > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
> > > >> > Currently I am storing json object type of values in string field
> in
> > > >> solr.
> > > >> > Using this field, in the code I am parsing json objects and doing
> > sum
> > > of
> > > >> > the values under it.
> > > >> >
> > > >> > In solr, do we have any option in doing it by default when using
> the
> > > >> json
> > > >> > object field values.
> > > >>
> > > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know
> > > >> this.  It has no idea that the data is JSON, and won't be able to do
> > > >> anything special with the info contained there.
> > > >>
> > > >> Thanks,
> > > >> Shawn
> > > >>
> > > >>
> > >
> >
>

Re: Json object values in solr string field

2018-09-27 Thread Shawn Heisey


On 9/27/2018 8:53 AM, Balanathagiri Ayyasamypalanivel wrote:

Thanks Shawn for your prompt response.
Actually we have to filter on the query time while calculate the score.

The challenge here is we should not add the asset and put as static field
in the index time. The asset needs to be calculated while query time with
some filters.


Solr doesn't have that ability as far as I am aware.  No matter how you 
slice this, you'll be writing custom code to handle it.


In response to another part of the thread: search engines typically 
involve a lot of data duplication.  It's usually faster to simply 
duplicate data in multiple documents than to try and normalize the data 
like a relational database does.


Thanks,
Shawn

Re: Json object values in solr string field

2018-09-27 Thread Balanathagiri Ayyasamypalanivel

Hi Alex, thanks, we have that set up already in place, we are thinking to
optimize more to resign the data to avoid these duplication.

Regards,
Bala.

On Thu, Sep 27, 2018, 10:31 AM Alexandre Rafalovitch 
wrote:

> Well, my feeling is that you are going in the wrong direction. And that
> maybe you need to focus more on separating your - non solr - storage
> representation and your - solr - search oriented representation.
>
> E.g. if your issue is storage, maybe you can focus on stored=false
> indexed=true approach.
>
> Regards,
> Alex
>
> On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, <
> bala.cit...@gmail.com> wrote:
>
> > Any suggestions?
> > Regards,
> > Bala.
> >
> > On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel <
> > bala.cit...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Thanks for the reply, actually we are planning to optimize the huge
> > volume
> > > of data.
> > >
> > > For example, in our current system we have as below, so we can do facet
> > > pivot or stats to get the sum of asset_td for each acct, but the data
> > > growing lot whenever more asset getting added.
> > >
> > > Id | Accts| assetid | asset_td
> > > 1| Acct1 | asset1 | 20
> > > 2| Acct1 | asset2 | 30
> > > 3| Acct2 | asset3 | 10
> > > 4| Acct3 | asset2 | 10
> > >
> > > So we planned to change as
> > >
> > > Id | Accts | asset_s
> > > 1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
> > > 2  | Acct2 | [{"asset3": "10"}]
> > > 3  | Acct3 | [{"asset2": "10"}]
> > >
> > > But only draw back here is we have to parse the json to do the sum of
> the
> > > values, is there any other way to handle this scenario.
> > >
> > > Regards,
> > > Bala.
> > >
> > > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey 
> wrote:
> > >
> > >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
> > >> > Currently I am storing json object type of values in string field in
> > >> solr.
> > >> > Using this field, in the code I am parsing json objects and doing
> sum
> > of
> > >> > the values under it.
> > >> >
> > >> > In solr, do we have any option in doing it by default when using the
> > >> json
> > >> > object field values.
> > >>
> > >> Even if you have JSON-formatted strings in Solr, Solr doesn't know
> > >> this.  It has no idea that the data is JSON, and won't be able to do
> > >> anything special with the info contained there.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> >
>

Re: Json object values in solr string field

2018-09-27 Thread Balanathagiri Ayyasamypalanivel

Thanks Shawn for your prompt response.
Actually we have to filter on the query time while calculate the score.

The challenge here is we should not add the asset and put as static field
in the index time. The asset needs to be calculated while query time with
some filters.

Regards,
Bala.

On Thu, Sep 27, 2018, 10:35 AM Shawn Heisey  wrote:

> On 9/26/2018 12:46 PM, Balanathagiri Ayyasamypalanivel wrote:
> > But only draw back here is we have to parse the json to do the sum of the
> > values, is there any other way to handle this scenario.
>
> Solr cannot do that for you.  You could put this in your indexing
> software -- add up the numbers and put the result into a new field in
> your Solr document, so that the information is already in the index when
> you do your query.  This could be done with a custom Update Processor (a
> Solr plugin that you would need to write), but if you already have
> custom indexing software, it's probably easier to simply change that
> software than to try and write a plugin.
>
> Thanks,
> Shawn
>
>

Re: Json object values in solr string field

2018-09-27 Thread Shawn Heisey


On 9/26/2018 12:46 PM, Balanathagiri Ayyasamypalanivel wrote:

But only draw back here is we have to parse the json to do the sum of the
values, is there any other way to handle this scenario.


Solr cannot do that for you.  You could put this in your indexing 
software -- add up the numbers and put the result into a new field in 
your Solr document, so that the information is already in the index when 
you do your query.  This could be done with a custom Update Processor (a 
Solr plugin that you would need to write), but if you already have 
custom indexing software, it's probably easier to simply change that 
software than to try and write a plugin.


Thanks,
Shawn

Re: Json object values in solr string field

2018-09-27 Thread Alexandre Rafalovitch

Well, my feeling is that you are going in the wrong direction. And that
maybe you need to focus more on separating your - non solr - storage
representation and your - solr - search oriented representation.

E.g. if your issue is storage, maybe you can focus on stored=false
indexed=true approach.

Regards,
Alex

On Thu, Sep 27, 2018, 10:13 AM Balanathagiri Ayyasamypalanivel, <
bala.cit...@gmail.com> wrote:

> Any suggestions?
> Regards,
> Bala.
>
> On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel <
> bala.cit...@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks for the reply, actually we are planning to optimize the huge
> volume
> > of data.
> >
> > For example, in our current system we have as below, so we can do facet
> > pivot or stats to get the sum of asset_td for each acct, but the data
> > growing lot whenever more asset getting added.
> >
> > Id | Accts| assetid | asset_td
> > 1| Acct1 | asset1 | 20
> > 2| Acct1 | asset2 | 30
> > 3| Acct2 | asset3 | 10
> > 4| Acct3 | asset2 | 10
> >
> > So we planned to change as
> >
> > Id | Accts | asset_s
> > 1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
> > 2  | Acct2 | [{"asset3": "10"}]
> > 3  | Acct3 | [{"asset2": "10"}]
> >
> > But only draw back here is we have to parse the json to do the sum of the
> > values, is there any other way to handle this scenario.
> >
> > Regards,
> > Bala.
> >
> > On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey  wrote:
> >
> >> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
> >> > Currently I am storing json object type of values in string field in
> >> solr.
> >> > Using this field, in the code I am parsing json objects and doing sum
> of
> >> > the values under it.
> >> >
> >> > In solr, do we have any option in doing it by default when using the
> >> json
> >> > object field values.
> >>
> >> Even if you have JSON-formatted strings in Solr, Solr doesn't know
> >> this.  It has no idea that the data is JSON, and won't be able to do
> >> anything special with the info contained there.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Re: Json object values in solr string field

2018-09-27 Thread Balanathagiri Ayyasamypalanivel

Any suggestions?
Regards,
Bala.

On Wed, Sep 26, 2018, 2:46 PM Balanathagiri Ayyasamypalanivel <
bala.cit...@gmail.com> wrote:

> Hi,
>
> Thanks for the reply, actually we are planning to optimize the huge volume
> of data.
>
> For example, in our current system we have as below, so we can do facet
> pivot or stats to get the sum of asset_td for each acct, but the data
> growing lot whenever more asset getting added.
>
> Id | Accts| assetid | asset_td
> 1| Acct1 | asset1 | 20
> 2| Acct1 | asset2 | 30
> 3| Acct2 | asset3 | 10
> 4| Acct3 | asset2 | 10
>
> So we planned to change as
>
> Id | Accts | asset_s
> 1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
> 2  | Acct2 | [{"asset3": "10"}]
> 3  | Acct3 | [{"asset2": "10"}]
>
> But only draw back here is we have to parse the json to do the sum of the
> values, is there any other way to handle this scenario.
>
> Regards,
> Bala.
>
> On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey  wrote:
>
>> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
>> > Currently I am storing json object type of values in string field in
>> solr.
>> > Using this field, in the code I am parsing json objects and doing sum of
>> > the values under it.
>> >
>> > In solr, do we have any option in doing it by default when using the
>> json
>> > object field values.
>>
>> Even if you have JSON-formatted strings in Solr, Solr doesn't know
>> this.  It has no idea that the data is JSON, and won't be able to do
>> anything special with the info contained there.
>>
>> Thanks,
>> Shawn
>>
>>

Re: Json object values in solr string field

2018-09-26 Thread Balanathagiri Ayyasamypalanivel

Hi,

Thanks for the reply, actually we are planning to optimize the huge volume
of data.

For example, in our current system we have as below, so we can do facet
pivot or stats to get the sum of asset_td for each acct, but the data
growing lot whenever more asset getting added.

Id | Accts| assetid | asset_td
1| Acct1 | asset1 | 20
2| Acct1 | asset2 | 30
3| Acct2 | asset3 | 10
4| Acct3 | asset2 | 10

So we planned to change as

Id | Accts | asset_s
1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
2  | Acct2 | [{"asset3": "10"}]
3  | Acct3 | [{"asset2": "10"}]

But only draw back here is we have to parse the json to do the sum of the
values, is there any other way to handle this scenario.

Regards,
Bala.

On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey  wrote:

> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
> > Currently I am storing json object type of values in string field in
> solr.
> > Using this field, in the code I am parsing json objects and doing sum of
> > the values under it.
> >
> > In solr, do we have any option in doing it by default when using the json
> > object field values.
>
> Even if you have JSON-formatted strings in Solr, Solr doesn't know
> this.  It has no idea that the data is JSON, and won't be able to do
> anything special with the info contained there.
>
> Thanks,
> Shawn
>
>

Re: Json object values in solr string field

2018-09-26 Thread Shawn Heisey


On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:

Currently I am storing json object type of values in string field in solr.
Using this field, in the code I am parsing json objects and doing sum of
the values under it.

In solr, do we have any option in doing it by default when using the json
object field values.


Even if you have JSON-formatted strings in Solr, Solr doesn't know 
this.  It has no idea that the data is JSON, and won't be able to do 
anything special with the info contained there.


Thanks,
Shawn

Json object values in solr string field

2018-09-26 Thread Balanathagiri Ayyasamypalanivel

Hi,
Currently I am storing json object type of values in string field in solr.
Using this field, in the code I am parsing json objects and doing sum of
the values under it.

In solr, do we have any option in doing it by default when using the json
object field values.

Regards,
Bala.

Re: Highlighting is not working with docValues only String field

2018-08-13 Thread Karthik Ramachandran

I have opened JIRA https://issues.apache.org/jira/browse/SOLR-12663


On Sat, Aug 11, 2018 at 8:59 PM Erick Erickson 
wrote:

> I can see why it wouldn't and also why it could/should. I also wonder about
> SortableTextField, perhaps mention that too.
>
> Seems worth a JIRA to me if there isn't one already
>
> On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran <
> kramachand...@commvault.com> wrote:
>
> > We are using Solr 7.2.1, highlighting is not working with docValues only
> > String field.
> >
> > Should I open a JIRA for this?
> >
> > Schema:
> > 
> >   id
> >   
> >> required="true"/>
> >> stored="true"/>
> >> stored="false"/>
> >   
> > 
> >
> > Data:
> > [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line
> > 2"},{"id":3,"name":"Testing line 3"}]
> >
> > Query:
> >
> >
> http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1
> >
> > Response:
> > {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing
> line
> > 1","name1":"Testing line 1"},{"id":"2","name":"Testing line
> > 2","name1":"Testing line 2"},{"id":"3","name":"Testing line
> > 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing
> > line 1"]},"2":{"name":["Testing line
> > 2"]},"3":{"name":["Testing line 3"]}}}
> >
> >
> > With Thanks & Regards
> > Karthik Ramachandran
> > P Please don't print this e-mail unless you really need to
> >
> > ***Legal Disclaimer***
> > "This communication may contain confidential and privileged material for
> > the
> > sole use of the intended recipient. Any unauthorized review, use or
> > distribution
> > by others is strictly prohibited. If you have received the message by
> > mistake,
> > please advise the sender by reply email and delete the message. Thank
> you."
> > **
> >
>


-- 
With Thanks & Regards
Karthik Ramachandran

P Please don't print this e-mail unless you really need to

Re: Highlighting is not working with docValues only String field

2018-08-11 Thread Erick Erickson

I can see why it wouldn't and also why it could/should. I also wonder about
SortableTextField, perhaps mention that too.

Seems worth a JIRA to me if there isn't one already

On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran <
kramachand...@commvault.com> wrote:

> We are using Solr 7.2.1, highlighting is not working with docValues only
> String field.
>
> Should I open a JIRA for this?
>
> Schema:
> 
>   id
>   
>required="true"/>
>stored="true"/>
>stored="false"/>
>   
> 
>
> Data:
> [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line
> 2"},{"id":3,"name":"Testing line 3"}]
>
> Query:
>
> http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1
>
> Response:
> {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line
> 1","name1":"Testing line 1"},{"id":"2","name":"Testing line
> 2","name1":"Testing line 2"},{"id":"3","name":"Testing line
> 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing
> line 1"]},"2":{"name":["Testing line
> 2"]},"3":{"name":["Testing line 3"]}}}
>
>
> With Thanks & Regards
> Karthik Ramachandran
> P Please don't print this e-mail unless you really need to
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>

Highlighting is not working with docValues only String field

2018-08-10 Thread Karthik Ramachandran

We are using Solr 7.2.1, highlighting is not working with docValues only String 
field.

Should I open a JIRA for this?

Schema:

  id
  
  
  
  
  


Data:
[{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line 
2"},{"id":3,"name":"Testing line 3"}]

Query:
http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1

Response:
{"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line 
1","name1":"Testing line 1"},{"id":"2","name":"Testing line 2","name1":"Testing 
line 2"},{"id":"3","name":"Testing line 3","name1":"Testing line 
3"}]},"highlighting":{"1":{"name":["Testing line 
1"]},"2":{"name":["Testing line 2"]},"3":{"name":["Testing 
line 3"]}}}


With Thanks & Regards
Karthik Ramachandran
P Please don't print this e-mail unless you really need to

***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

Re: truncate string field type

2018-07-10 Thread Zahra Aminolroaya

suppose I want to search the "l(i|a)*on k(i|e)*ng". there is a space between
two words. I want solr to retrieve the exact match that these two words or
their other cases are adjacent. If I want to use text field type, each one
of these words are considered as tokens, so solr may bring back other
results too; However, we have strict costumers who only need exact matches
if any result is available not more!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: truncate string field type

2018-07-10 Thread Alexandre Rafalovitch

Are you sure Solr is the right tool for you? Regexp searches is the really
last resort approach in the domain.

I suggest that maybe you rethink your actual business case (share it here)
to benefiy from tokenization or look if other tools are better.

As it is, you are using a drill to hammer nails.

Regards,
Alex

On Tue, Jul 10, 2018, 2:44 AM Zahra Aminolroaya, 
wrote:

> Thanks Alexandre and Erick. Erick I want to use my regular expression to
> search a field and Solr text field token the document, so the regular
> expression result will not be valid. I want Solr not to token my doc,
> although I will lose some terms using solr string.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: truncate string field type

2018-07-10 Thread Zahra Aminolroaya

Thanks Alexandre and Erick. Erick I want to use my regular expression to
search a field and Solr text field token the document, so the regular
expression result will not be valid. I want Solr not to token my doc,
although I will lose some terms using solr string.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: truncate string field type

2018-07-08 Thread Erick Erickson

Why do you want to add such long strings to your index in the first
place? There are almost useless for search, you want tokenized
(text_general is a good place to start) if you want to search for
words within the string.

"The number of bytes limit" is 32K or so, right? What do you want to
do with the data going in there?

There may be good reasons, but I've seen confusion around strings in the past.

Best,
Erick

On Sat, Jul 7, 2018 at 11:12 PM, Alexandre Rafalovitch
 wrote:
> Did you look into UpdateRequestProcessors?
>
> There is a truncate one there.
>
> Regards,
> Alex
>
> On Sun, Jul 8, 2018, 12:44 AM Zahra Aminolroaya, 
> wrote:
>
>> I want to truncate my string field type due to its number of bytes limit. I
>> wrote the following in my schema:
>>
>>
>> 
>>   
>>   
>>   > prefixLength="32700"/>
>>
>>
>>   
>>   > prefixLength="32700"/>
>>
>> 
>>
>> However, I found that StrField (string) does not support specifying an
>> analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be
>> more than 1000.
>>
>> I want to have the same application of string. Do you think it is
>> reasonable
>> to use  "text_general" field type with solr.KeywordTokenizerFactory filter
>> to have the same application? Do I lose any feature?
>>
>> If I use text_general, it is not needed to truncate.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>

Re: truncate string field type

2018-07-08 Thread Alexandre Rafalovitch

Did you look into UpdateRequestProcessors?

There is a truncate one there.

Regards,
Alex

On Sun, Jul 8, 2018, 12:44 AM Zahra Aminolroaya, 
wrote:

> I want to truncate my string field type due to its number of bytes limit. I
> wrote the following in my schema:
>
>
> 
>   
>   
>prefixLength="32700"/>
>
>
>   
>prefixLength="32700"/>
>
> 
>
> However, I found that StrField (string) does not support specifying an
> analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be
> more than 1000.
>
> I want to have the same application of string. Do you think it is
> reasonable
> to use  "text_general" field type with solr.KeywordTokenizerFactory filter
> to have the same application? Do I lose any feature?
>
> If I use text_general, it is not needed to truncate.
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

truncate string field type

2018-07-07 Thread Zahra Aminolroaya

I want to truncate my string field type due to its number of bytes limit. I
wrote the following in my schema:



  
  
  
   
   
  
  
   


However, I found that StrField (string) does not support specifying an
analyzer. Besides, prefixLength in TruncateTokenFilterFactory could not be
more than 1000.

I want to have the same application of string. Do you think it is reasonable
to use  "text_general" field type with solr.KeywordTokenizerFactory filter
to have the same application? Do I lose any feature?

If I use text_general, it is not needed to truncate.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley

I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this.
I should be able to look into this shortly if no one else does.

-Yonik


On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> Thanks for the complete info that allowed me to easily reproduce this!
> The bug seems to extend beyond hll/unique... I tried min(string_s) and
> got wonky results as well.
>
> -Yonik
>
>
> On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> 
> wrote:
>> Hello,
>>
>> I've encountered 2 issues while trying to apply unique()/hll() function to a
>> string field inside a range facet:
>>
>> Results are incorrect for a single-valued string field.
>> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field.
>>
>>
>> How to reproduce:
>>
>> Create a core based on the default configSet.
>> Add several simple documents to the core, like these:
>>
>> [
>>   {
>> "id": "14790",
>> "int_i": 2010,
>> "date_dt": "2010-01-01T00:00:00Z",
>> "string_s": "a",
>> "string_ss": ["a", "b"]
>>   },
>>   {
>> "id": "12254",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "e",
>> "string_ss": ["b", "c"]
>>   },
>>   {
>> "id": "12937",
>> "int_i": 2008,
>> "date_dt": "2008-01-01T00:00:00Z",
>> "string_s": "c",
>> "string_ss": ["c", "d"]
>>   },
>>   {
>> "id": "10575",
>> "int_i": 2008,
>> "date_dt": "2008-01-01T00:00:00Z",
>> "string_s": "b",
>> "string_ss": ["d", "e"]
>>   },
>>   {
>> "id": "13644",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "e",
>> "string_ss": ["e", "a"]
>>   },
>>   {
>> "id": "8405",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "d",
>> "string_ss": ["a", "b"]
>>   },
>>   {
>> "id": "6128",
>> "int_i": 2008,
>> "date_dt": "2008-01-01T00:00:00Z",
>> "string_s": "a",
>> "string_ss": ["b", "c"]
>>   },
>>   {
>> "id": "5220",
>> "int_i": 2015,
>> "date_dt": "2015-01-01T00:00:00Z",
>> "string_s": "d",
>> "string_ss": ["c", "d"]
>>   },
>>   {
>> "id": "6850",
>> "int_i": 2012,
>> "date_dt": "2012-01-01T00:00:00Z",
>> "string_s": "b",
>> "string_ss": ["d", "e"]
>>   },
>>   {
>> "id": "5748",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "e",
>> "string_ss": ["e", "a"]
>>   }
>> ]
>>
>> 3. Try queries like the following for a single-valued string field:
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"
>>
>> Distinct counts returned are incorrect in general. For example, for the set
>> of documents above, the response will contain:
>>
>> {
>> "val": 2010,
>> "count": 1,
>> "distinct_count": 0
>> }
>>
>> and
>>
>> "between": {
>> "count": 10,
>> "distinct_count": 1
>> }
>>
>> (there should be 5 distinct values).
>>
>> Note, the result depends on the order in which the documents are added.
>>
>> 4. Try queries like the following for a multi-valued string field:
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"
>>
>> I’m getting ArrayIndexOutOfBoundsException for such queries.
>>
>> Note, everything looks Ok for other field types (I tried single- and
>> multi-valued ints, doubles and dates) or when the enclosing facet is a terms
>> facet or there is no enclosing facet at all.
>>
>> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
>> 5.x, as it seems, do not have such issues.
>>
>> Is it a bug? Or, may be, I’ve missed something?
>>
>> Thanks,
>>
>> Volodymyr
>>

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley

Thanks for the complete info that allowed me to easily reproduce this!
The bug seems to extend beyond hll/unique... I tried min(string_s) and
got wonky results as well.

-Yonik


On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev <vmrudn...@gmail.com> wrote:
> Hello,
>
> I've encountered 2 issues while trying to apply unique()/hll() function to a
> string field inside a range facet:
>
> Results are incorrect for a single-valued string field.
> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field.
>
>
> How to reproduce:
>
> Create a core based on the default configSet.
> Add several simple documents to the core, like these:
>
> [
>   {
> "id": "14790",
> "int_i": 2010,
> "date_dt": "2010-01-01T00:00:00Z",
> "string_s": "a",
> "string_ss": ["a", "b"]
>   },
>   {
> "id": "12254",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "e",
> "string_ss": ["b", "c"]
>   },
>   {
> "id": "12937",
> "int_i": 2008,
> "date_dt": "2008-01-01T00:00:00Z",
> "string_s": "c",
> "string_ss": ["c", "d"]
>   },
>   {
> "id": "10575",
> "int_i": 2008,
> "date_dt": "2008-01-01T00:00:00Z",
> "string_s": "b",
> "string_ss": ["d", "e"]
>   },
>   {
> "id": "13644",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "e",
> "string_ss": ["e", "a"]
>   },
>   {
> "id": "8405",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
>     "string_s": "d",
> "string_ss": ["a", "b"]
>   },
>   {
> "id": "6128",
> "int_i": 2008,
> "date_dt": "2008-01-01T00:00:00Z",
> "string_s": "a",
> "string_ss": ["b", "c"]
>   },
>   {
> "id": "5220",
> "int_i": 2015,
> "date_dt": "2015-01-01T00:00:00Z",
> "string_s": "d",
> "string_ss": ["c", "d"]
>   },
>   {
> "id": "6850",
> "int_i": 2012,
> "date_dt": "2012-01-01T00:00:00Z",
> "string_s": "b",
> "string_ss": ["d", "e"]
>   },
>   {
> "id": "5748",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "e",
> "string_ss": ["e", "a"]
>   }
> ]
>
> 3. Try queries like the following for a single-valued string field:
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"
>
> Distinct counts returned are incorrect in general. For example, for the set
> of documents above, the response will contain:
>
> {
> "val": 2010,
> "count": 1,
> "distinct_count": 0
> }
>
> and
>
> "between": {
> "count": 10,
> "distinct_count": 1
> }
>
> (there should be 5 distinct values).
>
> Note, the result depends on the order in which the documents are added.
>
> 4. Try queries like the following for a multi-valued string field:
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"
>
> I’m getting ArrayIndexOutOfBoundsException for such queries.
>
> Note, everything looks Ok for other field types (I tried single- and
> multi-valued ints, doubles and dates) or when the enclosing facet is a terms
> facet or there is no enclosing facet at all.
>
> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
> 5.x, as it seems, do not have such issues.
>
> Is it a bug? Or, may be, I’ve missed something?
>
> Thanks,
>
> Volodymyr
>

Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Volodymyr Rudniev

Hello,

I've encountered 2 issues while trying to apply unique()/hll() function to
a string field inside a range facet:

   1. Results are incorrect for a single-valued string field.
   2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string
   field.


How to reproduce:

   1. Create a core based on the default configSet.
   2. Add several simple documents to the core, like these:

[
  {
"id": "14790",
"int_i": 2010,
"date_dt": "2010-01-01T00:00:00Z",
"string_s": "a",
"string_ss": ["a", "b"]
  },
  {
"id": "12254",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["b", "c"]
  },
  {
"id": "12937",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "c",
"string_ss": ["c", "d"]
  },
  {
"id": "10575",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "b",
"string_ss": ["d", "e"]
  },
  {
"id": "13644",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["e", "a"]
  },
  {
"id": "8405",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "d",
"string_ss": ["a", "b"]
  },
  {
"id": "6128",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "a",
"string_ss": ["b", "c"]
  },
  {
"id": "5220",
"int_i": 2015,
"date_dt": "2015-01-01T00:00:00Z",
"string_s": "d",
"string_ss": ["c", "d"]
  },
  {
"id": "6850",
"int_i": 2012,
"date_dt": "2012-01-01T00:00:00Z",
"string_s": "b",
"string_ss": ["d", "e"]
  },
  {
"id": "5748",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["e", "a"]
  }
]

3. Try queries like the following for a single-valued string field:

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"

Distinct counts returned are incorrect in general. For example, for the set
of documents above, the response will contain:

{
"val": 2010,
"count": 1,
"distinct_count": 0
}

and

"between": {
"count": 10,
"distinct_count": 1
}

(there should be 5 distinct values).

Note, the result depends on the order in which the documents are added.

4. Try queries like the following for a multi-valued string field:

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"

I’m getting ArrayIndexOutOfBoundsException for such queries.

Note, everything looks Ok for other field types (I tried single- and
multi-valued ints, doubles and dates) or when the enclosing facet is a
terms facet or there is no enclosing facet at all.

I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
5.x, as it seems, do not have such issues.

Is it a bug? Or, may be, I’ve missed something?

Thanks,

Volodymyr
q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"

docs_1-10.json
Description: application/json
q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"

Re: Making a String field case-insensitive

2017-11-01 Thread Zheng Lin Edwin Yeo

Hi Emir,

Thanks for your advice. This works.

Regards,
Edwin


On 1 November 2017 at 18:08, Emir Arnautović 
wrote:

> Hi,
> You can use KeywordTokenizer and LowerCaseTokenFilterFactory.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 09:50, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > Would like to find out, what is the best way to lower-case a String index
> > in Solr, to make it case insensitive, while preserving the structure of
> the
> > string (ie It should not break into different tokens at space, and should
> > not remove any characters or symbols)
> >
> > I found that solr.StrField does not use lower case filter. But if I
> change
> > it to solr.TextField and uses Standard Tokenizer, the fields get broken
> up.
> >
> > Eg:
> >
> > For this configuration,
> >
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="false">
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >   
> >
> > The string "*SYStem 500 **" gets broken down into this
> >
> > *system | 500*
> >
> > The system and 500 are separated into 2 tokens, which is not what we
> want.
> > Also, the * is being removed.
> >
> >
> > We will like to have something like this. This will preserve what it is
> as
> > a string but just lowercase it.
> >
> > *system 500 **
>
>

Re: Making a String field case-insensitive

2017-11-01 Thread Emir Arnautović

Hi,
You can use KeywordTokenizer and LowerCaseTokenFilterFactory.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 09:50, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> Would like to find out, what is the best way to lower-case a String index
> in Solr, to make it case insensitive, while preserving the structure of the
> string (ie It should not break into different tokens at space, and should
> not remove any characters or symbols)
> 
> I found that solr.StrField does not use lower case filter. But if I change
> it to solr.TextField and uses Standard Tokenizer, the fields get broken up.
> 
> Eg:
> 
> For this configuration,
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
> The string "*SYStem 500 **" gets broken down into this
> 
> *system | 500*
> 
> The system and 500 are separated into 2 tokens, which is not what we want.
> Also, the * is being removed.
> 
> 
> We will like to have something like this. This will preserve what it is as
> a string but just lowercase it.
> 
> *system 500 **

Making a String field case-insensitive

2017-11-01 Thread Zheng Lin Edwin Yeo

Hi,

Would like to find out, what is the best way to lower-case a String index
in Solr, to make it case insensitive, while preserving the structure of the
string (ie It should not break into different tokens at space, and should
not remove any characters or symbols)

I found that solr.StrField does not use lower case filter. But if I change
it to solr.TextField and uses Standard Tokenizer, the fields get broken up.

Eg:

For this configuration,










   

The string "*SYStem 500 **" gets broken down into this

*system | 500*

The system and 500 are separated into 2 tokens, which is not what we want.
Also, the * is being removed.


We will like to have something like this. This will preserve what it is as
a string but just lowercase it.

*system 500 **

Re: AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-14 Thread Shawn Heisey

On 1/13/2017 7:36 AM, Sebastian Riemer wrote:
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.

Some users are always going to be confused in one way or another when
something behaves in a way that's contrary to their expectations.  If
you plan your interface correctly, you can eliminate the biggest sources
of confusion ... but there's an applicable saying here:  You can never
make things idiot-proof.  There's always a better idiot.

The facet.mincount parameter is the way to deal with this problem, as
Bill Bell already mentioned.  One of the reasons that facet.mincount
exists is to remove terms that have no documents, but still exist in the
index.

If the q parameter was an actual query instead of "all docs" and the
request didn't have facet.mincount, then the facet for that field would
still have thirteen entries, many of which might be zero.

Thanks,
Shawn

AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer

Thanks @Toke,  for pointing out these options. I'll have a read about 
expungeDeletes. 

Sounds even more so, that having solr filter out 0-counts is a good idea and I 
should handle my use-case outside of solr.

Thanks again,
Sebastian

On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w 
> emi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0
> =json
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the 
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues structure in 
the segment files, without respect to documents marked as deleted. At some 
point you had one or more documents with m_mediaType_s:1, which were later 
deleted.

If your index is not too large, you can verify this by optimizing down to 1 
segment, which will remove all traces of deleted documents (unless the index is 
already 1 segment).

If you cannot live with the false terms, committing with expungeDeletes=true 
should do the trick, although it is likely to make your indexing process a lot 
heavier.

The reason for this inaccuracy is that it is quite heavy to verify whether a 
docvalue is referenced by a document: Each time one or more documents in a 
segment are deleted, all references from all documents in that segment would 
have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where _all_ 
documents with a certain docvalue are deleted, my guess it that it is seen as 
too much of an edge case to handle.
--
Toke Eskildsen, Royal Danish Library

AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer

Nice, thank you very much for your explanation!

>> Solr returns all fields as facet result where there was some value at 
some time as long as the the documents are somewhere in the index, even when 
they're marked as indexed. So there must have been a document with 
m_mediaType_s=1. Even if all these documents are deleted already, its values 
still appear in the facet result.

I did not know about that! That makes perfect sense. I am quite sure there has 
been a time where that field contained the value "1". Even more, as now where I 
rebuild my index, the value "1" is not present as facet.field result anymore.

I'll think about how to deal with my situation then, maybe it would be better 
to keep solr filtering out 0-count facet-fields and insert the filterquery 
leading to 0 results into the select-dropdown "manually".

-Ursprüngliche Nachricht-
Von: Michael Kuhlmann [mailto:k...@solr.info] 
Gesendet: Freitag, 13. Januar 2017 15:43
An: solr-user@lucene.apache.org
Betreff: Re: FacetField-Result on String-Field contains value with count 0?

Then I don't understand your problem. Solr already does exactly what you want.

Maybe the problem is different: I assume that there never was a value of "1" in 
the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at some time 
as long as the the documents are somewhere in the index, even when they're 
marked as indexed. So there must have been a document with m_mediaType_s=1. 
Even if all these documents are deleted already, its values still appear in the 
facet result.

This holds true until segments get merged so that all deleted documents are 
pruned. So if you send a forceMerge request, chances are good that "1" won't 
come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:
> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.
>
> -Ursprüngliche Nachricht-
> Von: billnb...@gmail.com [mailto:billnb...@gmail.com]
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 
> 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
>>
>> Pardon me,
>> the second search should have been this: 
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22
>> t =on=*:*=0=0=json (or in other words, give me all 
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields 
>> result-count list?
>>
>> 
>>
>> Hi,
>>
>> Please help me understand: 
>> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>>  returns:
>>
>> "facet_counts":{
>>"facet_queries":{},
>>"facet_fields":{
>>  "m_mediaType_s":[
>>"2",25561,
>>"3",19027,
>>"10",1966,
>>"11",1705,
>>"12",1067,
>>"4",1056,
>>"5",291,
>>"8",68,
>>"13",2,
>>"6",2,
>>"7",1,
>>"9",1,
>>"1",0]},
>>"facet_ranges":{},
>>"facet_intervals":{},
>>"facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22
>> t
>> =on=*:*=0=0=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22
>> t
>> =on=*:*=0=0=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it 
>> does not exist?
>>
>> And why does it e.g. not contain
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsInclude
>> I tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> > stored="true" />
>> > />
>>

Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Toke Eskildsen

On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w
> emi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0
> =json 
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues
structure in the segment files, without respect to documents marked as
deleted. At some point you had one or more documents with
m_mediaType_s:1, which were later deleted.

If your index is not too large, you can verify this by optimizing down
to 1 segment, which will remove all traces of deleted documents (unless
the index is already 1 segment).

If you cannot live with the false terms, committing with
expungeDeletes=true should do the trick, although it is likely to make
your indexing process a lot heavier.

The reason for this inaccuracy is that it is quite heavy to verify
whether a docvalue is referenced by a document: Each time one or more
documents in a segment are deleted, all references from all documents
in that segment would have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where
_all_ documents with a certain docvalue are deleted, my guess it that
it is seen as too much of an edge case to handle.
-- 
Toke Eskildsen, Royal Danish Library

Re: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Michael Kuhlmann

Then I don't understand your problem. Solr already does exactly what you
want.

Maybe the problem is different: I assume that there never was a value of
"1" in the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at
some time as long as the the documents are somewhere in the index, even
when they're marked as indexed. So there must have been a document with
m_mediaType_s=1. Even if all these documents are deleted already, its
values still appear in the facet result.

This holds true until segments get merged so that all deleted documents
are pruned. So if you send a forceMerge request, chances are good that
"1" won't come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:
> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.
>
> -Ursprüngliche Nachricht-
> Von: billnb...@gmail.com [mailto:billnb...@gmail.com] 
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 
> 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
>>
>> Pardon me,
>> the second search should have been this: 
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22
>> =on=*:*=0=0=json (or in other words, give me all 
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields 
>> result-count list?
>>
>> 
>>
>> Hi,
>>
>> Please help me understand: 
>> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>>  returns:
>>
>> "facet_counts":{
>>"facet_queries":{},
>>"facet_fields":{
>>  "m_mediaType_s":[
>>"2",25561,
>>"3",19027,
>>"10",1966,
>>"11",1705,
>>"12",1067,
>>"4",1056,
>>"5",291,
>>"8",68,
>>"13",2,
>>"6",2,
>>"7",1,
>>"9",1,
>>"1",0]},
>>"facet_ranges":{},
>>"facet_intervals":{},
>>"facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22
>> =on=*:*=0=0=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22
>> =on=*:*=0=0=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it 
>> does not exist?
>>
>> And why does it e.g. not contain 
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
>> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> > stored="true" />
>> > />
>>

AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer

Hi Bill,

Thanks, that's actually where I come from. But I don't want to exclude values 
leading to a count of zero.

Background to this: A user searched for mediaType "book" which gave him 10 
results. Now some other task/routine whatever changes all those 10 books to be 
say 10 ebooks, because the type has been incorrect. The user makes a refresh, 
still looking for "book" gets 0 results (which is expected) and because we rule 
out facet.fields having count 0, I don't get back the selected mediaType "book" 
and thus I cannot select this value in the select-dropdown-filter for the 
mediaType. This leads to confusion for the user, since he has no results, but 
doesn't see that it's because of he still has that mediaType-filter set to a 
value "books" which now actually leads to 0 results.

-Ursprüngliche Nachricht-
Von: billnb...@gmail.com [mailto:billnb...@gmail.com] 
Gesendet: Freitag, 13. Januar 2017 15:23
An: solr-user@lucene.apache.org
Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0?

Set mincount to 1

Bill Bell
Sent from mobile

> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
> 
> Pardon me,
> the second search should have been this: 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22
> =on=*:*=0=0=json (or in other words, give me all 
> documents having value "1" for field "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the facet.fields 
> result-count list?
> 
> 
> 
> Hi,
> 
> Please help me understand: 
> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>  returns:
> 
> "facet_counts":{
>"facet_queries":{},
>"facet_fields":{
>  "m_mediaType_s":[
>"2",25561,
>"3",19027,
>"10",1966,
>"11",1705,
>"12",1067,
>"4",1056,
>"5",291,
>"8",68,
>"13",2,
>"6",2,
>"7",1,
>"9",1,
>"1",0]},
>"facet_ranges":{},
>"facet_intervals":{},
>"facet_heatmaps":{}}}
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22
> =on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":25561,"start":0,"docs":[]
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22
> =on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":0,"start":0,"docs":[]
> 
> So why does the search for facet.field even contain the value "1", if it does 
> not exist?
> 
> And why does it e.g. not contain 
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
> 
> Best regards,
> Sebastian
> 
> Additional info, field m_mediaType_s is a string;
>  stored="true" />
>  />
>

Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread billnbell

Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer  wrote:
> 
> Pardon me, 
> the second search should have been this: 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json
>  
> (or in other words, give me all documents having value "1" for field 
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the facet.fields 
> result-count list?
> 
> 
> 
> Hi,
> 
> Please help me understand: 
> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>  returns:
> 
> "facet_counts":{
>"facet_queries":{},
>"facet_fields":{
>  "m_mediaType_s":[
>"2",25561,
>"3",19027,
>"10",1966,
>"11",1705,
>"12",1067,
>"4",1056,
>"5",291,
>"8",68,
>"13",2,
>"6",2,
>"7",1,
>"9",1,
>"1",0]},
>"facet_ranges":{},
>"facet_intervals":{},
>"facet_heatmaps":{}}}
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":25561,"start":0,"docs":[]
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":0,"start":0,"docs":[]
> 
> So why does the search for facet.field even contain the value "1", if it does 
> not exist?
> 
> And why does it e.g. not contain 
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
>  : 0
> 
> Best regards,
> Sebastian
> 
> Additional info, field m_mediaType_s is a string;
>  stored="true" />
> 
>

AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer

Pardon me, 
the second search should have been this: 
http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json
 
(or in other words, give me all documents having value "1" for field 
"m_mediaType_s")

Since this search gives zero results, why is it included in the facet.fields 
result-count list?



Hi,

Please help me understand: 
http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
 returns:

"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "m_mediaType_s":[
"2",25561,
"3",19027,
"10",1966,
"11",1705,
"12",1067,
"4",1056,
"5",291,
"8",68,
"13",2,
"6",2,
"7",1,
"9",1,
"1",0]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does 
not exist?

And why does it e.g. not contain 
"SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
 : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;

FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer

Hi,

Please help me understand: 
http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
 returns:

"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "m_mediaType_s":[
"2",25561,
"3",19027,
"10",1966,
"11",1705,
"12",1067,
"4",1056,
"5",291,
"8",68,
"13",2,
"6",2,
"7",1,
"9",1,
"1",0]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does 
not exist?

And why does it e.g. not contain 
"SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
 : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;

Index Size in String Field vs Text Field

2016-09-20 Thread Zheng Lin Edwin Yeo

Hi,

Would like to check, will the index size for fields which has been defined
as String be generally smaller than fields which has been defined as a Text
Field (Eg: KeywordTokenizerFactory)?

Assuming that both of them contains the same value in the fields, and there
is no additional filters for KeywordTokenizerFactory.

I'm using Solr 6.2.0

Regards,
Edwin

Re: Sub faceting on string field using json facet runs extremly slow

2016-05-19 Thread Vijay Tiwary

Can somebody confirm whether the jira SOLR-8096 will affect json facet
also as I see sub faceting using term facet on string field is ruuning 5x
slower than on integer field for same number of hits and unique terms.
On 17-May-2016 3:33 pm, "Vijay Tiwary" <vijaykr.tiw...@gmail.com> wrote:

> Below is the request
>
>q=*:*=0=0={
>
> "customer_id": {
>
> type": "terms",
>
> "limit": -1,
>
> "field": "cid_ti",
>
> "mincount": 1,
>
> "facet": {
>
> "contact_s": {
>
> "type":
> "terms",
>
> "limit": 1,
>
> "field":
> "contact_s",
>
>
> "mincount": 1
>
> }
>
>
>
> }
>
> }
>
> }=age_td:[25 TO 50]
>
>
>
>
>
>
>
>
>
> On 17-May-2016 2:20 pm, "chandan khatri" <chandankhat...@gmail.com> wrote:
>
>> Can you please share the query for sub faceting?
>>
>> On Tue, May 17, 2016 at 2:13 PM, Vijay Tiwary <vijaykr.tiw...@gmail.com>
>> wrote:
>>
>>> Hello all,
>>> I have an index of 8 shards having 1 replica each distubuted across 8
>>> node
>>> solr cloud . Size of index is 300 gb having 30 million documents. Solr
>>> json
>>> facet runs extremly slow if I am sub faceting on string field even if
>>> tnumfound is only around 2 (also I am not returning any rows i.e
>>> rows=0).
>>> Is there any way to improve the performance?
>>>
>>> Thanks,
>>> Vijay
>>>
>>
>>

Sub faceting on string field using json facet runs extremly slow

2016-05-17 Thread Vijay Tiwary

Hello all,
I have an index of 8 shards having 1 replica each distubuted across 8 node
solr cloud . Size of index is 300 gb having 30 million documents. Solr json
facet runs extremly slow if I am sub faceting on string field even if
tnumfound is only around 2 (also I am not returning any rows i.e
rows=0).
Is there any way to improve the performance?

Thanks,
Vijay

Solr541 Carriage Return Stripped Off In String Field ?

2016-02-02 Thread Kosila Yuichiro

Hello.
I have a question regarding to "string" type field.

[ Symptom ]
When a string value including carriage return line feed (\r\n)
and passed that over to a string field, it is stored, however,
when I query that document and see the value of the field,
carriage return is stripped off away.

[ Question ]
Is this a supposed behavior ?

[ Environment ]
Apache Solr 5.4.1
Document added via its SolrJ

[ How To Reproduce ]

(1)  Download Apache Solr 5.4.1
(2)  Create a core , "test"

(3)  Prepare two fields,  "id" and "field20"
 Assign the following attributes to those fields ;
   -  type="string"  indexed="true"  stored="true"  required="true" 
multiValued="false"

(4)  Start up the Solr and from AdminGUI,
make sure that everything is working and no error coming up,
and confirm that the defined two fields are available.

(5)  Make a tiny test program using SolrJ,
 to test a document insert, and to query against it.
 Jar files used ;
- apache-solr-solrj-5.4.0.jar
- apache-solr-core-5.4.0.jar
- commons-codec-1.9.jar
- httpclient-4.5.1.jar
- commons-io-2.4.jar
- slf4j-api-1.7.13.jar
- jcl-over-slf4j-1.7.14.jar
- slf4j-jdk14-1.7.14.jar

(6)  Insert a document where the value of field20 given as "ABC\r\nDEF"
(7)  When I query that document, from both AdminGUI and SolrJ,
 I see the value retrieved as "ABD\nDEF" , where "\r" is stripped off.


[ Test Code ]

package solrtest ;
public class SolrTest {

  public static void main(String[] args) throws IOException,SolrServerException 
{

String url = "http://localhost:8983/solr/test; ;
HttpSolrServer server = new HttpSolrServer(url) ;
server.setParser(new XMLResponseParser()) ;

String mydata = "ABC\r\nDEF" ;
byte[] asciiCodes = mydata.getBytes("US-ASCII") ;
System.out.println (asciiCodes[3] + " , " + asciiCodes[4]) ;

SolrInputDocument mydoc = new SolrInputDocument() ;
mydoc.addField ( "id"  , "98765" , 1.0f ) ;
mydoc.addField ( "field20" , mydata  , 1.0f ) ;

Collection docs = new ArrayList() ;
docs.add ( mydoc ) ;
server.add ( docs ) ;
server.commit () ;

SolrQuery myquery = new SolrQuery() ;
myquery.setQuery (" id:98765" ) ;
QueryResponse rsp = server.query(myquery) ;
SolrDocumentList hits = rsp.getResults() ;

String target = "" ;
int pos = 0 ;
while ( pos < hits.getNumFound() ) {

  ListIterator docloop = hits.listIterator() ;

  while ( docloop.hasNext() ) {
pos++ ;

SolrDocument hitdoc = docloop.next() ;
Map<String, Collection> fieldvalues = 
hitdoc.getFieldValuesMap() ;
Iterator fieldnames = hitdoc.getFieldNames().iterator() ;

while ( fieldnames.hasNext() ) {

  String fieldname = fieldnames.next() ;

  Collection cellvalues = fieldvalues.get(fieldname) ;
  Iterator valueloop = cellvalues.iterator() ;

  while ( valueloop.hasNext() ) {
Object cellobj = valueloop.next() ;
String cellvalue = cellobj.toString() ;

if ( fieldname.equals("field20") ) {
  target = cellvalue ;
}

  }
}
  }
}

asciiCodes = target.getBytes("US-ASCII") ;
for ( int i=0 ; i < target.length() ; i++ ) {
  System.out.print ( asciiCodes[i] + " " ) ;
}
System.out.println ("\r\n") ;

server.close() ;

  }
}

--

Thank you in advance.
Yuichiro Kosila , Tokyo/Japan

RE: How to convert string field to date

2016-01-29 Thread Kallu, Sreenivasa (HQP)

Thanks steve. Workaround 2 is working fine.

Thanks again.
--sreenivasa kallu

-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Thursday, January 28, 2016 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: How to convert string field to date

Try workaround 2, I did and it worked for me.  See my comment on the issue: 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607-3FfocusedCommentId-3D15122751-26page-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-23comment-2D15122751=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=YvKSGXdvGRaysNwzHzvAmlBnY6iorT9wVevdTbUPjbQ=ryXl7Qzxnej4YdkT8uiP1iNipk3zqQycBuewsOMqFjs=
 >

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 6:45 PM, Kallu, Sreenivasa (HQP) 
> <sreenivasa.ka...@roberthalf.com> wrote:
> 
> Thanks steve for prompt response.
> 
> I tried workaround one. 
> i.e.  1. Add attr_date via add-dynamic-field instead of add-field 
> (even though the name has no asterisk)
> 
> I am able to add dynamic field  attr_date. But while starting the solr , I am 
> getting following message.
> Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should 
> have either a leading or a trailing asterisk, and no others.
> 
> So solr looking for either leading * or trailing * in the dynamic field name.
> 
> I can see similar problems in workaround 2.
> 
> Any other suggestions?
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
> -Original Message-
> From: Steve Rowe [mailto:sar...@gmail.com]
> Sent: Thursday, January 28, 2016 1:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to convert string field to date
> 
> Hi Sreenivasa,
> 
> This is a known bug: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_SOLR-2D8607=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRA
> lAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=ZJBCYIV-H5H3
> u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH
> 1_ljJkk=
> 
> (though the problem is not just about catch-all fields as the issue 
> currently indicates - all dynamic fields are affected)
> 
> Two workarounds (neither tested):
> 
> 1. Add attr_date via add-dynamic-field instead of add-field (even though the 
> name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then 
> add attr_* back; these can be done with a single request.
> 
> I’ll update SOLR_8607 to reflect these things.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>> <sreenivasa.ka...@roberthalf.com> wrote:
>> 
>> Hi,
>>  I am new to solr.
>> 
>> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
>> email messages.
>> I can see only see three fields ( id,_version_,_text_) defined in 
>> managed-schema. Remaining fields are handled by following dynamic 
>> field > stored="true" multiValued="true"/>
>> 
>> I have field name attr_date with type string. I want convert this 
>> field type to date. Currently date range is not working on this field.
>> I tried schema API to add new field attr_date and got following error 
>> message "Field 'attr_date' already exists".  I tried to replace field type 
>> to date and got following error message "The field 'attr_date' is not 
>> present in this schema, and so cannot be replaced".
>> 
>> Please help me to convert "attr_date"  field type to date.
>> 
>> Advanced Thanks.
>> --sreenivasa kallu
>> 
>> 
>

How to convert string field to date

2016-01-28 Thread Kallu, Sreenivasa (HQP)

Hi,
   I am new to solr.

I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
email messages.
I can see only see three fields ( id,_version_,_text_) defined in 
managed-schema. Remaining fields are
handled by following dynamic field


I have field name attr_date with type string. I want convert this field type to 
date. Currently date range is not
working on this field. I tried schema API to add new field attr_date and got 
following error message
"Field 'attr_date' already exists".  I tried to replace field type to date and 
got following error message
"The field 'attr_date' is not present in this schema, and so cannot be 
replaced".

Please help me to convert "attr_date"  field type to date.

Advanced Thanks.
--sreenivasa kallu

Re: How to convert string field to date

2016-01-28 Thread Steve Rowe

Hi Sreenivasa,

This is a known bug: https://issues.apache.org/jira/browse/SOLR-8607

(though the problem is not just about catch-all fields as the issue currently 
indicates - all dynamic fields are affected)

Two workarounds (neither tested):

1. Add attr_date via add-dynamic-field instead of add-field (even though the 
name has no asterisk)
2. Remove the attr_* dynamic field, add attr-date, then add attr_* back; these 
can be done with a single request.

I’ll update SOLR_8607 to reflect these things.

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>  wrote:
> 
> Hi,
>   I am new to solr.
> 
> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
> email messages.
> I can see only see three fields ( id,_version_,_text_) defined in 
> managed-schema. Remaining fields are
> handled by following dynamic field
>  multiValued="true"/>
> 
> I have field name attr_date with type string. I want convert this field type 
> to date. Currently date range is not
> working on this field. I tried schema API to add new field attr_date and got 
> following error message
> "Field 'attr_date' already exists".  I tried to replace field type to date 
> and got following error message
> "The field 'attr_date' is not present in this schema, and so cannot be 
> replaced".
> 
> Please help me to convert "attr_date"  field type to date.
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
>

RE: How to convert string field to date

2016-01-28 Thread Kallu, Sreenivasa (HQP)

Thanks steve for prompt response.

I tried workaround one. 
i.e.  1. Add attr_date via add-dynamic-field instead of add-field (even though 
the name has no asterisk)

I am able to add dynamic field  attr_date. But while starting the solr , I am 
getting following message.
Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should 
have either a leading or a trailing asterisk, and no others.

So solr looking for either leading * or trailing * in the dynamic field name.

I can see similar problems in workaround 2.

Any other suggestions?

Advanced Thanks.
--sreenivasa kallu

-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Thursday, January 28, 2016 1:17 PM
To: solr-user@lucene.apache.org
Subject: Re: How to convert string field to date

Hi Sreenivasa,

This is a known bug: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=ZJBCYIV-H5H3u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH1_ljJkk=

(though the problem is not just about catch-all fields as the issue currently 
indicates - all dynamic fields are affected)

Two workarounds (neither tested):

1. Add attr_date via add-dynamic-field instead of add-field (even though the 
name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then 
add attr_* back; these can be done with a single request.

I’ll update SOLR_8607 to reflect these things.

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
> <sreenivasa.ka...@roberthalf.com> wrote:
> 
> Hi,
>   I am new to solr.
> 
> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
> email messages.
> I can see only see three fields ( id,_version_,_text_) defined in 
> managed-schema. Remaining fields are handled by following dynamic 
> field  stored="true" multiValued="true"/>
> 
> I have field name attr_date with type string. I want convert this 
> field type to date. Currently date range is not working on this field. 
> I tried schema API to add new field attr_date and got following error 
> message "Field 'attr_date' already exists".  I tried to replace field type to 
> date and got following error message "The field 'attr_date' is not present in 
> this schema, and so cannot be replaced".
> 
> Please help me to convert "attr_date"  field type to date.
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
>

Re: How to convert string field to date

2016-01-28 Thread Steve Rowe

Try workaround 2, I did and it worked for me.  See my comment on the issue: 
<https://issues.apache.org/jira/browse/SOLR-8607?focusedCommentId=15122751=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15122751>

--
Steve
www.lucidworks.com

> On Jan 28, 2016, at 6:45 PM, Kallu, Sreenivasa (HQP) 
> <sreenivasa.ka...@roberthalf.com> wrote:
> 
> Thanks steve for prompt response.
> 
> I tried workaround one. 
> i.e.  1. Add attr_date via add-dynamic-field instead of add-field (even 
> though the name has no asterisk)
> 
> I am able to add dynamic field  attr_date. But while starting the solr , I am 
> getting following message.
> Could not load conf for core sreenimsg: Dynamic field name 'attr_date' should 
> have either a leading or a trailing asterisk, and no others.
> 
> So solr looking for either leading * or trailing * in the dynamic field name.
> 
> I can see similar problems in workaround 2.
> 
> Any other suggestions?
> 
> Advanced Thanks.
> --sreenivasa kallu
> 
> -Original Message-
> From: Steve Rowe [mailto:sar...@gmail.com] 
> Sent: Thursday, January 28, 2016 1:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to convert string field to date
> 
> Hi Sreenivasa,
> 
> This is a known bug: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D8607=CwIFaQ=19TEyCb-E0do3cLmFgm9ItTXlbGQ5gmhRAlAtE256go=ZV-VnW_JFfcZo8vYJrpehzAvJFfw1xE42YRKpSHHqLg=ZJBCYIV-H5H3u5j_Rrhaex68Eb9dgqZmlO6fzKNfr8s=qmQIR8akquwcJ83E7HZgK38lTfSug8QifJEH1_ljJkk=
>  
> 
> (though the problem is not just about catch-all fields as the issue currently 
> indicates - all dynamic fields are affected)
> 
> Two workarounds (neither tested):
> 
> 1. Add attr_date via add-dynamic-field instead of add-field (even though the 
> name has no asterisk) 2. Remove the attr_* dynamic field, add attr-date, then 
> add attr_* back; these can be done with a single request.
> 
> I’ll update SOLR_8607 to reflect these things.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Jan 28, 2016, at 3:58 PM, Kallu, Sreenivasa (HQP) 
>> <sreenivasa.ka...@roberthalf.com> wrote:
>> 
>> Hi,
>>  I am new to solr.
>> 
>> I am using managed-schema. I am not using schema.xml.  I am indexing outlook 
>> email messages.
>> I can see only see three fields ( id,_version_,_text_) defined in 
>> managed-schema. Remaining fields are handled by following dynamic 
>> field > stored="true" multiValued="true"/>
>> 
>> I have field name attr_date with type string. I want convert this 
>> field type to date. Currently date range is not working on this field. 
>> I tried schema API to add new field attr_date and got following error 
>> message "Field 'attr_date' already exists".  I tried to replace field type 
>> to date and got following error message "The field 'attr_date' is not 
>> present in this schema, and so cannot be replaced".
>> 
>> Please help me to convert "attr_date"  field type to date.
>> 
>> Advanced Thanks.
>> --sreenivasa kallu
>> 
>> 
>

Re: How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Upayavira

That is, use a TextField plus a KeywordTokenizerFactory, rather than a
StringField

On Wed, Sep 16, 2015, at 09:03 PM, Upayavira wrote:
> If you want to analyse a string field, use the KeywordTokenizer - it
> just passes the whole field through as a single tokenizer.
> 
> Does that get you there?
> 
> On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote:
> > I understand that i can configure "solr.PhoneticFilterFactory" for both
> > indexing and query time for "solr.TextField". However, i want to query a
> > list of term (indexed and stored) from a field ordered by phonetic
> > similarity, which can be easily done by most of relational database.
> > 
> > Term Component allows me to perform exactly matching and regex based
> > fuzzy
> > matching from multi-valued field. However, the solr string field does not
> > allow to customise the default analyser. Is there any other way to
> > circumvent the problem?
> > 
> > thanks,
> > Jerry
> > 
> > 
> > 
> > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote:
> > 
> > >
> > >
> > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote:
> > > > Hi,
> > > >
> > > >
> > > > I want to query a list of terms indexed and stored in multivalued string
> > > > field via Term Component. The term component can support exact matching
> > > > and
> > > > regex based fuzzy matching. However, Is any way i can configure scheme 
> > > > to
> > > > do phonetic matching/query?
> > >
> > > Phonetic matching is done at index time - that is - you use a
> > > PhoneticFilterFactory in your analysis chain, such that you are doing
> > > exact match lookups on the phonetic terms.
> > >
> > > Make sense?
> > >
> > > Upayavira
> > >

Re: How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Upayavira

If you want to analyse a string field, use the KeywordTokenizer - it
just passes the whole field through as a single tokenizer.

Does that get you there?

On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote:
> I understand that i can configure "solr.PhoneticFilterFactory" for both
> indexing and query time for "solr.TextField". However, i want to query a
> list of term (indexed and stored) from a field ordered by phonetic
> similarity, which can be easily done by most of relational database.
> 
> Term Component allows me to perform exactly matching and regex based
> fuzzy
> matching from multi-valued field. However, the solr string field does not
> allow to customise the default analyser. Is there any other way to
> circumvent the problem?
> 
> thanks,
> Jerry
> 
> 
> 
> On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote:
> 
> >
> >
> > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote:
> > > Hi,
> > >
> > >
> > > I want to query a list of terms indexed and stored in multivalued string
> > > field via Term Component. The term component can support exact matching
> > > and
> > > regex based fuzzy matching. However, Is any way i can configure scheme to
> > > do phonetic matching/query?
> >
> > Phonetic matching is done at index time - that is - you use a
> > PhoneticFilterFactory in your analysis chain, such that you are doing
> > exact match lookups on the phonetic terms.
> >
> > Make sense?
> >
> > Upayavira
> >

Re: How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Jie Gao

I understand that i can configure "solr.PhoneticFilterFactory" for both
indexing and query time for "solr.TextField". However, i want to query a
list of term (indexed and stored) from a field ordered by phonetic
similarity, which can be easily done by most of relational database.

Term Component allows me to perform exactly matching and regex based fuzzy
matching from multi-valued field. However, the solr string field does not
allow to customise the default analyser. Is there any other way to
circumvent the problem?

thanks,
Jerry

On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote:

>
>
> On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote:
> > Hi,
> >
> >
> > I want to query a list of terms indexed and stored in multivalued string
> > field via Term Component. The term component can support exact matching
> > and
> > regex based fuzzy matching. However, Is any way i can configure scheme to
> > do phonetic matching/query?
>
> Phonetic matching is done at index time - that is - you use a
> PhoneticFilterFactory in your analysis chain, such that you are doing
> exact match lookups on the phonetic terms.
>
> Make sense?
>
> Upayavira
>

Re: How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Upayavira

On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote:
> Hi,
> 
> 
> I want to query a list of terms indexed and stored in multivalued string
> field via Term Component. The term component can support exact matching
> and
> regex based fuzzy matching. However, Is any way i can configure scheme to
> do phonetic matching/query?

Phonetic matching is done at index time - that is - you use a
PhoneticFilterFactory in your analysis chain, such that you are doing
exact match lookups on the phonetic terms.

Make sense?

Upayavira

Re: How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Jie Gao

Many thanks for your suggestion.

It works well for querying the field with phonetic matching and responses a
list of docs tagged with the term.

However, is there any way that i can get a list of matched terms ? The
phonetic matching seems not work with Term Component (i'm using terms.regex
to filter).

Jie Gao,
Research Assistant,
Department of Computer Science, The University of Sheffield,
Regent Court, 211 Portobello, S1 4DP, Sheffield, UK

On 16 September 2015 at 21:04, Upayavira <u...@odoko.co.uk> wrote:

> That is, use a TextField plus a KeywordTokenizerFactory, rather than a
> StringField
>
> On Wed, Sep 16, 2015, at 09:03 PM, Upayavira wrote:
> > If you want to analyse a string field, use the KeywordTokenizer - it
> > just passes the whole field through as a single tokenizer.
> >
> > Does that get you there?
> >
> > On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote:
> > > I understand that i can configure "solr.PhoneticFilterFactory" for both
> > > indexing and query time for "solr.TextField". However, i want to query
> a
> > > list of term (indexed and stored) from a field ordered by phonetic
> > > similarity, which can be easily done by most of relational database.
> > >
> > > Term Component allows me to perform exactly matching and regex based
> > > fuzzy
> > > matching from multi-valued field. However, the solr string field does
> not
> > > allow to customise the default analyser. Is there any other way to
> > > circumvent the problem?
> > >
> > > thanks,
> > > Jerry
> > >
> > >
> > >
> > > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote:
> > >
> > > >
> > > >
> > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote:
> > > > > Hi,
> > > > >
> > > > >
> > > > > I want to query a list of terms indexed and stored in multivalued
> string
> > > > > field via Term Component. The term component can support exact
> matching
> > > > > and
> > > > > regex based fuzzy matching. However, Is any way i can configure
> scheme to
> > > > > do phonetic matching/query?
> > > >
> > > > Phonetic matching is done at index time - that is - you use a
> > > > PhoneticFilterFactory in your analysis chain, such that you are doing
> > > > exact match lookups on the phonetic terms.
> > > >
> > > > Make sense?
> > > >
> > > > Upayavira
> > > >
>

Re: How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Upayavira

I bet the terms component does not analyse the terms, so you will need
to hand in already analysed phonetic terms. You could use the
http://localhost:8983/solr/YOUR-CORE/analysis/field URL to have Solr
analyse the field for you before passing it back to the term component.

Upayavira

On Wed, Sep 16, 2015, at 10:03 PM, Jie Gao wrote:
> Many thanks for your suggestion.
> 
> It works well for querying the field with phonetic matching and responses
> a
> list of docs tagged with the term.
> 
> However, is there any way that i can get a list of matched terms ? The
> phonetic matching seems not work with Term Component (i'm using
> terms.regex
> to filter).
> 
> Jie Gao,
> Research Assistant,
> Department of Computer Science, The University of Sheffield,
> Regent Court, 211 Portobello, S1 4DP, Sheffield, UK
> 
> On 16 September 2015 at 21:04, Upayavira <u...@odoko.co.uk> wrote:
> 
> > That is, use a TextField plus a KeywordTokenizerFactory, rather than a
> > StringField
> >
> > On Wed, Sep 16, 2015, at 09:03 PM, Upayavira wrote:
> > > If you want to analyse a string field, use the KeywordTokenizer - it
> > > just passes the whole field through as a single tokenizer.
> > >
> > > Does that get you there?
> > >
> > > On Wed, Sep 16, 2015, at 08:52 PM, Jie Gao wrote:
> > > > I understand that i can configure "solr.PhoneticFilterFactory" for both
> > > > indexing and query time for "solr.TextField". However, i want to query
> > a
> > > > list of term (indexed and stored) from a field ordered by phonetic
> > > > similarity, which can be easily done by most of relational database.
> > > >
> > > > Term Component allows me to perform exactly matching and regex based
> > > > fuzzy
> > > > matching from multi-valued field. However, the solr string field does
> > not
> > > > allow to customise the default analyser. Is there any other way to
> > > > circumvent the problem?
> > > >
> > > > thanks,
> > > > Jerry
> > > >
> > > >
> > > >
> > > > On 16 September 2015 at 19:55, Upayavira <u...@odoko.co.uk> wrote:
> > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 16, 2015, at 06:37 PM, Jie Gao wrote:
> > > > > > Hi,
> > > > > >
> > > > > >
> > > > > > I want to query a list of terms indexed and stored in multivalued
> > string
> > > > > > field via Term Component. The term component can support exact
> > matching
> > > > > > and
> > > > > > regex based fuzzy matching. However, Is any way i can configure
> > scheme to
> > > > > > do phonetic matching/query?
> > > > >
> > > > > Phonetic matching is done at index time - that is - you use a
> > > > > PhoneticFilterFactory in your analysis chain, such that you are doing
> > > > > exact match lookups on the phonetic terms.
> > > > >
> > > > > Make sense?
> > > > >
> > > > > Upayavira
> > > > >
> >

How to perform phonetic matching/query for multivalued string field

2015-09-16 Thread Jie Gao

Hi,


I want to query a list of terms indexed and stored in multivalued string
field via Term Component. The term component can support exact matching and
regex based fuzzy matching. However, Is any way i can configure scheme to
do phonetic matching/query?

Thanks,
Jerry

Re: SOLRJ Atomic updates of String field

2014-11-12 Thread Anurag Sharma

I understood the query now.
Atomic Update and Optimistic Concurrency are independent in Solr version 
5.
Not sure about version 4.2, if they are combined in this version a
_version_ field is needed to pass in every update. The atomic/partial
update will succeed if version in the request and indexed doc matches
otherwise response will have HTTP error code 409.

You can try by passing the _version_ of indexed doc during update.

It's also good to add a unit test in Solr for partial update which
currently I see missing.

On Wed, Nov 12, 2014 at 1:00 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Bbarani,

 Partial update solrJ example can be found in :
 http://find.searchhub.org/document/5b1187abfcfad33f

 Ahmet



 On Tuesday, November 11, 2014 8:51 PM, bbarani bbar...@gmail.com wrote:
 I am using the below code to do partial update (in SOLR 4.2)

 partialUpdate = new HashMapString, Object();
 partialUpdate.put(set,Object);
 doc.setField(description, partialUpdate);
 server.add(docs);
 server.commit();

 I am seeing the below description value with {set =...}, Any idea why this
 is getting added?

 str name=description
 {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
 for faster processing and longer battery life, the M8 motion coprocessor to
 track speed, distance and elevation, and with an 8MP iSight camera, you can
 record 1080p HD Video at 60 FPS!}
 /str



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
 Sent from the Solr - User mailing list archive at Nabble.com.

SOLRJ Atomic updates of String field

2014-11-11 Thread bbarani

I am using the below code to do partial update (in SOLR 4.2)

partialUpdate = new HashMapString, Object();
partialUpdate.put(set,Object);
doc.setField(description, partialUpdate);
server.add(docs);
server.commit();

I am seeing the below description value with {set =...}, Any idea why this
is getting added?

str name=description
{set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
for faster processing and longer battery life, the M8 motion coprocessor to
track speed, distance and elevation, and with an 8MP iSight camera, you can
record 1080p HD Video at 60 FPS!}
/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLRJ Atomic updates of String field

2014-11-11 Thread Anurag Sharma

Sorry didn't get what you are trying to achieve and the issue.

On Wed, Nov 12, 2014 at 12:20 AM, bbarani bbar...@gmail.com wrote:

 I am using the below code to do partial update (in SOLR 4.2)

 partialUpdate = new HashMapString, Object();
 partialUpdate.put(set,Object);
 doc.setField(description, partialUpdate);
 server.add(docs);
 server.commit();

 I am seeing the below description value with {set =...}, Any idea why this
 is getting added?

 str name=description
 {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
 for faster processing and longer battery life, the M8 motion coprocessor to
 track speed, distance and elevation, and with an 8MP iSight camera, you can
 record 1080p HD Video at 60 FPS!}
 /str



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLRJ Atomic updates of String field

2014-11-11 Thread Ahmet Arslan

Hi Bbarani,

Partial update solrJ example can be found in : 
http://find.searchhub.org/document/5b1187abfcfad33f

Ahmet



On Tuesday, November 11, 2014 8:51 PM, bbarani bbar...@gmail.com wrote:
I am using the below code to do partial update (in SOLR 4.2)

partialUpdate = new HashMapString, Object();
partialUpdate.put(set,Object);
doc.setField(description, partialUpdate);
server.add(docs);
server.commit();

I am seeing the below description value with {set =...}, Any idea why this
is getting added?

str name=description
{set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
for faster processing and longer battery life, the M8 motion coprocessor to
track speed, distance and elevation, and with an 8MP iSight camera, you can
record 1080p HD Video at 60 FPS!}
/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact match on string field with special characters

2014-10-06 Thread tedsolr

I may have provided too much background story for my question. What I am
trying to do at the core here, is an exact match on a single field. I do
this programmatically by reading the field value from the facet query and
setting it equal to the field name for a subsequent search.

if this is a sample facet query result ... (Field1 is defined as a string)
[Field1:[HI! THIS IS A VALUE FOR \FIELD1\ (100)]

Then I need to run a search for that exact value. The problem is the double
quotes and slashes when I try to construct the facet query ...
String fq = Field1: + \ + value + \;

The quotes play havoc with the concatenation, as do backslashes. I was
wondering if there's a way to build the search without having to manually
construct it in code. The only thing I can come up with is to transform the
field data at index time by replacing double quotes and backslashes. I don't
strip special chars because I'm using the facet values for display. This
problem may be specific to SolrJ. Thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209p4162907.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact match on string field with special characters

2014-10-06 Thread tedsolr

Shoot I just noticed the error in my original post which would certainly
cause confusion.

Instead of
query.addFacetField(fq); 

I meant to write
query.setParam(fq, fg);

Sorry.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209p4162908.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Exact match on string field with special characters

2014-10-06 Thread Michael Ryan

This should do what you want:

String fq = Field1 + \ + 
org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(value) + \;

-Michael

-Original Message-
From: tedsolr [mailto:tsm...@sciquest.com] 
Sent: Monday, October 06, 2014 10:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact match on string field with special characters

I may have provided too much background story for my question. What I am trying 
to do at the core here, is an exact match on a single field. I do this 
programmatically by reading the field value from the facet query and setting it 
equal to the field name for a subsequent search.

if this is a sample facet query result ... (Field1 is defined as a string) 
[Field1:[HI! THIS IS A VALUE FOR \FIELD1\ (100)]

Then I need to run a search for that exact value. The problem is the double 
quotes and slashes when I try to construct the facet query ...
String fq = Field1: + \ + value + \;

The quotes play havoc with the concatenation, as do backslashes. I was 
wondering if there's a way to build the search without having to manually 
construct it in code. The only thing I can come up with is to transform the 
field data at index time by replacing double quotes and backslashes. I don't 
strip special chars because I'm using the facet values for display. This 
problem may be specific to SolrJ. Thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209p4162907.html
Sent from the Solr - User mailing list archive at Nabble.com.

Exact match on string field with special characters

2014-10-01 Thread tedsolr

I am trying to do SQL like aggregation (GROUP BY) with solr faceting. So I
use string fields for faceting - to try to get an exact match. However, it
seems like to run a facet query I have to surround the value with double
quotes. That poses issues when the field value is

green bath towels

-or-

red \cars

Those two special characters must be transformed somehow on indexing so I
can create the query:
(java)
...
String fg = fieldName + :\ + fieldValue + \;
query.addFacetField(fq);
...

Is there a way to request an exact match search without having to resort to
the quotes? I could possibly convert spaces to underscores at index time,
but I'd like to avoid munging that data because I'm using the string field
for display too! That saves time/searches when aggregating against 10 - 15
fields which takes a whole lot of facet searches to begin with.

Using Solr 4.9 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Exact match on string field with special characters

2014-10-01 Thread Michael Ryan

When you call addFacetField, the parameter you pass it should just be the 
fieldName. The fieldValue shouldn't come into play at all (unless I'm 
misunderstanding what you're trying to do).

If you ever do need to escape a value for a query, you can use 
org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars().

-Michael

-Original Message-
From: tedsolr [mailto:tsm...@sciquest.com] 
Sent: Wednesday, October 01, 2014 5:33 PM
To: solr-user@lucene.apache.org
Subject: Exact match on string field with special characters

I am trying to do SQL like aggregation (GROUP BY) with solr faceting. So I use 
string fields for faceting - to try to get an exact match. However, it seems 
like to run a facet query I have to surround the value with double quotes. That 
poses issues when the field value is

green bath towels

-or-

red \cars

Those two special characters must be transformed somehow on indexing so I can 
create the query:
(java)
...
String fg = fieldName + :\ + fieldValue + \; query.addFacetField(fq); ...

Is there a way to request an exact match search without having to resort to the 
quotes? I could possibly convert spaces to underscores at index time, but I'd 
like to avoid munging that data because I'm using the string field for display 
too! That saves time/searches when aggregating against 10 - 15 fields which 
takes a whole lot of facet searches to begin with.

Using Solr 4.9 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact match on string field with special characters

2014-10-01 Thread Ahmet Arslan

Hi,

raw query parser or term query parser would be handy. 

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser

Ahmet



On Thursday, October 2, 2014 12:32 AM, tedsolr tsm...@sciquest.com wrote:
I am trying to do SQL like aggregation (GROUP BY) with solr faceting. So I
use string fields for faceting - to try to get an exact match. However, it
seems like to run a facet query I have to surround the value with double
quotes. That poses issues when the field value is

green bath towels

-or-

red \cars

Those two special characters must be transformed somehow on indexing so I
can create the query:
(java)
...
String fg = fieldName + :\ + fieldValue + \;
query.addFacetField(fq);
...

Is there a way to request an exact match search without having to resort to
the quotes? I could possibly convert spaces to underscores at index time,
but I'd like to avoid munging that data because I'm using the string field
for display too! That saves time/searches when aggregating against 10 - 15
fields which takes a whole lot of facet searches to begin with.

Using Solr 4.9 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-match-on-string-field-with-special-characters-tp4162209.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to summarize a String Field ?

2014-09-18 Thread YouPeng Yang

Hi

   One of my filed called AMOUNT  is  String,and I want to  calculate the
sum of the this filed.
I have try it with the stats component,it only give out the stats
information without sum item just as following:

lst name=AMOUNT
 str name=min/str
 str name=max5000/str
 long name=count24230/long
 long name=missing26362/long
  lst name=facets/
/lst

   Is there any ways to achieve this object?

Regards

Re: How to summarize a String Field ?

2014-09-18 Thread Erick Erickson

You cannot do this as far as I know, it must be a numeric field
(float/int/tint/tfloat whatever).

Best
Erick

On Thu, Sep 18, 2014 at 12:46 AM, YouPeng Yang
yypvsxf19870...@gmail.com wrote:
 Hi

One of my filed called AMOUNT  is  String,and I want to  calculate the
 sum of the this filed.
 I have try it with the stats component,it only give out the stats
 information without sum item just as following:

 lst name=AMOUNT
  str name=min/str
  str name=max5000/str
  long name=count24230/long
  long name=missing26362/long
   lst name=facets/
 /lst

Is there any ways to achieve this object?

 Regards

Re: How to summarize a String Field ?

2014-09-18 Thread Jack Krupansky

Do a copyField to a numeric field.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson 
Sent: Thursday, September 18, 2014 11:35 AM 
To: solr-user@lucene.apache.org 
Subject: Re: How to summarize a String Field ? 

You cannot do this as far as I know, it must be a numeric field
(float/int/tint/tfloat whatever).

Best
Erick

On Thu, Sep 18, 2014 at 12:46 AM, YouPeng Yang
yypvsxf19870...@gmail.com wrote:

Hi

   One of my filed called AMOUNT  is  String,and I want to  calculate the
sum of the this filed.
I have try it with the stats component,it only give out the stats
information without sum item just as following:

lst name=AMOUNT
 str name=min/str
 str name=max5000/str
 long name=count24230/long
 long name=missing26362/long
  lst name=facets/
/lst

   Is there any ways to achieve this object?

Regards

Typecast non stored string field for sorting

2014-04-23 Thread abhishek jain

Hi friends,
I have a field which is string which I created by mistake it should have
been int.
It is not stored just indexed.

I want to numerically sort it, and hence I want a function which can at
query convert to integer or double and then I can apply sort. Is it
possible?
If not then can I create a new field with the value from non stored field?

Please advise.
Thanks
Abhishek

-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Typecast non stored string field for sorting

2014-04-23 Thread Erick Erickson

I don't know of any way offhand to do this except to re-index. You
can't, for instance,  say copy from this indexed field to this other
indexed field.

Is it possible for you to re-index?

Best,
Erick

On Wed, Apr 23, 2014 at 12:46 PM, abhishek jain
abhishek.netj...@gmail.com wrote:
 Hi friends,
 I have a field which is string which I created by mistake it should have
 been int.
 It is not stored just indexed.

 I want to numerically sort it, and hence I want a function which can at
 query convert to integer or double and then I can apply sort. Is it
 possible?
 If not then can I create a new field with the value from non stored field?

 Please advise.
 Thanks
 Abhishek

 --
 Thanks and kind Regards,
 Abhishek jain
 +91 9971376767

Re: Typecast non stored string field for sorting

2014-04-23 Thread bbi123

I think you can write a custom function query and use it on query time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Typecast-non-stored-string-field-for-sorting-tp4132759p4132779.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlight feature is not working on string field type- Apache Solr

2013-12-10 Thread Furkan KAMACI

Hi;

When you examine Solr example folder you can see that highlighting feature
works for String field. Here is the definition for cat field that is a type
of String:

field name=cat type=string indexed=true stored=true
multiValued=true/

if you run that from your browser:

http://localhost:8983/solr/collection1/select?q=cat:*%20AND%20name:samsunghl=truehl.fl=*

You will see that highlighting works as excepted.

All in all, what is your Solr version and configuration of search handler?

Thanks;
Furkan KAMACI




2013/12/5 pyramesh pyrames...@gmail.com

 Hi ALL,

 I have recently build small search application using Apache solr. now I am
 facing an issue.

 Highlighting text feature is not working on string field type, But it
 working on text field type.

 when I search the content on string field type, the results are getting
 displaying, but not getting highlight.

 can any one please guide me on the same. Thanks in Advance !!!

 Regards,
 Ramesh py



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/highlight-feature-is-not-working-on-string-field-type-Apache-Solr-tp4105084.html
 Sent from the Solr - User mailing list archive at Nabble.com.

highlight feature is not working on string field type- Apache Solr

2013-12-05 Thread pyramesh

Hi ALL,

I have recently build small search application using Apache solr. now I am
facing an issue.

Highlighting text feature is not working on string field type, But it
working on text field type.

when I search the content on string field type, the results are getting
displaying, but not getting highlight. 

can any one please guide me on the same. Thanks in Advance !!!

Regards,
Ramesh py



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlight-feature-is-not-working-on-string-field-type-Apache-Solr-tp4105084.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: String field does not yield partial match result using qf parameter

2013-06-25 Thread Jan Høydahl

fieldType string is not tokenized, so your observation is correct. You need 
to use a fieldType with analysis and tokenization to get the behavior you want.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 02:35 skrev Mugoma Joseph O. mug...@yengas.com:

 
 It looks like partial search works only with copied to field. This works:
 
 $ curl
 http://localhost:8282/solr/links/select?q=text_ngrams:yengaswt=jsonindent=onfl=id,domain,score;
 
 On Tue, June 25, 2013 12:39 am, Mugoma Joseph O. wrote:
 Hello,
 
 I am newbie to solr.
 
 I am trying out partial search (match). My experience is opposite of
 http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html
 
 When I add 'qf' to to dismax query I get no result unless there's a full
 match.
 
 I am using NGramFilterFactory as follows:
 
 fieldType name=text_edgengrams class=solr.TextField
   analyzer type=index
 tokenizer class=solr.LowerCaseTokenizerFactory/
 filter class=solr.NGramFilterFactory minGramSize=3
 maxGramSize=15/
   /analyzer
   analyzer type=query
 tokenizer class=solr.LowerCaseTokenizerFactory/
   /analyzer
 /fieldType
 
 ...
 
 
 field name=text_ngrams type=text_edgengrams indexed=true
 stored=false multiValued=true /
 
 ...
 
 field name=domain type=string indexed=true stored=true/
 
 ...
 
 copyField source=domain dest=text_ngrams/
 
 
 If I have yengas.com in indexed I can search for yengas.com but not
 yengas. However, If I drop 'qf' I can search for yengas.
 
 
 Example searches:
 
 $ curl
 http://localhost:8282/solr/links/select?q=domain:yengaswt=jsonindent=onfl=id,domain,score;
 = response:{numFound:0,start:0,docs:[]
 
 
 $ curl
 http://localhost:8282/solr/links/select?q=domain:yengas.comwt=jsonindent=onfl=id,domain,score;
 = response:{numFound:3,start:0,docs:[]
 
 $ curl
 http://localhost:8282/solr/links/select?defType=dismaxq=yengasqf=domain^4pf=domainps=0fl=id,domain,score;
 = response:{numFound:0,start:0,docs:[]
 
 
 $ curl
 http://localhost:8282/solr/links/select?defType=dismaxq=yengas.compf=domainps=0fl=id,domain,score;
 = response:{numFound:3,start:0,docs:[]
 
 
 The partial match fails on dismax and normal query.
 
 What could I be missing?
 
 
 Thanks.
 
 Mugoma.

String field does not yield partial match result using qf parameter

2013-06-24 Thread Mugoma Joseph O.

Hello,

I am newbie to solr.

I am trying out partial search (match). My experience is opposite of
http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html

When I add 'qf' to to dismax query I get no result unless there's a full
match.

I am using NGramFilterFactory as follows:

 fieldType name=text_edgengrams class=solr.TextField
   analyzer type=index
 tokenizer class=solr.LowerCaseTokenizerFactory/
 filter class=solr.NGramFilterFactory minGramSize=3
maxGramSize=15/
   /analyzer
   analyzer type=query
 tokenizer class=solr.LowerCaseTokenizerFactory/
   /analyzer
 /fieldType

 ...


 field name=text_ngrams type=text_edgengrams indexed=true
stored=false multiValued=true /

 ...

 field name=domain type=string indexed=true stored=true/

 ...

 copyField source=domain dest=text_ngrams/


If I have yengas.com in indexed I can search for yengas.com but not
yengas. However, If I drop 'qf' I can search for yengas.


Example searches:

 $ curl
http://localhost:8282/solr/links/select?q=domain:yengaswt=jsonindent=onfl=id,domain,score;
 = response:{numFound:0,start:0,docs:[]


 $ curl
http://localhost:8282/solr/links/select?q=domain:yengas.comwt=jsonindent=onfl=id,domain,score;
 = response:{numFound:3,start:0,docs:[]

 $ curl
http://localhost:8282/solr/links/select?defType=dismaxq=yengasqf=domain^4pf=domainps=0fl=id,domain,score;
 = response:{numFound:0,start:0,docs:[]


 $ curl
http://localhost:8282/solr/links/select?defType=dismaxq=yengas.compf=domainps=0fl=id,domain,score;
 = response:{numFound:3,start:0,docs:[]


The partial match fails on dismax and normal query.

What could I be missing?


Thanks.

Mugoma.

Re: String field does not yield partial match result using qf parameter

2013-06-24 Thread Mugoma Joseph O.


It looks like partial search works only with copied to field. This works:

$ curl
http://localhost:8282/solr/links/select?q=text_ngrams:yengaswt=jsonindent=onfl=id,domain,score;

On Tue, June 25, 2013 12:39 am, Mugoma Joseph O. wrote:
 Hello,

 I am newbie to solr.

 I am trying out partial search (match). My experience is opposite of
 http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-td4060096.html

 When I add 'qf' to to dismax query I get no result unless there's a full
 match.

 I am using NGramFilterFactory as follows:

  fieldType name=text_edgengrams class=solr.TextField
analyzer type=index
  tokenizer class=solr.LowerCaseTokenizerFactory/
  filter class=solr.NGramFilterFactory minGramSize=3
 maxGramSize=15/
/analyzer
analyzer type=query
  tokenizer class=solr.LowerCaseTokenizerFactory/
/analyzer
  /fieldType

  ...


  field name=text_ngrams type=text_edgengrams indexed=true
 stored=false multiValued=true /

  ...

  field name=domain type=string indexed=true stored=true/

  ...

  copyField source=domain dest=text_ngrams/


 If I have yengas.com in indexed I can search for yengas.com but not
 yengas. However, If I drop 'qf' I can search for yengas.


 Example searches:

  $ curl
 http://localhost:8282/solr/links/select?q=domain:yengaswt=jsonindent=onfl=id,domain,score;
  = response:{numFound:0,start:0,docs:[]


  $ curl
 http://localhost:8282/solr/links/select?q=domain:yengas.comwt=jsonindent=onfl=id,domain,score;
  = response:{numFound:3,start:0,docs:[]

  $ curl
 http://localhost:8282/solr/links/select?defType=dismaxq=yengasqf=domain^4pf=domainps=0fl=id,domain,score;
  = response:{numFound:0,start:0,docs:[]


  $ curl
 http://localhost:8282/solr/links/select?defType=dismaxq=yengas.compf=domainps=0fl=id,domain,score;
  = response:{numFound:3,start:0,docs:[]


 The partial match fails on dismax and normal query.

 What could I be missing?


 Thanks.

 Mugoma.

Re: Solr string field stripping new lines line breaks

2013-06-19 Thread sodoo

Dears,

My english is bad. But I will try to explain. 

I have indexed databases and files. The files included : docx, pdf, txt.
Then I have indexed all of data.
But my indexed document  pdf files text all of through continued. 

I try to appear line break text. 
Document files text line breaks to indexed document also line breaks. 

My frontend app is SOLARIUM. 

How can I appear line break the indexed data?
Please assist me on this.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-string-field-stripping-new-lines-line-breaks-tp3984384p4071595.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr string field stripping new lines line breaks

2013-06-19 Thread Erick Erickson

First, please start a new thread when you change the topic,
doing so makes the threads easier to track.

But what is your evidence that line breaks are stripped? The
stored data is a verbatim copy of the data that went in to the
field, nothing at all is changed. So one of several things is
happening
1 they may be being stripped by whatever turns the PDF into
a Solr document, SOLARIUM?
2 if you're displaying them in a browser, the line breaks may be
there but just being ignored by the browser.

You could write a very brief SolrJ program or similar and see the
raw output by getting the data directly from your index...

Best
Erick

On Wed, Jun 19, 2013 at 5:50 AM, sodoo first...@yahoo.com wrote:
 Dears,

 My english is bad. But I will try to explain.

 I have indexed databases and files. The files included : docx, pdf, txt.
 Then I have indexed all of data.
 But my indexed document  pdf files text all of through continued.

 I try to appear line break text.
 Document files text line breaks to indexed document also line breaks.

 My frontend app is SOLARIUM.

 How can I appear line break the indexed data?
 Please assist me on this.

 Thank you



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-string-field-stripping-new-lines-line-breaks-tp3984384p4071595.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Error indexing string field

2013-06-11 Thread PeriS

I have a field declared as type string, so should it care whats inside the 
string?

Caused by: java.lang.NumberFormatException: For input string: 1835-1910.

Thanks
-Peri

Re: Error indexing string field

2013-06-11 Thread Chris Hostetter


: I have a field declared as type string, so should it care whats inside the 
string?
: 
: Caused by: java.lang.NumberFormatException: For input string: 1835-1910.

you haven't given us any information we can use to help you...

schema? high level error that wrapped that NFE? full stack trace of the 
entire error? data you are indexing?

https://wiki.apache.org/solr/UsingMailingLists

Best guesses:
 * you aren't indexing into the field you think you are
 * there is a copyField from teh field you are using into another field 
you forgot about
 * you are using an update processor that expects numbers
 * you are using a DataImportHandler feature that expects numbers




-Hoss

Re: string field does not yield exact match result using qf parameter

2013-05-02 Thread kirpakaroji

Hi Jan

my question is when I tweak pf and qf parameter and the results change
slightly and I do not think for exact match you need to implement the
solution that you mentioned in your reply. you can always have string field
and in your pf parameter you can boost that field to get the exact match
results on top.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: string field does not yield exact match result using qf parameter

2013-05-02 Thread Jan Høydahl

Hi,

You can try to increase the pf boost for your string field, I don't think 
you'll have success in having it boosted with pf since it's a string? Check 
explain output with debugQuery=true and see whether you get a phrase boost.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

2. mai 2013 kl. 19:16 skrev kirpakaroji kirpakar...@yahoo.com:

 Hi Jan
 
 my question is when I tweak pf and qf parameter and the results change
 slightly and I do not think for exact match you need to implement the
 solution that you mentioned in your reply. you can always have string field
 and in your pf parameter you can boost that field to get the exact match
 results on top.
 
 Thanks
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html
 Sent from the Solr - User mailing list archive at Nabble.com.

string field does not yield exact match result using qf parameter

2013-04-30 Thread kirpakaroji

  I have a question regarding boosting the exact match queries to top,
followed by partial match and if there is no exact match then give me
partial match. The following 2 solutions have yielded different results, and
I was not clear on it why

   This is the schema I have

   field name=f1 type=string indexed=true stored=true /
   field name=f2 type=text_general indexed=false stored=true
multiValued=true/
   field name=f3 type=pt_field indexed=true stored=true /
   copyField source=f1 dest=f3 /
   uniqueKeyf1/uniqueKey

fieldType name=pt_field class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=0/

filter class=solr.StopFilterFactory ignoreCase=true
words=./lang/stopwords_pt.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.SnowballPorterFilterFactory
language=Portuguese/
  /analyzer
/fieldType

in my solrconfig.xml I have
   str name=dff1/str
   str name=qff1^10 f3^1/str
   str name=pff1^10 f3^1/str

now if I try to specify the query with these parameters in solrconfig.xml,
99% of the time exactmatch first and then partial match 1%of the time the
exact match result is in the index but does not show on the results and does
not give any partial matches for that query either.

But if I make it qf=f3pf=f1^10 f3^1 yields the exactmatch result on top
100% of the time.

   Why I am seeing this behavior.

is there anyway to say qf=f1 on the interface and get only exact results if
present (in this case though f1 is string but the q parameter has spaces. do
I need to use pf field
   I am using dismax query parser.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: string field does not yield exact match result using qf parameter

2013-04-30 Thread Jan Høydahl

Hi,

The pf feature will only kick in for phrases, i.e. multiple tokens. Per 
definition a string is one single token, so it will never kick in for strings.

A workaround can be found here: https://github.com/cominvent/exactmatch

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

30. apr. 2013 kl. 20:52 skrev kirpakaroji kirpakar...@yahoo.com:

  I have a question regarding boosting the exact match queries to top,
 followed by partial match and if there is no exact match then give me
 partial match. The following 2 solutions have yielded different results, and
 I was not clear on it why
 
   This is the schema I have
 
   field name=f1 type=string indexed=true stored=true /
   field name=f2 type=text_general indexed=false stored=true
 multiValued=true/
   field name=f3 type=pt_field indexed=true stored=true /
   copyField source=f1 dest=f3 /
   uniqueKeyf1/uniqueKey
 
fieldType name=pt_field class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0/
 
filter class=solr.StopFilterFactory ignoreCase=true
 words=./lang/stopwords_pt.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.SnowballPorterFilterFactory
 language=Portuguese/
  /analyzer
/fieldType
 
 in my solrconfig.xml I have
   str name=dff1/str
   str name=qff1^10 f3^1/str
   str name=pff1^10 f3^1/str
 
 now if I try to specify the query with these parameters in solrconfig.xml,
 99% of the time exactmatch first and then partial match 1%of the time the
 exact match result is in the index but does not show on the results and does
 not give any partial matches for that query either.
 
But if I make it qf=f3pf=f1^10 f3^1 yields the exactmatch result on top
 100% of the time.
 
   Why I am seeing this behavior.
 
 is there anyway to say qf=f1 on the interface and get only exact results if
 present (in this case though f1 is string but the q parameter has spaces. do
 I need to use pf field
   I am using dismax query parser.
 
 Thanks
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Interesting issue with special characters in a string field value

2013-02-24 Thread Michael Della Bitta

Hello Jack,

I'm not sure if this is an option for you, but if you submit and
retrieve your documents using only SolrJ, you won't have to worry
about escaping them for encoding into a particular document format.
SolrJ would handle that for you.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Sun, Feb 24, 2013 at 12:29 AM, Jack Park jackp...@topicquests.org wrote:
 Ok. I have revisited this issue as deeply as possible using simplistic
 unit tests, tossing out indexes, and starting fresh.

 A typical Solr document might have a label, e.g. the string inside the
 quotes: Node Type.  That would be queried, according to what I've
 been able to read, as a Phrase Query, which means, include the quotes
 around the text.

 When I use the admin query panel with this query:
 label:Node Type
 A fragment of the full document is returned. it is this:

   doc
 str name=locatorNodeType/str
 arr name=label
   strNode Type/str
 /arr

 In my code using SolrJ, I have printlines just as the escaped query
 string comes in, and one which shows what the SolrQuery looks like
 after setting it up to go online. I then show what came back:

 Solr3Client.runQuery- label:Node Type 0 10
 Solr3Client.runQuery-1 q=label%3A%22Node+Type%22start=0rows=10
  {numFound=1,start=0,docs=[SolrDocument{locator=NodeType,
 smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests
 typology node type., isPrivate=false, creatorId=SystemUser, label=Node
 Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST
 2013, createdDate=Sat Feb 23 20:43:22 PST 2013,
 _version_=1427826019119661056}]}

 What that says is that SolrQuery inserted a + inside the query string,
 and that it found 1 document, but did not return it.

 In the largest picture, I have returned to using XMLResponseParser on
 the theory that I will now be able to take advantage of partialUpdates
 on multi-valued fields (ListString) but haven't tested that yet. I
 am not yet escaping such things as  or  but just escaping those
 things mentioned in the Solr documents which are reserved characters.

 So, the current update is this: learning about phrase queries, and
 judicious escaping of reserved characters seems to be helping. Next up
 entails two issues: more robust testing of escaped characters, and
 trying to discover what is the best approach to dealing with
 characters that must be escaped to get past XML, e.g. '', '', and
 others.

 Many thanks
 Jack


 On Fri, Feb 22, 2013 at 2:44 PM, Jack Park jackp...@topicquests.org wrote:
 Michael,
 I don't think you misunderstood. I will soon give a full response here, but
 am on the road at the moment.

 Many thanks
 Jack


 On Friday, February 22, 2013, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 My mistake, I misunderstood the problem.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:

 : If you're submitting documents as XML, you're always going to have to
 : escape meaningful XML characters going in. If you ask for them back as
 : XML, you should be prepared to unescape special XML characters as

 that still wouldn't explain the discrepency he's claiming to see between
 the json  xml resmonses (the json containing an empty string

 Jack: please elaborate with specifics about your solr version, field,
 field type, how you indexed your doc, and what the request urls  raw
 responses that you get are (ie: don't trust the XML you see in your
 browser, it may be unescaping escaped sequences in element text to be
 helpful .. use something like curl)

 For example...

 BEGIN GOOD EXAMPLE OF SPECIFICS---

 I'm using Solr 4.x with the 4.x example schema which has the following
 field...

field name=cat type=string indexed=true stored=true
 multiValued=true/
fieldType name=string class=solr.StrField sortMissingLast=true
 /

 I indexed a doc like this...

 $ curl http://localhost:8983/solr/update?commit=true; -H
 'Content-type:application/json' -d '[{id:hoss, cat:Something to use
 as a source node } ]'

 And this is what i get from the following requests...

 $ curl
 http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true;
 ?xml version=1.0 encoding=UTF-8?
 response

 result name=response numFound=1 start=0
   doc
 str name=idhoss/str
 arr name=cat
   strlt;Something to use as a source nodegt;/str
 /arr
 long name=_version_1427705631375097856/long/doc
 /result
 /response

 $ curl
 http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true;
 {
   response:{numFound:1,start:0,docs:[
   {
 id:hoss,
 cat:[Something to use as a source node],

Re: Interesting issue with special characters in a string field value

2013-02-24 Thread Jack Park

I did run attempt queries with and without escaping at the admin query
browser; made no difference. I seem to recall that the system did not
work without escaping, but it does seem worth blocking escaping and
testing again.

Many thanks
Jack

On Sun, Feb 24, 2013 at 1:16 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Hello Jack,

 I'm not sure if this is an option for you, but if you submit and
 retrieve your documents using only SolrJ, you won't have to worry
 about escaping them for encoding into a particular document format.
 SolrJ would handle that for you.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Sun, Feb 24, 2013 at 12:29 AM, Jack Park jackp...@topicquests.org wrote:
 Ok. I have revisited this issue as deeply as possible using simplistic
 unit tests, tossing out indexes, and starting fresh.

 A typical Solr document might have a label, e.g. the string inside the
 quotes: Node Type.  That would be queried, according to what I've
 been able to read, as a Phrase Query, which means, include the quotes
 around the text.

 When I use the admin query panel with this query:
 label:Node Type
 A fragment of the full document is returned. it is this:

   doc
 str name=locatorNodeType/str
 arr name=label
   strNode Type/str
 /arr

 In my code using SolrJ, I have printlines just as the escaped query
 string comes in, and one which shows what the SolrQuery looks like
 after setting it up to go online. I then show what came back:

 Solr3Client.runQuery- label:Node Type 0 10
 Solr3Client.runQuery-1 q=label%3A%22Node+Type%22start=0rows=10
  {numFound=1,start=0,docs=[SolrDocument{locator=NodeType,
 smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests
 typology node type., isPrivate=false, creatorId=SystemUser, label=Node
 Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST
 2013, createdDate=Sat Feb 23 20:43:22 PST 2013,
 _version_=1427826019119661056}]}

 What that says is that SolrQuery inserted a + inside the query string,
 and that it found 1 document, but did not return it.

 In the largest picture, I have returned to using XMLResponseParser on
 the theory that I will now be able to take advantage of partialUpdates
 on multi-valued fields (ListString) but haven't tested that yet. I
 am not yet escaping such things as  or  but just escaping those
 things mentioned in the Solr documents which are reserved characters.

 So, the current update is this: learning about phrase queries, and
 judicious escaping of reserved characters seems to be helping. Next up
 entails two issues: more robust testing of escaped characters, and
 trying to discover what is the best approach to dealing with
 characters that must be escaped to get past XML, e.g. '', '', and
 others.

 Many thanks
 Jack


 On Fri, Feb 22, 2013 at 2:44 PM, Jack Park jackp...@topicquests.org wrote:
 Michael,
 I don't think you misunderstood. I will soon give a full response here, but
 am on the road at the moment.

 Many thanks
 Jack


 On Friday, February 22, 2013, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 My mistake, I misunderstood the problem.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:

 : If you're submitting documents as XML, you're always going to have to
 : escape meaningful XML characters going in. If you ask for them back as
 : XML, you should be prepared to unescape special XML characters as

 that still wouldn't explain the discrepency he's claiming to see between
 the json  xml resmonses (the json containing an empty string

 Jack: please elaborate with specifics about your solr version, field,
 field type, how you indexed your doc, and what the request urls  raw
 responses that you get are (ie: don't trust the XML you see in your
 browser, it may be unescaping escaped sequences in element text to be
 helpful .. use something like curl)

 For example...

 BEGIN GOOD EXAMPLE OF SPECIFICS---

 I'm using Solr 4.x with the 4.x example schema which has the following
 field...

field name=cat type=string indexed=true stored=true
 multiValued=true/
fieldType name=string class=solr.StrField sortMissingLast=true
 /

 I indexed a doc like this...

 $ curl http://localhost:8983/solr/update?commit=true; -H
 'Content-type:application/json' -d '[{id:hoss, cat:Something to 
 use
 as a source node } ]'

 And this is what i get from the following requests...

 $ curl
 http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true;
 ?xml version=1.0 encoding=UTF-8?
 response

 result name=response numFound=1 start=0
   doc
 str name=idhoss/str
 arr name=cat

Re: Interesting issue with special characters in a string field value

2013-02-23 Thread Jack Park

Ok. I have revisited this issue as deeply as possible using simplistic
unit tests, tossing out indexes, and starting fresh.

A typical Solr document might have a label, e.g. the string inside the
quotes: Node Type.  That would be queried, according to what I've
been able to read, as a Phrase Query, which means, include the quotes
around the text.

When I use the admin query panel with this query:
label:Node Type
A fragment of the full document is returned. it is this:

  doc
str name=locatorNodeType/str
arr name=label
  strNode Type/str
/arr

In my code using SolrJ, I have printlines just as the escaped query
string comes in, and one which shows what the SolrQuery looks like
after setting it up to go online. I then show what came back:

Solr3Client.runQuery- label:Node Type 0 10
Solr3Client.runQuery-1 q=label%3A%22Node+Type%22start=0rows=10
 {numFound=1,start=0,docs=[SolrDocument{locator=NodeType,
smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests
typology node type., isPrivate=false, creatorId=SystemUser, label=Node
Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST
2013, createdDate=Sat Feb 23 20:43:22 PST 2013,
_version_=1427826019119661056}]}

What that says is that SolrQuery inserted a + inside the query string,
and that it found 1 document, but did not return it.

In the largest picture, I have returned to using XMLResponseParser on
the theory that I will now be able to take advantage of partialUpdates
on multi-valued fields (ListString) but haven't tested that yet. I
am not yet escaping such things as  or  but just escaping those
things mentioned in the Solr documents which are reserved characters.

So, the current update is this: learning about phrase queries, and
judicious escaping of reserved characters seems to be helping. Next up
entails two issues: more robust testing of escaped characters, and
trying to discover what is the best approach to dealing with
characters that must be escaped to get past XML, e.g. '', '', and
others.

Many thanks
Jack


On Fri, Feb 22, 2013 at 2:44 PM, Jack Park jackp...@topicquests.org wrote:
 Michael,
 I don't think you misunderstood. I will soon give a full response here, but
 am on the road at the moment.

 Many thanks
 Jack


 On Friday, February 22, 2013, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 My mistake, I misunderstood the problem.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:

 : If you're submitting documents as XML, you're always going to have to
 : escape meaningful XML characters going in. If you ask for them back as
 : XML, you should be prepared to unescape special XML characters as

 that still wouldn't explain the discrepency he's claiming to see between
 the json  xml resmonses (the json containing an empty string

 Jack: please elaborate with specifics about your solr version, field,
 field type, how you indexed your doc, and what the request urls  raw
 responses that you get are (ie: don't trust the XML you see in your
 browser, it may be unescaping escaped sequences in element text to be
 helpful .. use something like curl)

 For example...

 BEGIN GOOD EXAMPLE OF SPECIFICS---

 I'm using Solr 4.x with the 4.x example schema which has the following
 field...

field name=cat type=string indexed=true stored=true
 multiValued=true/
fieldType name=string class=solr.StrField sortMissingLast=true
 /

 I indexed a doc like this...

 $ curl http://localhost:8983/solr/update?commit=true; -H
 'Content-type:application/json' -d '[{id:hoss, cat:Something to use
 as a source node } ]'

 And this is what i get from the following requests...

 $ curl
 http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true;
 ?xml version=1.0 encoding=UTF-8?
 response

 result name=response numFound=1 start=0
   doc
 str name=idhoss/str
 arr name=cat
   strlt;Something to use as a source nodegt;/str
 /arr
 long name=_version_1427705631375097856/long/doc
 /result
 /response

 $ curl
 http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true;
 {
   response:{numFound:1,start:0,docs:[
   {
 id:hoss,
 cat:[Something to use as a source node],
 _version_:1427705631375097856}]
   }}

 $ curl
 http://localhost:8983/solr/select?q=cat:%22Something+to+use+as+a+source+node%22wt=jsonindent=trueomitHeader=true
 {
   response:{numFound:1,start:0,docs:[
   {
 id:hoss,
 cat:[Something to use as a source node],
 _version_:1427705631375097856}]
   }}

 END GOOD EXAMPLE OF SPECIFICS---

 :  Even more curious, if I use this query at the console:
 : 
 :  details:Something to use as a source node
 : 
 :  I get nothing back.

 note in my last example above the importance of

Interesting issue with special characters in a string field value

2013-02-22 Thread Jack Park

I have a multi-value stored field called details

I've been deliberately sending it values like

Something to use as a source node

If I fetch a document with that field at the admin query console,
using XML, I get:

 arr name=details
  strSomething to use as a source node/str
/arr

If I fetch with JSON, I get:
details: [
  
],

Even more curious, if I use this query at the console:

details:Something to use as a source node

I get nothing back.
I think I'm having an identity crisis in relation to escaping
characters at SolrJ. The values are going up, and when the query is to
bring the document back, they come back. But, as individuals values,
they don't appear to submit to query. If I actually escape them going
up, then the document is full of escaped characters, which can be
troublesome when fetching and using.

Any thoughts?

Many thanks
Jack

Re: Interesting issue with special characters in a string field value

2013-02-22 Thread Michael Della Bitta

Hi Jack,

If you're submitting documents as XML, you're always going to have to
escape meaningful XML characters going in. If you ask for them back as
XML, you should be prepared to unescape special XML characters as
output. Same goes for JSON, etc. There's really no way around this...
it's just a fact of life when dealing with document formats.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Fri, Feb 22, 2013 at 3:24 PM, Jack Park jackp...@topicquests.org wrote:
 I have a multi-value stored field called details

 I've been deliberately sending it values like

 Something to use as a source node

 If I fetch a document with that field at the admin query console,
 using XML, I get:

  arr name=details
   strSomething to use as a source node/str
 /arr

 If I fetch with JSON, I get:
 details: [
   
 ],

 Even more curious, if I use this query at the console:

 details:Something to use as a source node

 I get nothing back.
 I think I'm having an identity crisis in relation to escaping
 characters at SolrJ. The values are going up, and when the query is to
 bring the document back, they come back. But, as individuals values,
 they don't appear to submit to query. If I actually escape them going
 up, then the document is full of escaped characters, which can be
 troublesome when fetching and using.

 Any thoughts?

 Many thanks
 Jack

Re: Interesting issue with special characters in a string field value

2013-02-22 Thread Chris Hostetter


: If you're submitting documents as XML, you're always going to have to
: escape meaningful XML characters going in. If you ask for them back as
: XML, you should be prepared to unescape special XML characters as

that still wouldn't explain the discrepency he's claiming to see between 
the json  xml resmonses (the json containing an empty string

Jack: please elaborate with specifics about your solr version, field, 
field type, how you indexed your doc, and what the request urls  raw 
responses that you get are (ie: don't trust the XML you see in your 
browser, it may be unescaping escaped sequences in element text to be 
helpful .. use something like curl)

For example...

BEGIN GOOD EXAMPLE OF SPECIFICS---

I'm using Solr 4.x with the 4.x example schema which has the following 
field...

   field name=cat type=string indexed=true stored=true 
multiValued=true/
   fieldType name=string class=solr.StrField sortMissingLast=true /

I indexed a doc like this...

$ curl http://localhost:8983/solr/update?commit=true; -H 
'Content-type:application/json' -d '[{id:hoss, cat:Something to use as 
a source node } ]'

And this is what i get from the following requests...

$ curl 
http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true;
 
?xml version=1.0 encoding=UTF-8?
response

result name=response numFound=1 start=0
  doc
str name=idhoss/str
arr name=cat
  strlt;Something to use as a source nodegt;/str
/arr
long name=_version_1427705631375097856/long/doc
/result
/response

$ curl 
http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true;
 
{
  response:{numFound:1,start:0,docs:[
  {
id:hoss,
cat:[Something to use as a source node],
_version_:1427705631375097856}]
  }}

$ curl 
http://localhost:8983/solr/select?q=cat:%22Something+to+use+as+a+source+node%22wt=jsonindent=trueomitHeader=true
 
{
  response:{numFound:1,start:0,docs:[
  {
id:hoss,
cat:[Something to use as a source node],
_version_:1427705631375097856}]
  }}

END GOOD EXAMPLE OF SPECIFICS---

:  Even more curious, if I use this query at the console:
: 
:  details:Something to use as a source node
: 
:  I get nothing back.

note in my last example above the importance of using quotes (or the 
{!term} qparser) to query string fields that contain special characters 
like whitespace -- whitespace is syntacally meaningul to the lucene query 
parser, it seperates clauses of a boolean query.


-Hoss

Re: Interesting issue with special characters in a string field value

2013-02-22 Thread Michael Della Bitta

My mistake, I misunderstood the problem.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : If you're submitting documents as XML, you're always going to have to
 : escape meaningful XML characters going in. If you ask for them back as
 : XML, you should be prepared to unescape special XML characters as

 that still wouldn't explain the discrepency he's claiming to see between
 the json  xml resmonses (the json containing an empty string

 Jack: please elaborate with specifics about your solr version, field,
 field type, how you indexed your doc, and what the request urls  raw
 responses that you get are (ie: don't trust the XML you see in your
 browser, it may be unescaping escaped sequences in element text to be
 helpful .. use something like curl)

 For example...

 BEGIN GOOD EXAMPLE OF SPECIFICS---

 I'm using Solr 4.x with the 4.x example schema which has the following
 field...

field name=cat type=string indexed=true stored=true 
 multiValued=true/
fieldType name=string class=solr.StrField sortMissingLast=true /

 I indexed a doc like this...

 $ curl http://localhost:8983/solr/update?commit=true; -H 
 'Content-type:application/json' -d '[{id:hoss, cat:Something to use 
 as a source node } ]'

 And this is what i get from the following requests...

 $ curl 
 http://localhost:8983/solr/select?q=id:hosswt=xmlindent=trueomitHeader=true;
 ?xml version=1.0 encoding=UTF-8?
 response

 result name=response numFound=1 start=0
   doc
 str name=idhoss/str
 arr name=cat
   strlt;Something to use as a source nodegt;/str
 /arr
 long name=_version_1427705631375097856/long/doc
 /result
 /response

 $ curl 
 http://localhost:8983/solr/select?q=id:hosswt=jsonindent=trueomitHeader=true;
 {
   response:{numFound:1,start:0,docs:[
   {
 id:hoss,
 cat:[Something to use as a source node],
 _version_:1427705631375097856}]
   }}

 $ curl 
 http://localhost:8983/solr/select?q=cat:%22Something+to+use+as+a+source+node%22wt=jsonindent=trueomitHeader=true
 {
   response:{numFound:1,start:0,docs:[
   {
 id:hoss,
 cat:[Something to use as a source node],
 _version_:1427705631375097856}]
   }}

 END GOOD EXAMPLE OF SPECIFICS---

 :  Even more curious, if I use this query at the console:
 : 
 :  details:Something to use as a source node
 : 
 :  I get nothing back.

 note in my last example above the importance of using quotes (or the
 {!term} qparser) to query string fields that contain special characters
 like whitespace -- whitespace is syntacally meaningul to the lucene query
 parser, it seperates clauses of a boolean query.


 -Hoss

1 2 >

1 - 100 of 186 matches

Mail list logo