Re: How to get case-sensitive Terms?

2021-02-17 Thread elivis
Alexandre Rafalovitch wrote
> What about copyField with the target being index only (docValue only?) and
> no lowercase on the target field type?
> 
> Solr is not a database, you are optimising for search. So duplicate,
> multi-process, denormalise, create custom field types, etc.
> 
> Regards,
>Alex

Thank you! 

One more question - when we index data, we have some other fields that we
are populating. Our data comes from different inputs, so one of those fields
is a data source ID that the text came from. Wen we do search, we are able
to get search results specific to only that data source by adding filter
query (e.g. fq=image_id:1). However, that doesn't seem to work when doing a
terms query - I always get the terms from the entire index. Is there a way
to filter the terms?

Thank you again.





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Meaning of "Index" flag under properties and schema

2021-02-17 Thread Alexandre Rafalovitch
I wonder if looking more directly at the indexes would allow you to
get closer to the problem source.

Have you tried comparing/exploring the indexes with Luke? It is in the
Lucene distribution (not Solr), and there is a small explanation here:
https://mocobeta.medium.com/luke-become-an-apache-lucene-module-as-of-lucene-8-1-7d139c998b2

Regards,
   Alex.

On Wed, 17 Feb 2021 at 16:58, Vivaldi  wrote:
>
> I was getting “illegal argument exception length must be >= 1” when I used 
> significantTerms streaming expression, from this collection and field. I 
> asked about that as a separate question on this list. I will get the whole 
> exception stack trace the next time I am at the customer site.
>
> Why any other field in other collections doesn’t have that flag? We have 
> numerous indexed, non-indexed, docvalues fields in other collections but not 
> that row
>
> Sent from my iPhone
>
> > On 16 Feb 2021, at 20:42, Shawn Heisey  wrote:
> >
> >> On 2/16/2021 9:16 AM, ufuk yılmaz wrote:
> >> I didn’t realise that, sorry. The table is like:
> >> Flags   Indexed Tokenized   Stored  UnInvertible
> >> Properties  YesYesYes Yes
> >> Schema  YesYesYes Yes
> >> Index   YesYesYes NO
> >> Problematic collection has a Index row under Schema row. No other 
> >> collection has it. I was asking about what the “Index” meant
> >
> > I am not completely sure, but I think that row means the field was found in 
> > the actual Lucene index.
> >
> > In the original message you mentioned "weird exceptions" but didn't include 
> > any information about them.  Can you give us those exceptions, and the 
> > requests that caused them?
> >
> > Thanks,
> > Shawn
>


Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

2021-02-17 Thread David Smiley
I think the issue is this existing bug, but needs to refer to
toSolrInputDocument instead of toSolrDoc:
https://issues.apache.org/jira/browse/SOLR-13034
Highlighting isn't involved; you just need to somehow get a document cached
with lazy fields.  In a test I was able to do this simply by doing a query
that only returns the "id" field.  No highlighting.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Feb 17, 2021 at 10:28 AM David Smiley  wrote:

> Thanks for more details.  I was able to reproduce this locally!  I hacked
> a test to look similar to what you are doing.  BTW it's okay to fill out a
> JIRA imperfectly; they can always be edited :-).  Once I better understand
> the nature of the bug today, I'll file an issue and respond with it here.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Feb 17, 2021 at 6:36 AM Nussbaum, Ronen 
> wrote:
>
>> Hello David,
>>
>> Thank you for your reply.
>> It was very hard but finally I discovered how to reproduce it. I thought
>> of issuing an issue but wasn't sure about the components and priority.
>> I used the "tech products" configset, with the following changes:
>> 1. Added > name="_nest_path_" class="solr.NestPathField" />
>> 2. Added > stored="true" termVectors="true" termOffsets="true" termPositions="true"
>> required="false" multiValued="true" />
>> Than I inserted one document with a nested child e.g.
>> {id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
>>
>> To reproduce:
>> Do a search with surround and unified highlighter:
>>
>> hl.fl=text_en=unified=on=%7B!surround%7Dtext_en%3A4W("solr"%2C"great")
>>
>> Now, try to update the parent e.g. {id:"abc_1", categories_i:{add:1}}
>>
>> Important: it happens only when "id" contains underscore characters! If
>> you'll use "abc-1" it would work.
>>
>> Thanks in advance,
>> Ronen.
>>
>> -Original Message-
>> From: David Smiley 
>> Sent: יום א 14 פברואר 2021 19:17
>> To: solr-user 
>> Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
>> Loading => Invalid Index
>>
>> Hello Ronen,
>>
>> Can you please file a JIRA issue?  Some quick searches did not turn
>> anything up.  It would be super helpful to me if you could list a series of
>> steps with Solr out-of-the-box in 8.8 including what data to index and
>> query.  Solr already includes the "tech products" sample data; maybe that
>> can illustrate the problem?  It's not clear if nested schema or nested docs
>> are actually required in your example.  If you share the JIRA issue with
>> me, I'll chase this one down.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sun, Feb 14, 2021 at 11:16 AM Ronen Nussbaum 
>> wrote:
>>
>> > Hi All,
>> >
>> > I discovered a strange behaviour with this combination.
>> > Not only the atomic update fails, the child documents are not properly
>> > indexed, and you can't use highlights on their text fields. Currently
>> > there is no workaround other than reindex.
>> >
>> > Checked on 8.3.0, 8.6.1 and 8.8.0.
>> > 1. Configure nested schema.
>> > 2. enableLazyFieldLoading is true (default).
>> > 3. Run a search with hl.method=unified and hl.fl=> > fields> 4. Trying to do an atomic update on some of the *parents* of
>> > the returned documents from #3.
>> >
>> > You get an error: "TransactionLog doesn't know how to serialize class
>> > org.apache.lucene.document.LazyDocument$LazyField".
>> >
>> > Now trying to run #3 again yields an error message that the text field
>> > is indexed without positions.
>> >
>> > If enableLazyFieldLoading is false or if using the default highlighter
>> > this doesn't happen.
>> >
>> > Ronen.
>> >
>>
>>
>> This electronic message may contain proprietary and confidential
>> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
>> information is intended to be for the use of the individual(s) or
>> entity(ies) named above. If you are not the intended recipient (or
>> authorized to receive this e-mail for the intended recipient), you may not
>> use, copy, disclose or distribute to anyone this message or any information
>> contained in this message. If you have received this electronic message in
>> error, please notify us by replying to this e-mail.
>>
>


Re: Meaning of "Index" flag under properties and schema

2021-02-17 Thread Vivaldi
I was getting “illegal argument exception length must be >= 1” when I used 
significantTerms streaming expression, from this collection and field. I asked 
about that as a separate question on this list. I will get the whole exception 
stack trace the next time I am at the customer site.

Why any other field in other collections doesn’t have that flag? We have 
numerous indexed, non-indexed, docvalues fields in other collections but not 
that row

Sent from my iPhone

> On 16 Feb 2021, at 20:42, Shawn Heisey  wrote:
> 
>> On 2/16/2021 9:16 AM, ufuk yılmaz wrote:
>> I didn’t realise that, sorry. The table is like:
>> Flags   Indexed Tokenized   Stored  UnInvertible
>> Properties  YesYesYes Yes
>> Schema  YesYesYes Yes
>> Index   YesYesYes NO
>> Problematic collection has a Index row under Schema row. No other collection 
>> has it. I was asking about what the “Index” meant
> 
> I am not completely sure, but I think that row means the field was found in 
> the actual Lucene index.
> 
> In the original message you mentioned "weird exceptions" but didn't include 
> any information about them.  Can you give us those exceptions, and the 
> requests that caused them?
> 
> Thanks,
> Shawn



RE: Is 8.8.x going be stabilized and finalized?

2021-02-17 Thread Subhajit Das
Hi Shawn,

Nice to know that Solr will be considered top level project of Apache.

I asked based on earlier 3 version patterns. Just hoping that 8.8 would be long 
term stable, kind of like 7.7.x line-up.

Thanks for the clarification.

Regards,
Subhajit

From: Shawn Heisey
Sent: 17 February 2021 09:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Is 8.8.x going be stabilized and finalized?

On 2/16/2021 7:57 PM, Subhajit Das wrote:
> I am planning to use 8.8 line-up for production use.
>
> But recently, a lot of people are complaining on 8.7 and 8.8. Also, there is 
> a clearly known issue on 8.8 as well.
>
> Following trends of earlier versions (5.x, 6.x and 7.x), will 8.8 will also 
> be finalized?
> For 5.x, 5.5.x was last version. For 6.x, 6.6.x was last version. For 7.x, 
> 7.7.x was last version. It would match the pattern, it seems.
> And 9.x is already planned and under development.
> And it seems, we require some stability.

All released versions are considered stable.  Sometimes problems are
uncovered after release.  Sometimes BIG problems.  We try our very best
to avoid bugs, but achieving that kind of perfection is nearly
impossible for any software project.

8.8.0 is the most current release.  The 8.8.1 release is underway, but
there's no way I can give you a concrete date.  The announcement MIGHT
come in the next few days, but it's always possible it could get pushed
back.  At this time, the changelog for 8.8.1 has five bugfixes
mentioned.  It should be more stable than 8.8.0, but it's impossible for
me to tell you whether you will have any problems with it.

On the dev list, the project is discussing the start of work on the 9.0
release, but that work has not yet begun.  Even if it started tomorrow,
it would be several weeks, maybe even a few months, before 9.0 is
actually released.  On top of the "normal" headaches involved in any new
major version release, there are some other things going on that might
further delay 9.0 and future 8.x versions:

* Solr is being promoted from a subproject of Lucene to it's own
top-level project at Apache.  This involves a LOT of work.  Much of that
work is administrative in nature, which is going to occupy us and take
away from time that we might spend working on the code and new releases.
* The build system for the master branch, which is currently versioned
as 9.0.0-SNAPSHOT, was recently switched from Ant+Ivy to Gradle.  It's
going to take some time to figure out all the fallout from that migration.
* Some of the devs have been involved in an effort to greatly simplify
and rewrite how SolrCloud does internal management of a cluster.  The
intent is much better stability and better performance.  You might have
seen public messages referring to a "reference implementation."  At this
time, it is unclear how much of that work will make it into 9.0 and how
much will be revealed in later releases.  We would like very much to
include at least the first phase in 9.0 if we can.

 From what I have seen over the last several years as one of the
developers on this project, it is likely that 8.9 and possibly even 8.10
and 8.11 will be released before we see 9.0.  Releases are NOT made on a
specific schedule, so I cannot tell you which versions you will see or
when they might happen.

I am fully aware that despite typing quite a lot of text here, that I
provided almost nothing in the way of concrete information that you can
use.  Sorry about that.

Thanks,
Shawn



Re: Is 8.8.x going be stabilized and finalized?

2021-02-17 Thread Timothy Potter
To add to what Shawn said, RC's are made available to anyone
interested in testing them and that helps us find bugs before release.

RC2 for 8.8.1 is available for testing now, see dev mailing list for location.

Please download it and verify it is stable for your use cases and environment.

Tim

On Tue, Feb 16, 2021 at 9:03 PM Shawn Heisey  wrote:
>
> On 2/16/2021 7:57 PM, Subhajit Das wrote:
> > I am planning to use 8.8 line-up for production use.
> >
> > But recently, a lot of people are complaining on 8.7 and 8.8. Also, there 
> > is a clearly known issue on 8.8 as well.
> >
> > Following trends of earlier versions (5.x, 6.x and 7.x), will 8.8 will also 
> > be finalized?
> > For 5.x, 5.5.x was last version. For 6.x, 6.6.x was last version. For 7.x, 
> > 7.7.x was last version. It would match the pattern, it seems.
> > And 9.x is already planned and under development.
> > And it seems, we require some stability.
>
> All released versions are considered stable.  Sometimes problems are
> uncovered after release.  Sometimes BIG problems.  We try our very best
> to avoid bugs, but achieving that kind of perfection is nearly
> impossible for any software project.
>
> 8.8.0 is the most current release.  The 8.8.1 release is underway, but
> there's no way I can give you a concrete date.  The announcement MIGHT
> come in the next few days, but it's always possible it could get pushed
> back.  At this time, the changelog for 8.8.1 has five bugfixes
> mentioned.  It should be more stable than 8.8.0, but it's impossible for
> me to tell you whether you will have any problems with it.
>
> On the dev list, the project is discussing the start of work on the 9.0
> release, but that work has not yet begun.  Even if it started tomorrow,
> it would be several weeks, maybe even a few months, before 9.0 is
> actually released.  On top of the "normal" headaches involved in any new
> major version release, there are some other things going on that might
> further delay 9.0 and future 8.x versions:
>
> * Solr is being promoted from a subproject of Lucene to it's own
> top-level project at Apache.  This involves a LOT of work.  Much of that
> work is administrative in nature, which is going to occupy us and take
> away from time that we might spend working on the code and new releases.
> * The build system for the master branch, which is currently versioned
> as 9.0.0-SNAPSHOT, was recently switched from Ant+Ivy to Gradle.  It's
> going to take some time to figure out all the fallout from that migration.
> * Some of the devs have been involved in an effort to greatly simplify
> and rewrite how SolrCloud does internal management of a cluster.  The
> intent is much better stability and better performance.  You might have
> seen public messages referring to a "reference implementation."  At this
> time, it is unclear how much of that work will make it into 9.0 and how
> much will be revealed in later releases.  We would like very much to
> include at least the first phase in 9.0 if we can.
>
>  From what I have seen over the last several years as one of the
> developers on this project, it is likely that 8.9 and possibly even 8.10
> and 8.11 will be released before we see 9.0.  Releases are NOT made on a
> specific schedule, so I cannot tell you which versions you will see or
> when they might happen.
>
> I am fully aware that despite typing quite a lot of text here, that I
> provided almost nothing in the way of concrete information that you can
> use.  Sorry about that.
>
> Thanks,
> Shawn


Re: Change field to DocValues

2021-02-17 Thread Mahmoud Almokadem
That's right, I want to avoid a complete reindexing process.
But should I create another field with the docValues property or change the
current field directly?

Can I use streaming expressions to update the whole index or should I
select and update using batches?


Thanks,
Mahmoud


On Wed, Feb 17, 2021 at 4:51 PM xiefengchang 
wrote:

> Hi:
> I think you are just trying to avoid complete re-index right?
> why don't you take a look at this:
> https://lucene.apache.org/solr/guide/8_0/updating-parts-of-documents.html
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2021-02-17 21:14:11, "Mahmoud Almokadem" 
> wrote:
> >Hello,
> >
> >I've an integer field on an index with billions of documents and need to
> do
> >facets on this field, unfortunately the field doesn't have the docValues
> >property, so the FieldCache will be fired and use much memory.
> >
> >What is the best way to change the field to be docValues supported?
> >
> >Regards,
> >Mahmoud
>


Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

2021-02-17 Thread David Smiley
Thanks for more details.  I was able to reproduce this locally!  I hacked a
test to look similar to what you are doing.  BTW it's okay to fill out a
JIRA imperfectly; they can always be edited :-).  Once I better understand
the nature of the bug today, I'll file an issue and respond with it here.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Feb 17, 2021 at 6:36 AM Nussbaum, Ronen 
wrote:

> Hello David,
>
> Thank you for your reply.
> It was very hard but finally I discovered how to reproduce it. I thought
> of issuing an issue but wasn't sure about the components and priority.
> I used the "tech products" configset, with the following changes:
> 1. Added  name="_nest_path_" class="solr.NestPathField" />
> 2. Added  termVectors="true" termOffsets="true" termPositions="true" required="false"
> multiValued="true" />
> Than I inserted one document with a nested child e.g.
> {id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
>
> To reproduce:
> Do a search with surround and unified highlighter:
>
> hl.fl=text_en=unified=on=%7B!surround%7Dtext_en%3A4W("solr"%2C"great")
>
> Now, try to update the parent e.g. {id:"abc_1", categories_i:{add:1}}
>
> Important: it happens only when "id" contains underscore characters! If
> you'll use "abc-1" it would work.
>
> Thanks in advance,
> Ronen.
>
> -Original Message-
> From: David Smiley 
> Sent: יום א 14 פברואר 2021 19:17
> To: solr-user 
> Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
> Loading => Invalid Index
>
> Hello Ronen,
>
> Can you please file a JIRA issue?  Some quick searches did not turn
> anything up.  It would be super helpful to me if you could list a series of
> steps with Solr out-of-the-box in 8.8 including what data to index and
> query.  Solr already includes the "tech products" sample data; maybe that
> can illustrate the problem?  It's not clear if nested schema or nested docs
> are actually required in your example.  If you share the JIRA issue with
> me, I'll chase this one down.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Feb 14, 2021 at 11:16 AM Ronen Nussbaum  wrote:
>
> > Hi All,
> >
> > I discovered a strange behaviour with this combination.
> > Not only the atomic update fails, the child documents are not properly
> > indexed, and you can't use highlights on their text fields. Currently
> > there is no workaround other than reindex.
> >
> > Checked on 8.3.0, 8.6.1 and 8.8.0.
> > 1. Configure nested schema.
> > 2. enableLazyFieldLoading is true (default).
> > 3. Run a search with hl.method=unified and hl.fl= > fields> 4. Trying to do an atomic update on some of the *parents* of
> > the returned documents from #3.
> >
> > You get an error: "TransactionLog doesn't know how to serialize class
> > org.apache.lucene.document.LazyDocument$LazyField".
> >
> > Now trying to run #3 again yields an error message that the text field
> > is indexed without positions.
> >
> > If enableLazyFieldLoading is false or if using the default highlighter
> > this doesn't happen.
> >
> > Ronen.
> >
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


Re:Change field to DocValues

2021-02-17 Thread xiefengchang
Hi:
I think you are just trying to avoid complete re-index right?
why don't you take a look at this: 
https://lucene.apache.org/solr/guide/8_0/updating-parts-of-documents.html

















At 2021-02-17 21:14:11, "Mahmoud Almokadem"  wrote:
>Hello,
>
>I've an integer field on an index with billions of documents and need to do
>facets on this field, unfortunately the field doesn't have the docValues
>property, so the FieldCache will be fired and use much memory.
>
>What is the best way to change the field to be docValues supported?
>
>Regards,
>Mahmoud


Re: [SOLVED] UPDATE collection's Rule-based Replica Placement

2021-02-17 Thread mosheB
Thanks Ilan and Aroop for replying.
So not exactly move but rather *update* the existing set of rules so future
replica placement will enforced by them.
I managed to do so using the  MODIFYCOLLECTION

  
action:

http://:/solr/admin/collections?action=MODIFYCOLLECTION==shard:*,replica:<2,host:*



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Change field to DocValues

2021-02-17 Thread Mahmoud Almokadem
Hello,

I've an integer field on an index with billions of documents and need to do
facets on this field, unfortunately the field doesn't have the docValues
property, so the FieldCache will be fired and use much memory.

What is the best way to change the field to be docValues supported?

Regards,
Mahmoud


RE: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

2021-02-17 Thread Nussbaum, Ronen
Hello David,

Thank you for your reply.
It was very hard but finally I discovered how to reproduce it. I thought of 
issuing an issue but wasn't sure about the components and priority.
I used the "tech products" configset, with the following changes:
1. Added 
2. Added 
Than I inserted one document with a nested child e.g.
{id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}

To reproduce:
Do a search with surround and unified highlighter:
hl.fl=text_en=unified=on=%7B!surround%7Dtext_en%3A4W("solr"%2C"great")

Now, try to update the parent e.g. {id:"abc_1", categories_i:{add:1}}

Important: it happens only when "id" contains underscore characters! If you'll 
use "abc-1" it would work.

Thanks in advance,
Ronen.

-Original Message-
From: David Smiley 
Sent: יום א 14 פברואר 2021 19:17
To: solr-user 
Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading 
=> Invalid Index

Hello Ronen,

Can you please file a JIRA issue?  Some quick searches did not turn anything 
up.  It would be super helpful to me if you could list a series of steps with 
Solr out-of-the-box in 8.8 including what data to index and query.  Solr 
already includes the "tech products" sample data; maybe that can illustrate the 
problem?  It's not clear if nested schema or nested docs are actually required 
in your example.  If you share the JIRA issue with me, I'll chase this one down.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Feb 14, 2021 at 11:16 AM Ronen Nussbaum  wrote:

> Hi All,
>
> I discovered a strange behaviour with this combination.
> Not only the atomic update fails, the child documents are not properly
> indexed, and you can't use highlights on their text fields. Currently
> there is no workaround other than reindex.
>
> Checked on 8.3.0, 8.6.1 and 8.8.0.
> 1. Configure nested schema.
> 2. enableLazyFieldLoading is true (default).
> 3. Run a search with hl.method=unified and hl.fl= fields> 4. Trying to do an atomic update on some of the *parents* of
> the returned documents from #3.
>
> You get an error: "TransactionLog doesn't know how to serialize class
> org.apache.lucene.document.LazyDocument$LazyField".
>
> Now trying to run #3 again yields an error message that the text field
> is indexed without positions.
>
> If enableLazyFieldLoading is false or if using the default highlighter
> this doesn't happen.
>
> Ronen.
>


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.