date:20171004

FilterCache size should reduce as index grows?

2017-10-04 Thread S G

Hi,

Here is a discussion we had recently with a fellow Solr user.
It seems reasonable to me and wanted to see if this is an accepted theory.

The bit-vectors in filterCache are as long as the maximum number of
documents in a core. If there are a billion docs per core, every bit vector
will have a billion bits making its size as 10 9 / 8 = 128 mb
With such a big cache-value per entry,  the default value of 128 values in
will become 128x128mb = 16gb and would not be very good for a system
running below 32 gb of memory.

If such a use-case is anticipated, either the JVM's max memory be increased
to beyond 40 gb or the filterCache size be reduced to 32.

Thanks
SG

Re: tipping point for using solrcloud—or not?

2017-10-04 Thread Shawn Heisey


On 9/29/2017 6:34 AM, John Blythe wrote:

complete noob as to solrcloud here. almost-non-noob on solr in general.

we're experiencing growing pains in our data and am thinking through moving
to solrcloud as a result. i'm hoping to find out if it seems like a good
strategy or if we need to get other areas of interest handled first before
introducing new complexities.


SolrCloud's main advantages are in automation, centralization, and 
eliminating single points of failure. Indexing multiple replicas works 
very differently in cloud than in master/slave, a difference that can 
have both advantages and disadvantages.  It is advantageous in *most* 
situations, but master/slave might have an edge in *some* situations.


For most *new* production setups requiring high availability, I would in 
almost every case recommend SolrCloud. Master/slave is a system that 
works, but the master represents a single point of failure.  If the 
master dies, manual reconfiguration of all machines is usually required 
in order to define a new master.  If you're willing to do some tricks 
with DNS, it might be possible to avoid manual Solr reconfiguration, but 
it is not seamless like SolrCloud, which is a true cluster that has no 
masters and no slaves.


I do not use SolrCloud in most of my setups.  This is only because when 
those setups were designed, SolrCloud was a development dream, something 
that was being worked on in a development branch.  SolrCloud did not 
arrive in a released version until 4.0.0-ALPHA.  If I were designing a 
setup from scratch now, I would definitely build it with SolrCloud.



here's a rundown of things:
- we are on a 30g ram aws instance
- we have ~30g tucked away in the ../solr/server/ dir
- our largest core is 6.8g w/ ~25 segments at any given time. this is also
the core that our business directly runs off of, users interact with, etc.
- 5g is for a logs type of dataset that analytics can be built off of to
help inform the primary core above
- 3g are taken up by 3 different third party sources that we use solr to
warehouse and have available for query for the sake of linking items in our
primary core to these cores for data enrichment
- several others take up < 1g each
- and then we have dev- and demo- flavors for some of these

we had been operating on a 16gb machine till a few weeks ago (actually
bumped while at lucene revolution bc i hadn't noticed how much we'd
outgrown the cache size's needs till the week before!). the load when doing
an import or running our heavier operations is much better and doesn't fall
under the weight of the operations like it had been doing.

we have no master/slave replica. all of our data is 'replicated' by the
fact that it exists in mysql. if solr were to go down it'd be a nice big
fire but one we could recover from within a couple hours by simply
reimporting.


If your business model can tolerate a two hour outage, I am envious.  
That is not something that most businesses can tolerate.  Also, many 
setups cannot do a full rebuild in two hours.  Some kind of replication 
is required for a fault tolerant installation.



i'd like to have a more sophisticated set up in place for fault tolerance
than that, of course. i'd also like to see our heavy, many-query based
operations be speedier and better capable of handling multi-threaded runs
at once w/ ease.

is this a matter of getting still more ram on the machine? cpus for faster
processing? splitting up the read/write operations between master/slave?
going full steam into a solrcloud configuration?

one more note. per discussion at the conference i'm combing through our
configs to make sure we trim any fat we can. also wanting to get
optimization scheduled more regularly to help out w segmentation and
garbage heap. not sure how far those two alone will get us, though


The desire to scale an index, either in size or query load, is not by 
itself a reason to switch to SolrCloud.  Scaling is generally easier to 
manage with cloud, because you just fire up another server, and it is 
immediately part of the cloud, ready for whatever collection changes or 
additions you might need, most of which can be done with requests via 
the HTTP API.  Although performance can improve with SolrCloud, it is 
not usually a *significant* improvement, assuming that the distribution 
of data and the number/configuration of servers are similar between 
master/slave and SolrCloud.


If you rearrange the data or upgrade/add server hardware *with* the 
switch to SolrCloud, then any significant performance improvement is 
probably not attributable to SolrCloud, but to the other changes.


If all your homegrown tools are designed around non-cloud setups, you 
might find it very painful to switch.  Some things require different 
HTTP APIs, and the APIs that you might already use could have different 
responses or require slightly different information in the request.


RAM is the resource with the most impact on Solr performance.  CPU is

Re: Solr test runs: test skipping logic

2017-10-04 Thread Erick Erickson

There are some tests annotated @Nightly or @Weekly, or @Slow, is there
a correlation to those?

Best,
Erick

On Wed, Oct 4, 2017 at 8:59 AM, Nawab Zada Asad Iqbal  wrote:
> Hi,
>
> I am seeing that in different test runs (e.g., by executing 'ant test' on
> the root folder in 'lucene-solr') a different subset of tests are skipped.
> Where can I find more about it? I am trying to create parity between test
> successes before and after my changes and this is causing  confusion.
>
>
> Thanks
> Nawab

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Steve Rowe

When I run those commands (on Debian Linux 8.9, with Maven v3.0.5 and Oracle 
JDK 1.8.0.77), I get:

-
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:19.741s
-

Are you on the master branch?  Have you modified the source? 

--
Steve
www.lucidworks.com

> On Oct 4, 2017, at 8:25 PM, Nawab Zada Asad Iqbal  wrote:
> 
> Hi Steve,
> 
> I did this:
> 
> ant get-maven-poms
>  cd maven-build/
>  mvn -DskipTests install
> 
> On Wed, Oct 4, 2017 at 4:56 PM, Steve Rowe  wrote:
> 
>> Hi Nawab,
>> 
>>> On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal 
>> wrote:
>>> 
>>> I am hitting following error with maven build:
>>> Is that expected?
>> 
>> No.  What commands did you use?
>> 
>>> Can someone share me the details about how
>>> https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
>> 
>> The Jenkins job runs the equivalent of the following:
>> 
>> ant jenkins-maven-nightly -Dm2.repository.id=apache.snapshots.https
>> -Dm2.repository.url=https://repository.apache.org/content/
>> repositories/snapshots
>> -DskipTests=true
>> 
>> This in turn runs the equivalent of the following:
>> 
>> ant get-maven-poms
>> mvn -f maven-build/pom.xml -fae  -Dm2.repository.id=apache.snapshots.https
>> -Dm2.repository.url=https://repository.apache.org/content/
>> repositories/snapshots
>> -DskipTests=true install
>> 
>> Note that tests are not run, and that artifacts are published to the
>> Apache sandbox repository.
>> 
>> --
>> Steve
>> www.lucidworks.com

Re: ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Rick Leir


Ascot,

At the risk of ...   Can you disable Kerberos in Hbase? If not, then you 
will have to provide a password!


Rick


On 2017-10-04 07:32 PM, Ascot Moss wrote:

Does anyone use hbase indexer in index kerberos Hbase to solr?

Pls help!

On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss  wrote:


Hi,

I am trying to use hbase-indexer to index hbase table to Solr,

Solr 6.6
Hbase-Indexer 1.6
Hbase 1.2.5 with Kerberos enabled,


After putting new test rows into the Hbase table, I got the following
error from hbase-indexer thus it cannot write the data to solr :

WARN ipc.AbstractRpcClient: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused
by GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]

ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely
cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChalleng
e(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConn
ect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSa
slConnection(RpcClientImpl.java:609)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$
600(RpcClientImpl.java:154)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(
RpcClientImpl.java:735)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(
RpcClientImpl.java:732)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
upInformation.java:1698)
at


Any idea how to resolve this issue?

Regards

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Nawab Zada Asad Iqbal

Hi Steve,

I did this:

ant get-maven-poms
  cd maven-build/
  mvn -DskipTests install

On Wed, Oct 4, 2017 at 4:56 PM, Steve Rowe  wrote:

> Hi Nawab,
>
> > On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal 
> wrote:
> >
> > I am hitting following error with maven build:
> > Is that expected?
>
> No.  What commands did you use?
>
> > Can someone share me the details about how
> > https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
>
> The Jenkins job runs the equivalent of the following:
>
> ant jenkins-maven-nightly -Dm2.repository.id=apache.snapshots.https
> -Dm2.repository.url=https://repository.apache.org/content/
> repositories/snapshots
> -DskipTests=true
>
> This in turn runs the equivalent of the following:
>
> ant get-maven-poms
> mvn -f maven-build/pom.xml -fae  -Dm2.repository.id=apache.snapshots.https
> -Dm2.repository.url=https://repository.apache.org/content/
> repositories/snapshots
> -DskipTests=true install
>
> Note that tests are not run, and that artifacts are published to the
> Apache sandbox repository.
>
> --
> Steve
> www.lucidworks.com

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Steve Rowe

Hi Nawab,

> On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal  wrote:
> 
> I am hitting following error with maven build:
> Is that expected?

No.  What commands did you use?

> Can someone share me the details about how
> https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.

The Jenkins job runs the equivalent of the following:

ant jenkins-maven-nightly -Dm2.repository.id=apache.snapshots.https
-Dm2.repository.url=https://repository.apache.org/content/repositories/snapshots
-DskipTests=true

This in turn runs the equivalent of the following:

ant get-maven-poms
mvn -f maven-build/pom.xml -fae  -Dm2.repository.id=apache.snapshots.https
-Dm2.repository.url=https://repository.apache.org/content/repositories/snapshots
-DskipTests=true install

Note that tests are not run, and that artifacts are published to the Apache 
sandbox repository.

--
Steve
www.lucidworks.com

Re: Jenkins setup for continuous build

2017-10-04 Thread Nawab Zada Asad Iqbal

I looked at
https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console and
decided to switch to maven. However my maven build (without jenkins) is
failing with this error:

[INFO] Scanning classes for violations...
[ERROR] Forbidden class/interface use: org.bouncycastle.util.Strings
[non-portable or internal runtime class]
[ERROR]   in
org.apache.solr.response.TestCustomDocTransformer$CustomTransformerFactory
(TestCustomDocTransformer.java:78)
[ERROR] Scanned 1290 (and 2112 related) class file(s) for forbidden API
invocations (in 2.74s), 1 error(s).
[INFO]



Is that expected? Can someone share me the details about how
https://builds.apache.org/job/Lucene-Solr-Maven-master is configured




On Wed, Oct 4, 2017 at 9:14 AM, Nawab Zada Asad Iqbal 
wrote:

> Hi,
>
> I have some custom code in solr (which is not of good quality for
> contributing back) so I need to setup my own continuous build solution. I
> tried jenkins and was hoping that ant build (ant clean compile) in Execute
> Shell textbox will work, but I am stuck at this ivy-fail error:
>
> To work around it, I also added another step in the 'Execute Shell' (ant
> ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the
> following error. I guess that I am not alone in doing this so there should
> be some standard work around for this.
>
> ivy-fail:
>  [echo]
>  [echo]  This build requires Ivy and Ivy could not be found in your 
> ant classpath.
>  [echo]
>  [echo]  (Due to classpath issues and the recursive nature of the 
> Lucene/Solr
>  [echo]  build system, a local copy of Ivy can not be used an loaded 
> dynamically
>  [echo]  by the build.xml)
>  [echo]
>  [echo]  You can either manually install a copy of Ivy 2.3.0 in your 
> ant classpath:
>  [echo]http://ant.apache.org/manual/install.html#optionalTasks
>  [echo]
>  [echo]  Or this build file can do it for you by running the Ivy 
> Bootstrap target:
>  [echo]ant ivy-bootstrap
>  [echo]
>  [echo]  Either way you will only have to install Ivy one time.
>  [echo]
>  [echo]  'ant ivy-bootstrap' will install a copy of Ivy into your Ant 
> User Library:
>  [echo]/home/jenkins/.ant/lib
>  [echo]
>  [echo]  If you would prefer, you can have it installed into an 
> alternative
>  [echo]  directory using the 
> "-Divy_install_path=/some/path/you/choose" option,
>  [echo]  but you will have to specify this path every time you build 
> Lucene/Solr
>  [echo]  in the future...
>  [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]
>  [echo]  If you have already run ivy-bootstrap, and still get this 
> message, please
>  [echo]  try using the "--noconfig" option when running ant, or 
> editing your global
>  [echo]  ant config to allow the user lib to be loaded.  See the wiki 
> for more details:
>  [echo]
> http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F
>  [echo]
>
>
>
>

Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Nawab Zada Asad Iqbal

So, i looked at this setup
https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console which
is using Maven, so i switched to maven too.

I am hitting following error with maven build:
Is that expected? Can someone share me the details about how
https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
Thanks.

[INFO] Scanning classes for violations...
[ERROR] Forbidden class/interface use: org.bouncycastle.util.Strings
[non-portable or internal runtime class]
[ERROR]   in
org.apache.solr.response.TestCustomDocTransformer$CustomTransformerFactory
(TestCustomDocTransformer.java:78)
[ERROR] Scanned 1290 (and 2112 related) class file(s) for forbidden API
invocations (in 2.74s), 1 error(s).



On Wed, Oct 4, 2017 at 9:14 AM, Nawab Zada Asad Iqbal 
wrote:

> Hi,
>
> I have some custom code in solr (which is not of good quality for
> contributing back) so I need to setup my own continuous build solution. I
> tried jenkins and was hoping that ant build (ant clean compile) in Execute
> Shell textbox will work, but I am stuck at this ivy-fail error:
>
> To work around it, I also added another step in the 'Execute Shell' (ant
> ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the
> following error. I guess that I am not alone in doing this so there should
> be some standard work around for this.
>
> ivy-fail:
>  [echo]
>  [echo]  This build requires Ivy and Ivy could not be found in your 
> ant classpath.
>  [echo]
>  [echo]  (Due to classpath issues and the recursive nature of the 
> Lucene/Solr
>  [echo]  build system, a local copy of Ivy can not be used an loaded 
> dynamically
>  [echo]  by the build.xml)
>  [echo]
>  [echo]  You can either manually install a copy of Ivy 2.3.0 in your 
> ant classpath:
>  [echo]http://ant.apache.org/manual/install.html#optionalTasks
>  [echo]
>  [echo]  Or this build file can do it for you by running the Ivy 
> Bootstrap target:
>  [echo]ant ivy-bootstrap
>  [echo]
>  [echo]  Either way you will only have to install Ivy one time.
>  [echo]
>  [echo]  'ant ivy-bootstrap' will install a copy of Ivy into your Ant 
> User Library:
>  [echo]/home/jenkins/.ant/lib
>  [echo]
>  [echo]  If you would prefer, you can have it installed into an 
> alternative
>  [echo]  directory using the 
> "-Divy_install_path=/some/path/you/choose" option,
>  [echo]  but you will have to specify this path every time you build 
> Lucene/Solr
>  [echo]  in the future...
>  [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]
>  [echo]  If you have already run ivy-bootstrap, and still get this 
> message, please
>  [echo]  try using the "--noconfig" option when running ant, or 
> editing your global
>  [echo]  ant config to allow the user lib to be loaded.  See the wiki 
> for more details:
>  [echo]
> http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F
>  [echo]
>
>
>
>

Re: ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Ascot Moss

Does anyone use hbase indexer in index kerberos Hbase to solr?

Pls help!

On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss  wrote:

> Hi,
>
> I am trying to use hbase-indexer to index hbase table to Solr,
>
> Solr 6.6
> Hbase-Indexer 1.6
> Hbase 1.2.5 with Kerberos enabled,
>
>
> After putting new test rows into the Hbase table, I got the following
> error from hbase-indexer thus it cannot write the data to solr :
>
> WARN ipc.AbstractRpcClient: Exception encountered while connecting to the
> server : javax.security.sasl.SaslException: GSS initiate failed [Caused
> by GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
>
> ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely
> cause is missing or invalid credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
> at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChalleng
> e(GssKrb5Client.java:211)
> at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConn
> ect(HBaseSaslRpcClient.java:179)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSa
> slConnection(RpcClientImpl.java:609)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$
> 600(RpcClientImpl.java:154)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(
> RpcClientImpl.java:735)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(
> RpcClientImpl.java:732)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> upInformation.java:1698)
> at
>
>
> Any idea how to resolve this issue?
>
> Regards
>

Re: Solr Spatial Query Problem Hk.

2017-10-04 Thread David Smiley

Hi,

Firstly, if Solr returns an error referencing an exception then you can
look in Solr's logs for the stack trace, which helps debugging problems a
ton (at least for Solr devs).

I suspect that the problem here is that your schema might have a dynamic
field where *coordinates is defined to be a number.  The error suggests
this, at least.

On Wed, Sep 27, 2017 at 6:42 AM Can Ezgi Aydemir 
wrote:

> 1-
> http://localhost:8983/solr/nh/select?fq=geometry.coordinates:%22IsWithin(POLYGON((-80%2029,%20-90%2050,%20-60%2070,%200%200,%20-80%2029)))%20distErrPct=0%22


missing q=*:*


>
> 2-
> http://localhost:8983/solr/nh/select?q={!field%20f=geometry.coordinates}Intersects(POLYGON((-80%2029,%20-90%2050,%20-60%2070,%200%200,%20-80%2029)))
> 
> 3-
> http://localhost:8983/solr/nh/select?q=*:*={!field%20f=geometry.coordinates}Intersects(POLYGON((-80%2029,%20-90%2050,%20-60%2070,%200%200,%20-80%2029)))
> 
>
> 
>  
>   400
>   1
>   
>geometry.coordinates:"IsWithin(POLYGON((-80 29, -90 50,
> -60 70, 0 0, -80 29))) distErrPct=0"
>
>   
>  
>  
>   
>org.apache.solr.common.SolrException
>org.apache.solr.common.SolrException
>   
>   Invalid Number: IsWithin(POLYGON((-80 29, -90 50, -60
> 70, 0 0, -80 29))) distErrPct=0
>   400
>  
> 
>
>
>
> [cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.]
>
> Can Ezgi AYDEMİR
> Oracle Veri Tabanı Yöneticisi
>
> İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
> 2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye
> T : 0 312 233 50 00 .:. F : 0312 235 56 82
> E :  cayde...@islem.com.tr<
> https://mail.islem.com.tr/owa/redir.aspx?REF=nPSsfnBmV5Ce9vWorvlOrrYthN1Wt5jhrDrHz4IuPgJuXODmM8nUCAFtYWlsdG86Z2R1cmFuQGlzbGVtLmNvbS50cg..>
> .:. W : https://mail.islem.com.tr/owa/redir.aspx?REF=q0Pp2HH-W10G07gbyIRn7NyrFWyaL2QLhqXKE1SMNj1uXODmM8nUCAFodHRwOi8vd3d3LmlzbGVtLmNvbS50ci8
> .>
>
> Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece
> adreslenen kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu
> e-postayı yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta
> yanlışlıkla size gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları
> sisteminizden siliniz ve göndereni hemen bilgilendiriniz. Ayrıca, bu
> e-posta ve ekindeki dosyaları virüs bulaşması ihtimaline karşı taratınız.
> İŞLEM GIS® bu e-posta ile taşınabilecek herhangi bir virüsün neden
> olabileceği hasarın sorumluluğunu kabul etmez. Bilgi iç
> in:b...@islem.com.tr This message may contain confidential information
> and is intended only for recipient name. If you are not the named addressee
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately if you have received this e-mail by mistake and
> delete this e-mail from your system. Finally, the recipient should check
> this email and any attachments for the presence of viruses. İŞLEM GIS®
> accepts no liability for any damage may be caused by any virus transmitted
> by this email.” For information: b...@islem.com.tr
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Complexphrase treats wildcards differently than other query parsers

2017-10-04 Thread Bjarke Buur Mortensen

Hi list,

I'm trying to search for the term funktionsnedsättning*
In my analyzer chain I use a MappingCharFilterFactory to change ä to a.
So I would expect that funktionsnedsättning* would translate to
funktionsnedsattning*.

If I use e.g. the lucene query parser, this is indeed what happens:
...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me
"rawquerystring":"funktionsnedsättning*", "querystring":
"funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsattning*"
and 15 documents returned.

Trying the same with complexphrase gives me:
...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning* gives me
"rawquerystring":"funktionsnedsättning*", "querystring":
"funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsättning*"
and 0 documents. Notice how ä has not been changed to a.

How can this be? Is complexphrase somehow skipping the analysis chain for
multiterms, even though components and in particular
MappingCharFilterFactory are Multi-term aware

Are there any configuration gotchas that I'm not aware of?

Thanks for the help,
Bjarke Buur Mortensen
Senior Software Engineer, Eluence A/S

Re: length of indexed value

2017-10-04 Thread John Blythe

ah, thanks for the link.

--
John Blythe

On Wed, Oct 4, 2017 at 9:23 AM, Erick Erickson 
wrote:

> Check. The problem is they don't encode the exact length. I _think_
> this patch shows you'd be OK with shorter lengths, but check:
> https://issues.apache.org/jira/browse/LUCENE-7730.
>
> Note it's not the patch that counts here, just look at the table of
> lengths.
>
> Best,
> Erick
>
> On Wed, Oct 4, 2017 at 4:25 AM, John Blythe  wrote:
> > interesting idea.
> >
> > the field in question is one that can have a good deal of stray zeros
> based
> > on distributor skus for a product and bad entries from those entering
> them.
> > part of the matching logic for some operations look for these
> discrepancies
> > by having a simple regex that removes zeroes. so 400010 can match with
> > 40010 (and rightly so). issues come in the form of rare cases where 41
> is a
> > sku by the same distributor or manufacturer and thus can end up being an
> > erroneous match. having a means of looking at the length would help to
> know
> > that going from 6 characters to 2 is too far a leap to be counted as a
> > match.
> >
> > --
> > John Blythe
> >
> > On Wed, Oct 4, 2017 at 6:22 AM, alessandro.benedetti <
> a.benede...@sease.io>
> > wrote:
> >
> >> Are the norms a good approximation for you ?
> >> If you preserve norms at indexing time ( it is a configuration that you
> can
> >> operate in the schema.xml) you can retrieve them with this specific
> >> function
> >> query :
> >>
> >> *norm(field)*
> >> Returns the "norm" stored in the index for the specified field. This is
> the
> >> product of the index time boost and the length normalization factor,
> >> according to the Similarity for the field.
> >> norm(fieldName)
> >>
> >> This will not be the exact length of the field, but it can be a good
> >> approximation though.
> >>
> >> Cheers
> >>
> >>
> >>
> >> -
> >> ---
> >> Alessandro Benedetti
> >> Search Consultant, R Software Engineer, Director
> >> Sease Ltd. - www.sease.io
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
>

RE: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-10-04 Thread Chris Hostetter



: Ok, it has been resolved. I was lucky to have spotted i was looking at 
: the wrong schema fike! The one the test actually used was not yet 
: updated from Trie to Point!

And boom goes the dynamite.

This is a prime example of where having assumptions in your code (that the 
field type will by an IntPoint field) can bite you in the ass if/hen the 
schema changes (or when it was never changed and still used old TrieInt) 
but if you use proper delgation to get hte SchemaField from the schema, 
and then ask the SchemaField's FieldType to build you a range query) 
everything will work even if/when the schema is edited.


: 
: Thanks!
: Markus
: 
:  
:  
: -Original message-
: > From:Markus Jelsma 
: > Sent: Tuesday 3rd October 2017 15:18
: > To: solr-user@lucene.apache.org; Solr-user 
: > Subject: RE: Moving to Point, trouble with IntPoint.newRangeQuery()
: > 
: > Ok, i have stripped down the QParser to demonstrate the problem. This is 
the basic test with only one document in the index:
: > 
: >   public void testPointRange() throws Exception {
: > assertU(adoc("id", "8", "digest1", "-1820898630"));
: > assertU(commit());
: > assertQ(
: > req("q", "{!qdigest field=digest1}", "debug", "true", "indent", 
"true"), 
: > "//result/doc[1]/str[@name='id'][.='8']");
: >   }
: > 
: > The following parse() implementation passes (because it simply uses 
LuceneQParser syntax):
: > 
: >   @Override
: >   public Query parse() throws SyntaxError {
: > QParser luceneQParser = new LuceneQParser("digest1:[-1820898630 TO 
-1820898630]", localParams, params, req);
: > return luceneQParser.parse();
: >   }
: > 
: > But when i switch to a BooleanQuery with just one RangeQuery, it fails:
: > 
: >   @Override
: >   public Query parse() throws SyntaxError {
: > BooleanQuery.Builder builder = new BooleanQuery.Builder();
: > Query pointQuery = IntPoint.newRangeQuery("digest1", -1820898630, 
-1820898630);
: > builder.add(pointQuery, Occur.SHOULD);
: > return builder.build();
: >   }
: > 
: > I might be overlooking things but i really don't so the problem with the 
second parse() impl.
: > 
: > What am i doing wrong?
: > 
: > Many thanks,
: > Markus
: > 
: >  
: >  
: > -Original message-
: > > From:Chris Hostetter 
: > > Sent: Tuesday 26th September 2017 18:52
: > > To: Solr-user 
: > > Subject: Re: Moving to Point, trouble with IntPoint.newRangeQuery()
: > > 
: > > 
: > > : I have a QParser impl. that transforms text input to one or more 
: > > : integers, it makes a BooleanQuery one a field with all integers in 
: > > : OR-more. It used to work by transforming the integer using 
: > > : LegacyNumericUtils.intToPrefixCoded, getting a BytesRef.
: > > : 
: > > : I have now moved it to use IntPoint.newRangeQuery(field, integer, 
: > > : integer), i read (think javadocs) this is the way to go, but i get no 
: > > : matches!
: > > 
: > > As a general point: if you want to do this in a modular way, you should 
: > > fetch the FieldType from the IndexSchema and use the 
: > > FieldType.getRangeQuery(...) method.
: > > 
: > > That said -- at a quick glance, knowing how your schema is defined, i'm 
: > > not sure why/how your IntPoint.newRangeQuery() code would fail.
: > > 
: > > Maybe add a lower level test of the QParser directly and assert some 
: > > explicit properties on the Query objects? (class, etc...)
: > > 
: > > 
: > > 
: > > -Hoss
: > > http://www.lucidworks.com/
: > > 
: > 
: 

-Hoss
http://www.lucidworks.com/

Jenkins setup for continuous build

2017-10-04 Thread Nawab Zada Asad Iqbal

Hi,

I have some custom code in solr (which is not of good quality for
contributing back) so I need to setup my own continuous build solution. I
tried jenkins and was hoping that ant build (ant clean compile) in Execute
Shell textbox will work, but I am stuck at this ivy-fail error:

To work around it, I also added another step in the 'Execute Shell' (ant
ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the
following error. I guess that I am not alone in doing this so there should
be some standard work around for this.

ivy-fail:
 [echo]
 [echo]  This build requires Ivy and Ivy could not be found in
your ant classpath.
 [echo]
 [echo]  (Due to classpath issues and the recursive nature of
the Lucene/Solr
 [echo]  build system, a local copy of Ivy can not be used an
loaded dynamically
 [echo]  by the build.xml)
 [echo]
 [echo]  You can either manually install a copy of Ivy 2.3.0
in your ant classpath:
 [echo]http://ant.apache.org/manual/install.html#optionalTasks
 [echo]
 [echo]  Or this build file can do it for you by running the
Ivy Bootstrap target:
 [echo]ant ivy-bootstrap
 [echo]
 [echo]  Either way you will only have to install Ivy one time.
 [echo]
 [echo]  'ant ivy-bootstrap' will install a copy of Ivy into
your Ant User Library:
 [echo]/home/jenkins/.ant/lib
 [echo]
 [echo]  If you would prefer, you can have it installed into
an alternative
 [echo]  directory using the
"-Divy_install_path=/some/path/you/choose" option,
 [echo]  but you will have to specify this path every time you
build Lucene/Solr
 [echo]  in the future...
 [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]
 [echo]  If you have already run ivy-bootstrap, and still get
this message, please
 [echo]  try using the "--noconfig" option when running ant,
or editing your global
 [echo]  ant config to allow the user lib to be loaded.  See
the wiki for more details:
 [echo]
http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F
 [echo]

Solr test runs: test skipping logic

2017-10-04 Thread Nawab Zada Asad Iqbal

Hi,

I am seeing that in different test runs (e.g., by executing 'ant test' on
the root folder in 'lucene-solr') a different subset of tests are skipped.
Where can I find more about it? I am trying to create parity between test
successes before and after my changes and this is causing  confusion.


Thanks
Nawab

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma

Well, that made a difference! Now we're back at 64 MB per replica.

Thanks,
Markus
 
 
-Original message-
> From:Erick Erickson 
> Sent: Wednesday 4th October 2017 16:19
> To: solr-user 
> Subject: Re: Very high number of deleted docs
> 
> Hmmm, OK,  I stand corrected.
> 
> This is odd, though. I suspect a quirk in the merging algorithm when
> you have a small index..
> 
> Ahh, wait. What happens if you modify the segments per tier parameter
> of TMP? The default is 10, and perhaps because this is such a small
> index you don't have very many like-sized segments to merge after your
> periodic run. Setting segs per tier to a much lower number (like 2)
> might kick in the background merging. It'll make more I/O during
> indexing happen of course.
> 
> Best,
> Erick
> 
> On Wed, Oct 4, 2017 at 7:09 AM, Markus Jelsma
>  wrote:
> > No, that collection never receives a forceMerge nor expungeDeletes. Almost 
> > all (99.999%) documents are overwritten every 90 minutes.
> >
> > A single shard has 16k docs (97k total) but is only 300 MB large. Maybe 
> > that's a problem there.
> >
> > I can simply turn a switch to forgeMerge after the periodic update cycle, 
> > but i preferred Lucene to do it for me.
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> >> From:Erick Erickson 
> >> Sent: Wednesday 4th October 2017 14:56
> >> To: solr-user 
> >> Subject: Re: Very high number of deleted docs
> >>
> >> Did you _ever_ do a forceMerge/optimize or expungeDeletes?
> >>
> >> Here's the problem TieredMergePolicy (TMP) has a maximum segment size
> >> it will allow, 5G by default. No segment is even considered for
> >> merging unless it has < 2.5G (or half whatever the default is)
> >> non-deleted docs, the logic being that to merge similar size segments,
> >> each has to be less than half the max size.
> >>
> >> However, optimize/forceMerge and expungeDeletes do not have a limit on
> >> the segment size. So say you optimize at some point and have a 100G
> >> segment. It won't get merged until you have 97.5G worth of deleted
> >> docs.
> >>
> >> More here:
> >> https://issues.apache.org/jira/browse/LUCENE-7976
> >>
> >> Erick
> >>
> >> On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
> >>  wrote:
> >> > Do you mean a periodic forceMerge? That is usually considered a bad 
> >> > habit on this list (i agree). It is just that i am actually very 
> >> > surprised this can happen at all with default settings. This factory, 
> >> > unfortunately does not seem to support settings configured in solrconfig.
> >> >
> >> > Thanks,
> >> > Markus
> >> >
> >> > -Original message-
> >> >> From:Amrit Sarkar 
> >> >> Sent: Wednesday 4th October 2017 14:42
> >> >> To: solr-user@lucene.apache.org
> >> >> Subject: Re: Very high number of deleted docs
> >> >>
> >> >> Hi Markus,
> >> >>
> >> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects 
> >> >> segments
> >> >> about to merge priority. Optimising index time by time, preferably
> >> >> scheduling weekly / fortnight / ..., at low traffic period to never be 
> >> >> in
> >> >> such odd position of 80% deleted docs in total index.
> >> >>
> >> >> Amrit Sarkar
> >> >> Search Engineer
> >> >> Lucidworks, Inc.
> >> >> 415-589-9269
> >> >> www.lucidworks.com
> >> >> Twitter http://twitter.com/lucidworks
> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> >>
> >> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
> >> >> emir.arnauto...@sematext.com> wrote:
> >> >>
> >> >> > Hi Markus,
> >> >> > You can set reclaimDeletesWeight in merge settings to some higher 
> >> >> > value
> >> >> > than default (I think it is 2) to favor segments with deleted docs 
> >> >> > when
> >> >> > merging.
> >> >> >
> >> >> > HTH,
> >> >> > Emir
> >> >> > --
> >> >> > Monitoring - Log Management - Alerting - Anomaly Detection
> >> >> > Solr & Elasticsearch Consulting Support Training - 
> >> >> > http://sematext.com/
> >> >> >
> >> >> >
> >> >> >
> >> >> > > On 4 Oct 2017, at 13:31, Markus Jelsma 
> >> >> > wrote:
> >> >> > >
> >> >> > > Hello,
> >> >> > >
> >> >> > > Using a 6.6.0, i just spotted one of our collections having a core 
> >> >> > > of
> >> >> > which over 80 % of the total number of documents were deleted 
> >> >> > documents.
> >> >> > >
> >> >> > > It has  >> >> > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
> >> >> > configured with no non-default settings.
> >> >> > >
> >> >> > > Is this supposed to happen? How can i prevent these kind of numbers?
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Markus
> >> >> >
> >> >> >
> >> >>
> >>
>

RE: Default value from another field?

2017-10-04 Thread jimi.hullegard

Thank you Alexandre! It worked great. :)

And here is how it is configured, if someone else wants to do this, but is too 
busy to read the documentation for these classes:



source_field
target_field


 target_field 

[...]


/Jimi

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, October 3, 2017 7:02 PM
To: solr-user 
Subject: Re: Default value from another field?

I believe you should be able to use a combination of:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
and
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html

So, if you have no value in target field, you end up with one (copied one). And 
if you did, you end up with one (original one).

This is available in the stock Solr, just need to configure it.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 3 October 2017 at 12:14,   wrote:
> Hi Emir,
>
> Thanks for the tip about DefaultValueUpdateProcessorFactory. But even though 
> I agree that it most likely isn't too hard to write custom code that does 
> this, the overhead is a bit too much I think considering we now use a vanilla 
> Solr with no custom code deployed. So we would need to setup a new project, 
> and a new deployment procedure, and that is a bit overkill considering that 
> this is a feature that would help me as a developer and administrator only a 
> little bit (ie nice-to-have), and at the same time would not have any impact 
> for any end user (except, possibly negative side effects because of bugs etc).
>
> Regards
> /Jimi
>
>
> -Original Message-
> From: Emir Arnautović [mailto:emir.arnauto...@sematext.com]
> Sent: Tuesday, October 3, 2017 3:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Default value from another field?
>
> Hi Jimi,
> I don’t think that you can do it using schema, but you could do it 
> using custom update request processor chain. I quickly scanned to see 
> if there is such processor and could not find one. The closest one is 
> https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update
> /processor/DefaultValueUpdateProcessorFactory.html 
>  e/processor/DefaultValueUpdateProcessorFactory.html>
> It should not be too hard to adjust it to do what you need.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 3 Oct 2017, at 14:10, jimi.hulleg...@svensktnaringsliv.se wrote:
>>
>> Hi,
>>
>> Is it possible using some Solr schema magic to make solr get the default 
>> value for a field from another field? Ie, if the value is specified in the 
>> document to be indexed, then that value is used. Otherwise it uses the value 
>> of another field. As far as I understand it, the field property "default" 
>> only takes a static value, not a reference to another field. And the 
>> copyField element doesn't solve this problem either, since it will result in 
>> two values if the field was specified in the document, and I only want a 
>> single value.
>>
>> /Jimi
>

Solr boost function taking precedence over relevance boosting

2017-10-04 Thread ruby

I have a usecase where:
if a document has the search string in it's name_property field, then I want
to show that document on top. If multiple document has the search string in
it's name_property field then I want to sort them by creation date. 

Following is my query:
q={!boost+b=recip(ms(NOW,creation_date),3.16e-11,1,1)}(all_poperty_copy_field:(xyz)
OR name_property:xyz^300)

all_poperty_copy_field is the copy field where all property values are
copied over. 
name_property field contains document name. 

If my search returns 3 docs. Two of them created few years back and one
created today. Then even though the one created today does not have search
string in name_property, it's boosted to the top with above query. Is there
a way to make sure that only the object having search string in it's name is
boosted to top and if multiple objects have this string in their
name_property field, then they are sorted by creation date?

Thanks so much in advance.






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson

Hmmm, OK,  I stand corrected.

This is odd, though. I suspect a quirk in the merging algorithm when
you have a small index..

Ahh, wait. What happens if you modify the segments per tier parameter
of TMP? The default is 10, and perhaps because this is such a small
index you don't have very many like-sized segments to merge after your
periodic run. Setting segs per tier to a much lower number (like 2)
might kick in the background merging. It'll make more I/O during
indexing happen of course.

Best,
Erick

On Wed, Oct 4, 2017 at 7:09 AM, Markus Jelsma
 wrote:
> No, that collection never receives a forceMerge nor expungeDeletes. Almost 
> all (99.999%) documents are overwritten every 90 minutes.
>
> A single shard has 16k docs (97k total) but is only 300 MB large. Maybe 
> that's a problem there.
>
> I can simply turn a switch to forgeMerge after the periodic update cycle, but 
> i preferred Lucene to do it for me.
>
> Thanks,
> Markus
>
> -Original message-
>> From:Erick Erickson 
>> Sent: Wednesday 4th October 2017 14:56
>> To: solr-user 
>> Subject: Re: Very high number of deleted docs
>>
>> Did you _ever_ do a forceMerge/optimize or expungeDeletes?
>>
>> Here's the problem TieredMergePolicy (TMP) has a maximum segment size
>> it will allow, 5G by default. No segment is even considered for
>> merging unless it has < 2.5G (or half whatever the default is)
>> non-deleted docs, the logic being that to merge similar size segments,
>> each has to be less than half the max size.
>>
>> However, optimize/forceMerge and expungeDeletes do not have a limit on
>> the segment size. So say you optimize at some point and have a 100G
>> segment. It won't get merged until you have 97.5G worth of deleted
>> docs.
>>
>> More here:
>> https://issues.apache.org/jira/browse/LUCENE-7976
>>
>> Erick
>>
>> On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
>>  wrote:
>> > Do you mean a periodic forceMerge? That is usually considered a bad habit 
>> > on this list (i agree). It is just that i am actually very surprised this 
>> > can happen at all with default settings. This factory, unfortunately does 
>> > not seem to support settings configured in solrconfig.
>> >
>> > Thanks,
>> > Markus
>> >
>> > -Original message-
>> >> From:Amrit Sarkar 
>> >> Sent: Wednesday 4th October 2017 14:42
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: Very high number of deleted docs
>> >>
>> >> Hi Markus,
>> >>
>> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects 
>> >> segments
>> >> about to merge priority. Optimising index time by time, preferably
>> >> scheduling weekly / fortnight / ..., at low traffic period to never be in
>> >> such odd position of 80% deleted docs in total index.
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>> >> emir.arnauto...@sematext.com> wrote:
>> >>
>> >> > Hi Markus,
>> >> > You can set reclaimDeletesWeight in merge settings to some higher value
>> >> > than default (I think it is 2) to favor segments with deleted docs when
>> >> > merging.
>> >> >
>> >> > HTH,
>> >> > Emir
>> >> > --
>> >> > Monitoring - Log Management - Alerting - Anomaly Detection
>> >> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >> >
>> >> >
>> >> >
>> >> > > On 4 Oct 2017, at 13:31, Markus Jelsma 
>> >> > wrote:
>> >> > >
>> >> > > Hello,
>> >> > >
>> >> > > Using a 6.6.0, i just spotted one of our collections having a core of
>> >> > which over 80 % of the total number of documents were deleted documents.
>> >> > >
>> >> > > It has > >> > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
>> >> > configured with no non-default settings.
>> >> > >
>> >> > > Is this supposed to happen? How can i prevent these kind of numbers?
>> >> > >
>> >> > > Thanks,
>> >> > > Markus
>> >> >
>> >> >
>> >>
>>

ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Ascot Moss

Hi,

I am trying to use hbase-indexer to index hbase table to Solr,

Solr 6.6
Hbase-Indexer 1.6
Hbase 1.2.5 with Kerberos enabled,


After putting new test rows into the Hbase table, I got the following error
from hbase-indexer thus it cannot write the data to solr :

WARN ipc.AbstractRpcClient: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]

ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely
cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(
GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(
HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(
RpcClientImpl.java:609)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
access$600(RpcClientImpl.java:154)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.
run(RpcClientImpl.java:735)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.
run(RpcClientImpl.java:732)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(
UserGroupInformation.java:1698)
at


Any idea how to resolve this issue?

Regards

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma

No, that collection never receives a forceMerge nor expungeDeletes. Almost all 
(99.999%) documents are overwritten every 90 minutes.

A single shard has 16k docs (97k total) but is only 300 MB large. Maybe that's 
a problem there.

I can simply turn a switch to forgeMerge after the periodic update cycle, but i 
preferred Lucene to do it for me.

Thanks,
Markus
 
-Original message-
> From:Erick Erickson 
> Sent: Wednesday 4th October 2017 14:56
> To: solr-user 
> Subject: Re: Very high number of deleted docs
> 
> Did you _ever_ do a forceMerge/optimize or expungeDeletes?
> 
> Here's the problem TieredMergePolicy (TMP) has a maximum segment size
> it will allow, 5G by default. No segment is even considered for
> merging unless it has < 2.5G (or half whatever the default is)
> non-deleted docs, the logic being that to merge similar size segments,
> each has to be less than half the max size.
> 
> However, optimize/forceMerge and expungeDeletes do not have a limit on
> the segment size. So say you optimize at some point and have a 100G
> segment. It won't get merged until you have 97.5G worth of deleted
> docs.
> 
> More here:
> https://issues.apache.org/jira/browse/LUCENE-7976
> 
> Erick
> 
> On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
>  wrote:
> > Do you mean a periodic forceMerge? That is usually considered a bad habit 
> > on this list (i agree). It is just that i am actually very surprised this 
> > can happen at all with default settings. This factory, unfortunately does 
> > not seem to support settings configured in solrconfig.
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> >> From:Amrit Sarkar 
> >> Sent: Wednesday 4th October 2017 14:42
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Very high number of deleted docs
> >>
> >> Hi Markus,
> >>
> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
> >> about to merge priority. Optimising index time by time, preferably
> >> scheduling weekly / fortnight / ..., at low traffic period to never be in
> >> such odd position of 80% deleted docs in total index.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >> > Hi Markus,
> >> > You can set reclaimDeletesWeight in merge settings to some higher value
> >> > than default (I think it is 2) to favor segments with deleted docs when
> >> > merging.
> >> >
> >> > HTH,
> >> > Emir
> >> > --
> >> > Monitoring - Log Management - Alerting - Anomaly Detection
> >> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >> >
> >> >
> >> >
> >> > > On 4 Oct 2017, at 13:31, Markus Jelsma 
> >> > wrote:
> >> > >
> >> > > Hello,
> >> > >
> >> > > Using a 6.6.0, i just spotted one of our collections having a core of
> >> > which over 80 % of the total number of documents were deleted documents.
> >> > >
> >> > > It has  >> > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
> >> > configured with no non-default settings.
> >> > >
> >> > > Is this supposed to happen? How can i prevent these kind of numbers?
> >> > >
> >> > > Thanks,
> >> > > Markus
> >> >
> >> >
> >>
>

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma

Ah thanks for that! 

-Original message-
> From:Emir Arnautović 
> Sent: Wednesday 4th October 2017 15:03
> To: solr-user@lucene.apache.org
> Subject: Re: Very high number of deleted docs
> 
> Hi Markus,
> It is passed but not explicitly - it uses reflection to pass arguments - take 
> a look at parent factory class.
> 
> When it comes to force merging - you have extreme case - 80% is deleted (my 
> guess frequent updates) and extreme cases require some extreme measures - it 
> can be either periodic force merge or full reindexing + aliases.
> 
> HTH,
> Emir
> 
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
> > On 4 Oct 2017, at 14:47, Markus Jelsma  wrote:
> > 
> > Do you mean a periodic forceMerge? That is usually considered a bad habit 
> > on this list (i agree). It is just that i am actually very surprised this 
> > can happen at all with default settings. This factory, unfortunately does 
> > not seem to support settings configured in solrconfig.
> > 
> > Thanks,
> > Markus
> > 
> > -Original message-
> >> From:Amrit Sarkar 
> >> Sent: Wednesday 4th October 2017 14:42
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Very high number of deleted docs
> >> 
> >> Hi Markus,
> >> 
> >> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
> >> about to merge priority. Optimising index time by time, preferably
> >> scheduling weekly / fortnight / ..., at low traffic period to never be in
> >> such odd position of 80% deleted docs in total index.
> >> 
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> 
> >> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >> 
> >>> Hi Markus,
> >>> You can set reclaimDeletesWeight in merge settings to some higher value
> >>> than default (I think it is 2) to favor segments with deleted docs when
> >>> merging.
> >>> 
> >>> HTH,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>> 
> >>> 
> >>> 
>  On 4 Oct 2017, at 13:31, Markus Jelsma 
> >>> wrote:
>  
>  Hello,
>  
>  Using a 6.6.0, i just spotted one of our collections having a core of
> >>> which over 80 % of the total number of documents were deleted documents.
>  
>  It has   class="org.apache.solr.index.TieredMergePolicyFactory"/>
> >>> configured with no non-default settings.
>  
>  Is this supposed to happen? How can i prevent these kind of numbers?
>  
>  Thanks,
>  Markus
> >>> 
> >>> 
> >> 
> 
>

Re: Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-04 Thread Erick Erickson

How does it not work for you? Details matter, an example set of values and
the response from Solr are good bits of info for us to have.

On Tue, Oct 3, 2017 at 3:59 PM, Bruno Mannina 
wrote:

> Dear all,
>
>
>
> Is it possible to have a colored highlight in a multi-value field ?
>
>
>
> I’m succeed to do it on a textfield but not in a multi-value field, then
> SOLR takes hl.simple.pre / hl.simple.post as tag.
>
>
>
> Thanks a lot for your help,
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] [image:
> 1425551717] [image: 1425551737]
> [image: 1425551760]
> 
>
>
>
>
> 
>  Garanti
> sans virus. www.avast.com
> 
> <#m_-7780043212915396992_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson

rapid updates aren't the cause of a large percentage of deleted
documents. See the JIRA I referenced for the probable cause:
https://issues.apache.org/jira/browse/LUCENE-7976

If my suspicion is correct you'll see one or more of your segments
occupy way more than 5G. Assuming my suspicion is correct, you have to
either periodically optimize/forceMerge or expungeDeletes regularly.
At that point, though, you might as well optimize/forceMerge.
expungeDeletes would only save you re-writing segments with < 20%
deleted docs (at least I think that's the cutoff).

Or reindex from scratch and never, never, never forceMerge/optimize or
expungeDeletes.

Best,
Erick

On Wed, Oct 4, 2017 at 6:03 AM, Emir Arnautović
 wrote:
> Hi Markus,
> It is passed but not explicitly - it uses reflection to pass arguments - take 
> a look at parent factory class.
>
> When it comes to force merging - you have extreme case - 80% is deleted (my 
> guess frequent updates) and extreme cases require some extreme measures - it 
> can be either periodic force merge or full reindexing + aliases.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 4 Oct 2017, at 14:47, Markus Jelsma  wrote:
>>
>> Do you mean a periodic forceMerge? That is usually considered a bad habit on 
>> this list (i agree). It is just that i am actually very surprised this can 
>> happen at all with default settings. This factory, unfortunately does not 
>> seem to support settings configured in solrconfig.
>>
>> Thanks,
>> Markus
>>
>> -Original message-
>>> From:Amrit Sarkar 
>>> Sent: Wednesday 4th October 2017 14:42
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Very high number of deleted docs
>>>
>>> Hi Markus,
>>>
>>> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
>>> about to merge priority. Optimising index time by time, preferably
>>> scheduling weekly / fortnight / ..., at low traffic period to never be in
>>> such odd position of 80% deleted docs in total index.
>>>
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>>
>>> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>>> emir.arnauto...@sematext.com> wrote:
>>>
 Hi Markus,
 You can set reclaimDeletesWeight in merge settings to some higher value
 than default (I think it is 2) to favor segments with deleted docs when
 merging.

 HTH,
 Emir
 --
 Monitoring - Log Management - Alerting - Anomaly Detection
 Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Oct 2017, at 13:31, Markus Jelsma 
 wrote:
>
> Hello,
>
> Using a 6.6.0, i just spotted one of our collections having a core of
 which over 80 % of the total number of documents were deleted documents.
>
> It has  class="org.apache.solr.index.TieredMergePolicyFactory"/>
 configured with no non-default settings.
>
> Is this supposed to happen? How can i prevent these kind of numbers?
>
> Thanks,
> Markus


>>>
>

Re: Solr cloud planning

2017-10-04 Thread Erick Erickson

You'll almost certainly have to shard then. First of all Lucene has a
hard limit of 2^31 docs in a single index so there's a 2B limit.
There's no such limit on the number of docs in the collection (i.e. 5
shards each can have 2B docs for 10B docs total in the collection).

But nobody that I know of has that many documents on a shard, although
I've seen 200M-300M docs on a shard give good response time. I've also
seen 20M docs strain a beefy server.

Here's an outline of what it takes to find out:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The idea is to set up a test environment that you strain to breaking
with _your_ data/queries/environment. You can do this with just two
machines, from there its just multiplying...

Best,
Erick

On Wed, Oct 4, 2017 at 6:07 AM, gatanathoa  wrote:
> There is a very large amount of data and there will be a constant addition of
> more data. There will be hundreds of millions if not billions of items.
>
> We have to be able to be able to be constantly indexing items but also allow
> for searching. Sadly there is no way to know the amount of searching that
> will be done, but was told to expected a fair amount. (I have no idea what
> "a fair amount" means either)
>
> I am not sure that only one shard will be adequate in this setup. The speed
> of the search results is the key here. There is also no way to test this
> prior to implementation.
>
> Is this enough information to be able to provide some guide lines?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: length of indexed value

2017-10-04 Thread Erick Erickson

Check. The problem is they don't encode the exact length. I _think_
this patch shows you'd be OK with shorter lengths, but check:
https://issues.apache.org/jira/browse/LUCENE-7730.

Note it's not the patch that counts here, just look at the table of lengths.

Best,
Erick

On Wed, Oct 4, 2017 at 4:25 AM, John Blythe  wrote:
> interesting idea.
>
> the field in question is one that can have a good deal of stray zeros based
> on distributor skus for a product and bad entries from those entering them.
> part of the matching logic for some operations look for these discrepancies
> by having a simple regex that removes zeroes. so 400010 can match with
> 40010 (and rightly so). issues come in the form of rare cases where 41 is a
> sku by the same distributor or manufacturer and thus can end up being an
> erroneous match. having a means of looking at the length would help to know
> that going from 6 characters to 2 is too far a leap to be counted as a
> match.
>
> --
> John Blythe
>
> On Wed, Oct 4, 2017 at 6:22 AM, alessandro.benedetti 
> wrote:
>
>> Are the norms a good approximation for you ?
>> If you preserve norms at indexing time ( it is a configuration that you can
>> operate in the schema.xml) you can retrieve them with this specific
>> function
>> query :
>>
>> *norm(field)*
>> Returns the "norm" stored in the index for the specified field. This is the
>> product of the index time boost and the length normalization factor,
>> according to the Similarity for the field.
>> norm(fieldName)
>>
>> This will not be the exact length of the field, but it can be a good
>> approximation though.
>>
>> Cheers
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Emir Arnautović

Hi Bernd,
I guess it is not exposed in Solrj. Maybe for good reason - it is rarely good 
to call it. You might better set reclaimDeletesWeight in your merge config and 
keep number of deleted docs under control that way.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Oct 2017, at 14:53, Bernd Fehling  wrote:
> 
> Hi Emir,
> 
> can you point out which commit you are using for expungeDeletes true/false?
> My commit has only
> commit(String collection, boolean waitFlush, boolean waitSearcher, boolean 
> softCommit)
> 
> Or is expungeDeletes true/false a special combination of the boolean 
> parameters?
> 
> Regards, Bernd
> 
> 
> Am 04.10.2017 um 13:27 schrieb Emir Arnautović:
>> Hi Bernd,
>> When it comes to updating, it does not exist because indexed documents are 
>> not updatable - you can add new document with the same id and old one will 
>> be flagged as deleted. No need to delete explicitly.
>> 
>> When it comes to expungeDeletes - that is a flag that can be set when 
>> committing.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 4 Oct 2017, at 10:38, Bernd Fehling  
>>> wrote:
>>> 
>>> A simple question about solrj (Solr 6.4.2),
>>> 
>>> how to update documents with expungeDeletes true/false?
>>> 
>>> In org.apache.solr.client.solrj.SolrClient there are many add,
>>> commit, delete, optimize, ... but no "update".
>>> 
>>> What is the best way to "update"?
>>> - just "add" the same docid with new content as update?
>>> - first "deleteById" and then "add"?
>>> - anything else...?
>>> 
>>> And how accomplish "expungeDeletes" true/false ?
>>> 
>>> Thanks, Bernd

Re: Solr cloud planning

2017-10-04 Thread gatanathoa

There is a very large amount of data and there will be a constant addition of
more data. There will be hundreds of millions if not billions of items.

We have to be able to be able to be constantly indexing items but also allow
for searching. Sadly there is no way to know the amount of searching that
will be done, but was told to expected a fair amount. (I have no idea what
"a fair amount" means either)

I am not sure that only one shard will be adequate in this setup. The speed
of the search results is the key here. There is also no way to test this
prior to implementation. 

Is this enough information to be able to provide some guide lines?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović

Hi Markus,
It is passed but not explicitly - it uses reflection to pass arguments - take a 
look at parent factory class.

When it comes to force merging - you have extreme case - 80% is deleted (my 
guess frequent updates) and extreme cases require some extreme measures - it 
can be either periodic force merge or full reindexing + aliases.

HTH,
Emir

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Oct 2017, at 14:47, Markus Jelsma  wrote:
> 
> Do you mean a periodic forceMerge? That is usually considered a bad habit on 
> this list (i agree). It is just that i am actually very surprised this can 
> happen at all with default settings. This factory, unfortunately does not 
> seem to support settings configured in solrconfig.
> 
> Thanks,
> Markus
> 
> -Original message-
>> From:Amrit Sarkar 
>> Sent: Wednesday 4th October 2017 14:42
>> To: solr-user@lucene.apache.org
>> Subject: Re: Very high number of deleted docs
>> 
>> Hi Markus,
>> 
>> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
>> about to merge priority. Optimising index time by time, preferably
>> scheduling weekly / fortnight / ..., at low traffic period to never be in
>> such odd position of 80% deleted docs in total index.
>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>> 
>>> Hi Markus,
>>> You can set reclaimDeletesWeight in merge settings to some higher value
>>> than default (I think it is 2) to favor segments with deleted docs when
>>> merging.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 4 Oct 2017, at 13:31, Markus Jelsma 
>>> wrote:
 
 Hello,
 
 Using a 6.6.0, i just spotted one of our collections having a core of
>>> which over 80 % of the total number of documents were deleted documents.
 
 It has >>> class="org.apache.solr.index.TieredMergePolicyFactory"/>
>>> configured with no non-default settings.
 
 Is this supposed to happen? How can i prevent these kind of numbers?
 
 Thanks,
 Markus
>>> 
>>> 
>>

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Erick Erickson

Do not use expungedeletes even if you find a way to call it in the
scenario you're talking about. First of all I think you'll run into
the issue here: https://issues.apache.org/jira/browse/LUCENE-7976

Second it is a very heavy weight operation. It potentially rewrites
_all_ of your index and it sounds like all you're really concerned
about is deleting the old document when re-adding it. As Emir says,
the old document is marked as deleted for you. When the segments
containing the deleted document is eventually merged with another
segment as part of normal indexing the resources associated with it
will be purged. This is automatic too.

Best,
Erick

On Wed, Oct 4, 2017 at 5:53 AM, Bernd Fehling
 wrote:
> Hi Emir,
>
> can you point out which commit you are using for expungeDeletes true/false?
> My commit has only
> commit(String collection, boolean waitFlush, boolean waitSearcher, boolean 
> softCommit)
>
> Or is expungeDeletes true/false a special combination of the boolean 
> parameters?
>
> Regards, Bernd
>
>
> Am 04.10.2017 um 13:27 schrieb Emir Arnautović:
>> Hi Bernd,
>> When it comes to updating, it does not exist because indexed documents are 
>> not updatable - you can add new document with the same id and old one will 
>> be flagged as deleted. No need to delete explicitly.
>>
>> When it comes to expungeDeletes - that is a flag that can be set when 
>> committing.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 4 Oct 2017, at 10:38, Bernd Fehling  
>>> wrote:
>>>
>>> A simple question about solrj (Solr 6.4.2),
>>>
>>> how to update documents with expungeDeletes true/false?
>>>
>>> In org.apache.solr.client.solrj.SolrClient there are many add,
>>> commit, delete, optimize, ... but no "update".
>>>
>>> What is the best way to "update"?
>>> - just "add" the same docid with new content as update?
>>> - first "deleteById" and then "add"?
>>> - anything else...?
>>>
>>> And how accomplish "expungeDeletes" true/false ?
>>>
>>> Thanks, Bernd

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson

Did you _ever_ do a forceMerge/optimize or expungeDeletes?

Here's the problem TieredMergePolicy (TMP) has a maximum segment size
it will allow, 5G by default. No segment is even considered for
merging unless it has < 2.5G (or half whatever the default is)
non-deleted docs, the logic being that to merge similar size segments,
each has to be less than half the max size.

However, optimize/forceMerge and expungeDeletes do not have a limit on
the segment size. So say you optimize at some point and have a 100G
segment. It won't get merged until you have 97.5G worth of deleted
docs.

More here:
https://issues.apache.org/jira/browse/LUCENE-7976

Erick

On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma
 wrote:
> Do you mean a periodic forceMerge? That is usually considered a bad habit on 
> this list (i agree). It is just that i am actually very surprised this can 
> happen at all with default settings. This factory, unfortunately does not 
> seem to support settings configured in solrconfig.
>
> Thanks,
> Markus
>
> -Original message-
>> From:Amrit Sarkar 
>> Sent: Wednesday 4th October 2017 14:42
>> To: solr-user@lucene.apache.org
>> Subject: Re: Very high number of deleted docs
>>
>> Hi Markus,
>>
>> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
>> about to merge priority. Optimising index time by time, preferably
>> scheduling weekly / fortnight / ..., at low traffic period to never be in
>> such odd position of 80% deleted docs in total index.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>
>> > Hi Markus,
>> > You can set reclaimDeletesWeight in merge settings to some higher value
>> > than default (I think it is 2) to favor segments with deleted docs when
>> > merging.
>> >
>> > HTH,
>> > Emir
>> > --
>> > Monitoring - Log Management - Alerting - Anomaly Detection
>> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >
>> >
>> >
>> > > On 4 Oct 2017, at 13:31, Markus Jelsma 
>> > wrote:
>> > >
>> > > Hello,
>> > >
>> > > Using a 6.6.0, i just spotted one of our collections having a core of
>> > which over 80 % of the total number of documents were deleted documents.
>> > >
>> > > It has > > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
>> > configured with no non-default settings.
>> > >
>> > > Is this supposed to happen? How can i prevent these kind of numbers?
>> > >
>> > > Thanks,
>> > > Markus
>> >
>> >
>>

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Bernd Fehling

Hi Emir,

can you point out which commit you are using for expungeDeletes true/false?
My commit has only
commit(String collection, boolean waitFlush, boolean waitSearcher, boolean 
softCommit)

Or is expungeDeletes true/false a special combination of the boolean parameters?

Regards, Bernd


Am 04.10.2017 um 13:27 schrieb Emir Arnautović:
> Hi Bernd,
> When it comes to updating, it does not exist because indexed documents are 
> not updatable - you can add new document with the same id and old one will be 
> flagged as deleted. No need to delete explicitly.
> 
> When it comes to expungeDeletes - that is a flag that can be set when 
> committing.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 4 Oct 2017, at 10:38, Bernd Fehling  
>> wrote:
>>
>> A simple question about solrj (Solr 6.4.2),
>>
>> how to update documents with expungeDeletes true/false?
>>
>> In org.apache.solr.client.solrj.SolrClient there are many add,
>> commit, delete, optimize, ... but no "update".
>>
>> What is the best way to "update"?
>> - just "add" the same docid with new content as update?
>> - first "deleteById" and then "add"?
>> - anything else...?
>>
>> And how accomplish "expungeDeletes" true/false ?
>>
>> Thanks, Bernd

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma

Do you mean a periodic forceMerge? That is usually considered a bad habit on 
this list (i agree). It is just that i am actually very surprised this can 
happen at all with default settings. This factory, unfortunately does not seem 
to support settings configured in solrconfig.

Thanks,
Markus
 
-Original message-
> From:Amrit Sarkar 
> Sent: Wednesday 4th October 2017 14:42
> To: solr-user@lucene.apache.org
> Subject: Re: Very high number of deleted docs
> 
> Hi Markus,
> 
> Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
> about to merge priority. Optimising index time by time, preferably
> scheduling weekly / fortnight / ..., at low traffic period to never be in
> such odd position of 80% deleted docs in total index.
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> 
> On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
> > Hi Markus,
> > You can set reclaimDeletesWeight in merge settings to some higher value
> > than default (I think it is 2) to favor segments with deleted docs when
> > merging.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 4 Oct 2017, at 13:31, Markus Jelsma 
> > wrote:
> > >
> > > Hello,
> > >
> > > Using a 6.6.0, i just spotted one of our collections having a core of
> > which over 80 % of the total number of documents were deleted documents.
> > >
> > > It has  > > class="org.apache.solr.index.TieredMergePolicyFactory"/>
> > configured with no non-default settings.
> > >
> > > Is this supposed to happen? How can i prevent these kind of numbers?
> > >
> > > Thanks,
> > > Markus
> >
> >
>

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma

I really doubt that is going to do anything, TieredMergePolicyFactory does not 
pass the settings from Solr to TieredMergePolicy.

Thanks,
Markus

 
 
-Original message-
> From:Emir Arnautović 
> Sent: Wednesday 4th October 2017 14:33
> To: solr-user@lucene.apache.org
> Subject: Re: Very high number of deleted docs
> 
> Hi Markus,
> You can set reclaimDeletesWeight in merge settings to some higher value than 
> default (I think it is 2) to favor segments with deleted docs when merging.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
> > On 4 Oct 2017, at 13:31, Markus Jelsma  wrote:
> > 
> > Hello,
> > 
> > Using a 6.6.0, i just spotted one of our collections having a core of which 
> > over 80 % of the total number of documents were deleted documents.
> > 
> > It has  > class="org.apache.solr.index.TieredMergePolicyFactory"/> configured with no 
> > non-default settings.
> > 
> > Is this supposed to happen? How can i prevent these kind of numbers?
> > 
> > Thanks,
> > Markus
> 
>

Re: Very high number of deleted docs

2017-10-04 Thread Amrit Sarkar

Hi Markus,

Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
about to merge priority. Optimising index time by time, preferably
scheduling weekly / fortnight / ..., at low traffic period to never be in
such odd position of 80% deleted docs in total index.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Markus,
> You can set reclaimDeletesWeight in merge settings to some higher value
> than default (I think it is 2) to favor segments with deleted docs when
> merging.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Oct 2017, at 13:31, Markus Jelsma 
> wrote:
> >
> > Hello,
> >
> > Using a 6.6.0, i just spotted one of our collections having a core of
> which over 80 % of the total number of documents were deleted documents.
> >
> > It has  > class="org.apache.solr.index.TieredMergePolicyFactory"/>
> configured with no non-default settings.
> >
> > Is this supposed to happen? How can i prevent these kind of numbers?
> >
> > Thanks,
> > Markus
>
>

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović

Hi Markus,
You can set reclaimDeletesWeight in merge settings to some higher value than 
default (I think it is 2) to favor segments with deleted docs when merging.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Oct 2017, at 13:31, Markus Jelsma  wrote:
> 
> Hello,
> 
> Using a 6.6.0, i just spotted one of our collections having a core of which 
> over 80 % of the total number of documents were deleted documents.
> 
> It has  class="org.apache.solr.index.TieredMergePolicyFactory"/> configured with no 
> non-default settings.
> 
> Is this supposed to happen? How can i prevent these kind of numbers?
> 
> Thanks,
> Markus

Very high number of deleted docs

2017-10-04 Thread Markus Jelsma

Hello,

Using a 6.6.0, i just spotted one of our collections having a core of which 
over 80 % of the total number of documents were deleted documents.

It has  configured with no 
non-default settings.

Is this supposed to happen? How can i prevent these kind of numbers?

Thanks,
Markus

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Emir Arnautović

Hi Bernd,
When it comes to updating, it does not exist because indexed documents are not 
updatable - you can add new document with the same id and old one will be 
flagged as deleted. No need to delete explicitly.

When it comes to expungeDeletes - that is a flag that can be set when 
committing.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/

> On 4 Oct 2017, at 10:38, Bernd Fehling  wrote:
> 
> A simple question about solrj (Solr 6.4.2),
> 
> how to update documents with expungeDeletes true/false?
> 
> In org.apache.solr.client.solrj.SolrClient there are many add,
> commit, delete, optimize, ... but no "update".
> 
> What is the best way to "update"?
> - just "add" the same docid with new content as update?
> - first "deleteById" and then "add"?
> - anything else...?
> 
> And how accomplish "expungeDeletes" true/false ?
> 
> Thanks, Bernd

Re: length of indexed value

2017-10-04 Thread John Blythe

interesting idea.

the field in question is one that can have a good deal of stray zeros based
on distributor skus for a product and bad entries from those entering them.
part of the matching logic for some operations look for these discrepancies
by having a simple regex that removes zeroes. so 400010 can match with
40010 (and rightly so). issues come in the form of rare cases where 41 is a
sku by the same distributor or manufacturer and thus can end up being an
erroneous match. having a means of looking at the length would help to know
that going from 6 characters to 2 is too far a leap to be counted as a
match.

--
John Blythe

On Wed, Oct 4, 2017 at 6:22 AM, alessandro.benedetti 
wrote:

> Are the norms a good approximation for you ?
> If you preserve norms at indexing time ( it is a configuration that you can
> operate in the schema.xml) you can retrieve them with this specific
> function
> query :
>
> *norm(field)*
> Returns the "norm" stored in the index for the specified field. This is the
> product of the index time boost and the length normalization factor,
> according to the Similarity for the field.
> norm(fieldName)
>
> This will not be the exact length of the field, but it can be a good
> approximation though.
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: length of indexed value

2017-10-04 Thread alessandro.benedetti

Are the norms a good approximation for you ?
If you preserve norms at indexing time ( it is a configuration that you can
operate in the schema.xml) you can retrieve them with this specific function
query :

*norm(field)*
Returns the "norm" stored in the index for the specified field. This is the
product of the index time boost and the length normalization factor,
according to the Similarity for the field.
norm(fieldName)

This will not be the exact length of the field, but it can be a good
approximation though.

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Time to Load a Solr Core with Hdfs Directory Factory

2017-10-04 Thread Rick Leir


Shashank,

I had a quick look at:

https://lucene.apache.org/solr/guide/6_6/running-solr-on-hdfs.html

Did you enable the Block Cache and the solr.hdfs.nrtcachingdirectory?
cheers -- Rick

On 2017-10-03 09:22 PM, Shashank Pedamallu wrote:

Hi,

I’m trying an experiment in which, I’m loading a core of 1.27GB with 5621600 
documents on 2 Solr setups. On first setup, dataDir is pointed as a 
NRTCachingDirectory as a standard path in my local. On second setup, it is 
pointed as HdfsDirectory. As part of loading the core, I see following log:

2017-10-04 01:07:50.102 UTC INFO  
(searcherExecutor-12-thread-1-processing-x:staging_1gb-core-1) 
[core='x:staging_1gb-core-1'] org.apache.solr.core.SolrCore@2247 
[staging_1gb-core-1] Registered new searcher 
Searcher@10fe9415[staging_1gb-core-1] 
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_12bk(6.4.2):C2886542)
 Uninverting(_1eu5(6.4.2):C743800) Uninverting(_18kl(6.4.2):c331485) 
Uninverting(_1lt0(6.4.2):c284012) Uninverting(_1xx5(6.4.2):C654477) 
Uninverting(_1qsg(6.4.2):C658237) Uninverting(_1qf4(6.4.2):c24903) 
Uninverting(_1xwv(6.4.2):c16734) Uninverting(_1xwb(6.4.2):c1) 
Uninverting(_1xww(6.4.2):C174) Uninverting(_1xy9(6.4.2):c878) 
Uninverting(_1xxf(6.4.2):c354) Uninverting(_1xxp(6.4.2):c508) 
Uninverting(_1xx6(6.4.2):C150) Uninverting(_1xxz(6.4.2):c545) 
Uninverting(_1xxg(6.4.2):C190) Uninverting(_1xyj(6.4.2):c690) 
Uninverting(_1xyd(6.4.2):C144)))}

This step takes about 132 milli-secods in setup 1 (i.e., with 
NRTCachingDirectoryFactory).
The same step takes about 21 minutes on second setup (i.e., with 
HdfsDirectoryFactory).

Does the load time of a Solr Core drops so badly on a HdfsFileSystem? Is this 
expected?

Thanks,
Shashank



.

solrj howto update documents with expungeDeletes

2017-10-04 Thread Bernd Fehling

A simple question about solrj (Solr 6.4.2),

how to update documents with expungeDeletes true/false?

In org.apache.solr.client.solrj.SolrClient there are many add,
commit, delete, optimize, ... but no "update".

What is the best way to "update"?
- just "add" the same docid with new content as update?
- first "deleteById" and then "add"?
- anything else...?

And how accomplish "expungeDeletes" true/false ?

Thanks, Bernd

43 matches

Mail list logo