RE: Search on field returns documents that should not match

2017-09-14 Thread Jagdish Vasani
Hi Chang Wang,

Searching is worng here, q=title_lemma: fit you handgun
It will search  title_lemma: fit or  _text:you  or _text:handgun (if stop words 
are not excluded)
Here _text is default search field , you might have different default search 
field.

Try search like q=title_lemma:(fit you handgun)
That will search any words in field title_lemma

Fire below query, you will get proper result,
http://localhost:8983/solr/bkb4/select?fq=doc_type:document=on=title_lemma:(fit
 you handgun)=json

Thanks,
Jagdish
From: Chang Wang [mailto:changwan...@gmail.com]
Sent: Friday, September 15, 2017 8:12 AM
To: solr-user@lucene.apache.org
Subject: Search on field returns documents that should not match

Hello All,


I am confused by the field search behavior of solr (6.6), and hope someone
can help me understand the results.

For example,
I search "fit you handgun" on the field of "title_lemma".
http://localhost:8983/solr/bkb4/select?fq=doc_type:document=on=title_lemma:%20fit%20you%20handgun=json

The first returned result is a good one which contains "fit you handgun" in 
"title_lemma" field, the 2nd and 3rd results do not contain any of those query 
words at all. Why are these documents still returned?

I attach the screen shot.


There is a related question. When I do not index the field "title_lemma", solr 
gui
still allows me to search on that field and returns the result. Why does it 
happen?
My understanding is that if a field is not indexed, it should not be searchable 
at all.

Thank you,
Chang


NOTICE TO RECIPIENT(s):This e-mail message may contain confidential or legally 
privileged information and is intended only for the use of the intended 
recipient(s). Any unauthorized disclosure, dissemination, distribution, copying 
or the taking of any action in reliance on the information herein is 
prohibited. E-mails are not secure and cannot be guaranteed to be error free as 
they can be intercepted, amended, or contain viruses. Although The Digital 
Group has taken reasonable precautions to ensure no viruses are present in this 
email, the company cannot accept responsibility for any loss or damage arising 
from the use of this email or attachments. Any opinion defamatory or deemed to 
be defamatory or any material which could be reasonably branded to be a species 
of plagiarism and other statements contained in this message and any attachment 
are solely those of the author and do not necessarily represent those of the 
company.


Search on field returns documents that should not match

2017-09-14 Thread Chang Wang
Hello All,


I am confused by the field search behavior of solr (6.6), and hope someone
can help me understand the results.

For example,
I search "fit you handgun" on the field of "title_lemma".
http://localhost:8983/solr/bkb4/select?fq=doc_type:
document=on=title_lemma:%20fit%20you%20handgun=json

The first returned result is a good one which contains "fit you handgun" in
"title_lemma" field, the 2nd and 3rd results do not contain any of those
query words at all. Why are these documents still returned?

I attach the screen shot.


There is a related question. When I do not index the field "title_lemma",
solr gui
still allows me to search on that field and returns the result. Why does it
happen?
My understanding is that if a field is not indexed, it should not be
searchable at all.

Thank you,
Chang


Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
looks as though the problem is in parsing some malformed XML,  based on
what I'm seeing:

...
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 11))
... ( char #11 is a vertical tab).

This should be fixed outside Solr, but if that is not practical, and you
could live with dropping the offending document(s) then you might want to
investigate the TolerantUpdateProcessorFactory Solr 6.1 or later)

-Simon

On Thu, Sep 14, 2017 at 3:56 PM, arnoldbronley 
wrote:

> Thanks for information. Here is the full stack trace. I thought to handle
> it
> from client side but client apps are not under my control and I don't have
> access to them.
>
> org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, code
> 11))
>  at [row,col {unknown-source}]: [1,413]
> at org.apache.solr.handler.loader.XMLLoader.load(
> XMLLoader.java:179)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(
> UpdateRequestHandler.java:97)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:153)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
> at org.apache.solr.servlet.HttpSolrCall.execute(
> HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:460)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:254)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1668)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1092)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:308)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:244)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceAndRun(ExecuteProduceConsume.java:246)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:156)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:654)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
> ((CTRL-CHAR, code 11))
>  at [row,col {unknown-source}]: [1,413]
> at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(
> StreamScanner.java:674)
> at
> com.ctc.wstx.sr.BasicStreamReader.readTextPrimary(
> BasicStreamReader.java:4576)
> at
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(
> BasicStreamReader.java:2881)
> at com.ctc.wstx.sr.BasicStreamReader.next(
> BasicStreamReader.java:1073)
> at org.apache.solr.handler.loader.XMLLoader.readDoc(
> XMLLoader.java:397)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:249)
> at org.apache.solr.handler.loader.XMLLoader.load(
> XMLLoader.java:177)
> ... 32 more
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: query with @ and *

2017-09-14 Thread Erick Erickson
See: 
https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

It discusses the general problem of particular filters being able to
cope with wildcards or not. Generally any filter that could
potentially produce more than one output token per input token is
skipped when wildcards are encountered.

Best,
Erick

On Thu, Sep 14, 2017 at 6:26 AM, Susheel Kumar  wrote:
> You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
> analysis chain.
>
> Thanks,
> Susheel
>
>
> On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey  wrote:
>
>> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
>> > I have a problem when searching on email addresses.
>> > @ seems to be handled as a special character but I don't find anything
>> about it in the documentation.
>> >
>> > This is my test data
>> > t...@one.com
>> > t...@two.com
>>
>> Chances are that have analysis defined on this field, and that the
>> analysis includes a tokenizer or tokenizer/filter combination that
>> splits on punctuation.  This means that for the both entries, you have
>> three terms.  For the first one, those terms are test, one, and com.
>> For the second one, they are test,  two, and com.  The rest of what I'm
>> writing assumes that this is the case.
>>
>> > searching for test* results both, ok.
>>
>> This matches the term "test" in both entries.
>>
>> > searching for t...@one.com results the correct one, ok.
>>
>> Query analysis probably splits the same way index analysis does, so the
>> actual search is for all three terms.
>>
>> > searching for test results both, what I didn't expect but it's ok.
>>
>> In this case, it matches the simple term "test" that's in the index on
>> both documents.
>>
>> > searching for test@one* results none and that's the problem.
>>
>> When you include wildcards in a query, most query analysis is skipped,
>> so it's looking for the literal text "test@one" followed by any
>> characters.  Because the index analysis removed the @ character and
>> split the things around it into separate terms, this will not match any
>> of the terms in the index.
>>
>> Wildcards, while they do work in many cases, are often not the correct
>> way to do queries.
>>
>> Thanks,
>> Shawn
>>
>>


Re: Two separate instances sharing the same zookeeper cluster

2017-09-14 Thread Mike Drob
When you specify the zk string for a solr instance, you typically include a
chroot in it. I think the default is /solr, but it doesn't have to be, so
you should be able to run with -z zk1:2181/sorl-dev and /solr-prod

https://lucene.apache.org/solr/guide/6_6/setting-up-an-external-zookeeper-ensemble.html#SettingUpanExternalZooKeeperEnsemble-PointSolrattheinstance

On Thu, Sep 14, 2017 at 3:01 PM, James Keeney  wrote:

> I have a staging and a production solr cluster. I'd like to have them use
> the same zookeeper cluster. It seems like it is possible if I can set a
> different directory for the second cluster. I've looked through the
> documentation though and I can't quite figure out where to set that up. As
> a result my staging cluster nodes keep trying to add themselves tot he
> production cluster.
>
> If someone could point me in the right direction?
>
> Jim K.
> --
> Jim Keeney
> President, FitterWeb
> E: j...@fitterweb.com
> M: 703-568-5887
>
> *FitterWeb Consulting*
> *Are you lean and agile enough? *
>


Two separate instances sharing the same zookeeper cluster

2017-09-14 Thread James Keeney
I have a staging and a production solr cluster. I'd like to have them use
the same zookeeper cluster. It seems like it is possible if I can set a
different directory for the second cluster. I've looked through the
documentation though and I can't quite figure out where to set that up. As
a result my staging cluster nodes keep trying to add themselves tot he
production cluster.

If someone could point me in the right direction?

Jim K.
-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *


Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread arnoldbronley
Thanks for information. Here is the full stack trace. I thought to handle it
from client side but client apps are not under my control and I don't have
access to them.

org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, code
11))
 at [row,col {unknown-source}]: [1,413]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:179)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 11))
 at [row,col {unknown-source}]: [1,413]
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:674)
at
com.ctc.wstx.sr.BasicStreamReader.readTextPrimary(BasicStreamReader.java:4576)
at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2881)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1073)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:397)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:249)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
... 32 more



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
@Arnold: are these non UTF-8 control characters (which is what the Nutch
issue was about) or otherwise legal UTF-8  characters which Solr for some
reason is choking on ?

If you could provide a full stack trace it would be really helpful.


On Thu, Sep 14, 2017 at 2:55 PM, Markus Jelsma 
wrote:

> Hello,
>
> You can not do this in Solr, you cannot even send non-character code
> points in the first place. For Apache Nutch we solved the problem by
> stripping those non-character code points from Strings before putting them
> in SolrDocument. Check the ticket, you can easily resuse the strip method.
>
> Perhaps it would be a good idea to move the method to SolrDocument or
> somewhere in SolrJ in the first place, so others don't have to bother with
> this problem.
>
> Regards,
> Markus
>
> https://issues.apache.org/jira/browse/NUTCH-1016
>
>
>
> -Original message-
> > From:Arnold Bronley 
> > Sent: Thursday 14th September 2017 19:46
> > To: solr-user@lucene.apache.org
> > Subject: How to remove control characters in stored value at Solr side
> >
> > I know I can apply PatternReplaceFilterFactory to remove control
> characters
> > from indexed value. However, is it possible to do similar thing for
> stored
> > value? Because of some control characters included in indexing request,
> > Solr throws Illegal Character Exception.
> >
>


Re: 2 Solr Instance with One Data Directory

2017-09-14 Thread Shawn Heisey
On 9/14/2017 10:18 AM, Ravi Kumar Taminidi wrote:
> Hi Any one tried, have  2 solr Instance with One Data Directory. 
>
> I get below Error when i try to point the 2nd solr to the first solr 
> directory.
>
> Any help ?
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Index dir '/engine/data/index/' of core 'engine' is already locked. The most 
> likely cause is another Solr server (or another solr core in this server) 
> also configured to use this directory; other possible causes may be specific 
> to lockType: native

Don't do this.  Lucene (the technology that powers Solr) is specifically
designed to NOT allow this to happen.  It is blocked intentionally.

Solr offers at least two ways to replicate data between servers, where
each one has its own separate index.  The most feature-rich option for
data replication is a fundamental feature of SolrCloud.

Thanks,
Shawn



RE: How to remove control characters in stored value at Solr side

2017-09-14 Thread Markus Jelsma
Hello,

You can not do this in Solr, you cannot even send non-character code points in 
the first place. For Apache Nutch we solved the problem by stripping those 
non-character code points from Strings before putting them in SolrDocument. 
Check the ticket, you can easily resuse the strip method.

Perhaps it would be a good idea to move the method to SolrDocument or somewhere 
in SolrJ in the first place, so others don't have to bother with this problem.

Regards,
Markus

https://issues.apache.org/jira/browse/NUTCH-1016

 
 
-Original message-
> From:Arnold Bronley 
> Sent: Thursday 14th September 2017 19:46
> To: solr-user@lucene.apache.org
> Subject: How to remove control characters in stored value at Solr side
> 
> I know I can apply PatternReplaceFilterFactory to remove control characters
> from indexed value. However, is it possible to do similar thing for stored
> value? Because of some control characters included in indexing request,
> Solr throws Illegal Character Exception.
> 


Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
Sounds as though an update request processor will do that, and also
eliminate the need to use the PatternReplaceFilterfactory downstream.

Take a look at the documentation in
https://lucene.apache.org/solr/guide/6_6/update-request-processors.html.
I'm thinking that the RegexReplaceProcessorFactory might work for this.

best

-Simon

On Thu, Sep 14, 2017 at 1:46 PM, Arnold Bronley 
wrote:

> I know I can apply PatternReplaceFilterFactory to remove control characters
> from indexed value. However, is it possible to do similar thing for stored
> value? Because of some control characters included in indexing request,
> Solr throws Illegal Character Exception.
>


Highlighting in subqueries?

2017-09-14 Thread Peter Matthew Eichman
Hello all,

Is it possible to highlight the results of subqueries?

Thanks,
-Peter

-- 
Peter Eichman
Senior Software Developer
University of Maryland Libraries
peich...@umd.edu


How to remove control characters in stored value at Solr side

2017-09-14 Thread Arnold Bronley
I know I can apply PatternReplaceFilterFactory to remove control characters
from indexed value. However, is it possible to do similar thing for stored
value? Because of some control characters included in indexing request,
Solr throws Illegal Character Exception.


2 Solr Instance with One Data Directory

2017-09-14 Thread Ravi Kumar Taminidi
Hi Any one tried, have  2 solr Instance with One Data Directory. 

I get below Error when i try to point the 2nd solr to the first solr directory.

Any help ?

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Index dir '/engine/data/index/' of core 'engine' is already locked. The most 
likely cause is another Solr server (or another solr core in this server) also 
configured to use this directory; other possible causes may be specific to 
lockType: native

Thanks

Ravi

Getting exception from /solr/admin/metrics

2017-09-14 Thread Shawn Heisey
I had never seen the /solr/admin/metrics endpoint, so I tried to access
it on my dev server.  It threw an exception.  This is the first line:

2017-09-1414:27:11.629ERROR
(qtp1394336709-480905)[]o.a.s.h.RequestHandlerBaseorg.apache.lucene.store.AlreadyClosedException:Alreadyclosed

The rest of the log information directly pertaining to this issue is
available here for the next month:

https://apaste.info/aXOm

There isn't anything that *I* would consider to be particularly unusual
about the setup.  The version is 6.6.0.  I do have a third-party plugin
installed, and some update processors that I've written myself.  I have
no idea whether those could have caused this problem, but I think it's
unlikely.

The message does not say what index directory was being accessed when
the problem occurred.

I tried to use the branch_6_6 code so I could add the path to the index
directory to the exception message, but with the generated
6.6.2-SNAPSHOT package, I couldn't get Solr to start properly.  Some of
my cores failed to initialize, and the logs are very unclear on what
went wrong.  Switching the service symlink back to 6.6.0 allowed
everything to start correctly.  I may need to check out the 6.6.0 source
tag and modify that instead of trying to upgrade.

Thanks,
Shawn



Re: query with @ and *

2017-09-14 Thread Susheel Kumar
You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
analysis chain.

Thanks,
Susheel


On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey  wrote:

> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> > I have a problem when searching on email addresses.
> > @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
> >
> > This is my test data
> > t...@one.com
> > t...@two.com
>
> Chances are that have analysis defined on this field, and that the
> analysis includes a tokenizer or tokenizer/filter combination that
> splits on punctuation.  This means that for the both entries, you have
> three terms.  For the first one, those terms are test, one, and com.
> For the second one, they are test,  two, and com.  The rest of what I'm
> writing assumes that this is the case.
>
> > searching for test* results both, ok.
>
> This matches the term "test" in both entries.
>
> > searching for t...@one.com results the correct one, ok.
>
> Query analysis probably splits the same way index analysis does, so the
> actual search is for all three terms.
>
> > searching for test results both, what I didn't expect but it's ok.
>
> In this case, it matches the simple term "test" that's in the index on
> both documents.
>
> > searching for test@one* results none and that's the problem.
>
> When you include wildcards in a query, most query analysis is skipped,
> so it's looking for the literal text "test@one" followed by any
> characters.  Because the index analysis removed the @ character and
> split the things around it into separate terms, this will not match any
> of the terms in the index.
>
> Wildcards, while they do work in many cases, are often not the correct
> way to do queries.
>
> Thanks,
> Shawn
>
>


Re: query with @ and *

2017-09-14 Thread Shawn Heisey
On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> I have a problem when searching on email addresses.
> @ seems to be handled as a special character but I don't find anything about 
> it in the documentation.
>
> This is my test data
> t...@one.com
> t...@two.com

Chances are that have analysis defined on this field, and that the
analysis includes a tokenizer or tokenizer/filter combination that
splits on punctuation.  This means that for the both entries, you have
three terms.  For the first one, those terms are test, one, and com. 
For the second one, they are test,  two, and com.  The rest of what I'm
writing assumes that this is the case.

> searching for test* results both, ok.

This matches the term "test" in both entries.

> searching for t...@one.com results the correct one, ok.

Query analysis probably splits the same way index analysis does, so the 
actual search is for all three terms.

> searching for test results both, what I didn't expect but it's ok.

In this case, it matches the simple term "test" that's in the index on
both documents.

> searching for test@one* results none and that's the problem.

When you include wildcards in a query, most query analysis is skipped, 
so it's looking for the literal text "test@one" followed by any
characters.  Because the index analysis removed the @ character and
split the things around it into separate terms, this will not match any
of the terms in the index.

Wildcards, while they do work in many cases, are often not the correct
way to do queries.

Thanks,
Shawn



Re: query with @ and *

2017-09-14 Thread Atita Arora
Hi,

Can you give us a little information about the query parser you using in
your handler ?

Thanks,
Ati


On Thu, Sep 14, 2017 at 4:36 PM, Mannott, Birgit 
wrote:

> Hi,
>
> I have a problem when searching on email addresses.
> @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
>
> This is my test data
> t...@one.com
> t...@two.com
>
> searching for test* results both, ok.
> searching for t...@one.com results the correct one, ok.
> searching for test results both, what I didn't expect but it's ok.
> searching for test@one* results none and that's the problem.
>
> Escaping the char @ doesn't change it.
> It seems that every query containing @ and * has no result.
>
> Has anyone an idea how to change this?
>
> Thanks,
> Birgit
>
>
>
>
>
>


query with @ and *

2017-09-14 Thread Mannott, Birgit
Hi,

I have a problem when searching on email addresses.
@ seems to be handled as a special character but I don't find anything about it 
in the documentation.

This is my test data
t...@one.com
t...@two.com

searching for test* results both, ok.
searching for t...@one.com results the correct one, ok.
searching for test results both, what I didn't expect but it's ok.
searching for test@one* results none and that's the problem.

Escaping the char @ doesn't change it.
It seems that every query containing @ and * has no result.

Has anyone an idea how to change this?

Thanks,
Birgit







Re: Solr Spatial Index and Data

2017-09-14 Thread Rick Leir

hi Can Ezgi
> First of all, i want to use spatial index for my data include 
polyghons and points. But solr indexed first 18 rows, other rows not 
indexed.


Do all rows have a unique id field?

Are there errors in the logfile?
cheers -- Rick


.


Re: SolrJ Java API examples

2017-09-14 Thread Leonardo Perez Pulido
Hi,
This may help:

https://github.com/leoperezpulido/lucene-solr/tree/master/solr/solrj/src/test/org/apache/solr/client/solrj

Regards.

On Thu, Sep 14, 2017 at 4:21 AM, Vishal Srivastava  wrote:

> Hi,
> I'm a beginner at SolrJ , and am currently looking to implement and
> integrate the same at my current organisation using Java .
>
> After a lot of research, I failed to find any good material / examples for
> SolrJ 's Java library that I could use as reference.
>
> Please suggest some good material.
>
> Thanks a ton.
>
> Vishal Srivastava.
>


SolrJ Java API examples

2017-09-14 Thread Vishal Srivastava
Hi,
I'm a beginner at SolrJ , and am currently looking to implement and
integrate the same at my current organisation using Java .

After a lot of research, I failed to find any good material / examples for
SolrJ 's Java library that I could use as reference.

Please suggest some good material.

Thanks a ton.

Vishal Srivastava.


Re: Provide suggestion on indexing performance

2017-09-14 Thread Sreenivas.T
I agree with Tom. Doc values and stored fields are present for different
reasons. Doc values is another index that gets build for faster
sorting/faceting.

On Wed, Sep 13, 2017 at 11:30 PM Tom Evans  wrote:

> On Tue, Sep 12, 2017 at 4:06 AM, Aman Tandon 
> wrote:
> > Hi,
> >
> > We want to know about the indexing performance in the below mentioned
> > scenarios, consider the total number of 10 string fields and total number
> > of documents are 10 million.
> >
> > 1) indexed=true, stored=true
> > 2) indexed=true, docValues=true
> >
> > Which one should we prefer in terms of indexing performance, please share
> > your experience.
> >
> > With regards,
> > Aman Tandon
>
> Your question doesn't make much sense. You turn on stored when you
> need to retrieve the original contents of the fields after searching,
> and you use docvalues to speed up faceting, sorting and grouping.
> Using docvalues to retrieve values during search is more expensive
> than simply using stored values, so if your primary aim is retrieving
> stored values, use stored=true.
>
> Secondly, the only way to answer performance questions for your schema
> and data is to try it out. Generate 10 million docs, store them in a
> doc (eg as CSV), and then use the post tool to try different schema
> and query options.
>
> Cheers
>
> Tom
>


Solr Spatial Index and Data

2017-09-14 Thread Can Ezgi Aydemir
Hi everyone,



First of all, i want to use spatial index for my data include polyghons and 
points. But solr indexed first 18 rows, other rows not indexed. I need sample 
datas include polyghons and points.



Other problem, i will write spatial query this datas. This spatial query 
include intersect, neighborhood, in etc. Please could you help me this query 
prepare?



Thx for interest.



Best regards.





[cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.]

Can Ezgi AYDEMİR
Oracle Veri Tabanı Yöneticisi

İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye
T : 0 312 233 50 00 .:. F : 0312 235 56 82
E :  
cayde...@islem.com.tr
 .:. W : https://mail.islem.com.tr/owa/redir.aspx?REF=q0Pp2HH-W10G07gbyIRn7NyrFWyaL2QLhqXKE1SMNj1uXODmM8nUCAFodHRwOi8vd3d3LmlzbGVtLmNvbS50ci8.>

Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen 
kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı 
yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size 
gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve 
göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs 
bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek 
herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi 
için:b...@islem.com.tr This message may contain confidential information and is 
intended only for recipient name. If you are not the named addressee you should 
not disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. İŞLEM GIS® accepts no liability for 
any damage may be caused by any virus transmitted by this email.” For 
information: b...@islem.com.tr


Re: Freeze Index

2017-09-14 Thread Toke Eskildsen
On Wed, 2017-09-13 at 11:56 -0700, fabigol wrote:
> my problem is that my index freeze several time and i don't know why.
> So i lost all the data of my index.
> I have 14 million of documents from postgresql database. I have an
> only node with 31 GO for my JVM and my server has 64GO. My index make
> 6 GO on the HDD.
>
> Is it a good configuration?

If you look in the admin GUI, you can see how much memory is actually
used by the JVM. My guess is that it is _way_ lower than 31GB. A 6GB
index is quite small and unless you do special processing, you should
be fine with a 2GB JVM or something like that.

One of the symptoms for having too large a memory allocation for the
JVM are occasional long pauses due to garbage collection. However, you
should not lose anything - it is just a pause. Can you describe in more
detail what you mean by freeze and losing data?

- Toke Eskildsen, Royal Danish Library