Re: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-04 Thread Ere Maijala

Hi,

Solr uses JIRA for issue tickets. You can find it here: 
https://issues.apache.org/jira/browse/SOLR


I'd suggest filing a new bug issue in the SOLR project (note that 
several other projects also use this JIRA installation). Here's an 
example of an existing highlighter issue for reference: 
https://issues.apache.org/jira/browse/SOLR-14019.


See also some brief documentation:

https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker)

Regards,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.58:

Hi Ere

Please to be of service!

No I have not filed a JIRA ticket. I am new to interacting with the Solr
Community and only beginning to 'find my legs'. I am not too sure what JIRA
is I am afraid!

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.



-Original Message-
From: Ere Maijala 
Sent: 01 March 2021 12:53
To: solr-user@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole lot
of trouble. Did you file a JIRA ticket already?

Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:

Hi There

I just came across a situation where a unified highlighting search
under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times

out.

I resolved it by a config change – but it can catch you out. Hence
this email.

With solr 8.8.0 a new unified highlighting parameter
 was implemented which if not set defaults to 0.5.
This attempts to improve the high lighting so that highlighted text
does not appear right at the left. This works well but if you have a
search result with numerous occurrences of the word in question within
the record performance goes right down!

2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select
params={hl.snippets=2=test=on=100=id,d
escription,specification,score=20=*=10&_=161440511913
4}
hits=57008 status=0 QTime=1414320

2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.s.HttpSolrCall Unable to write response, client closed
connection or we are shutting down =>
org.eclipse.jetty.io.EofException

at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

org.eclipse.jetty.io.EofException: null

at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

at
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

at
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

when I set =0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.25=100=2=test
axAnalyzedChars=100=*=unified=9&_=
1614430061690}
hits=136939 status=0 QTime=87024

And  =0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.1=100=2=test
xAnalyzedChars=100=*=unified=9&_=1
614430061690}
hits=136939 status=0 QTime=69033

And =0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.0=100=2=test
xAnalyzedChars=100=*=unified=9&_=1
614430061690}
hits=136939 status=0 QTime=2841

I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully
left aligned).  I am not too sure as to how many time a word has to
occur in a record for performance to go right down – but if too many
it can have a BIG impact.

I also noticed that setting =9 did not break out of
the query until it finished. Perhaps because the query finished
quickly and what took the time was the highlighting. It might be an
idea to get  to also cover any highlighting so that the
query does not run until the jetty timeout is hit. The machine 100%
one core for about
20 mins!.

Hope this helps.

Regards

Matthew

*Matthew Flowerday*| Consultant | ULEAF

Unisys | 01908 774830| matthew.flower...@unisys.com
<mailto:matthew.flower...@unisys.com>

Address Enigma | Wavendon Business Park |

RE: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Flowerday, Matthew J
Hi Ere

Please to be of service!

No I have not filed a JIRA ticket. I am new to interacting with the Solr
Community and only beginning to 'find my legs'. I am not too sure what JIRA
is I am afraid!

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com 
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.
   

-Original Message-
From: Ere Maijala  
Sent: 01 March 2021 12:53
To: solr-user@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole lot
of trouble. Did you file a JIRA ticket already?

Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:
> Hi There
>
> I just came across a situation where a unified highlighting search 
> under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times
out.
> I resolved it by a config change – but it can catch you out. Hence 
> this email.
>
> With solr 8.8.0 a new unified highlighting parameter 
>  was implemented which if not set defaults to 0.5. 
> This attempts to improve the high lighting so that highlighted text 
> does not appear right at the left. This works well but if you have a 
> search result with numerous occurrences of the word in question within 
> the record performance goes right down!
>
> 2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
> params={hl.snippets=2=test=on=100=id,d
> escription,specification,score=20=*=10&_=161440511913
> 4}
> hits=57008 status=0 QTime=1414320
>
> 2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.s.HttpSolrCall Unable to write response, client closed 
> connection or we are shutting down => 
> org.eclipse.jetty.io.EofException
>
>at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
>
> org.eclipse.jetty.io.EofException: null
>
>at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
>at
> org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
>at
> org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
> when I set =0.25 results came back much quicker
>
> 2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.25=100=2=test
> axAnalyzedChars=100=*=unified=9&_=
> 1614430061690}
> hits=136939 status=0 QTime=87024
>
> And  =0.1
>
> 2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.1=100=2=test
> xAnalyzedChars=100=*=unified=9&_=1
> 614430061690}
> hits=136939 status=0 QTime=69033
>
> And =0.0
>
> 2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.0=100=2=test
> xAnalyzedChars=100=*=unified=9&_=1
> 614430061690}
> hits=136939 status=0 QTime=2841
>
> I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully 
> left aligned).  I am not too sure as to how many time a word has to 
> occur in a record for performance to go right down – but if too many 
> it can have a BIG impact.
>
> I also noticed that setting =9 did not break out of 
> the query until it finished. Perhaps because the query finished 
> quickly and what took the time was the highlighting. It might be an 
> idea to get  to also cover any highlighting so that the 
> query does not run until the jetty timeout is hit. The machine 100% 
> one core for about
> 20 mins!.
>
> Hope this helps.
>
> Regards
>
> Matthew
>
> *Matthew Flowerday*| Consultant | ULEAF
>
> Unisys | 01908 774830| matthew.flower...@unisys.com 
> <mailto:matthew.flower...@unisys.com>
>
> Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes |
> MK17 8LX
>
> unisys_logo <http://www.

Re: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Ere Maijala

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole 
lot of trouble. Did you file a JIRA ticket already?


Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:

Hi There

I just came across a situation where a unified highlighting search under 
solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out. 
I resolved it by a config change – but it can catch you out. Hence this 
email.


With solr 8.8.0 a new unified highlighting parameter  
was implemented which if not set defaults to 0.5. This attempts to 
improve the high lighting so that highlighted text does not appear right 
at the left. This works well but if you have a search result with 
numerous occurrences of the word in question within the record 
performance goes right down!


2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
params={hl.snippets=2=test=on=100=id,description,specification,score=20=*=10&_=1614405119134} 
hits=57008 status=0 QTime=1414320


2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.s.HttpSolrCall Unable to write response, client closed connection 
or we are shutting down => org.eclipse.jetty.io.EofException


   at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)


org.eclipse.jetty.io.EofException: null

   at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


   at 
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


   at 
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


when I set =0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.25=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=87024


And  =0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.1=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=69033


And =0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.0=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=2841


I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully 
left aligned).  I am not too sure as to how many time a word has to 
occur in a record for performance to go right down – but if too many it 
can have a BIG impact.


I also noticed that setting =9 did not break out of the 
query until it finished. Perhaps because the query finished quickly and 
what took the time was the highlighting. It might be an idea to get 
 to also cover any highlighting so that the query does not 
run until the jetty timeout is hit. The machine 100% one core for about 
20 mins!.


Hope this helps.

Regards

Matthew

*Matthew Flowerday*| Consultant | ULEAF

Unisys | 01908 774830| matthew.flower...@unisys.com 



Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | 
MK17 8LX


unisys_logo 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all devices.


Grey_LI Grey_TW 
Grey_YT 
Grey_FB 
Grey_Vimeo 
Grey_UB 




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Flowerday, Matthew J
Hi There

 

I just came across a situation where a unified highlighting search under
solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out. I
resolved it by a config change - but it can catch you out. Hence this email.

 

With solr 8.8.0 a new unified highlighting parameter  was
implemented which if not set defaults to 0.5. This attempts to improve the
high lighting so that highlighted text does not appear right at the left.
This works well but if you have a search result with numerous occurrences of
the word in question within the record performance goes right down!

 

2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select
params={hl.snippets=2=test=on=100=id,descrip
tion,specification,score=20=*=10&_=1614405119134}
hits=57008 status=0 QTime=1414320

2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.s.HttpSolrCall Unable to write response, client closed connection or
we are shutting down => org.eclipse.jetty.io.EofException

  at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

org.eclipse.jetty.io.EofException: null

  at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

  at
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

  at
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

 

when I set =0.25 results came back much quicker

 

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,score
tart=1=0.25=100=2=test
ars=100=*=unified=9&_=1614430061690}
hits=136939 status=0 QTime=87024

 

And  =0.1

 

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,score
tart=1=0.1=100=2=test
rs=100=*=unified=9&_=1614430061690}
hits=136939 status=0 QTime=69033

 

And =0.0

 

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,score
tart=1=0.0=100=2=test
rs=100=*=unified=9&_=1614430061690}
hits=136939 status=0 QTime=2841

 

I left our setting at 0.0 - this presumably how it was in 7.7.1 (fully left
aligned).  I am not too sure as to how many time a word has to occur in a
record for performance to go right down - but if too many it can have a BIG
impact.

 

I also noticed that setting =9 did not break out of the
query until it finished. Perhaps because the query finished quickly and what
took the time was the highlighting. It might be an idea to get 
to also cover any highlighting so that the query does not run until the
jetty timeout is hit. The machine 100% one core for about 20 mins!.

 

Hope this helps.

 

Regards

 

Matthew

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

  

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 

  
 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Searching for credit card numbers

2020-07-28 Thread Walter Underwood
If you reindex, I’ve become a big fan of adding a date field with an index 
timestamp.
That will allow you to check whether everything has been reindexed.

   

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 28, 2020, at 2:11 PM, Jörn Franke  wrote:
> 
> A regex search at query time would leave room for attacks (eg a regex can 
> easily be designed to block the Solr server forever).
> 
> If the field is store you can also try to use a cursor to go through all 
> entries using a cursor and reindex the doc based on the field:
> 
> https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html
> 
> This would also imply that you have the other fields stored. Otherwise 
> reindex.
> You can do this in parallel to the existing index and once finished simply 
> change the alias for the collection (that would be without any downtime for 
> the users but you require of course corresponding space).
> 
>> Am 28.07.2020 um 21:06 schrieb lstusr 5u93n4 :
>> 
>> Possible... yes. Agreed that this is the right approach. But if we already
>> have a big index that we're searching through? Any way to "hack it"?
>> 
>>> On Tue, 28 Jul 2020 at 14:55, Walter Underwood 
>>> wrote:
>>> 
>>> I’d do that at index time. Add an update request processor script that
>>> does the regex and adds a field has_credit_card_number:true.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>>>> On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4  wrote:
>>>> 
>>>> Let's say I have a text field that's been indexed with the standard
>>>> tokenizer, and I want to match the docs that have credit card numbers in
>>>> them (this is for altruistic purposes, not nefarious ones!). What's the
>>>> best way to build a search that will do this?
>>>> 
>>>> Searching for "   " seems to return inconsistent results.
>>>> 
>>>> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
>>>> should work, but that's not matching the docs I think it should either...
>>>> 
>>>> Any suggestions?
>>>> 
>>>> Thanks In Advance!
>>> 
>>> 



Re: Searching for credit card numbers

2020-07-28 Thread Jörn Franke
A regex search at query time would leave room for attacks (eg a regex can 
easily be designed to block the Solr server forever).

If the field is store you can also try to use a cursor to go through all 
entries using a cursor and reindex the doc based on the field:

https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html

This would also imply that you have the other fields stored. Otherwise reindex.
You can do this in parallel to the existing index and once finished simply 
change the alias for the collection (that would be without any downtime for the 
users but you require of course corresponding space).

> Am 28.07.2020 um 21:06 schrieb lstusr 5u93n4 :
> 
> Possible... yes. Agreed that this is the right approach. But if we already
> have a big index that we're searching through? Any way to "hack it"?
> 
>> On Tue, 28 Jul 2020 at 14:55, Walter Underwood 
>> wrote:
>> 
>> I’d do that at index time. Add an update request processor script that
>> does the regex and adds a field has_credit_card_number:true.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>>> On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4  wrote:
>>> 
>>> Let's say I have a text field that's been indexed with the standard
>>> tokenizer, and I want to match the docs that have credit card numbers in
>>> them (this is for altruistic purposes, not nefarious ones!). What's the
>>> best way to build a search that will do this?
>>> 
>>> Searching for "   " seems to return inconsistent results.
>>> 
>>> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
>>> should work, but that's not matching the docs I think it should either...
>>> 
>>> Any suggestions?
>>> 
>>> Thanks In Advance!
>> 
>> 


Re: Searching for credit card numbers

2020-07-28 Thread lstusr 5u93n4
Possible... yes. Agreed that this is the right approach. But if we already
have a big index that we're searching through? Any way to "hack it"?

On Tue, 28 Jul 2020 at 14:55, Walter Underwood 
wrote:

> I’d do that at index time. Add an update request processor script that
> does the regex and adds a field has_credit_card_number:true.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4  wrote:
> >
> > Let's say I have a text field that's been indexed with the standard
> > tokenizer, and I want to match the docs that have credit card numbers in
> > them (this is for altruistic purposes, not nefarious ones!). What's the
> > best way to build a search that will do this?
> >
> > Searching for "   " seems to return inconsistent results.
> >
> > Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
> > should work, but that's not matching the docs I think it should either...
> >
> > Any suggestions?
> >
> > Thanks In Advance!
>
>


Re: Searching for credit card numbers

2020-07-28 Thread Walter Underwood
I’d do that at index time. Add an update request processor script that
does the regex and adds a field has_credit_card_number:true.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4  wrote:
> 
> Let's say I have a text field that's been indexed with the standard
> tokenizer, and I want to match the docs that have credit card numbers in
> them (this is for altruistic purposes, not nefarious ones!). What's the
> best way to build a search that will do this?
> 
> Searching for "   " seems to return inconsistent results.
> 
> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
> should work, but that's not matching the docs I think it should either...
> 
> Any suggestions?
> 
> Thanks In Advance!



Searching for credit card numbers

2020-07-28 Thread lstusr 5u93n4
Let's say I have a text field that's been indexed with the standard
tokenizer, and I want to match the docs that have credit card numbers in
them (this is for altruistic purposes, not nefarious ones!). What's the
best way to build a search that will do this?

Searching for "   " seems to return inconsistent results.

Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
should work, but that's not matching the docs I think it should either...

Any suggestions?

Thanks In Advance!


Re: Searching document content and mult-valued fields

2020-07-06 Thread Emir Arnautović
Hi Shaun,
If project content is relatively static, you could use nested documents 
 or 
you could plain with join query parser 
.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Jul 2020, at 18:19, Shaun Campbell  wrote:
> 
> Hi
> 
> Been using Solr on a project now for a couple of years and is working well.
> It's just a simple index of about 20 - 25 fields and 7,000 project records.
> 
> Now there's a requirement to be able to search on the content of documents
> (web pages, Word, pdf etc) related to those projects.  My initial thought
> was to just create a new index to store the Tika'd content and just search
> on that. However, the requirement is to somehow search through both the
> project records and the content records at the same time and list the main
> project with perhaps some info on the matching content data. I tried to
> explain that you may find matching main project records but no content, and
> vice versa.
> 
> My only solution to this search problem is to either concatenate all the
> document content into one field on the main project record, and add that to
> my dismax search, and use boosting etc or to use a multi-valued field to
> store the content of each project document.  I'm a bit reluctant to do this
> as the application is running well and I'm a bit nervous about a change to
> the schema and the indexing process.  I just wondered what you thought
> about adding a lot of content to an existing schema (single or multivalued
> field) that doesn't normally store big amounts of data.
> 
> Or does anyone know of any way, I can join two searches like this together
> and two separate indexes?
> 
> Thanks
> Shaun



Searching document content and mult-valued fields

2020-07-01 Thread Shaun Campbell
Hi

Been using Solr on a project now for a couple of years and is working well.
It's just a simple index of about 20 - 25 fields and 7,000 project records.

Now there's a requirement to be able to search on the content of documents
(web pages, Word, pdf etc) related to those projects.  My initial thought
was to just create a new index to store the Tika'd content and just search
on that. However, the requirement is to somehow search through both the
project records and the content records at the same time and list the main
project with perhaps some info on the matching content data. I tried to
explain that you may find matching main project records but no content, and
vice versa.

My only solution to this search problem is to either concatenate all the
document content into one field on the main project record, and add that to
my dismax search, and use boosting etc or to use a multi-valued field to
store the content of each project document.  I'm a bit reluctant to do this
as the application is running well and I'm a bit nervous about a change to
the schema and the indexing process.  I just wondered what you thought
about adding a lot of content to an existing schema (single or multivalued
field) that doesn't normally store big amounts of data.

Or does anyone know of any way, I can join two searches like this together
and two separate indexes?

Thanks
Shaun


Re: HTTP 401 when searching on alias in secured Solr

2020-06-16 Thread Jason Gerlowski
Just wanted to close the loop here: Isabelle filed SOLR-14569 for this
and eventually reported there that the problem seems specific to her
custom configuration which specifies a seemingly innocuous
 in solrconfig.xml.

See that jira for more detailed explanation (and hopefully a
resolution coming soon).

On Wed, Jun 10, 2020 at 4:01 PM Jan Høydahl  wrote:
>
> Please share your security.json file
>
> Jan Høydahl
>
> > 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
> > :
> >
> > Hi;
> >
> > I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can 
> > log in the Solr Admin UI.  I can create collections and aliases, and I can 
> > index documents in Solr.
> >
> > Collections : test1, test2
> > Alias: test (combines test1, test2)
> >
> > Indexed document "solr-word.pdf" in collection test1
> >
> > Searching on a collection works:
> > http://localhost:8983/solr/test1/select?q=*:*=xml
> > 
> >
> > But searching on an alias results in HTTP 401
> > http://localhost:8983/solr/test/select?q=*:*=xml
> >
> > Error from server at null: Expected mime type application/octet-stream but 
> > got text/html.> content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
> > Response code: 401  HTTP ERROR 401 Authentication 
> > failed, Response code: 401  
> > URI:/solr/test1_shard1_replica_n1/select 
> > STATUS:401 
> > MESSAGE:Authentication failed, Response code: 
> > 401 SERVLET:default   
> > 
> >
> > Even if https://issues.apache.org/jira/browse/SOLR-13510 is fixed in Solr 
> > 8.5.0, I did try to start Solr with -Dsolr.http1=true, and I set 
> > "forwardCredentials":true in security.json.
> >
> > Nothing works.  I just cannot use aliases when Solr is secured.
> >
> > Can anyone confirm if this may be a configuration issue, or if this could 
> > possibly be a bug ?
> >
> > Thank you;
> >
> > Isabelle Giguère
> > Computational Linguist & Java Developer
> > Linguiste informaticienne & développeur java
> >
> >


Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-15 Thread Isabelle Giguere
Thank you for the input, Aroop.

It is probably a red herring.  I will have to pick the configuration apart 
piece by piece.  Sigh.

It's probably not a node down issue, since I'm only setting up one node.

(Reporting an unrelated error message should probably be considered a bug 
anyways.)

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Aroop Ganguly 
Envoyé : 14 juin 2020 17:37
À : solr-user@lucene.apache.org 
Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

Isabele sometime 401’s are a red herring for other issues un related to auth.
We have had issues on 7.7 where an underlying transient replica recovery and/or 
leader down situation where the only message we got back from Solr was a 401.
Please see if u have any down replicas or other issues where certain nodes may 
have trouble getting more current information from zookeeper.


> On Jun 14, 2020, at 2:13 PM, Isabelle Giguere  <mailto:igigu...@opentext.com.INVALID>> wrote:
>
> I have created 
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SOLR-14569__;!!Obbck6kTJA!PBs90R0pHCmvm6hGqjUeowZNMwhTEibIfLyr8_szdm0Jh-s9okdbuGya_nBlsjED$
>   
> <https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SOLR-14569__;!!Obbck6kTJA!PBs90R0pHCmvm6hGqjUeowZNMwhTEibIfLyr8_szdm0Jh-s9okdbuGya_nBlsjED$
>  >
> It includes a patch with the unit test to reproduce the issue, and a 
> simplification of our product-specific configuration, with instructions.
>
> Let's catch up on Jira.
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>
> 
> De : Jan Høydahl mailto:jan@cominvent.com>>
> Envoyé : 13 juin 2020 17:50
> À : solr-user  <mailto:solr-user@lucene.apache.org>>
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>
> I did not manage to reproduce. Feel free to open the JIRA and attach the 
> failing test. In the issue description, it is great if you manage to describe 
> the reproduction steps in a clean way, so anyone can reproduce with a minimal 
> neccessary config.
>
> Jan
>
>> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
>> mailto:igigu...@opentext.com.INVALID>>:
>>
>> Hello again;
>>
>> I have managed to reproduce the issue in a unit test.  I should probably add 
>> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
>>
>> Meanwhile, for your suggested queries:
>>
>> 1.  Query on the collection:
>>
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>>  
>> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$>
>> HTTP/1.1 200 OK
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 8214
>>
>> 
>> 
>>
>> 
>> true
>> 0
>> 2
>> 
>>   *:*
>> 
>> 
>> 
>> Response contains the Solr document, of course
>>
>>
>> 2. Query on the alias
>>
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$
>>  
>> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$><https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>>  
>> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$>
>>  >
>> HTTP/1.1 401 Unauthorized
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'se

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-14 Thread Aroop Ganguly
Isabele sometime 401’s are a red herring for other issues un related to auth.
We have had issues on 7.7 where an underlying transient replica recovery and/or 
leader down situation where the only message we got back from Solr was a 401.
Please see if u have any down replicas or other issues where certain nodes may 
have trouble getting more current information from zookeeper.


> On Jun 14, 2020, at 2:13 PM, Isabelle Giguere  <mailto:igigu...@opentext.com.INVALID>> wrote:
> 
> I have created https://issues.apache.org/jira/browse/SOLR-14569 
> <https://issues.apache.org/jira/browse/SOLR-14569>
> It includes a patch with the unit test to reproduce the issue, and a 
> simplification of our product-specific configuration, with instructions.
> 
> Let's catch up on Jira.
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Jan Høydahl mailto:jan@cominvent.com>>
> Envoyé : 13 juin 2020 17:50
> À : solr-user  <mailto:solr-user@lucene.apache.org>>
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> I did not manage to reproduce. Feel free to open the JIRA and attach the 
> failing test. In the issue description, it is great if you manage to describe 
> the reproduction steps in a clean way, so anyone can reproduce with a minimal 
> neccessary config.
> 
> Jan
> 
>> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
>> mailto:igigu...@opentext.com.INVALID>>:
>> 
>> Hello again;
>> 
>> I have managed to reproduce the issue in a unit test.  I should probably add 
>> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
>> 
>> Meanwhile, for your suggested queries:
>> 
>> 1.  Query on the collection:
>> 
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>>  
>> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$>
>> HTTP/1.1 200 OK
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 8214
>> 
>> 
>> 
>> 
>> 
>> true
>> 0
>> 2
>> 
>>   *:*
>> 
>> 
>> 
>> Response contains the Solr document, of course
>> 
>> 
>> 2. Query on the alias
>> 
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$
>>  
>> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$><https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>>  
>> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$>
>>  >
>> HTTP/1.1 401 Unauthorized
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Cache-Control: no-cache, no-store
>> Pragma: no-cache
>> Expires: Sat, 01 Jan 2000 01:00:00 GMT
>> Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
>> ETag: "172aaa7c1eb"
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 1332
>> 
>> 
>> 
>> 
>> 
>> true
>> 401
>> 16
>> 
>>   *:*
>> 
>> 
>> 
>> Error contains the full html HTTP 401 message (with escaped characters, of 
>> course)
>> Gist of it : HTTP ERROR 401 require authentication
>> 
>> Thanks;

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-14 Thread Isabelle Giguere
I have created https://issues.apache.org/jira/browse/SOLR-14569
It includes a patch with the unit test to reproduce the issue, and a 
simplification of our product-specific configuration, with instructions.

Let's catch up on Jira.

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 13 juin 2020 17:50
À : solr-user 
Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

I did not manage to reproduce. Feel free to open the JIRA and attach the 
failing test. In the issue description, it is great if you manage to describe 
the reproduction steps in a clean way, so anyone can reproduce with a minimal 
neccessary config.

Jan

> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
> :
>
> Hello again;
>
> I have managed to reproduce the issue in a unit test.  I should probably add 
> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
>
> Meanwhile, for your suggested queries:
>
>  1.  Query on the collection:
>
> curl -i -u admin:admin 
> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/xml; charset=UTF-8
> Content-Length: 8214
>
> 
> 
>
> 
>  true
>  0
>  2
>  
>*:*
>  
> 
> 
> Response contains the Solr document, of course
>
>
> 2. Query on the alias
>
> curl -i -u admin:admin 
> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$
>  
> <https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>  >
> HTTP/1.1 401 Unauthorized
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Cache-Control: no-cache, no-store
> Pragma: no-cache
> Expires: Sat, 01 Jan 2000 01:00:00 GMT
> Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
> ETag: "172aaa7c1eb"
> Content-Type: application/xml; charset=UTF-8
> Content-Length: 1332
>
> 
> 
>
> 
>  true
>  401
>  16
>  
>*:*
>  
> 
> 
> Error contains the full html HTTP 401 message (with escaped characters, of 
> course)
> Gist of it : HTTP ERROR 401 require authentication
>
> Thanks;
>
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>
> 
> De : Jan Høydahl 
> Envoyé : 12 juin 2020 17:30
> À : solr-user@lucene.apache.org 
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>
> I’d say, try the query with curl and enable http headers
>
> curl -i —user admin:admin http://localhost:8983/solr/mycollection/select?q=*:*
> curl -i —user admin:admin http://localhost:8983/solr/myalias/select?q=*:*
>
> Are you saying that you see a difference between the two? What are the 
> headers?
>
> Jan
>
>> 12. jun. 2020 kl. 20:06 skrev Isabelle Giguere 
>> :
>>
>> Hi Jan
>>
>> Thank you for your time on this.
>>
>> If I send a /select request directly on the alias (/solr/test/select), the 
>> browser asks for credentials, but the Solr response returns status=401 and 
>> an html error message with "HTTP ERROR 401 require authentication"
>>
>> Obviously, my expectation was that some query results would be returned.
>>
>> Since you can't reproduce the issue, I have to assume it's a configuration 
>> issue.
>>
>> So, if I may, let me provide as much details as I can about my setup.
>>
>> Can anyone see something wrong here, some incompatibility ?
>>
>> Solr 8.5.0
>>
>> solrconfig.xml
>> 7.1.0
>> 
>> 
>> 
>> 
>>   
>>   5
>>   5
>> 

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-13 Thread Jan Høydahl
I did not manage to reproduce. Feel free to open the JIRA and attach the 
failing test. In the issue description, it is great if you manage to describe 
the reproduction steps in a clean way, so anyone can reproduce with a minimal 
neccessary config.

Jan

> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
> :
> 
> Hello again;
> 
> I have managed to reproduce the issue in a unit test.  I should probably add 
> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
> 
> Meanwhile, for your suggested queries:
> 
>  1.  Query on the collection:
> 
> curl -i -u admin:admin http://10.5.106.115:8985/solr/test1/select?q=*:*=xml
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/xml; charset=UTF-8
> Content-Length: 8214
> 
> 
> 
> 
> 
>  true
>  0
>  2
>  
>*:*
>  
> 
> 
> Response contains the Solr document, of course
> 
> 
> 2. Query on the alias
> 
> curl -i -u admin:admin 
> http://10.5.106.115:8985/solr/test/select?q=*:*=xml<http://10.5.106.115:8985/solr/test1/select?q=*:*=xml>
> HTTP/1.1 401 Unauthorized
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Cache-Control: no-cache, no-store
> Pragma: no-cache
> Expires: Sat, 01 Jan 2000 01:00:00 GMT
> Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
> ETag: "172aaa7c1eb"
> Content-Type: application/xml; charset=UTF-8
> Content-Length: 1332
> 
> 
> 
> 
> 
>  true
>  401
>  16
>  
>*:*
>  
> 
> 
> Error contains the full html HTTP 401 message (with escaped characters, of 
> course)
> Gist of it : HTTP ERROR 401 require authentication
> 
> Thanks;
> 
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Jan Høydahl 
> Envoyé : 12 juin 2020 17:30
> À : solr-user@lucene.apache.org 
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> I’d say, try the query with curl and enable http headers
> 
> curl -i —user admin:admin http://localhost:8983/solr/mycollection/select?q=*:*
> curl -i —user admin:admin http://localhost:8983/solr/myalias/select?q=*:*
> 
> Are you saying that you see a difference between the two? What are the 
> headers?
> 
> Jan
> 
>> 12. jun. 2020 kl. 20:06 skrev Isabelle Giguere 
>> :
>> 
>> Hi Jan
>> 
>> Thank you for your time on this.
>> 
>> If I send a /select request directly on the alias (/solr/test/select), the 
>> browser asks for credentials, but the Solr response returns status=401 and 
>> an html error message with "HTTP ERROR 401 require authentication"
>> 
>> Obviously, my expectation was that some query results would be returned.
>> 
>> Since you can't reproduce the issue, I have to assume it's a configuration 
>> issue.
>> 
>> So, if I may, let me provide as much details as I can about my setup.
>> 
>> Can anyone see something wrong here, some incompatibility ?
>> 
>> Solr 8.5.0
>> 
>> solrconfig.xml
>> 7.1.0
>> 
>> 
>> 
>> 
>>   
>>   5
>>   5
>>   5
>>   
>> 
>> schema.xml
>> version=1.6
>> Some warnings on start-up about Trie* fields and deprecated filters (we 
>> should fix that)
>> 
>> security.json in Zookeeper, at the Solr ZK root (provided on this thread)
>> blockUnknown : (true|false) = no change in behavior for me, for this issue
>> forwardCredentials : (true|false) = no change in behavior for me, for this 
>> issue
>> 
>> No SSL
>> 
>> solr.in.sh
>> SOLR_AUTH_TYPE="basic"
>> SOLR_AUTHENTICATION_OPTS="-Dbasicauth=admin:admin"
>> 
>> start command params:
>> solr start -force -c -m 4g -h  -p  -z 
>> :/
>> 
>> 
>> Am I missing anything ?
>> 
>> Thank you.
>> 
>> 
>>

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-12 Thread Isabelle Giguere
Hello again;

I have managed to reproduce the issue in a unit test.  I should probably add a 
Jira ticket with a patch for the unit test On Solr 8.5.0, not master.

Meanwhile, for your suggested queries:

  1.  Query on the collection:

curl -i -u admin:admin http://10.5.106.115:8985/solr/test1/select?q=*:*=xml
 HTTP/1.1 200 OK
Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self'; 
worker-src 'self';
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content-Type: application/xml; charset=UTF-8
Content-Length: 8214





  true
  0
  2
  
*:*
  


Response contains the Solr document, of course


2. Query on the alias

curl -i -u admin:admin 
http://10.5.106.115:8985/solr/test/select?q=*:*=xml<http://10.5.106.115:8985/solr/test1/select?q=*:*=xml>
 HTTP/1.1 401 Unauthorized
Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self'; 
worker-src 'self';
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store
Pragma: no-cache
Expires: Sat, 01 Jan 2000 01:00:00 GMT
Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
ETag: "172aaa7c1eb"
Content-Type: application/xml; charset=UTF-8
Content-Length: 1332





  true
  401
  16
  
*:*
  


Error contains the full html HTTP 401 message (with escaped characters, of 
course)
Gist of it : HTTP ERROR 401 require authentication

Thanks;


Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 12 juin 2020 17:30
À : solr-user@lucene.apache.org 
Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

I’d say, try the query with curl and enable http headers

curl -i —user admin:admin http://localhost:8983/solr/mycollection/select?q=*:*
curl -i —user admin:admin http://localhost:8983/solr/myalias/select?q=*:*

Are you saying that you see a difference between the two? What are the headers?

Jan

> 12. jun. 2020 kl. 20:06 skrev Isabelle Giguere 
> :
>
> Hi Jan
>
> Thank you for your time on this.
>
> If I send a /select request directly on the alias (/solr/test/select), the 
> browser asks for credentials, but the Solr response returns status=401 and an 
> html error message with "HTTP ERROR 401 require authentication"
>
> Obviously, my expectation was that some query results would be returned.
>
> Since you can't reproduce the issue, I have to assume it's a configuration 
> issue.
>
> So, if I may, let me provide as much details as I can about my setup.
>
> Can anyone see something wrong here, some incompatibility ?
>
> Solr 8.5.0
>
> solrconfig.xml
> 7.1.0
> 
> 
> 
> 
>
>5
>5
>5
>
>
> schema.xml
> version=1.6
> Some warnings on start-up about Trie* fields and deprecated filters (we 
> should fix that)
>
> security.json in Zookeeper, at the Solr ZK root (provided on this thread)
> blockUnknown : (true|false) = no change in behavior for me, for this issue
> forwardCredentials : (true|false) = no change in behavior for me, for this 
> issue
>
> No SSL
>
> solr.in.sh
> SOLR_AUTH_TYPE="basic"
> SOLR_AUTHENTICATION_OPTS="-Dbasicauth=admin:admin"
>
> start command params:
> solr start -force -c -m 4g -h  -p  -z 
> :/
>
>
> Am I missing anything ?
>
> Thank you.
>
> 
>
> My investigation so far:
>
> I have set logging levels to TRACE for anything related to HTTP, HTTP2, 
> Authorization, Authentication...
>
> Judging by a comment in 
> org.apache.solr.core.CoreContainer.setupHttpClientForAuthPlugin(Object), I 
> should see some logging from PKIAuthenticationPlugin, no matter what plugin 
> is actually used, and regardless if forwardCredentials is true or false:
> Comment:
> // Always register PKI auth interceptor, which will then delegate the 
> decision of who should secure
> // each request to the configured authentication plugin.
>
> Expected log message from 
> org.apache.solr.security.PKIAuthenticationPlugin.setup(Http2SolrClient) 
> and/or from 
> org.apache.solr.security.PKIAuthenticationPlugin.HttpHeaderClientInterceptor.process(HttpRequest,
>  HttpContext)
>
> When running a request on an alias, I only see the expected log message from 
> /admin requests, never for /select requests.
>
> Of course, if my configura

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-12 Thread Jan Høydahl
I’d say, try the query with curl and enable http headers

curl -i —user admin:admin http://localhost:8983/solr/mycollection/select?q=*:*
curl -i —user admin:admin http://localhost:8983/solr/myalias/select?q=*:*

Are you saying that you see a difference between the two? What are the headers?

Jan

> 12. jun. 2020 kl. 20:06 skrev Isabelle Giguere 
> :
> 
> Hi Jan
> 
> Thank you for your time on this.
> 
> If I send a /select request directly on the alias (/solr/test/select), the 
> browser asks for credentials, but the Solr response returns status=401 and an 
> html error message with "HTTP ERROR 401 require authentication"
> 
> Obviously, my expectation was that some query results would be returned.
> 
> Since you can't reproduce the issue, I have to assume it's a configuration 
> issue.
> 
> So, if I may, let me provide as much details as I can about my setup.
> 
> Can anyone see something wrong here, some incompatibility ?
> 
> Solr 8.5.0
> 
> solrconfig.xml
> 7.1.0
> 
> 
> 
> 
>
>5
>5
>5
>
> 
> schema.xml
> version=1.6
> Some warnings on start-up about Trie* fields and deprecated filters (we 
> should fix that)
> 
> security.json in Zookeeper, at the Solr ZK root (provided on this thread)
> blockUnknown : (true|false) = no change in behavior for me, for this issue
> forwardCredentials : (true|false) = no change in behavior for me, for this 
> issue
> 
> No SSL
> 
> solr.in.sh
> SOLR_AUTH_TYPE="basic"
> SOLR_AUTHENTICATION_OPTS="-Dbasicauth=admin:admin"
> 
> start command params:
> solr start -force -c -m 4g -h  -p  -z 
> :/
> 
> 
> Am I missing anything ?
> 
> Thank you.
> 
> 
> 
> My investigation so far:
> 
> I have set logging levels to TRACE for anything related to HTTP, HTTP2, 
> Authorization, Authentication...
> 
> Judging by a comment in 
> org.apache.solr.core.CoreContainer.setupHttpClientForAuthPlugin(Object), I 
> should see some logging from PKIAuthenticationPlugin, no matter what plugin 
> is actually used, and regardless if forwardCredentials is true or false:
> Comment:
> // Always register PKI auth interceptor, which will then delegate the 
> decision of who should secure
> // each request to the configured authentication plugin.
> 
> Expected log message from 
> org.apache.solr.security.PKIAuthenticationPlugin.setup(Http2SolrClient) 
> and/or from 
> org.apache.solr.security.PKIAuthenticationPlugin.HttpHeaderClientInterceptor.process(HttpRequest,
>  HttpContext)
> 
> When running a request on an alias, I only see the expected log message from 
> /admin requests, never for /select requests.
> 
> Of course, if my configuration is wrong, then my code and log analysis is 
> useless.
> 
> **
> 
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Jan Høydahl 
> Envoyé : 12 juin 2020 06:55
> À : solr-user@lucene.apache.org 
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> Hi
> 
> I tried to reproduce, but I can successfully search both the collection and 
> the alias. Both collection and alias promt for password, and when giving the 
> password the search succeeds.
> 
> What was your expectation?
> 
> Jan
> 
>> 11. jun. 2020 kl. 16:53 skrev Isabelle Giguere 
>> :
>> 
>> Some extra info:
>> Collections have 1 shard, 1 replica.  Only 1 Solr node running.
>> 
>> The HTTP 401 is not intermittent, as reported in SOLR-13421 and SOLR-13510.
>> 
>> Any request to the alias fails.
>> 
>> Thanks;
>> 
>> Isabelle Giguère
>> Computational Linguist & Java Developer
>> Linguiste informaticienne & développeur java
>> 
>> 
>> 
>> De : Isabelle Giguere 
>> Envoyé : 10 juin 2020 16:11
>> À : solr-user@lucene.apache.org 
>> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>> 
>> Hi Jan;
>> 
>> Thank you for your reply.
>> 
>> This is security.json as seen in Zookeeper.  Credentials are admin / admin
>> 
>> {
>> "authentication":{
>>   "blockUnknown":false,
>>   "realm":"MTM Solr",
>>   "forwardCredentials":true,
>>   "class":"solr.BasicAuthPlugin",
>>   "credentials":{"admin":"0rTOgObKYwzSyPoYuj2su2/90eQCfysF1aasxTx+wrc= 
>> +tCMmpawYYtTsp3JfkG9avb8bKZ

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-12 Thread Isabelle Giguere
Hi Jan

Thank you for your time on this.

If I send a /select request directly on the alias (/solr/test/select), the 
browser asks for credentials, but the Solr response returns status=401 and an 
html error message with "HTTP ERROR 401 require authentication"

Obviously, my expectation was that some query results would be returned.

Since you can't reproduce the issue, I have to assume it's a configuration 
issue.

So, if I may, let me provide as much details as I can about my setup.

Can anyone see something wrong here, some incompatibility ?

Solr 8.5.0

solrconfig.xml
7.1.0





5
5
5


schema.xml
version=1.6
Some warnings on start-up about Trie* fields and deprecated filters (we should 
fix that)

security.json in Zookeeper, at the Solr ZK root (provided on this thread)
blockUnknown : (true|false) = no change in behavior for me, for this issue
forwardCredentials : (true|false) = no change in behavior for me, for this issue

No SSL

solr.in.sh
SOLR_AUTH_TYPE="basic"
SOLR_AUTHENTICATION_OPTS="-Dbasicauth=admin:admin"

start command params:
solr start -force -c -m 4g -h  -p  -z 
:/


Am I missing anything ?

Thank you.



My investigation so far:

I have set logging levels to TRACE for anything related to HTTP, HTTP2, 
Authorization, Authentication...

Judging by a comment in 
org.apache.solr.core.CoreContainer.setupHttpClientForAuthPlugin(Object), I 
should see some logging from PKIAuthenticationPlugin, no matter what plugin is 
actually used, and regardless if forwardCredentials is true or false:
Comment:
// Always register PKI auth interceptor, which will then delegate the decision 
of who should secure
// each request to the configured authentication plugin.

Expected log message from 
org.apache.solr.security.PKIAuthenticationPlugin.setup(Http2SolrClient) and/or 
from 
org.apache.solr.security.PKIAuthenticationPlugin.HttpHeaderClientInterceptor.process(HttpRequest,
 HttpContext)

When running a request on an alias, I only see the expected log message from 
/admin requests, never for /select requests.

Of course, if my configuration is wrong, then my code and log analysis is 
useless.

**


Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 12 juin 2020 06:55
À : solr-user@lucene.apache.org 
Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

Hi

I tried to reproduce, but I can successfully search both the collection and the 
alias. Both collection and alias promt for password, and when giving the 
password the search succeeds.

What was your expectation?

Jan

> 11. jun. 2020 kl. 16:53 skrev Isabelle Giguere 
> :
>
> Some extra info:
> Collections have 1 shard, 1 replica.  Only 1 Solr node running.
>
> The HTTP 401 is not intermittent, as reported in SOLR-13421 and SOLR-13510.
>
> Any request to the alias fails.
>
> Thanks;
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>
> 
> De : Isabelle Giguere 
> Envoyé : 10 juin 2020 16:11
> À : solr-user@lucene.apache.org 
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>
> Hi Jan;
>
> Thank you for your reply.
>
> This is security.json as seen in Zookeeper.  Credentials are admin / admin
>
> {
>  "authentication":{
>"blockUnknown":false,
>"realm":"MTM Solr",
>"forwardCredentials":true,
>"class":"solr.BasicAuthPlugin",
>"credentials":{"admin":"0rTOgObKYwzSyPoYuj2su2/90eQCfysF1aasxTx+wrc= 
> +tCMmpawYYtTsp3JfkG9avb8bKZlm/IGTZirsufYvns="},
>"":{"v":2}},
>  "authorization":{
>"class":"solr.RuleBasedAuthorizationPlugin",
>"permissions":[{
>"name":"all",
>"role":"admin"}],
>"user-role":{"admin":"admin"},
>"":{"v":8}}}
>
> Thanks for feedback
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>
> 
> De : Jan Høydahl 
> Envoyé : 10 juin 2020 16:01
> À : solr-user@lucene.apache.org 
> Objet : [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>
> Please share your security.json file
>
> Jan Høydahl
>
>> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
>> :
>>
>> Hi;
>>
>> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can 
&g

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-12 Thread Jan Høydahl
Hi

I tried to reproduce, but I can successfully search both the collection and the 
alias. Both collection and alias promt for password, and when giving the 
password the search succeeds.

What was your expectation?

Jan

> 11. jun. 2020 kl. 16:53 skrev Isabelle Giguere 
> :
> 
> Some extra info:
> Collections have 1 shard, 1 replica.  Only 1 Solr node running.
> 
> The HTTP 401 is not intermittent, as reported in SOLR-13421 and SOLR-13510.
> 
> Any request to the alias fails.
> 
> Thanks;
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Isabelle Giguere 
> Envoyé : 10 juin 2020 16:11
> À : solr-user@lucene.apache.org 
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> Hi Jan;
> 
> Thank you for your reply.
> 
> This is security.json as seen in Zookeeper.  Credentials are admin / admin
> 
> {
>  "authentication":{
>"blockUnknown":false,
>"realm":"MTM Solr",
>"forwardCredentials":true,
>"class":"solr.BasicAuthPlugin",
>"credentials":{"admin":"0rTOgObKYwzSyPoYuj2su2/90eQCfysF1aasxTx+wrc= 
> +tCMmpawYYtTsp3JfkG9avb8bKZlm/IGTZirsufYvns="},
>"":{"v":2}},
>  "authorization":{
>"class":"solr.RuleBasedAuthorizationPlugin",
>"permissions":[{
>"name":"all",
>"role":"admin"}],
>"user-role":{"admin":"admin"},
>"":{"v":8}}}
> 
> Thanks for feedback
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Jan Høydahl 
> Envoyé : 10 juin 2020 16:01
> À : solr-user@lucene.apache.org 
> Objet : [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> Please share your security.json file
> 
> Jan Høydahl
> 
>> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
>> :
>> 
>> Hi;
>> 
>> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can 
>> log in the Solr Admin UI.  I can create collections and aliases, and I can 
>> index documents in Solr.
>> 
>> Collections : test1, test2
>> Alias: test (combines test1, test2)
>> 
>> Indexed document "solr-word.pdf" in collection test1
>> 
>> Searching on a collection works:
>> http://localhost:8983/solr/test1/select?q=*:*=xml
>> 
>> 
>> But searching on an alias results in HTTP 401
>> http://localhost:8983/solr/test/select?q=*:*=xml
>> 
>> Error from server at null: Expected mime type application/octet-stream but 
>> got text/html.   > content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
>> Response code: 401  HTTP ERROR 401 Authentication 
>> failed, Response code: 401  
>> URI:/solr/test1_shard1_replica_n1/select 
>> STATUS:401 
>> MESSAGE:Authentication failed, Response code: 401 
>> SERVLET:default   
>> 
>> Even if 
>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SOLR-13510__;!!Obbck6kTJA!P6ugA-rw1I80PaH0U_GVasNqn8EXwmVQ33lwcPOU-cvNgTJK6-3zAf8ukzvv3ynJ$
>>   is fixed in Solr 8.5.0, I did try to start Solr with -Dsolr.http1=true, 
>> and I set "forwardCredentials":true in security.json.
>> 
>> Nothing works.  I just cannot use aliases when Solr is secured.
>> 
>> Can anyone confirm if this may be a configuration issue, or if this could 
>> possibly be a bug ?
>> 
>> Thank you;
>> 
>> Isabelle Giguère
>> Computational Linguist & Java Developer
>> Linguiste informaticienne & développeur java
>> 
>> 



Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-11 Thread Isabelle Giguere
Some extra info:
Collections have 1 shard, 1 replica.  Only 1 Solr node running.

The HTTP 401 is not intermittent, as reported in SOLR-13421 and SOLR-13510.

Any request to the alias fails.

Thanks;

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Isabelle Giguere 
Envoyé : 10 juin 2020 16:11
À : solr-user@lucene.apache.org 
Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

Hi Jan;

Thank you for your reply.

This is security.json as seen in Zookeeper.  Credentials are admin / admin

{
  "authentication":{
"blockUnknown":false,
"realm":"MTM Solr",
"forwardCredentials":true,
"class":"solr.BasicAuthPlugin",
"credentials":{"admin":"0rTOgObKYwzSyPoYuj2su2/90eQCfysF1aasxTx+wrc= 
+tCMmpawYYtTsp3JfkG9avb8bKZlm/IGTZirsufYvns="},
"":{"v":2}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{
"name":"all",
"role":"admin"}],
"user-role":{"admin":"admin"},
"":{"v":8}}}

Thanks for feedback

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 10 juin 2020 16:01
À : solr-user@lucene.apache.org 
Objet : [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

Please share your security.json file

Jan Høydahl

> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
> :
>
> Hi;
>
> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
> in the Solr Admin UI.  I can create collections and aliases, and I can index 
> documents in Solr.
>
> Collections : test1, test2
> Alias: test (combines test1, test2)
>
> Indexed document "solr-word.pdf" in collection test1
>
> Searching on a collection works:
> http://localhost:8983/solr/test1/select?q=*:*=xml
> 
>
> But searching on an alias results in HTTP 401
> http://localhost:8983/solr/test/select?q=*:*=xml
>
> Error from server at null: Expected mime type application/octet-stream but 
> got text/html.content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
> Response code: 401  HTTP ERROR 401 Authentication 
> failed, Response code: 401  
> URI:/solr/test1_shard1_replica_n1/select 
> STATUS:401 MESSAGE:Authentication 
> failed, Response code: 401 
> SERVLET:default   
>
> Even if 
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SOLR-13510__;!!Obbck6kTJA!P6ugA-rw1I80PaH0U_GVasNqn8EXwmVQ33lwcPOU-cvNgTJK6-3zAf8ukzvv3ynJ$
>   is fixed in Solr 8.5.0, I did try to start Solr with -Dsolr.http1=true, and 
> I set "forwardCredentials":true in security.json.
>
> Nothing works.  I just cannot use aliases when Solr is secured.
>
> Can anyone confirm if this may be a configuration issue, or if this could 
> possibly be a bug ?
>
> Thank you;
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>


Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-10 Thread Isabelle Giguere
Hi Jan;

Thank you for your reply.

This is security.json as seen in Zookeeper.  Credentials are admin / admin

{
  "authentication":{
"blockUnknown":false,
"realm":"MTM Solr",
"forwardCredentials":true,
"class":"solr.BasicAuthPlugin",
"credentials":{"admin":"0rTOgObKYwzSyPoYuj2su2/90eQCfysF1aasxTx+wrc= 
+tCMmpawYYtTsp3JfkG9avb8bKZlm/IGTZirsufYvns="},
"":{"v":2}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{
"name":"all",
"role":"admin"}],
"user-role":{"admin":"admin"},
"":{"v":8}}}

Thanks for feedback

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 10 juin 2020 16:01
À : solr-user@lucene.apache.org 
Objet : [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

Please share your security.json file

Jan Høydahl

> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
> :
>
> Hi;
>
> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
> in the Solr Admin UI.  I can create collections and aliases, and I can index 
> documents in Solr.
>
> Collections : test1, test2
> Alias: test (combines test1, test2)
>
> Indexed document "solr-word.pdf" in collection test1
>
> Searching on a collection works:
> http://localhost:8983/solr/test1/select?q=*:*=xml
> 
>
> But searching on an alias results in HTTP 401
> http://localhost:8983/solr/test/select?q=*:*=xml
>
> Error from server at null: Expected mime type application/octet-stream but 
> got text/html.content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
> Response code: 401  HTTP ERROR 401 Authentication 
> failed, Response code: 401  
> URI:/solr/test1_shard1_replica_n1/select 
> STATUS:401 MESSAGE:Authentication 
> failed, Response code: 401 
> SERVLET:default   
>
> Even if 
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SOLR-13510__;!!Obbck6kTJA!P6ugA-rw1I80PaH0U_GVasNqn8EXwmVQ33lwcPOU-cvNgTJK6-3zAf8ukzvv3ynJ$
>   is fixed in Solr 8.5.0, I did try to start Solr with -Dsolr.http1=true, and 
> I set "forwardCredentials":true in security.json.
>
> Nothing works.  I just cannot use aliases when Solr is secured.
>
> Can anyone confirm if this may be a configuration issue, or if this could 
> possibly be a bug ?
>
> Thank you;
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>


Re: HTTP 401 when searching on alias in secured Solr

2020-06-10 Thread Jan Høydahl
Please share your security.json file

Jan Høydahl

> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
> :
> 
> Hi;
> 
> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
> in the Solr Admin UI.  I can create collections and aliases, and I can index 
> documents in Solr.
> 
> Collections : test1, test2
> Alias: test (combines test1, test2)
> 
> Indexed document "solr-word.pdf" in collection test1
> 
> Searching on a collection works:
> http://localhost:8983/solr/test1/select?q=*:*=xml
> 
> 
> But searching on an alias results in HTTP 401
> http://localhost:8983/solr/test/select?q=*:*=xml
> 
> Error from server at null: Expected mime type application/octet-stream but 
> got text/html.content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
> Response code: 401  HTTP ERROR 401 Authentication 
> failed, Response code: 401  
> URI:/solr/test1_shard1_replica_n1/select 
> STATUS:401 MESSAGE:Authentication 
> failed, Response code: 401 
> SERVLET:default   
> 
> Even if https://issues.apache.org/jira/browse/SOLR-13510 is fixed in Solr 
> 8.5.0, I did try to start Solr with -Dsolr.http1=true, and I set 
> "forwardCredentials":true in security.json.
> 
> Nothing works.  I just cannot use aliases when Solr is secured.
> 
> Can anyone confirm if this may be a configuration issue, or if this could 
> possibly be a bug ?
> 
> Thank you;
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 


HTTP 401 when searching on alias in secured Solr

2020-06-10 Thread Isabelle Giguere
Hi;

I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
in the Solr Admin UI.  I can create collections and aliases, and I can index 
documents in Solr.

Collections : test1, test2
Alias: test (combines test1, test2)

Indexed document "solr-word.pdf" in collection test1

Searching on a collection works:
http://localhost:8983/solr/test1/select?q=*:*=xml


But searching on an alias results in HTTP 401
http://localhost:8983/solr/test/select?q=*:*=xml

Error from server at null: Expected mime type application/octet-stream but got 
text/html.Error 401 Authentication failed, 
Response code: 401  HTTP ERROR 401 Authentication 
failed, Response code: 401  
URI:/solr/test1_shard1_replica_n1/select 
STATUS:401 MESSAGE:Authentication 
failed, Response code: 401 SERVLET:default 
  

Even if https://issues.apache.org/jira/browse/SOLR-13510 is fixed in Solr 
8.5.0, I did try to start Solr with -Dsolr.http1=true, and I set 
"forwardCredentials":true in security.json.

Nothing works.  I just cannot use aliases when Solr is secured.

Can anyone confirm if this may be a configuration issue, or if this could 
possibly be a bug ?

Thank you;

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java




Re: Searching individual pages in solr

2020-03-24 Thread Erick Erickson
Well, given the structure of an inverted index, how would you have a clue what 
page the hit was on? You could conceivably index enough data with payloads and 
the like, but that’d cause a lot more bloat than just indexing each page.

Using grouping would allow you to show, say, the top three pages from the books 
with the highest score on an individual page basis.

But there are complications (aren’t there always?). Consider a page with one 
sentence. Indexed as an individual document, it might score quite high even if 
not the best choice. Or any embedded illustrations, what do you do with those? 
Index the caption os apart of the text? Ignore the caption? Etc.

I’d certainly start with a doc-per-page. Not quite sure what I’d do with the 
title and such, but that depends on your use-case.

Best,
Erick

> On Mar 24, 2020, at 12:22 PM, Dustin Lebsock  
> wrote:
> 
> Hi!
> 
> I'm looking for some guidance on engineering a solution for searching 
> individual pages of PDF documents. I currently have a SolrCloud setup that 
> uses an external tika server to extract text data from PDFs. I'd like to be 
> able to search individual pages for search results and for the overall 
> documents themselves (such as titles that link to external repo). I'm having 
> trouble coming up with a clean solution.
> 
> I ran across a discussion on stackoverflow about this found here:
> https://stackoverflow.com/a/50160163
> 
> I can't really see the pros and cons verse indexing a single document with 
> multiple fields for each page vs indexing each page separately and using 
> group queries. What does the solr community recommend?
> 
> Thank you for all the help!
> 
> Dustin Lebsock



Searching individual pages in solr

2020-03-24 Thread Dustin Lebsock
Hi!

I'm looking for some guidance on engineering a solution for searching 
individual pages of PDF documents. I currently have a SolrCloud setup that uses 
an external tika server to extract text data from PDFs. I'd like to be able to 
search individual pages for search results and for the overall documents 
themselves (such as titles that link to external repo). I'm having trouble 
coming up with a clean solution.

I ran across a discussion on stackoverflow about this found here:
https://stackoverflow.com/a/50160163

I can't really see the pros and cons verse indexing a single document with 
multiple fields for each page vs indexing each page separately and using group 
queries. What does the solr community recommend?

Thank you for all the help!

Dustin Lebsock


Re: Need help in GeoSpatial Searching into Solr Server

2019-12-23 Thread Erick Erickson
Why are you using  text field for location? You must use the proper field type.

You need to follow the instructions in the “spatial search” section of
the reference guide, here’s the ref guide for Solr 7:

https://lucene.apache.org/solr/guide/7_7/spatial-search.html

Best,
Erick


> On Dec 23, 2019, at 6:53 AM, niraj kumar  wrote:
> 
> I have 100 documents into Solr, type of location field is
> *org.apache.solr.schema.TextField.*
> 
> I am unable to run any query to search nearby points with reference to that
> field.
> 
> So if you can help into it or provide some program reference in JAVA with
> same kind of implementation.
> 
> 
> Thanks,
> Niraj



Need help in GeoSpatial Searching into Solr Server

2019-12-23 Thread niraj kumar
I have 100 documents into Solr, type of location field is
*org.apache.solr.schema.TextField.*

I am unable to run any query to search nearby points with reference to that
field.

So if you can help into it or provide some program reference in JAVA with
same kind of implementation.


Thanks,
Niraj


Re: Sometimes searching slow in Solr 6.1.0

2019-12-13 Thread Shawn Heisey

On 12/13/2019 12:29 AM, vishal patel wrote:

We have 2 shards and 2 replicas in our live environment. Total of 26 
collections. we give 64GB RAM for a single Solr instance.


Are you saying that the machine has 64GB of memory, or that the Java 
heap for Solr is 64GB?


Looking over the list history, it looks like you've started other 
threads about performance problems, and I have advised you on some of 
them.  From what you had said on those threads, I thought you were on a 
much newer version than 6.1.0.


You've been given the following wiki page before.  It would be a good 
idea to read it fully, and come back with any questions you have about it.


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue


I have faced a slow searching issue in our live environment. In our scenario, 
there are many update requests come within minutes like 50,000. At that time 
searching becomes slow.
The query is normal but taking a 4to5 seconds. When the same query will execute 
after sometimes it will not take time.
Our solr config details:
* autoCommit is 6 and autoSoftCommit is 100.


The maxTime values for those are in milliseconds.  So if you have 
maxTime for autoSoftCommit set to 100, that's a tenth of a second. 
There's no way that setting can be healthy.  The autoSoftCommit setting 
should be much longer.  I'd start at two minutes (12) and adjust 
from there.  If it's maxDocs that's set to 100, that is also way too 
small.  I would recommend only setting maxTime.


Thanks,
Shawn


Sometimes searching slow in Solr 6.1.0

2019-12-12 Thread vishal patel
We have 2 shards and 2 replicas in our live environment. Total of 26 
collections. we give 64GB RAM for a single Solr instance.
I have faced a slow searching issue in our live environment. In our scenario, 
there are many update requests come within minutes like 50,000. At that time 
searching becomes slow.
The query is normal but taking a 4to5 seconds. When the same query will execute 
after sometimes it will not take time.
Our solr config details:
* autoCommit is 6 and autoSoftCommit is 100.
* No caching add in config.

Can we handle the update thread priority? why searching is slow at that time? 
can we monitor updates and search requests for solr cloud performance?
<http://aka.ms/weboutlook>

Regards,
Vishal


Aw: Searching a nested structure. Can not retrieve parents with all corresponding childs

2019-12-09 Thread Marco Ibscher
Hi there,
 
on stackoverflow I got the advice to delete the _nest_path_ field. Without it I 
can use the parent filter without getting the "Parent filter should not be sent 
when the schema is nested" error mesage. For example:
 
q={!parent which=doc_type:parent}=id,[child parentFilter=doc_type:parent 
childFilter=doc_type:child]=200
 

I am still confused why it does not work with the _nest_path_ field and 
thankful for an advice, but right now I can work with this "solution".
 
Best regards
Marco

Gesendet: Mittwoch, 04. Dezember 2019 um 16:42 Uhr
Von: "Marco Ibscher" 
An: solr-user@lucene.apache.org
Betreff: Searching a nested structure. Can not retrieve parents with all 
corresponding childs
Hi there,

I have problems retrieving data in the nested structure in that it is indexed 
in solr 8.2:

I have a product database with products as the parent element and size/color 
combinations as the child elements. The data is imported with the data import 
handler:
























I can get all childs for a certain parent or all parents for a certain child, 
using the Block Join Query Parser (so the nested structure is working). But I 
cannot retrive parents with the corresponding childs.

I tried the following query:

q={!parent which="id:1"}=*,[child]=200
It returns the parent document but not the corresponding child documents. I 
dont't get any error message. I also checked the log file.

I also tried adding a childFilter or a parentFilter:

q={!parent which=doc_type:parent}=id,[child parentFilter=doc_type:parent 
childFilter=doc_type:child]=200

Using the parentFilter ends with the error message "Parent filter should not be 
sent when the schema is nested". The childFilter does not change the result 
(all parents, no childs).

Important schema fields:




















Can anyone help? I also posted this problem on stackoverflow: 
https://stackoverflow.com/questions/59162038/searching-a-nested-structure-can-not-retirve-parents-with-all-corresponding-chi

Thank you.

Marco


Searching a nested structure. Can not retrieve parents with all corresponding childs

2019-12-04 Thread Marco Ibscher
Hi there,

I have problems retrieving data in the nested structure in that it is indexed 
in solr 8.2:

I have a product database with products as the parent element and size/color 
combinations as the child elements. The data is imported with the data import 
handler:
























I can get all childs for a certain parent or all parents for a certain child, 
using the Block Join Query Parser (so the nested structure is working). But I 
cannot retrive parents with the corresponding childs.

I tried the following query:

q={!parent which="id:1"}=*,[child]=200
It returns the parent document but not the corresponding child documents. I 
dont't get any error message. I also checked the log file.

I also tried adding a childFilter or a parentFilter:

q={!parent which=doc_type:parent}=id,[child parentFilter=doc_type:parent 
childFilter=doc_type:child]=200

Using the parentFilter ends with the error message "Parent filter should not be 
sent when the schema is nested". The childFilter does not change the result 
(all parents, no childs).

Important schema fields:

  

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

Can anyone help? I also posted this problem on stackoverflow: 
https://stackoverflow.com/questions/59162038/searching-a-nested-structure-can-not-retirve-parents-with-all-corresponding-chi

Thank you.

Marco


RE: Require searching only for file content and not metadata

2019-08-29 Thread Khare, Kushal (MIND)
I have been working on the same and finding out why I am not getting any data 
in TextHandler or Metadata.
For that, I tried first creating just the parser to extract content from the 
documents using the Tika AutoDetect Parser. Finally, I found out that I was 
missing a jar.So, this separate plain text parser worked for me. But, now when 
I try to run my code that I shared with you is missing some classes. That's 
probably some jar conflicts.

PLAIN PARSING TEXT CODE :

package mind.solr;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;

public class ParsingExample {

public void parseExample() throws IOException, SAXException, TikaException {
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
   // try (InputStream stream = 
ParsingExample.class.getResourceAsStream("/TestDocx.docx"))
try(FileInputStream fin=new FileInputStream("D:\\docs\\TestA3.docx")){
parser.parse(fin, handler, metadata);

String text = handler.toString();
System.out.println("output :"+text);
}
}

public static void main(String[] args) throws IOException, SAXException, 
TikaException {
ParsingExample ps = new ParsingExample();
ps.parseExample();
//System.out.println("output :"+out);
}
}



JARS USED :
Solr-solrj-8.0.0.jar
tika-app-1.8.jar

I get the document content in the handler finally.

But, now when I move to my solr indexing code to run it and accordingly define 
my fields for the extracted content, I get an error. Following is the error 
that I get :

Exception in thread "main" java.lang.NoSuchFieldError: INSTANCE
at 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.(SSLConnectionSocketFactory.java:146)
at 
org.apache.solr.client.solrj.impl.HttpClientUtil$DefaultSchemaRegistryProvider.getSchemaRegistry(HttpClientUtil.java:235)
at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createPoolingConnectionManager(HttpClientUtil.java:260)
at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:201)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:964)
at mind.solr.solrJExtract.(solrJExtract.java:50)
at mind.solr.solrJExtract.main(solrJExtract.java:35)

I found that its because some HTTP Client jar conflicts, but I am unable to 
resolve it.
I request you to help me as what could be the issue and how it could be 
resolved.

Thanks!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 29 August 2019 16:57
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

I already provided feedback, you haven’t evidenced any attempt to follow up on 
it.

Best,
Erick

> On Aug 29, 2019, at 4:54 AM, Khare, Kushal (MIND) 
>  wrote:
>
> Erick,
> I am using the code that I posted yesterday. But, am not getting anything in 
> 'texthandler.toString'. Please check my snippet once and guide. Because, I 
> think I am very close to my requirement yet stuck here. I also debugged my 
> code. It is not going inside doTikaDocuments() & giving Null Pointer 
> Exception.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 28 August 2019 16:50
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> Attachments are aggressively stripped of attachments, you’ll have to either 
> post it someplace and provide a link or paste the relevant sections into the 
> e-mail.
>
> You’re not getting any metadata because you’re not adding any metadata
> to the documents with doc.addField(“metadatafield1”,
> value_of_metadata_field1);
>
> The only thing ever in the doc is what you explicitly put there. At this 
> point it’s just “id” and “_text_”.
>
> As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
> the field? And when you query, are you specifying =_text_? _text_ is 
> usually a catch-all field in the default schemas with this definition:
>
>  multiValued="true”/>
>
> Since stored=false, well, it’s not stored so can’t be returned. If you’re 
> successfully _searching_ on that field but not getting it back in the “fl” 
> list, this is almost certainly a stored=“false” issue.
>
> As for why you might have gotten all the metadata in this field with the post 
> tool, check that there are no “copyField” directives in the schema that 
> automatically copy other 

Re: Require searching only for file content and not metadata

2019-08-29 Thread Erick Erickson
I already provided feedback, you haven’t evidenced any attempt to follow up on 
it.

Best,
Erick

> On Aug 29, 2019, at 4:54 AM, Khare, Kushal (MIND) 
>  wrote:
> 
> Erick,
> I am using the code that I posted yesterday. But, am not getting anything in 
> 'texthandler.toString'. Please check my snippet once and guide. Because, I 
> think I am very close to my requirement yet stuck here. I also debugged my 
> code. It is not going inside doTikaDocuments() & giving Null Pointer 
> Exception.
> 
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 28 August 2019 16:50
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
> 
> Attachments are aggressively stripped of attachments, you’ll have to either 
> post it someplace and provide a link or paste the relevant sections into the 
> e-mail.
> 
> You’re not getting any metadata because you’re not adding any metadata to the 
> documents with doc.addField(“metadatafield1”, value_of_metadata_field1);
> 
> The only thing ever in the doc is what you explicitly put there. At this 
> point it’s just “id” and “_text_”.
> 
> As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
> the field? And when you query, are you specifying =_text_? _text_ is 
> usually a catch-all field in the default schemas with this definition:
> 
>  multiValued="true”/>
> 
> Since stored=false, well, it’s not stored so can’t be returned. If you’re 
> successfully _searching_ on that field but not getting it back in the “fl” 
> list, this is almost certainly a stored=“false” issue.
> 
> As for why you might have gotten all the metadata in this field with the post 
> tool, check that there are no “copyField” directives in the schema that 
> automatically copy other data into _text_.
> 
> Best,
> Erick
> 
> 
> 
>> On Aug 28, 2019, at 7:03 AM, Khare, Kushal (MIND) 
>>  wrote:
>> 
>> Attaching managed-schema.xml
>> 
>> -----Original Message-
>> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
>> Sent: 28 August 2019 16:30
>> To: solr-user@lucene.apache.org
>> Subject: RE: Require searching only for file content and not metadata
>> 
>> I already tried this example, I am currently working on this. I have 
>> complied the code, it is indexing the documents. But, it is not adding any 
>> thing to the field - _text_ . Also, not giving any metadata.
>> doc.addField("_text_", textHandler.toString()); --> here, 
>> textHandler.toString() is blank for all the 40 documents. All I am getting 
>> is the 'id' & 'version' field.
>> 
>> This is the code that I tried :
>> 
>> package mind.solr;
>> 
>> import org.apache.solr.client.solrj.SolrServerException;
>> import org.apache.solr.client.solrj.impl.HttpSolrClient;
>> import org.apache.solr.client.solrj.impl.XMLResponseParser;
>> import org.apache.solr.client.solrj.response.UpdateResponse;
>> import org.apache.solr.common.SolrInputDocument;
>> import org.apache.tika.metadata.Metadata;
>> import org.apache.tika.parser.AutoDetectParser;
>> import org.apache.tika.parser.ParseContext;
>> import org.apache.tika.sax.BodyContentHandler;
>> import org.xml.sax.ContentHandler;
>> 
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.io.IOException;
>> import java.io.InputStream;
>> import java.util.ArrayList;
>> import java.util.Collection;
>> 
>> public class solrJExtract {
>> 
>> private HttpSolrClient client;
>> private long start = System.currentTimeMillis();  private
>> AutoDetectParser autoParser;  private int totalTika = 0;  private int
>> totalSql = 0;
>> 
>> @SuppressWarnings("rawtypes")
>> private Collection docList = new ArrayList();
>> 
>> 
>> public static void main(String[] args) {
>>   try {
>>   solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>>   idxer.doTikaDocuments(new File("D:\\docs"));
>>   idxer.endIndexing();
>>   } catch (Exception e) {
>> e.printStackTrace();
>>   }
>> }
>> 
>> private  solrJExtract(String url) throws IOException, SolrServerException {
>>   // Create a SolrCloud-aware client to send docs to Solr
>>   // Use something like HttpSolrClient for stand-alone
>> 
>>   client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>>   .withConnectionTimeout(1)
>>   .withSocketTimeout(6)
>>   .build();
>> 
>>   // binary parser 

RE: Require searching only for file content and not metadata

2019-08-29 Thread Khare, Kushal (MIND)
Erick,
I am using the code that I posted yesterday. But, am not getting anything in 
'texthandler.toString'. Please check my snippet once and guide. Because, I 
think I am very close to my requirement yet stuck here. I also debugged my 
code. It is not going inside doTikaDocuments() & giving Null Pointer Exception.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 28 August 2019 16:50
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when you query, are you specifying =_text_? _text_ is usually 
a catch-all field in the default schemas with this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
>
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
>
> This is the code that I tried :
>
> package mind.solr;
>
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
>
> public class solrJExtract {
>
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();  private
> AutoDetectParser autoParser;  private int totalTika = 0;  private int
> totalSql = 0;
>
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
>
>
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
>
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
>
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
>
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
>
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();  }
>
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
>
>  }
>
>  /**
>   * ***Tika processing here
>   */
>  // Recursively traverse the filesystem, parsing everything found.
>  private void doTikaDocuments(File root) throws IOException,
> SolrServerException {
>
>// Simple loop 

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
If I try to add any metadata in a field like this :

doc.addField("meta", metadata.get("dc_creator"));
1. I don't get that field in the results, though it has been created.And, 
following is the definition on the schema :
  

2. When I check it in my code for the value using, 
System.out.println(metadata.get("dc_creator")); --> I get 'null'

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 28 August 2019 16:50
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when you query, are you specifying =_text_? _text_ is usually 
a catch-all field in the default schemas with this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
>
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
>
> This is the code that I tried :
>
> package mind.solr;
>
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
>
> public class solrJExtract {
>
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();  private
> AutoDetectParser autoParser;  private int totalTika = 0;  private int
> totalSql = 0;
>
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
>
>
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
>
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
>
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
>
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
>
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();  }
>
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
>
>  }
>
>  /**
>   * ***Tika processing here
>   */
>  // Recursively traverse the filesystem, parsing everything found.
>  private void doTikaDocuments(File root) throws IOExcept

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Yup ! I have already made stored = true for _text_. I will see to it. No 
worries.

BUT, I really need HELP for the separation of content & metadata. I checked , 
but there isn't any field that is copying the values into the '_text_' field.
The only definition I have for _text_ is :


For this : doc.addField(“metadatafield1”, value_of_metadata_field1);
I added author name, etc in the code, but not getting those fields. Also,  > 
doc.addField("_text_", textHandler.toString()); has blank value in it.

Please help !
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 28 August 2019 16:50
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when you query, are you specifying =_text_? _text_ is usually 
a catch-all field in the default schemas with this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
>
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
>
> This is the code that I tried :
>
> package mind.solr;
>
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
>
> public class solrJExtract {
>
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();  private
> AutoDetectParser autoParser;  private int totalTika = 0;  private int
> totalSql = 0;
>
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
>
>
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
>
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
>
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
>
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
>
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();  }
>
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
>
>  }
>
>  /**
>   * ***Tika proc

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
CURRENTLY, I AM GETTING

 "_text_" :
[" \n \n date 2019-06-24T09:52:33Z  \n cp:revision 5  \n Total-Time 1  \n 
extended-properties:AppVersion 15.  \n stream_content_type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
meta:paragraph-count 18  \n meta:word-count 20  \n 
extended-properties:PresentationFormat Widescreen  \n dc:creator Khare, Kushal 
(MIND)  \n extended-properties:Company MIND  \n Word-Count 20  \n 
dcterms:created 2019-06-18T07:25:29Z  \n dcterms:modified 2019-06-24T09:52:33Z  
\n Last-Modified 2019-06-24T09:52:33Z  \n Last-Save-Date 2019-06-24T09:52:33Z  
\n Paragraph-Count 18  \n meta:save-date 2019-06-24T09:52:33Z  \n dc:title 
PowerPoint Presentation  \n Application-Name Microsoft Office PowerPoint  \n 
extended-properties:TotalTime 1  \n modified 2019-06-24T09:52:33Z  \n 
Content-Type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
Slide-Count 2  \n stream_size 32234  \n X-Parsed-By 
org.apache.tika.parser.DefaultParser  \n X-Parsed-By 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser  \n creator Khare, Kushal 
(MIND)  \n meta:author Khare, Kushal (MIND)  \n meta:creation-date 
2019-06-18T07:25:29Z  \n extended-properties:Application Microsoft Office 
PowerPoint  \n meta:last-author Khare, Kushal (MIND)  \n meta:slide-count 2  \n 
Creation-Date 2019-06-18T07:25:29Z  \n xmpTPg:NPages 2  \n resourceName 
D:\\docs\\DemoOutput.pptx  \n Last-Author Khare, Kushal (MIND)  \n 
Revision-Number 5  \n Application-Version 15.  \n Author Khare, Kushal 
(MIND)  \n publisher MIND  \n Presentation-Format Widescreen  \n dc:publisher 
MIND  \n PowerPoint Presentation \n \n  slide-content   \n Hello. This is just 
for Demo!  \n If you find it anywhere, throw it away !\nA.W.A.Y away away away 
away away Away AWAY! \n  \n  \n A.W.A.Y once again !  \n  \n  \n  \n  \n  \n  
\n  \n  \n  \n  \n  \n  \n \n slide-master-content  \n slide-content   \n 
A.W.A.Y \n  \n away \n \n slide-master-content  \n embedded 
/docProps/thumbnail.jpeg"],

WHAT I WANT :

"_text_"  :
["\n  slide-content   \n Hello. This is just for Demo!  \n If you find it 
anywhere, throw it away !\nA.W.A.Y away away away away away Away AWAY! \n  \n  
\n A.W.A.Y once again !  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n \n 
slide-master-content  \n slide-content   \n A.W.A.Y \n  \n away \n \n 
slide-master-content  \n embedded /docProps/thumbnail.jpeg"],

"meta" : ["\n \n date 2019-06-24T09:52:33Z  \n cp:revision 5  \n Total-Time 1  
\n extended-properties:AppVersion 15.  \n stream_content_type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
meta:paragraph-count 18  \n meta:word-count 20  \n 
extended-properties:PresentationFormat Widescreen  \n dc:creator Khare, Kushal 
(MIND)  \n extended-properties:Company MIND  \n Word-Count 20  \n 
dcterms:created 2019-06-18T07:25:29Z  \n dcterms:modified 2019-06-24T09:52:33Z  
\n Last-Modified 2019-06-24T09:52:33Z  \n Last-Save-Date 2019-06-24T09:52:33Z  
\n Paragraph-Count 18  \n meta:save-date 2019-06-24T09:52:33Z  \n dc:title 
PowerPoint Presentation  \n Application-Name Microsoft Office PowerPoint  \n 
extended-properties:TotalTime 1  \n modified 2019-06-24T09:52:33Z  \n 
Content-Type 
application/vnd.openxmlformats-officedocument.presentationml.presentation  \n 
Slide-Count 2  \n stream_size 32234  \n X-Parsed-By 
org.apache.tika.parser.DefaultParser  \n X-Parsed-By 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser  \n creator Khare, Kushal 
(MIND)  \n meta:author Khare, Kushal (MIND)  \n meta:creation-date 
2019-06-18T07:25:29Z  \n extended-properties:Application Microsoft Office 
PowerPoint  \n meta:last-author Khare, Kushal (MIND)  \n meta:slide-count 2  \n 
Creation-Date 2019-06-18T07:25:29Z  \n xmpTPg:NPages 2  \n resourceName 
D:\\docs\\DemoOutput.pptx  \n Last-Author Khare, Kushal (MIND)  \n 
Revision-Number 5  \n Application-Version 15.  \n Author Khare, Kushal 
(MIND)  \n publisher MIND  \n Presentation-Format Widescreen  \n dc:publisher 
MIND  \n PowerPoint Presentation \n"]
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 28 August 2019 14:18
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

On 8/27/2019 7:18 AM, Khare, Kushal (MIND) wrote:
> Basically, what problem I am facing is - I am getting the textual content + 
> other metadata in my _text_ field. But, I want only the textual content 
> written inside the document.
> I tried various Request Handler Update Extract configurations, but none of 
> them worked for me.
> Please help me resolve this as I am badly stuck in this.

Controlling exactly what gets indexed in which fields is likely going to 
require that you write the indexing software yourself -- a program that 
extracts the data you want and sends it to Solr for indexing.

We do not r

Re: Require searching only for file content and not metadata

2019-08-28 Thread Erick Erickson
Attachments are aggressively stripped of attachments, you’ll have to either 
post it someplace and provide a link or paste the relevant sections into the 
e-mail.

You’re not getting any metadata because you’re not adding any metadata to the 
documents with 
doc.addField(“metadatafield1”, value_of_metadata_field1);

The only thing ever in the doc is what you explicitly put there. At this point 
it’s just “id” and “_text_”.

As for why _text_ isn’t showing up, does the schema have ’stored=“true”’ for 
the field? And when
you query, are you specifying =_text_? _text_ is usually a catch-all field 
in the default schemas with
this definition:

mailto:kushal.kh...@mind-infotech.com]
> Sent: 28 August 2019 16:30
> To: solr-user@lucene.apache.org
> Subject: RE: Require searching only for file content and not metadata
> 
> I already tried this example, I am currently working on this. I have complied 
> the code, it is indexing the documents. But, it is not adding any thing to 
> the field - _text_ . Also, not giving any metadata.
> doc.addField("_text_", textHandler.toString()); --> here, 
> textHandler.toString() is blank for all the 40 documents. All I am getting is 
> the 'id' & 'version' field.
> 
> This is the code that I tried :
> 
> package mind.solr;
> 
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.impl.XMLResponseParser;
> import org.apache.solr.client.solrj.response.UpdateResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.BodyContentHandler;
> import org.xml.sax.ContentHandler;
> 
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.Collection;
> 
> public class solrJExtract {
> 
> private HttpSolrClient client;
>  private long start = System.currentTimeMillis();
>  private AutoDetectParser autoParser;
>  private int totalTika = 0;
>  private int totalSql = 0;
> 
>  @SuppressWarnings("rawtypes")
> private Collection docList = new ArrayList();
> 
> 
> public static void main(String[] args) {
>try {
>solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
>idxer.doTikaDocuments(new File("D:\\docs"));
>idxer.endIndexing();
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
> 
>  private  solrJExtract(String url) throws IOException, SolrServerException {
>// Create a SolrCloud-aware client to send docs to Solr
>// Use something like HttpSolrClient for stand-alone
> 
>client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
>.withConnectionTimeout(1)
>.withSocketTimeout(6)
>.build();
> 
>// binary parser is used by default for responses
>client.setParser(new XMLResponseParser());
> 
>// One of the ways Tika can be used to attempt to parse arbitrary files.
>autoParser = new AutoDetectParser();
>  }
> 
> // Just a convenient place to wrap things up.
>  @SuppressWarnings("unchecked")
> private void endIndexing() throws IOException, SolrServerException {
>if ( docList.size() > 0) { // Are there any documents left over?
>  client.add(docList, 30); // Commit within 5 minutes
>}
>client.commit(); // Only needs to be done at the end,
>// commitWithin should do the rest.
>// Could even be omitted
>// assuming commitWithin was specified.
>long endTime = System.currentTimeMillis();
>System.out.println("Total Time Taken: " + (endTime - start) +
>" milliseconds to index " + totalSql +
>" SQL rows and " + totalTika + " documents");
> 
>  }
> 
>  /**
>   * ***Tika processing here
>   */
>  // Recursively traverse the filesystem, parsing everything found.
>  private void doTikaDocuments(File root) throws IOException, 
> SolrServerException {
> 
>// Simple loop for recursively indexing all the files
>// in the root directory passed in.
>for (File file : root.listFiles()) {
>  if (file.isDirectory()) {
>doTikaDocuments(file);
>continue;
>  }
>  // Get ready to parse the file.
>  ContentHandler textHandler = new BodyContentHandler();
>  Metadata metadata = new Metadata();
>  ParseContext context = new ParseContext();
>  // Tim Allison noted the following, thanks Tim!
>

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Attaching managed-schema.xml

-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 28 August 2019 16:30
To: solr-user@lucene.apache.org
Subject: RE: Require searching only for file content and not metadata

I already tried this example, I am currently working on this. I have complied 
the code, it is indexing the documents. But, it is not adding any thing to the 
field - _text_ . Also, not giving any metadata.
doc.addField("_text_", textHandler.toString()); --> here, 
textHandler.toString() is blank for all the 40 documents. All I am getting is 
the 'id' & 'version' field.

This is the code that I tried :

package mind.solr;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.UpdateResponse;
import org.apache.solr.common.SolrInputDocument;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Collection;

public class solrJExtract {

private HttpSolrClient client;
  private long start = System.currentTimeMillis();
  private AutoDetectParser autoParser;
  private int totalTika = 0;
  private int totalSql = 0;

  @SuppressWarnings("rawtypes")
private Collection docList = new ArrayList();


public static void main(String[] args) {
try {
solrJExtract idxer = new solrJExtract("http://localhost:8983/solr/tika;);
idxer.doTikaDocuments(new File("D:\\docs"));
idxer.endIndexing();
} catch (Exception e) {
  e.printStackTrace();
}
  }

  private  solrJExtract(String url) throws IOException, SolrServerException {
// Create a SolrCloud-aware client to send docs to Solr
// Use something like HttpSolrClient for stand-alone

client = new HttpSolrClient.Builder("http://localhost:8983/solr/tika;)
.withConnectionTimeout(1)
.withSocketTimeout(6)
.build();

// binary parser is used by default for responses
client.setParser(new XMLResponseParser());

// One of the ways Tika can be used to attempt to parse arbitrary files.
autoParser = new AutoDetectParser();
  }

// Just a convenient place to wrap things up.
  @SuppressWarnings("unchecked")
private void endIndexing() throws IOException, SolrServerException {
if ( docList.size() > 0) { // Are there any documents left over?
  client.add(docList, 30); // Commit within 5 minutes
}
client.commit(); // Only needs to be done at the end,
// commitWithin should do the rest.
// Could even be omitted
// assuming commitWithin was specified.
long endTime = System.currentTimeMillis();
System.out.println("Total Time Taken: " + (endTime - start) +
" milliseconds to index " + totalSql +
" SQL rows and " + totalTika + " documents");

  }

  /**
   * ***Tika processing here
   */
  // Recursively traverse the filesystem, parsing everything found.
  private void doTikaDocuments(File root) throws IOException, 
SolrServerException {

// Simple loop for recursively indexing all the files
// in the root directory passed in.
for (File file : root.listFiles()) {
  if (file.isDirectory()) {
doTikaDocuments(file);
continue;
  }
  // Get ready to parse the file.
  ContentHandler textHandler = new BodyContentHandler();
  Metadata metadata = new Metadata();
  ParseContext context = new ParseContext();
  // Tim Allison noted the following, thanks Tim!
  // If you want Tika to parse embedded files (attachments within your .doc 
or any other embedded
  // files), you need to send in the autodetectparser in the parsecontext:
  // context.set(Parser.class, autoParser);

  InputStream input = new FileInputStream(file);

  // Try parsing the file. Note we haven't checked at all to
  // see whether this file is a good candidate.
  try {
autoParser.parse(input, textHandler, metadata, context);
  } catch (Exception e) {
// Needs better logging of what went wrong in order to
// track down "bad" documents.
System.out.println(String.format("File %s failed", 
file.getCanonicalPath()));
e.printStackTrace();
continue;
  }
  // Just to show how much meta-data and what form it's in.
  dumpMetadata(file.getCanonicalPath(), metadata);

  // Index just a couple of the meta-data fields.
  SolrInputDocument doc = new SolrInputDocument();

  doc.addField("id", file.ge

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
e can also use the Lucidworks field mapping to
  // accomplish much the same thing.
  String author = metadata.get("Author");

/*
 * if (author != null) { //doc.addField("author", author); }
 */

  doc.addField("_text_", textHandler.toString());
  //doc.addField("meta", metadata.get("Last_Modified"));
  docList.add(doc);
  ++totalTika;

  // Completely arbitrary, just batch up more than one document
  // for throughput!
  if ( docList.size() >= 1000) {
// Commit within 5 minutes.
UpdateResponse resp = client.add(docList, 30);
if (resp.getStatus() != 0) {
System.out.println("Some horrible error has occurred, status is: " +
  resp.getStatus());
}
docList.clear();
  }
}
  }

  // Just to show all the metadata that's available.
  private void dumpMetadata(String fileName, Metadata metadata) {
  System.out.println("Dumping metadata for file: " + fileName);
for (String name : metadata.names()) {
  System.out.println(name + ":" + metadata.get(name));
}
System.out.println("x..");
  }
}


Also, I am attaching the scrollconfig.xml & Managed-schema.xml for my 
collection. Please see to it & suggest where I am getting wrong.
I can't even get to see the _text_ field in the query result, instead of stored 
parameter being true.
Any help would really be appreciated.
Thanks !

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 28 August 2019 14:18
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

On 8/27/2019 7:18 AM, Khare, Kushal (MIND) wrote:
> Basically, what problem I am facing is - I am getting the textual content + 
> other metadata in my _text_ field. But, I want only the textual content 
> written inside the document.
> I tried various Request Handler Update Extract configurations, but none of 
> them worked for me.
> Please help me resolve this as I am badly stuck in this.

Controlling exactly what gets indexed in which fields is likely going to 
require that you write the indexing software yourself -- a program that 
extracts the data you want and sends it to Solr for indexing.

We do not recommend running the Extracting Request Handler in production
-- Tika is known to crash when given some documents (usually PDF files are the 
problematic ones, but other formats can cause it too), and if it crashes while 
running inside Solr, it will take Solr down with it.

Here is an example program that uses Tika for rich document parsing.  It also 
talks to a database, but that part could be easily removed or modified:

https://lucidworks.com/post/indexing-with-solrj/

Thanks,
Shawn



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


solrconfig.xml
Description: solrconfig.xml


RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Yes, I have already gone through the reference guide. Its all because of the 
guide and documentation that I have reached till this stage.
Well, I am indexing rich document formats like - .docx, .pptx, .pdf etc.
The metadata I am talking about is - that currently sorl puts all the data like 
author, editor, content type details of the documents in the _text_  field, 
along with the textual content, and what I want is to separate them.
I also tried using ExtractingRequestHandler, understood the fmap.content in 
tika, but still can't reach the desired output.

-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 28 August 2019 12:55
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

You need to provide a little bit more details.  What is your Schema? How is the 
document structured ? Where do you get metadata from?

Have you read the Solr reference guide? Have you read a book about Solr?

> Am 28.08.2019 um 08:10 schrieb Khare, Kushal (MIND) 
> :
>
> Could anyone please help me with how to use this approach ? I humbly request 
> all the users to please help me get through this.
> Thanks !
>
> -Original Message-
> From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
> Sent: 28 August 2019 04:08
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> It will be easier to parse documents create content, metadata and other 
> required fields yourself in place of using default post tool. You will have 
> better control on what is going to  which field.
>
>
>> On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
>> kushal.kh...@mind-infotech.com> wrote:
>>
>> Basically, what problem I am facing is - I am getting the textual
>> content
>> + other metadata in my _text_ field. But, I want only the textual
>> + content
>> written inside the document.
>> I tried various Request Handler Update Extract configurations, but
>> none of them worked for me.
>> Please help me resolve this as I am badly stuck in this.
>>
>> -Original Message-
>> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
>> Sent: 27 August 2019 12:59
>> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
>> Subject: RE: Require searching only for file content and not metadata
>>
>> Chris,
>> What I have done is, I just created a core, used POST tool to index
>> the documents from my file system, and then moved to Solr Admin for querying.
>> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
>> to be searched for, instead of all the fields that solr creates by
>> itself like - author name. last modified, creator, id, etc.
>> I simply want solr to search only for the content inside the document
>> (the body of the document) & not on all the fields. For an example,
>> if I search for 'Kushal', it should return the document only if it
>> has the word in it as the content, not because it has author name or owner 
>> as Kushal.
>> Hope its clear than before now. Please help me with this !
>>
>> Thankyou!
>> Kushal Khare
>>
>> -Original Message-
>> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
>> Sent: 26 August 2019 18:47
>> To: solr-user@lucene.apache.org
>> Subject: Re: Require searching only for file content and not metadata
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Kushal,
>>
>>> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
>>> This is Kushal Khare, a new addition to the user-list. I started
>>> working with Solr few days ago for implementing it in my project.
>>>
>>> Now, I have the basics done, and reached the query stage.
>>>
>>> My problem is – I need to restrict the solr to search only for the
>>> file content and not the metadata. I have gone through various
>>> articles on the internet, but could not get any help.
>>>
>>> Therefore, I hope I could get some solutions here.
>>
>> How are you querying Solr? Are you querying from a web application?
>> From a thick-client application? Directly from a web browser?
>>
>> What do you consider "metadata" versus "content"? To Solr, everything
>> is the same...
>>
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>>
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
>> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
>> MyUKpp0/P6EpR

Re: Require searching only for file content and not metadata

2019-08-28 Thread Shawn Heisey

On 8/27/2019 7:18 AM, Khare, Kushal (MIND) wrote:

Basically, what problem I am facing is - I am getting the textual content + 
other metadata in my _text_ field. But, I want only the textual content written 
inside the document.
I tried various Request Handler Update Extract configurations, but none of them 
worked for me.
Please help me resolve this as I am badly stuck in this.


Controlling exactly what gets indexed in which fields is likely going to 
require that you write the indexing software yourself -- a program that 
extracts the data you want and sends it to Solr for indexing.


We do not recommend running the Extracting Request Handler in production 
-- Tika is known to crash when given some documents (usually PDF files 
are the problematic ones, but other formats can cause it too), and if it 
crashes while running inside Solr, it will take Solr down with it.


Here is an example program that uses Tika for rich document parsing.  It 
also talks to a database, but that part could be easily removed or modified:


https://lucidworks.com/post/indexing-with-solrj/

Thanks,
Shawn


Re: Require searching only for file content and not metadata

2019-08-28 Thread Jörn Franke
You need to provide a little bit more details.  What is your Schema? How is the 
document structured ? Where do you get metadata from?

Have you read the Solr reference guide? Have you read a book about Solr?

> Am 28.08.2019 um 08:10 schrieb Khare, Kushal (MIND) 
> :
> 
> Could anyone please help me with how to use this approach ? I humbly request 
> all the users to please help me get through this.
> Thanks !
> 
> -Original Message-
> From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
> Sent: 28 August 2019 04:08
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
> 
> It will be easier to parse documents create content, metadata and other 
> required fields yourself in place of using default post tool. You will have 
> better control on what is going to  which field.
> 
> 
>> On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
>> kushal.kh...@mind-infotech.com> wrote:
>> 
>> Basically, what problem I am facing is - I am getting the textual
>> content
>> + other metadata in my _text_ field. But, I want only the textual
>> + content
>> written inside the document.
>> I tried various Request Handler Update Extract configurations, but
>> none of them worked for me.
>> Please help me resolve this as I am badly stuck in this.
>> 
>> -Original Message-
>> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
>> Sent: 27 August 2019 12:59
>> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
>> Subject: RE: Require searching only for file content and not metadata
>> 
>> Chris,
>> What I have done is, I just created a core, used POST tool to index
>> the documents from my file system, and then moved to Solr Admin for querying.
>> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
>> to be searched for, instead of all the fields that solr creates by
>> itself like - author name. last modified, creator, id, etc.
>> I simply want solr to search only for the content inside the document
>> (the body of the document) & not on all the fields. For an example, if
>> I search for 'Kushal', it should return the document only if it has
>> the word in it as the content, not because it has author name or owner as 
>> Kushal.
>> Hope its clear than before now. Please help me with this !
>> 
>> Thankyou!
>> Kushal Khare
>> 
>> -Original Message-
>> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
>> Sent: 26 August 2019 18:47
>> To: solr-user@lucene.apache.org
>> Subject: Re: Require searching only for file content and not metadata
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>> 
>> Kushal,
>> 
>>> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
>>> This is Kushal Khare, a new addition to the user-list. I started
>>> working with Solr few days ago for implementing it in my project.
>>> 
>>> Now, I have the basics done, and reached the query stage.
>>> 
>>> My problem is – I need to restrict the solr to search only for the
>>> file content and not the metadata. I have gone through various
>>> articles on the internet, but could not get any help.
>>> 
>>> Therefore, I hope I could get some solutions here.
>> 
>> How are you querying Solr? Are you querying from a web application?
>> From a thick-client application? Directly from a web browser?
>> 
>> What do you consider "metadata" versus "content"? To Solr, everything
>> is the same...
>> 
>> - -chris
>> -BEGIN PGP SIGNATURE-
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>> 
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
>> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
>> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
>> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
>> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
>> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
>> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
>> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
>> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
>> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
>> wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
>> UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
>> =LWwW
>> -END PGP SI

RE: Require searching only for file content and not metadata

2019-08-28 Thread Khare, Kushal (MIND)
Could anyone please help me with how to use this approach ? I humbly request 
all the users to please help me get through this.
Thanks !

-Original Message-
From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
Sent: 28 August 2019 04:08
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

It will be easier to parse documents create content, metadata and other 
required fields yourself in place of using default post tool. You will have 
better control on what is going to  which field.


On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
kushal.kh...@mind-infotech.com> wrote:

> Basically, what problem I am facing is - I am getting the textual
> content
> + other metadata in my _text_ field. But, I want only the textual
> + content
> written inside the document.
> I tried various Request Handler Update Extract configurations, but
> none of them worked for me.
> Please help me resolve this as I am badly stuck in this.
>
> -Original Message-
> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
> Sent: 27 August 2019 12:59
> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
> Subject: RE: Require searching only for file content and not metadata
>
> Chris,
> What I have done is, I just created a core, used POST tool to index
> the documents from my file system, and then moved to Solr Admin for querying.
> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
> to be searched for, instead of all the fields that solr creates by
> itself like - author name. last modified, creator, id, etc.
> I simply want solr to search only for the content inside the document
> (the body of the document) & not on all the fields. For an example, if
> I search for 'Kushal', it should return the document only if it has
> the word in it as the content, not because it has author name or owner as 
> Kushal.
> Hope its clear than before now. Please help me with this !
>
> Thankyou!
> Kushal Khare
>
> -Original Message-
> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
> Sent: 26 August 2019 18:47
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Kushal,
>
> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> > This is Kushal Khare, a new addition to the user-list. I started
> > working with Solr few days ago for implementing it in my project.
> >
> > Now, I have the basics done, and reached the query stage.
> >
> > My problem is – I need to restrict the solr to search only for the
> > file content and not the metadata. I have gone through various
> > articles on the internet, but could not get any help.
> >
> > Therefore, I hope I could get some solutions here.
>
> How are you querying Solr? Are you querying from a web application?
> From a thick-client application? Directly from a web browser?
>
> What do you consider "metadata" versus "content"? To Solr, everything
> is the same...
>
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
> wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
> UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
> =LWwW
> -END PGP SIGNATURE-
>
> 
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
> attachments. WARNING: Computer viruses can be transmitted via email.
> The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any dama

Re: Require searching only for file content and not metadata

2019-08-27 Thread Yogendra Kumar Soni
It will be easier to parse documents create content, metadata and other
required fields yourself in place of using default post tool. You will have
better control on what is going to  which field.


On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), <
kushal.kh...@mind-infotech.com> wrote:

> Basically, what problem I am facing is - I am getting the textual content
> + other metadata in my _text_ field. But, I want only the textual content
> written inside the document.
> I tried various Request Handler Update Extract configurations, but none of
> them worked for me.
> Please help me resolve this as I am badly stuck in this.
>
> -Original Message-
> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
> Sent: 27 August 2019 12:59
> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
> Subject: RE: Require searching only for file content and not metadata
>
> Chris,
> What I have done is, I just created a core, used POST tool to index the
> documents from my file system, and then moved to Solr Admin for querying.
> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
> to be searched for, instead of all the fields that solr creates by itself
> like - author name. last modified, creator, id, etc.
> I simply want solr to search only for the content inside the document (the
> body of the document) & not on all the fields. For an example, if I search
> for 'Kushal', it should return the document only if it has the word in it
> as the content, not because it has author name or owner as Kushal.
> Hope its clear than before now. Please help me with this !
>
> Thankyou!
> Kushal Khare
>
> -Original Message-
> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
> Sent: 26 August 2019 18:47
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Kushal,
>
> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> > This is Kushal Khare, a new addition to the user-list. I started
> > working with Solr few days ago for implementing it in my project.
> >
> > Now, I have the basics done, and reached the query stage.
> >
> > My problem is – I need to restrict the solr to search only for the
> > file content and not the metadata. I have gone through various
> > articles on the internet, but could not get any help.
> >
> > Therefore, I hope I could get some solutions here.
>
> How are you querying Solr? Are you querying from a web application? From a
> thick-client application? Directly from a web browser?
>
> What do you consider "metadata" versus "content"? To Solr, everything is
> the same...
>
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
> wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
> UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
> =LWwW
> -END PGP SIGNATURE-
>
> 
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus/trojan/worms/malicious code transmitted
> by this email. www.motherson.com
>
> 
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If yo

RE: Require searching only for file content and not metadata

2019-08-27 Thread Khare, Kushal (MIND)
Basically, what problem I am facing is - I am getting the textual content + 
other metadata in my _text_ field. But, I want only the textual content written 
inside the document.
I tried various Request Handler Update Extract configurations, but none of them 
worked for me.
Please help me resolve this as I am badly stuck in this.

-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 27 August 2019 12:59
To: solr-user@lucene.apache.org; ch...@christopherschultz.net
Subject: RE: Require searching only for file content and not metadata

Chris,
What I have done is, I just created a core, used POST tool to index the 
documents from my file system, and then moved to Solr Admin for querying.
For 'Metadata' vs 'Content' , I mean that I just want the field '_text_' to be 
searched for, instead of all the fields that solr creates by itself like - 
author name. last modified, creator, id, etc.
I simply want solr to search only for the content inside the document (the body 
of the document) & not on all the fields. For an example, if I search for 
'Kushal', it should return the document only if it has the word in it as the 
content, not because it has author name or owner as Kushal.
Hope its clear than before now. Please help me with this !

Thankyou!
Kushal Khare

-Original Message-
From: Christopher Schultz [mailto:ch...@christopherschultz.net]
Sent: 26 August 2019 18:47
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Kushal,

On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> This is Kushal Khare, a new addition to the user-list. I started
> working with Solr few days ago for implementing it in my project.
>
> Now, I have the basics done, and reached the query stage.
>
> My problem is – I need to restrict the solr to search only for the
> file content and not the metadata. I have gone through various
> articles on the internet, but could not get any help.
>
> Therefore, I hope I could get some solutions here.

How are you querying Solr? Are you querying from a web application? From a 
thick-client application? Directly from a web browser?

What do you consider "metadata" versus "content"? To Solr, everything is the 
same...

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
=LWwW
-END PGP SIGNATURE-



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


RE: Require searching only for file content and not metadata

2019-08-27 Thread Khare, Kushal (MIND)
Chris,
What I have done is, I just created a core, used POST tool to index the 
documents from my file system, and then moved to Solr Admin for querying.
For 'Metadata' vs 'Content' , I mean that I just want the field '_text_' to be 
searched for, instead of all the fields that solr creates by itself like - 
author name. last modified, creator, id, etc.
I simply want solr to search only for the content inside the document (the body 
of the document) & not on all the fields. For an example, if I search for 
'Kushal', it should return the document only if it has the word in it as the 
content, not because it has author name or owner as Kushal.
Hope its clear than before now. Please help me with this !

Thankyou!
Kushal Khare

-Original Message-
From: Christopher Schultz [mailto:ch...@christopherschultz.net]
Sent: 26 August 2019 18:47
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Kushal,

On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> This is Kushal Khare, a new addition to the user-list. I started
> working with Solr few days ago for implementing it in my project.
>
> Now, I have the basics done, and reached the query stage.
>
> My problem is – I need to restrict the solr to search only for the
> file content and not the metadata. I have gone through various
> articles on the internet, but could not get any help.
>
> Therefore, I hope I could get some solutions here.

How are you querying Solr? Are you querying from a web application? From a 
thick-client application? Directly from a web browser?

What do you consider "metadata" versus "content"? To Solr, everything is the 
same...

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
=LWwW
-END PGP SIGNATURE-



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


Re: Require searching only for file content and not metadata

2019-08-26 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Kushal,

On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> This is Kushal Khare, a new addition to the user-list. I started 
> working with Solr few days ago for implementing it in my project.
> 
> Now, I have the basics done, and reached the query stage.
> 
> My problem is – I need to restrict the solr to search only for the 
> file content and not the metadata. I have gone through various 
> articles on the internet, but could not get any help.
> 
> Therefore, I hope I could get some solutions here.

How are you querying Solr? Are you querying from a web application? From
a thick-client application? Directly from a web browser?

What do you consider "metadata" versus "content"? To Solr, everything
is the same...

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
=LWwW
-END PGP SIGNATURE-


Require searching only for file content and not metadata

2019-08-26 Thread Khare, Kushal (MIND)
Hello Guys!
This is Kushal Khare, a new addition to the user-list. I started working with 
Solr few days ago for implementing it in my project.
Now, I have the basics done, and reached the query stage.
My problem is – I need to restrict the solr to search only for the file content 
and not the metadata. I have gone through various articles on the internet, but 
could not get any help.
Therefore, I hope I could get some solutions here.
Thanks ! Waiting for some response!





The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


Re: Searching on dates and time

2019-07-05 Thread Erick Erickson


There should be a number of these in the example schemas, although perhaps 
without indexed=“true” in the fieldType...

DateRanges are pretty cool, but this in the “keep it simple” category, you 
might just be able to use plain pdates with the standard [time TO time] syntax.

Although when I try your example (this is on 8.1), all your examples return the 
document when using DateRange. I indexed the doc with SolrJ just as you did. I 
was trying queries the admin UI and with quotes didn’t need to escape, with 
quotes I did, don’t know if daterange behavior has changed since 7.3. I don’t 
see any JIRAs on a quick scan that look pertinent.

The range works too if
1> you include the square brackets like all other range fields
2> correctly format the end date, 2019-02-06T12:00 not 2019-02-06:12:00.

BTW, the sweet spot for DateRange is you can, well, index a range, as: 
doc.addField(“dfield", "[2019-02-05T12:04:00Z TO 2019-02-06T12:04:00Z]”);
If all you’re trying to do is index a single point in time, I’d recommend pdate.

Best,
Erick

> On Jul 5, 2019, at 6:19 PM, Steven White  wrote:
> 
> Achieving the use-case is a must.  So if here is an alternative to
> using solr.DateRangeField,
> I'm willing to use it.  What do you mean by "pdate" and what is it?
> 
> I'm reading this link on how to use DateRangeField but yet it is not
> working for me:
> https://lucene.apache.org/solr/guide/6_6/working-with-dates.html  Is the
> issue with my schema the way i set it up?  The way I'm indexing the data?
> Or something else?
> 
> Steven
> 
> On Fri, Jul 5, 2019 at 3:03 PM Erick Erickson 
> wrote:
> 
>> I think what Mikhail is asking is whether your use-case would be satisfied
>> by just indexing a standard pdate rather than a daterange, then querying
>> by
>> 
>> fq=CC_FILE_DATETIME:[some_date/MONTH TO some_maybe_other_full_date].
>> 
>> With regular pdates, you can use “date math” to round to whatever you want
>> on one or both parts of the query.
>> 
>> A note you might be interested in about “fq” clauses and dates in the
>> filter cache:
>> 
>> https://dzone.com/articles/solr-date-math-now-and-filter
>> 
>> Best,
>> Erick
>> 
>>> On Jul 5, 2019, at 11:55 AM, Steven White  wrote:
>>> 
>>> I need both: point in time and range.  In both cases, I need to be able
>> to
>>> search between just 2 years, between year-month to year-month-day-time,
>>> etc.  So getting my schema right, what and how I index right and the
>> search
>>> syntax right are all important.  This is why, in my original post, I
>> shared
>>> my schema, what I'm indexing and search syntax I'm trying to use.  If I
>> got
>>> anything wrong here to get the feature working right, please let me know.
>>> 
>>> Steven.
>>> 
>>> On Fri, Jul 5, 2019 at 2:16 PM Mikhail Khludnev  wrote:
>>> 
 Hold on. Do you need a range or just point in time?
 
 On Fri, Jul 5, 2019 at 6:51 PM Steven White 
>> wrote:
 
> Thanks Mikhail.  I will read those links and switch over to latest
>> Solr.
> 
> Just to be sure, my schema setup and the way I'm indexing the date data
 are
> not the issue, right?
> 
> Steven.
> 
> On Fri, Jul 5, 2019 at 11:05 AM Mikhail Khludnev 
 wrote:
> 
>> Hello,
>> 
>> The indexed daterange value is really narrow, it might not be easy to
> pick
>> per se. I'm in doubts regarding " in queries. At least TO syntax
 expects
> [
>> ]
>> You can start from these baseline cases
>> 
>> 
> 
 
>> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java
>> 
>> and check
>> 
>> 
> 
 
>> https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
>> (also see below) for sure.
>> Also, I remember lack of strictness in 7,2.1 see
>> https://issues.apache.org/jira/browse/LUCENE-8640
>> 
>> On Fri, Jul 5, 2019 at 5:11 PM Steven White 
> wrote:
>> 
>>> Hi everyone,
>>> 
>>> I'm using Solr 7.2.1 but can upgrade if I must.
>>> 
>>> I setup my schema like so:
>>> 
>>>   
>>>    indexed="true"
>>> required="true"stored="false"  multiValued="false" />
>>> 
>>> And indexed my data like so:
>>> 
>>>   doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
>>> 
>>> When I try to search against this field, some search are working,
> others
>>> are not.  Here are examples
>>> 
>>>   I get a hit: CC_FILE_DATETIME:"2019-02-05"
>>>   I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
>>>   I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
>>>   I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
>>> 
>>> I'm seeing issues with range search took, like so:
>>> 
>>>   I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
>>> 2019-02-06:12:00"
>>> 
>>> It looks to me that anytime I include the time part, 

Re: Searching on dates and time

2019-07-05 Thread Steven White
Achieving the use-case is a must.  So if here is an alternative to
using solr.DateRangeField,
I'm willing to use it.  What do you mean by "pdate" and what is it?

I'm reading this link on how to use DateRangeField but yet it is not
working for me:
https://lucene.apache.org/solr/guide/6_6/working-with-dates.html  Is the
issue with my schema the way i set it up?  The way I'm indexing the data?
Or something else?

Steven

On Fri, Jul 5, 2019 at 3:03 PM Erick Erickson 
wrote:

> I think what Mikhail is asking is whether your use-case would be satisfied
> by just indexing a standard pdate rather than a daterange, then querying
> by
>
> fq=CC_FILE_DATETIME:[some_date/MONTH TO some_maybe_other_full_date].
>
> With regular pdates, you can use “date math” to round to whatever you want
> on one or both parts of the query.
>
> A note you might be interested in about “fq” clauses and dates in the
> filter cache:
>
> https://dzone.com/articles/solr-date-math-now-and-filter
>
> Best,
> Erick
>
> > On Jul 5, 2019, at 11:55 AM, Steven White  wrote:
> >
> > I need both: point in time and range.  In both cases, I need to be able
> to
> > search between just 2 years, between year-month to year-month-day-time,
> > etc.  So getting my schema right, what and how I index right and the
> search
> > syntax right are all important.  This is why, in my original post, I
> shared
> > my schema, what I'm indexing and search syntax I'm trying to use.  If I
> got
> > anything wrong here to get the feature working right, please let me know.
> >
> > Steven.
> >
> > On Fri, Jul 5, 2019 at 2:16 PM Mikhail Khludnev  wrote:
> >
> >> Hold on. Do you need a range or just point in time?
> >>
> >> On Fri, Jul 5, 2019 at 6:51 PM Steven White 
> wrote:
> >>
> >>> Thanks Mikhail.  I will read those links and switch over to latest
> Solr.
> >>>
> >>> Just to be sure, my schema setup and the way I'm indexing the date data
> >> are
> >>> not the issue, right?
> >>>
> >>> Steven.
> >>>
> >>> On Fri, Jul 5, 2019 at 11:05 AM Mikhail Khludnev 
> >> wrote:
> >>>
>  Hello,
> 
>  The indexed daterange value is really narrow, it might not be easy to
> >>> pick
>  per se. I'm in doubts regarding " in queries. At least TO syntax
> >> expects
> >>> [
>  ]
>  You can start from these baseline cases
> 
> 
> >>>
> >>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java
> 
>  and check
> 
> 
> >>>
> >>
> https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
>  (also see below) for sure.
>  Also, I remember lack of strictness in 7,2.1 see
>  https://issues.apache.org/jira/browse/LUCENE-8640
> 
>  On Fri, Jul 5, 2019 at 5:11 PM Steven White 
> >>> wrote:
> 
> > Hi everyone,
> >
> > I'm using Solr 7.2.1 but can upgrade if I must.
> >
> > I setup my schema like so:
> >
> >
> > >>> indexed="true"
> > required="true"stored="false"  multiValued="false" />
> >
> > And indexed my data like so:
> >
> >doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
> >
> > When I try to search against this field, some search are working,
> >>> others
> > are not.  Here are examples
> >
> >I get a hit: CC_FILE_DATETIME:"2019-02-05"
> >I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
> >I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
> >I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
> >
> > I'm seeing issues with range search took, like so:
> >
> >I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
> > 2019-02-06:12:00"
> >
> > It looks to me that anytime I include the time part, it won't work
> >> and
>  yes
> > I tried escaping ":" like so "\:" but that didn't help.
> >
> > Can someone guide me through this?
> >
> > Thank you
> >
> > Steven
> >
> 
> 
>  --
>  Sincerely yours
>  Mikhail Khludnev
> 
> >>>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
>
>


Re: Searching on dates and time

2019-07-05 Thread Erick Erickson
I think what Mikhail is asking is whether your use-case would be satisfied
by just indexing a standard pdate rather than a daterange, then querying
by 

fq=CC_FILE_DATETIME:[some_date/MONTH TO some_maybe_other_full_date].

With regular pdates, you can use “date math” to round to whatever you want
on one or both parts of the query.

A note you might be interested in about “fq” clauses and dates in the
filter cache:

https://dzone.com/articles/solr-date-math-now-and-filter

Best,
Erick

> On Jul 5, 2019, at 11:55 AM, Steven White  wrote:
> 
> I need both: point in time and range.  In both cases, I need to be able to
> search between just 2 years, between year-month to year-month-day-time,
> etc.  So getting my schema right, what and how I index right and the search
> syntax right are all important.  This is why, in my original post, I shared
> my schema, what I'm indexing and search syntax I'm trying to use.  If I got
> anything wrong here to get the feature working right, please let me know.
> 
> Steven.
> 
> On Fri, Jul 5, 2019 at 2:16 PM Mikhail Khludnev  wrote:
> 
>> Hold on. Do you need a range or just point in time?
>> 
>> On Fri, Jul 5, 2019 at 6:51 PM Steven White  wrote:
>> 
>>> Thanks Mikhail.  I will read those links and switch over to latest Solr.
>>> 
>>> Just to be sure, my schema setup and the way I'm indexing the date data
>> are
>>> not the issue, right?
>>> 
>>> Steven.
>>> 
>>> On Fri, Jul 5, 2019 at 11:05 AM Mikhail Khludnev 
>> wrote:
>>> 
 Hello,
 
 The indexed daterange value is really narrow, it might not be easy to
>>> pick
 per se. I'm in doubts regarding " in queries. At least TO syntax
>> expects
>>> [
 ]
 You can start from these baseline cases
 
 
>>> 
>> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java
 
 and check
 
 
>>> 
>> https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
 (also see below) for sure.
 Also, I remember lack of strictness in 7,2.1 see
 https://issues.apache.org/jira/browse/LUCENE-8640
 
 On Fri, Jul 5, 2019 at 5:11 PM Steven White 
>>> wrote:
 
> Hi everyone,
> 
> I'm using Solr 7.2.1 but can upgrade if I must.
> 
> I setup my schema like so:
> 
>
>>> indexed="true"
> required="true"stored="false"  multiValued="false" />
> 
> And indexed my data like so:
> 
>doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
> 
> When I try to search against this field, some search are working,
>>> others
> are not.  Here are examples
> 
>I get a hit: CC_FILE_DATETIME:"2019-02-05"
>I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
>I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
>I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
> 
> I'm seeing issues with range search took, like so:
> 
>I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
> 2019-02-06:12:00"
> 
> It looks to me that anytime I include the time part, it won't work
>> and
 yes
> I tried escaping ":" like so "\:" but that didn't help.
> 
> Can someone guide me through this?
> 
> Thank you
> 
> Steven
> 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 
>>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> 



Re: Searching on dates and time

2019-07-05 Thread Steven White
I need both: point in time and range.  In both cases, I need to be able to
search between just 2 years, between year-month to year-month-day-time,
etc.  So getting my schema right, what and how I index right and the search
syntax right are all important.  This is why, in my original post, I shared
my schema, what I'm indexing and search syntax I'm trying to use.  If I got
anything wrong here to get the feature working right, please let me know.

Steven.

On Fri, Jul 5, 2019 at 2:16 PM Mikhail Khludnev  wrote:

> Hold on. Do you need a range or just point in time?
>
> On Fri, Jul 5, 2019 at 6:51 PM Steven White  wrote:
>
> > Thanks Mikhail.  I will read those links and switch over to latest Solr.
> >
> > Just to be sure, my schema setup and the way I'm indexing the date data
> are
> > not the issue, right?
> >
> > Steven.
> >
> > On Fri, Jul 5, 2019 at 11:05 AM Mikhail Khludnev 
> wrote:
> >
> > > Hello,
> > >
> > > The indexed daterange value is really narrow, it might not be easy to
> > pick
> > > per se. I'm in doubts regarding " in queries. At least TO syntax
> expects
> > [
> > > ]
> > > You can start from these baseline cases
> > >
> > >
> >
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java
> > >
> > > and check
> > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
> > > (also see below) for sure.
> > > Also, I remember lack of strictness in 7,2.1 see
> > > https://issues.apache.org/jira/browse/LUCENE-8640
> > >
> > > On Fri, Jul 5, 2019 at 5:11 PM Steven White 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'm using Solr 7.2.1 but can upgrade if I must.
> > > >
> > > > I setup my schema like so:
> > > >
> > > > 
> > > >  > indexed="true"
> > > > required="true"stored="false"  multiValued="false" />
> > > >
> > > > And indexed my data like so:
> > > >
> > > > doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
> > > >
> > > > When I try to search against this field, some search are working,
> > others
> > > > are not.  Here are examples
> > > >
> > > > I get a hit: CC_FILE_DATETIME:"2019-02-05"
> > > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
> > > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
> > > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
> > > >
> > > > I'm seeing issues with range search took, like so:
> > > >
> > > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
> > > > 2019-02-06:12:00"
> > > >
> > > > It looks to me that anytime I include the time part, it won't work
> and
> > > yes
> > > > I tried escaping ":" like so "\:" but that didn't help.
> > > >
> > > > Can someone guide me through this?
> > > >
> > > > Thank you
> > > >
> > > > Steven
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Searching on dates and time

2019-07-05 Thread Mikhail Khludnev
Hold on. Do you need a range or just point in time?

On Fri, Jul 5, 2019 at 6:51 PM Steven White  wrote:

> Thanks Mikhail.  I will read those links and switch over to latest Solr.
>
> Just to be sure, my schema setup and the way I'm indexing the date data are
> not the issue, right?
>
> Steven.
>
> On Fri, Jul 5, 2019 at 11:05 AM Mikhail Khludnev  wrote:
>
> > Hello,
> >
> > The indexed daterange value is really narrow, it might not be easy to
> pick
> > per se. I'm in doubts regarding " in queries. At least TO syntax expects
> [
> > ]
> > You can start from these baseline cases
> >
> >
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java
> >
> > and check
> >
> >
> https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
> > (also see below) for sure.
> > Also, I remember lack of strictness in 7,2.1 see
> > https://issues.apache.org/jira/browse/LUCENE-8640
> >
> > On Fri, Jul 5, 2019 at 5:11 PM Steven White 
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm using Solr 7.2.1 but can upgrade if I must.
> > >
> > > I setup my schema like so:
> > >
> > > 
> > >  indexed="true"
> > > required="true"stored="false"  multiValued="false" />
> > >
> > > And indexed my data like so:
> > >
> > > doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
> > >
> > > When I try to search against this field, some search are working,
> others
> > > are not.  Here are examples
> > >
> > > I get a hit: CC_FILE_DATETIME:"2019-02-05"
> > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
> > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
> > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
> > >
> > > I'm seeing issues with range search took, like so:
> > >
> > > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
> > > 2019-02-06:12:00"
> > >
> > > It looks to me that anytime I include the time part, it won't work and
> > yes
> > > I tried escaping ":" like so "\:" but that didn't help.
> > >
> > > Can someone guide me through this?
> > >
> > > Thank you
> > >
> > > Steven
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Searching on dates and time

2019-07-05 Thread Steven White
Thanks Mikhail.  I will read those links and switch over to latest Solr.

Just to be sure, my schema setup and the way I'm indexing the date data are
not the issue, right?

Steven.

On Fri, Jul 5, 2019 at 11:05 AM Mikhail Khludnev  wrote:

> Hello,
>
> The indexed daterange value is really narrow, it might not be easy to pick
> per se. I'm in doubts regarding " in queries. At least TO syntax expects [
> ]
> You can start from these baseline cases
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java
>
> and check
>
> https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
> (also see below) for sure.
> Also, I remember lack of strictness in 7,2.1 see
> https://issues.apache.org/jira/browse/LUCENE-8640
>
> On Fri, Jul 5, 2019 at 5:11 PM Steven White  wrote:
>
> > Hi everyone,
> >
> > I'm using Solr 7.2.1 but can upgrade if I must.
> >
> > I setup my schema like so:
> >
> > 
> >  > required="true"stored="false"  multiValued="false" />
> >
> > And indexed my data like so:
> >
> > doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
> >
> > When I try to search against this field, some search are working, others
> > are not.  Here are examples
> >
> > I get a hit: CC_FILE_DATETIME:"2019-02-05"
> > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
> > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
> > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
> >
> > I'm seeing issues with range search took, like so:
> >
> > I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
> > 2019-02-06:12:00"
> >
> > It looks to me that anytime I include the time part, it won't work and
> yes
> > I tried escaping ":" like so "\:" but that didn't help.
> >
> > Can someone guide me through this?
> >
> > Thank you
> >
> > Steven
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Searching on dates and time

2019-07-05 Thread Mikhail Khludnev
Hello,

The indexed daterange value is really narrow, it might not be easy to pick
per se. I'm in doubts regarding " in queries. At least TO syntax expects [
]
You can start from these baseline cases
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/schema/DateRangeFieldTest.java

and check
https://lucene.apache.org/solr/guide/8_0/working-with-dates.html#date-range-formatting
(also see below) for sure.
Also, I remember lack of strictness in 7,2.1 see
https://issues.apache.org/jira/browse/LUCENE-8640

On Fri, Jul 5, 2019 at 5:11 PM Steven White  wrote:

> Hi everyone,
>
> I'm using Solr 7.2.1 but can upgrade if I must.
>
> I setup my schema like so:
>
> 
>  required="true"stored="false"  multiValued="false" />
>
> And indexed my data like so:
>
> doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;
>
> When I try to search against this field, some search are working, others
> are not.  Here are examples
>
> I get a hit: CC_FILE_DATETIME:"2019-02-05"
> I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
> I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
> I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"
>
> I'm seeing issues with range search took, like so:
>
> I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
> 2019-02-06:12:00"
>
> It looks to me that anytime I include the time part, it won't work and yes
> I tried escaping ":" like so "\:" but that didn't help.
>
> Can someone guide me through this?
>
> Thank you
>
> Steven
>


-- 
Sincerely yours
Mikhail Khludnev


Searching on dates and time

2019-07-05 Thread Steven White
Hi everyone,

I'm using Solr 7.2.1 but can upgrade if I must.

I setup my schema like so:




And indexed my data like so:

doc.addField("CC_FILE_DATETIME", "2019-02-05T12:04:00Z");;

When I try to search against this field, some search are working, others
are not.  Here are examples

I get a hit: CC_FILE_DATETIME:"2019-02-05"
I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12"
I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04"
I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04:00Z"

I'm seeing issues with range search took, like so:

I don't get a hit: CC_FILE_DATETIME:"2019-02-05T12:04 TO
2019-02-06:12:00"

It looks to me that anytime I include the time part, it won't work and yes
I tried escaping ":" like so "\:" but that didn't help.

Can someone guide me through this?

Thank you

Steven


Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
Disregard my previous response.  When I reindexed, something went wrong and
so my Lucene database was empty, which explains the immediate results and 0
results.  I reindexed again (properly) and all is working find now.  Thanks
for the help.
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add =query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>


Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
I added "posttime" to the schema first thing this morning, but your message
reminded me that I needed to re-index the table, which I did.  My schema
entry:



But my SQL contains "SELECT posttime as id" as so I tried both "posttime"
and "id" in my setParam() function, namely,
query.setParam("fq", "id:[2007-01-01T00:00:00Z TO 2010-01-01T00:00:00Z]");

So, whether I use "id" (string) or "posttime" (date), my results are an
immediate return of zero results.

I did look in the admin interface and *did* see posttime listed as one of
the index items.  The two rows (Index Analyzer and Query Analyzer) show the
same thing: org.apache.solr.schema.FieldType$DefaultAnalyzer, though I'm
not certain of the implications of this.

I have not attempted your =query suggestion just yet...
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add =query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>


Re: searching only within a date range

2019-06-07 Thread Erick Erickson
Yeah, it can be opaque…

My first guess is that you may not have a field “posttime” defined in your 
schema and/or documents. For searching it needs “indexed=true” and for 
faceting/grouping/sorting it should have “docValues=true”. That’s what your 
original facet query was telling you, the field isn’t there. Switching to an 
“fq” clause is consistent with there being no “posttime” field since Solr is 
fine with  docs that don’t have a  particular field. So by specifying a date 
range, any doc without a “posttime” field will be omitted from the results.

Or it  just is spelled differently ;)

Some things that might help:

1> Go to the admin UI and select cores>>your_core, then look at the “schema” 
link. There’s a drop-down that lets you select fields that are actually in your 
index and see  some of the values. My bet: “posttime” isn’t in the list. If so, 
you need to add it and re-index the docs  with a posttime field. If there is a 
“posttime”, select it and look at the upper right to see how it’s defined. 
There are two rows, one for what the schema thinks the definition is and one 
for what is actually in the Lucene  index.

2> add =query to your queries, and run them from the admin UI. That’ll 
give you a _lot_ quicker turn-around as well as some good info about how  the 
query was actually executed.

Best,
Erick

> On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal 
>  wrote:
> 
> So, instead of addDateRangeFacet(), I used:
> query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> 2015-01-01T00:00:00Z]");
> 
> I didn't get any errors, but the query returned immediately with 0
> results.  Without this contraint, it searches 13,000 records and takes 1 to
> 2 minutes and returns 356 records.  So something is not quite right, and
> I'm too new at this to understand where I went wrong.
> Mark
> 
> On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> wrote:
> 
>> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
>> it doesn't have any constraint on the results (i.e. it doesn't filter at
>> all).
>> You need to add a filter query [1] with a date range clause (e.g.
>> fq=field:[ TO > or *>]).
>> 
>> Best,
>> Andrea
>> 
>> [1]
>> 
>> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
>> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>> 
>> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
>>> Hello!
>>> 
>>> I have a search setup and it works fine.  I search a text field called
>>> "logtext" in a database table.  My Java code is like this:
>>> 
>>> SolrQuery query - new SolrQuery();
>>> query.setQuery(searchWord);
>>> query.setParam("df", "logtext");
>>> 
>>> Then I execute the search... and it works just great.  But now I want to
>>> add a constraint to only search for the "searchWord" within a certain
>> range
>>> of time -- given timestamps in the column called "posttime".  So, I added
>>> the code in bold below:
>>> 
>>> SolrQuery query - new SolrQuery();
>>> query.setQuery(searchWord);
>>> *query.setFacet(true);*
>>> *query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis()
>> -
>>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
>> /*
>>> from 1 year ago to present) */*
>>> query.setParam("df", "logtext");
>>> 
>>> But this gives me a complaint: *undefined field: "posttime"* so I clearly
>>> do not understand the arguments needed to addDateRangeFacet().  Can
>> someone
>>> help me determine the proper code for doing what I want?
>>> 
>>> Further, I am puzzled about the "gap" argument [last one in
>>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
>> have
>>> no idea the purpose of this.  I haven't found any documentation that
>>> explains this well.
>>> 
>>> Mark
>>> 
>> 
>> 



Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
So, instead of addDateRangeFacet(), I used:
query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
2015-01-01T00:00:00Z]");

I didn't get any errors, but the query returned immediately with 0
results.  Without this contraint, it searches 13,000 records and takes 1 to
2 minutes and returns 356 records.  So something is not quite right, and
I'm too new at this to understand where I went wrong.
Mark

On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
wrote:

> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> it doesn't have any constraint on the results (i.e. it doesn't filter at
> all).
> You need to add a filter query [1] with a date range clause (e.g.
> fq=field:[ TO  or *>]).
>
> Best,
> Andrea
>
> [1]
>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>
> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> > Hello!
> >
> > I have a search setup and it works fine.  I search a text field called
> > "logtext" in a database table.  My Java code is like this:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > query.setParam("df", "logtext");
> >
> > Then I execute the search... and it works just great.  But now I want to
> > add a constraint to only search for the "searchWord" within a certain
> range
> > of time -- given timestamps in the column called "posttime".  So, I added
> > the code in bold below:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > *query.setFacet(true);*
> > *query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis()
> -
> > 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> /*
> > from 1 year ago to present) */*
> > query.setParam("df", "logtext");
> >
> > But this gives me a complaint: *undefined field: "posttime"* so I clearly
> > do not understand the arguments needed to addDateRangeFacet().  Can
> someone
> > help me determine the proper code for doing what I want?
> >
> > Further, I am puzzled about the "gap" argument [last one in
> > addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> have
> > no idea the purpose of this.  I haven't found any documentation that
> > explains this well.
> >
> > Mark
> >
>
>


Re: searching only within a date range

2019-06-07 Thread Andrea Gazzarini
Hi Mark, you are using a "range facet" which is a "query-shape" feature, 
it doesn't have any constraint on the results (i.e. it doesn't filter at 
all).
You need to add a filter query [1] with a date range clause (e.g. 
fq=field:[ TO or *>]).


Best,
Andrea

[1] 
https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter

[2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html

On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:

Hello!

I have a search setup and it works fine.  I search a text field called
"logtext" in a database table.  My Java code is like this:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
query.setParam("df", "logtext");

Then I execute the search... and it works just great.  But now I want to
add a constraint to only search for the "searchWord" within a certain range
of time -- given timestamps in the column called "posttime".  So, I added
the code in bold below:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
*query.setFacet(true);*
*query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis() -
1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY"); /*
from 1 year ago to present) */*
query.setParam("df", "logtext");

But this gives me a complaint: *undefined field: "posttime"* so I clearly
do not understand the arguments needed to addDateRangeFacet().  Can someone
help me determine the proper code for doing what I want?

Further, I am puzzled about the "gap" argument [last one in
addDateRangeFacet()].  What does this do?  I used +1DAY, but I really have
no idea the purpose of this.  I haven't found any documentation that
explains this well.

Mark





searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
Hello!

I have a search setup and it works fine.  I search a text field called
"logtext" in a database table.  My Java code is like this:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
query.setParam("df", "logtext");

Then I execute the search... and it works just great.  But now I want to
add a constraint to only search for the "searchWord" within a certain range
of time -- given timestamps in the column called "posttime".  So, I added
the code in bold below:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
*query.setFacet(true);*
*query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis() -
1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY"); /*
from 1 year ago to present) */*
query.setParam("df", "logtext");

But this gives me a complaint: *undefined field: "posttime"* so I clearly
do not understand the arguments needed to addDateRangeFacet().  Can someone
help me determine the proper code for doing what I want?

Further, I am puzzled about the "gap" argument [last one in
addDateRangeFacet()].  What does this do?  I used +1DAY, but I really have
no idea the purpose of this.  I haven't found any documentation that
explains this well.

Mark


RE: Issue Searching Data from multiple Databases

2018-11-14 Thread Vadim Ivanov
Hi!
Have you tried to name entity in Fulldataimport http call
As
/dataimport/?command=full-import=Document1=true=true
Is there something sane in the log file after that command?

-- 
Vadim


> -Original Message-
> From: Santosh Kumar S [mailto:santoshkumar.saripa...@infinite.com]
> Sent: Wednesday, November 14, 2018 5:03 PM
> To: solr-user@lucene.apache.org
> Subject: Issue Searching Data from multiple Databases
> 
> I am trying to achieve search by connecting to multiple Databases (in my
case
> trying with 2 different DBs) to index data from multiple DB tables.
> I have tried doing the below as an approach to achieve my goal but in
vain,
> I am able to get only data from the DB 1 when I perform a full-import.
> Steps performed :
> 1.  Added multiple data source in the data-config.xml file as shown below
:
> 
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://10.10.10.10;databaseName=TestDB1;" user="TestUser"
> password="TestUser$"/>
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://10.10.10.10;databaseName=TestDB2;" user="TestUser"
> password="TestUser$"/>
> 
> 2. Added multiple entities against each data source added as shown below :
> 
>  transformer="RegexTransformer"
> pk="Id" query="select * from MyTestTable">
> 
> 
> 
> 
>  transformer="RegexTransformer"
> pk="EmpId" query="select * from MySampleTable" >
> 
> 
> 
> 
> 3. Added appropriate fields in the managed-schema.xml file as well
> 
>  required="true" multiValued="false" />
>  required="false" multiValued="false" />
> 
>  required="false" multiValued="false" />
>  required="false" multiValued="false" />
> 
> 4. Reloaded the collection for the changes to take affect.
> 5. Performed a full import. even observed that the data has not got
imported
> from DB2.
> 6. Did a search only to find the data from DB1 is getting fetched where as
> data from DB2 is not at all getting fetched.
> 
> Suggestions/Guidance shall be highly appreciated.
> Please let me know in case you need any further information.
> 
> Note :  I tried connecting 2 different DBs on 2 different servers and also
2
> different DBs on same server as well.
> 
> Thank you in advance!!
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Issue Searching Data from multiple Databases

2018-11-14 Thread Santosh Kumar S
I am trying to achieve search by connecting to multiple Databases (in my case
trying with 2 different DBs) to index data from multiple DB tables.
I have tried doing the below as an approach to achieve my goal but in vain,
I am able to get only data from the DB 1 when I perform a full-import.
Steps performed :
1.  Added multiple data source in the data-config.xml file as shown below :




2. Added multiple entities against each data source added as shown below :











3. Added appropriate fields in the managed-schema.xml file as well







4. Reloaded the collection for the changes to take affect.
5. Performed a full import. even observed that the data has not got imported
from DB2.
6. Did a search only to find the data from DB1 is getting fetched where as
data from DB2 is not at all getting fetched.

Suggestions/Guidance shall be highly appreciated.
Please let me know in case you need any further information.

Note :  I tried connecting 2 different DBs on 2 different servers and also 2
different DBs on same server as well.

Thank you in advance!!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: searching is slow while adding document each time

2018-10-28 Thread Erick Erickson
bq. Do you really think running a profiler on 4.4 will be more
effective than upgrading to 7.x?

No but it's better than random speculation.
On Sun, Oct 28, 2018 at 9:34 PM Deepak Goel  wrote:
>
> What are your hardware utilisations (cpu, memory, disk, network)?
>
> I think you might have to tune lucene too
>
> On Wed, 26 Sep 2018, 14:33 Mugeesh Husain,  wrote:
>
> > Hi,
> >
> > We are running 3 node solr cloud(4.4) in our production infrastructure, We
> > recently moved our SOLR server host softlayer to digital ocean server with
> > same configuration as production.
> >
> > Now we are facing some slowness in the searcher when we index document,
> > when
> > we stop indexing then searches is fine, while adding document then it
> > become
> > slow. one of solr server we are indexing other 2 for searching the request.
> >
> >
> > I am just wondering what was the reason searches become slow while indexing
> > even we are using same configuration as we had in prod?
> >
> > at the time we are pushing 500 document at a time, this processing is
> > continuously running(adding & deleting)
> >
> > these are the indexing logs
> >
> > 65497339 [http-apr-8980-exec-45] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> > }
> > {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> > (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> > B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> > (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> > DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> > (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> > D1E52788A466E484 (1612655281636900864)]} 0 9
> > 65497459 [http-apr-8980-exec-22] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> > }
> > {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> > (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> > 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> > (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> > 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> > (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> > 50EF977E5E873065 (1612655281759584256)]} 0 9
> > 65497572 [http-apr-8980-exec-40] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> > }
> > {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> > (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> > 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> > (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> > 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> > (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> > 59C4A764BB50B13B (1612655281880170496)]} 0 9
> > 65497724 [http-apr-8980-exec-31] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> > }
> > {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> > (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> > AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> > (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> > 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> > (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> > 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >


Re: searching is slow while adding document each time

2018-10-28 Thread Deepak Goel
What are your hardware utilisations (cpu, memory, disk, network)?

I think you might have to tune lucene too

On Wed, 26 Sep 2018, 14:33 Mugeesh Husain,  wrote:

> Hi,
>
> We are running 3 node solr cloud(4.4) in our production infrastructure, We
> recently moved our SOLR server host softlayer to digital ocean server with
> same configuration as production.
>
> Now we are facing some slowness in the searcher when we index document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searching the request.
>
>
> I am just wondering what was the reason searches become slow while indexing
> even we are using same configuration as we had in prod?
>
> at the time we are pushing 500 document at a time, this processing is
> continuously running(adding & deleting)
>
> these are the indexing logs
>
> 65497339 [http-apr-8980-exec-45] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> D1E52788A466E484 (1612655281636900864)]} 0 9
> 65497459 [http-apr-8980-exec-22] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> 50EF977E5E873065 (1612655281759584256)]} 0 9
> 65497572 [http-apr-8980-exec-40] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> 59C4A764BB50B13B (1612655281880170496)]} 0 9
> 65497724 [http-apr-8980-exec-31] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
t;>>>>>> all the time. Have your tried running "optimize" periodically. Is it
>>>>>>>> something that you can afford to run? If you have a Master-Slave setup
>>>>>>> for
>>>>>>>> Indexer v/s searchers, you can replicate on optimize in the Master,
>>>>>>> thereby
>>>>>>>> removing the optimize load on the searchers, but replicate to the
>>>>>>> searcher
>>>>>>>> periodically. That might help with reducing latency. Optimize merges
>>>>>>>> segments and hence creates a more compact index that is faster to
>>>> search.
>>>>>>>> It may involve some higher latency temporarily right after the
>>>>>>> replication,
>>>>>>>> but will go away soon after in-memory caches are full.
>>>>>>>> 
>>>>>>>> What is the search count/sec you are seeing?
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> Parag
>>>>>>>> 
>>>>>>>> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> We are running 3 node solr cloud(4.4) in our production
>>>> infrastructure,
>>>>>>> We
>>>>>>>>> recently moved our SOLR server host softlayer to digital ocean server
>>>>>>> with
>>>>>>>>> same configuration as production.
>>>>>>>>> 
>>>>>>>>> Now we are facing some slowness in the searcher when we index
>>>> document,
>>>>>>>>> when
>>>>>>>>> we stop indexing then searches is fine, while adding document then it
>>>>>>>>> become
>>>>>>>>> slow. one of solr server we are indexing other 2 for searching the
>>>>>>> request.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I am just wondering what was the reason searches become slow while
>>>>>>> indexing
>>>>>>>>> even we are using same configuration as we had in prod?
>>>>>>>>> 
>>>>>>>>> at the time we are pushing 500 document at a time, this processing is
>>>>>>>>> continuously running(adding & deleting)
>>>>>>>>> 
>>>>>>>>> these are the indexing logs
>>>>>>>>> 
>>>>>>>>> 65497339 [http-apr-8980-exec-45] INFO
>>>>>>>>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>>>>>> webapp=/solr
>>>>>>>>> path=/update
>>>>>>>>> params={distrib.from=
>>>>>>>>> 
>>>>>>> 
>>>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>>>>>>>>> }
>>>>>>>>> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
>>>>>>>>> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
>>>>>>>>> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
>>>>>>>>> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
>>>>>>>>> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
>>>>>>>>> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
>>>>>>>>> D1E52788A466E484 (1612655281636900864)]} 0 9
>>>>>>>>> 65497459 [http-apr-8980-exec-22] INFO
>>>>>>>>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>>>>>> webapp=/solr
>>>>>>>>> path=/update
>>>>>>>>> params={distrib.from=
>>>>>>>>> 
>>>>>>> 
>>>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>>>>>>>>> }
>>>>>>>>> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
>>>>>>>>> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
>>>>>>>>> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
>>>>>>>>> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
>>>>>>>>> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
>>>>>>>>> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
>>>>>>>>> 50EF977E5E873065 (1612655281759584256)]} 0 9
>>>>>>>>> 65497572 [http-apr-8980-exec-40] INFO
>>>>>>>>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>>>>>> webapp=/solr
>>>>>>>>> path=/update
>>>>>>>>> params={distrib.from=
>>>>>>>>> 
>>>>>>> 
>>>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>>>>>>>>> }
>>>>>>>>> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
>>>>>>>>> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
>>>>>>>>> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
>>>>>>>>> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
>>>>>>>>> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
>>>>>>>>> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
>>>>>>>>> 59C4A764BB50B13B (1612655281880170496)]} 0 9
>>>>>>>>> 65497724 [http-apr-8980-exec-31] INFO
>>>>>>>>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>>>>>> webapp=/solr
>>>>>>>>> path=/update
>>>>>>>>> params={distrib.from=
>>>>>>>>> 
>>>>>>> 
>>>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>>>>>>>>> }
>>>>>>>>> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
>>>>>>>>> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
>>>>>>>>> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
>>>>>>>>> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
>>>>>>>>> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
>>>>>>>>> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
>>>>>>>>> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>> 



Re: searching is slow while adding document each time

2018-10-28 Thread Erick Erickson
Put a profiler on it and see where the hot spots are?
On Sun, Oct 28, 2018 at 8:27 PM Walter Underwood  wrote:
>
> Upgrade, so that indexing isn’t using as much CPU. That leaves more CPU for 
> search.
>
> Make sure you are on a recent release of Java. Run the G1 collector.
>
> If you need more throughput, add more replicas or use instance with more CPUs.
>
> Has the index gotten bigger since the move?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 28, 2018, at 8:21 PM, Parag Shah  wrote:
> >
> > The original question though is about performance issue in the Searcher.
> > How would you improve that?
> >
> > On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
> > wrote:
> >
> >> The original question is for a three-node Solr Cloud cluster with
> >> continuous updates.
> >> Optimize in this configuration won’t help, it will just cause expensive
> >> merges later.
> >>
> >> I would recommend updating from Solr 4.4. that is a very early release for
> >> Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
> >> releases, the
> >> replicas actually did more indexing work than the leader.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Oct 28, 2018, at 2:13 PM, Erick Erickson 
> >> wrote:
> >>>
> >>> Well, if you optimize on the master you'll inevitably copy the entire
> >>> index to each of the slaves. Consuming that much network bandwidth can
> >>> be A Bad Thing.
> >>>
> >>> Here's the background for Walter's comment:
> >>>
> >> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >>>
> >>> Solr 7.5 is much better about this:
> >>>
> >> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >>>
> >>> Even with the improvements in Solr 7.5, optimize is still a very
> >>> expensive operation and unless you've measured and can _prove_ it's
> >>> beneficial enough to be worth the cost you should avoid it.
> >>>
> >>> Best,
> >>> Erick
> >>> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
> >> wrote:
> >>>>
> >>>> What would you do if your performance is degrading?
> >>>>
> >>>> I am not suggesting doing this for a serving index. Only one at the
> >> Master,
> >>>> which ones optimized gets replicated. Am I missing something here?
> >>>>
> >>>> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
> >> wun...@wunderwood.org>
> >>>> wrote:
> >>>>
> >>>>> Do not run optimize (force merge) unless you really understand the
> >>>>> downside.
> >>>>>
> >>>>> If you are continually adding and deleting documents, you really do not
> >>>>> want
> >>>>> to run optimize.
> >>>>>
> >>>>> wunder
> >>>>> Walter Underwood
> >>>>> wun...@wunderwood.org
> >>>>> http://observer.wunderwood.org/  (my blog)
> >>>>>
> >>>>>> On Oct 28, 2018, at 9:24 AM, Parag Shah 
> >> wrote:
> >>>>>>
> >>>>>> Hi Mugeesh,
> >>>>>>
> >>>>>>  Have you tried optimizing indexes to see if performance improves? It
> >>>>> is
> >>>>>> well known that over time as indexing goes on lucene creates more
> >>>>> segments
> >>>>>> which will be  searched over and hence take longer. Merging happens
> >>>>>> constantly but continuous indexing will still introduce smaller
> >> segments
> >>>>>> all the time. Have your tried running "optimize" periodically. Is it
> >>>>>> something that you can afford to run? If you have a Master-Slave setup
> >>>>> for
> >>>>>> Indexer v/s searchers, you can replicate on optimize in the Master,
> >>>>> thereby
> >>>>>> removing the optimize load on the searchers, but replicate to the
> >>>>> searcher
> >>>>>> periodically. That might help with reducing latency. Optimize merges
> >&

Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
Upgrade, so that indexing isn’t using as much CPU. That leaves more CPU for 
search.

Make sure you are on a recent release of Java. Run the G1 collector.

If you need more throughput, add more replicas or use instance with more CPUs.

Has the index gotten bigger since the move?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 8:21 PM, Parag Shah  wrote:
> 
> The original question though is about performance issue in the Searcher.
> How would you improve that?
> 
> On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
> wrote:
> 
>> The original question is for a three-node Solr Cloud cluster with
>> continuous updates.
>> Optimize in this configuration won’t help, it will just cause expensive
>> merges later.
>> 
>> I would recommend updating from Solr 4.4. that is a very early release for
>> Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
>> releases, the
>> replicas actually did more indexing work than the leader.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 28, 2018, at 2:13 PM, Erick Erickson 
>> wrote:
>>> 
>>> Well, if you optimize on the master you'll inevitably copy the entire
>>> index to each of the slaves. Consuming that much network bandwidth can
>>> be A Bad Thing.
>>> 
>>> Here's the background for Walter's comment:
>>> 
>> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
>>> 
>>> Solr 7.5 is much better about this:
>>> 
>> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
>>> 
>>> Even with the improvements in Solr 7.5, optimize is still a very
>>> expensive operation and unless you've measured and can _prove_ it's
>>> beneficial enough to be worth the cost you should avoid it.
>>> 
>>> Best,
>>> Erick
>>> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
>> wrote:
>>>> 
>>>> What would you do if your performance is degrading?
>>>> 
>>>> I am not suggesting doing this for a serving index. Only one at the
>> Master,
>>>> which ones optimized gets replicated. Am I missing something here?
>>>> 
>>>> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
>> wun...@wunderwood.org>
>>>> wrote:
>>>> 
>>>>> Do not run optimize (force merge) unless you really understand the
>>>>> downside.
>>>>> 
>>>>> If you are continually adding and deleting documents, you really do not
>>>>> want
>>>>> to run optimize.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wun...@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>>> On Oct 28, 2018, at 9:24 AM, Parag Shah 
>> wrote:
>>>>>> 
>>>>>> Hi Mugeesh,
>>>>>> 
>>>>>>  Have you tried optimizing indexes to see if performance improves? It
>>>>> is
>>>>>> well known that over time as indexing goes on lucene creates more
>>>>> segments
>>>>>> which will be  searched over and hence take longer. Merging happens
>>>>>> constantly but continuous indexing will still introduce smaller
>> segments
>>>>>> all the time. Have your tried running "optimize" periodically. Is it
>>>>>> something that you can afford to run? If you have a Master-Slave setup
>>>>> for
>>>>>> Indexer v/s searchers, you can replicate on optimize in the Master,
>>>>> thereby
>>>>>> removing the optimize load on the searchers, but replicate to the
>>>>> searcher
>>>>>> periodically. That might help with reducing latency. Optimize merges
>>>>>> segments and hence creates a more compact index that is faster to
>> search.
>>>>>> It may involve some higher latency temporarily right after the
>>>>> replication,
>>>>>> but will go away soon after in-memory caches are full.
>>>>>> 
>>>>>>  What is the search count/sec you are seeing?
>>>>>> 
>>>>>> Regards
>>>>>> Parag
>>>>>> 
>>>>>> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
>>>>> w

Re: searching is slow while adding document each time

2018-10-28 Thread Parag Shah
The original question though is about performance issue in the Searcher.
How would you improve that?

On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
wrote:

> The original question is for a three-node Solr Cloud cluster with
> continuous updates.
> Optimize in this configuration won’t help, it will just cause expensive
> merges later.
>
> I would recommend updating from Solr 4.4. that is a very early release for
> Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
> releases, the
> replicas actually did more indexing work than the leader.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 28, 2018, at 2:13 PM, Erick Erickson 
> wrote:
> >
> > Well, if you optimize on the master you'll inevitably copy the entire
> > index to each of the slaves. Consuming that much network bandwidth can
> > be A Bad Thing.
> >
> > Here's the background for Walter's comment:
> >
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >
> > Solr 7.5 is much better about this:
> >
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >
> > Even with the improvements in Solr 7.5, optimize is still a very
> > expensive operation and unless you've measured and can _prove_ it's
> > beneficial enough to be worth the cost you should avoid it.
> >
> > Best,
> > Erick
> > On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
> wrote:
> >>
> >> What would you do if your performance is degrading?
> >>
> >> I am not suggesting doing this for a serving index. Only one at the
> Master,
> >> which ones optimized gets replicated. Am I missing something here?
> >>
> >> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>
> >>> Do not run optimize (force merge) unless you really understand the
> >>> downside.
> >>>
> >>> If you are continually adding and deleting documents, you really do not
> >>> want
> >>> to run optimize.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>> On Oct 28, 2018, at 9:24 AM, Parag Shah 
> wrote:
> >>>>
> >>>> Hi Mugeesh,
> >>>>
> >>>>   Have you tried optimizing indexes to see if performance improves? It
> >>> is
> >>>> well known that over time as indexing goes on lucene creates more
> >>> segments
> >>>> which will be  searched over and hence take longer. Merging happens
> >>>> constantly but continuous indexing will still introduce smaller
> segments
> >>>> all the time. Have your tried running "optimize" periodically. Is it
> >>>> something that you can afford to run? If you have a Master-Slave setup
> >>> for
> >>>> Indexer v/s searchers, you can replicate on optimize in the Master,
> >>> thereby
> >>>> removing the optimize load on the searchers, but replicate to the
> >>> searcher
> >>>> periodically. That might help with reducing latency. Optimize merges
> >>>> segments and hence creates a more compact index that is faster to
> search.
> >>>> It may involve some higher latency temporarily right after the
> >>> replication,
> >>>> but will go away soon after in-memory caches are full.
> >>>>
> >>>>   What is the search count/sec you are seeing?
> >>>>
> >>>> Regards
> >>>> Parag
> >>>>
> >>>> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> >>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> We are running 3 node solr cloud(4.4) in our production
> infrastructure,
> >>> We
> >>>>> recently moved our SOLR server host softlayer to digital ocean server
> >>> with
> >>>>> same configuration as production.
> >>>>>
> >>>>> Now we are facing some slowness in the searcher when we index
> document,
> >>>>> when
> >>>>> we stop indexing then searches is fine, while adding document then it
> >>>>> become
> >>>>> slow. one of solr server we are indexing other 2 for searching the
> >

Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
The original question is for a three-node Solr Cloud cluster with continuous 
updates.
Optimize in this configuration won’t help, it will just cause expensive merges 
later.

I would recommend updating from Solr 4.4. that is a very early release for
Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early releases, 
the
replicas actually did more indexing work than the leader.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 2:13 PM, Erick Erickson  wrote:
> 
> Well, if you optimize on the master you'll inevitably copy the entire
> index to each of the slaves. Consuming that much network bandwidth can
> be A Bad Thing.
> 
> Here's the background for Walter's comment:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> 
> Solr 7.5 is much better about this:
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> 
> Even with the improvements in Solr 7.5, optimize is still a very
> expensive operation and unless you've measured and can _prove_ it's
> beneficial enough to be worth the cost you should avoid it.
> 
> Best,
> Erick
> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah  wrote:
>> 
>> What would you do if your performance is degrading?
>> 
>> I am not suggesting doing this for a serving index. Only one at the Master,
>> which ones optimized gets replicated. Am I missing something here?
>> 
>> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood 
>> wrote:
>> 
>>> Do not run optimize (force merge) unless you really understand the
>>> downside.
>>> 
>>> If you are continually adding and deleting documents, you really do not
>>> want
>>> to run optimize.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>>> On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
>>>> 
>>>> Hi Mugeesh,
>>>> 
>>>>   Have you tried optimizing indexes to see if performance improves? It
>>> is
>>>> well known that over time as indexing goes on lucene creates more
>>> segments
>>>> which will be  searched over and hence take longer. Merging happens
>>>> constantly but continuous indexing will still introduce smaller segments
>>>> all the time. Have your tried running "optimize" periodically. Is it
>>>> something that you can afford to run? If you have a Master-Slave setup
>>> for
>>>> Indexer v/s searchers, you can replicate on optimize in the Master,
>>> thereby
>>>> removing the optimize load on the searchers, but replicate to the
>>> searcher
>>>> periodically. That might help with reducing latency. Optimize merges
>>>> segments and hence creates a more compact index that is faster to search.
>>>> It may involve some higher latency temporarily right after the
>>> replication,
>>>> but will go away soon after in-memory caches are full.
>>>> 
>>>>   What is the search count/sec you are seeing?
>>>> 
>>>> Regards
>>>> Parag
>>>> 
>>>> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> We are running 3 node solr cloud(4.4) in our production infrastructure,
>>> We
>>>>> recently moved our SOLR server host softlayer to digital ocean server
>>> with
>>>>> same configuration as production.
>>>>> 
>>>>> Now we are facing some slowness in the searcher when we index document,
>>>>> when
>>>>> we stop indexing then searches is fine, while adding document then it
>>>>> become
>>>>> slow. one of solr server we are indexing other 2 for searching the
>>> request.
>>>>> 
>>>>> 
>>>>> I am just wondering what was the reason searches become slow while
>>> indexing
>>>>> even we are using same configuration as we had in prod?
>>>>> 
>>>>> at the time we are pushing 500 document at a time, this processing is
>>>>> continuously running(adding & deleting)
>>>>> 
>>>>> these are the indexing logs
>>>>> 
>>>>> 65497339 [http-apr-8980-exec-45] INFO
>>>>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>> webapp=/solr
>>>>> path=/update
>

Re: searching is slow while adding document each time

2018-10-28 Thread Erick Erickson
Well, if you optimize on the master you'll inevitably copy the entire
index to each of the slaves. Consuming that much network bandwidth can
be A Bad Thing.

Here's the background for Walter's comment:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Solr 7.5 is much better about this:
https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

Even with the improvements in Solr 7.5, optimize is still a very
expensive operation and unless you've measured and can _prove_ it's
beneficial enough to be worth the cost you should avoid it.

Best,
Erick
On Sun, Oct 28, 2018 at 1:51 PM Parag Shah  wrote:
>
> What would you do if your performance is degrading?
>
> I am not suggesting doing this for a serving index. Only one at the Master,
> which ones optimized gets replicated. Am I missing something here?
>
> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood 
> wrote:
>
> > Do not run optimize (force merge) unless you really understand the
> > downside.
> >
> > If you are continually adding and deleting documents, you really do not
> > want
> > to run optimize.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
> > >
> > > Hi Mugeesh,
> > >
> > >Have you tried optimizing indexes to see if performance improves? It
> > is
> > > well known that over time as indexing goes on lucene creates more
> > segments
> > > which will be  searched over and hence take longer. Merging happens
> > > constantly but continuous indexing will still introduce smaller segments
> > > all the time. Have your tried running "optimize" periodically. Is it
> > > something that you can afford to run? If you have a Master-Slave setup
> > for
> > > Indexer v/s searchers, you can replicate on optimize in the Master,
> > thereby
> > > removing the optimize load on the searchers, but replicate to the
> > searcher
> > > periodically. That might help with reducing latency. Optimize merges
> > > segments and hence creates a more compact index that is faster to search.
> > > It may involve some higher latency temporarily right after the
> > replication,
> > > but will go away soon after in-memory caches are full.
> > >
> > >What is the search count/sec you are seeing?
> > >
> > > Regards
> > > Parag
> > >
> > > On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> We are running 3 node solr cloud(4.4) in our production infrastructure,
> > We
> > >> recently moved our SOLR server host softlayer to digital ocean server
> > with
> > >> same configuration as production.
> > >>
> > >> Now we are facing some slowness in the searcher when we index document,
> > >> when
> > >> we stop indexing then searches is fine, while adding document then it
> > >> become
> > >> slow. one of solr server we are indexing other 2 for searching the
> > request.
> > >>
> > >>
> > >> I am just wondering what was the reason searches become slow while
> > indexing
> > >> even we are using same configuration as we had in prod?
> > >>
> > >> at the time we are pushing 500 document at a time, this processing is
> > >> continuously running(adding & deleting)
> > >>
> > >> these are the indexing logs
> > >>
> > >> 65497339 [http-apr-8980-exec-45] INFO
> > >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> > webapp=/solr
> > >> path=/update
> > >> params={distrib.from=
> > >>
> > http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> > >> }
> > >> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> > >> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> > >> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> > >> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> > >> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> > >> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> > >> D1E52788A466E484 (1612655281636900864)]} 0 9
> > >> 65497459 [http-apr-8980-exec-22] INFO
> > >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> > weba

Re: searching is slow while adding document each time

2018-10-28 Thread Parag Shah
What would you do if your performance is degrading?

I am not suggesting doing this for a serving index. Only one at the Master,
which ones optimized gets replicated. Am I missing something here?

On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood 
wrote:

> Do not run optimize (force merge) unless you really understand the
> downside.
>
> If you are continually adding and deleting documents, you really do not
> want
> to run optimize.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
> >
> > Hi Mugeesh,
> >
> >Have you tried optimizing indexes to see if performance improves? It
> is
> > well known that over time as indexing goes on lucene creates more
> segments
> > which will be  searched over and hence take longer. Merging happens
> > constantly but continuous indexing will still introduce smaller segments
> > all the time. Have your tried running "optimize" periodically. Is it
> > something that you can afford to run? If you have a Master-Slave setup
> for
> > Indexer v/s searchers, you can replicate on optimize in the Master,
> thereby
> > removing the optimize load on the searchers, but replicate to the
> searcher
> > periodically. That might help with reducing latency. Optimize merges
> > segments and hence creates a more compact index that is faster to search.
> > It may involve some higher latency temporarily right after the
> replication,
> > but will go away soon after in-memory caches are full.
> >
> >What is the search count/sec you are seeing?
> >
> > Regards
> > Parag
> >
> > On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> wrote:
> >
> >> Hi,
> >>
> >> We are running 3 node solr cloud(4.4) in our production infrastructure,
> We
> >> recently moved our SOLR server host softlayer to digital ocean server
> with
> >> same configuration as production.
> >>
> >> Now we are facing some slowness in the searcher when we index document,
> >> when
> >> we stop indexing then searches is fine, while adding document then it
> >> become
> >> slow. one of solr server we are indexing other 2 for searching the
> request.
> >>
> >>
> >> I am just wondering what was the reason searches become slow while
> indexing
> >> even we are using same configuration as we had in prod?
> >>
> >> at the time we are pushing 500 document at a time, this processing is
> >> continuously running(adding & deleting)
> >>
> >> these are the indexing logs
> >>
> >> 65497339 [http-apr-8980-exec-45] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> >> }
> >> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> >> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> >> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> >> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> >> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> >> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> >> D1E52788A466E484 (1612655281636900864)]} 0 9
> >> 65497459 [http-apr-8980-exec-22] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> >> }
> >> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> >> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> >> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> >> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> >> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> >> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> >> 50EF977E5E873065 (1612655281759584256)]} 0 9
> >> 65497572 [http-apr-8980-exec-40] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> >> }
> >> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> >> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> >

Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
Do not run optimize (force merge) unless you really understand the downside.

If you are continually adding and deleting documents, you really do not want
to run optimize.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
> 
> Hi Mugeesh,
> 
>Have you tried optimizing indexes to see if performance improves? It is
> well known that over time as indexing goes on lucene creates more segments
> which will be  searched over and hence take longer. Merging happens
> constantly but continuous indexing will still introduce smaller segments
> all the time. Have your tried running "optimize" periodically. Is it
> something that you can afford to run? If you have a Master-Slave setup for
> Indexer v/s searchers, you can replicate on optimize in the Master, thereby
> removing the optimize load on the searchers, but replicate to the searcher
> periodically. That might help with reducing latency. Optimize merges
> segments and hence creates a more compact index that is faster to search.
> It may involve some higher latency temporarily right after the replication,
> but will go away soon after in-memory caches are full.
> 
>What is the search count/sec you are seeing?
> 
> Regards
> Parag
> 
> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain  wrote:
> 
>> Hi,
>> 
>> We are running 3 node solr cloud(4.4) in our production infrastructure, We
>> recently moved our SOLR server host softlayer to digital ocean server with
>> same configuration as production.
>> 
>> Now we are facing some slowness in the searcher when we index document,
>> when
>> we stop indexing then searches is fine, while adding document then it
>> become
>> slow. one of solr server we are indexing other 2 for searching the request.
>> 
>> 
>> I am just wondering what was the reason searches become slow while indexing
>> even we are using same configuration as we had in prod?
>> 
>> at the time we are pushing 500 document at a time, this processing is
>> continuously running(adding & deleting)
>> 
>> these are the indexing logs
>> 
>> 65497339 [http-apr-8980-exec-45] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>> }
>> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
>> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
>> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
>> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
>> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
>> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
>> D1E52788A466E484 (1612655281636900864)]} 0 9
>> 65497459 [http-apr-8980-exec-22] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>> }
>> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
>> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
>> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
>> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
>> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
>> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
>> 50EF977E5E873065 (1612655281759584256)]} 0 9
>> 65497572 [http-apr-8980-exec-40] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>> }
>> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
>> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
>> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
>> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
>> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
>> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
>> 59C4A764BB50B13B (1612655281880170496)]} 0 9
>> 65497724 [http-apr-8980-exec-31] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
>> }
>> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
>> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
>> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
>> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
>> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
>> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
>> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 



Re: searching is slow while adding document each time

2018-10-28 Thread Parag Shah
Hi Mugeesh,

Have you tried optimizing indexes to see if performance improves? It is
well known that over time as indexing goes on lucene creates more segments
which will be  searched over and hence take longer. Merging happens
constantly but continuous indexing will still introduce smaller segments
all the time. Have your tried running "optimize" periodically. Is it
something that you can afford to run? If you have a Master-Slave setup for
Indexer v/s searchers, you can replicate on optimize in the Master, thereby
removing the optimize load on the searchers, but replicate to the searcher
periodically. That might help with reducing latency. Optimize merges
segments and hence creates a more compact index that is faster to search.
It may involve some higher latency temporarily right after the replication,
but will go away soon after in-memory caches are full.

What is the search count/sec you are seeing?

Regards
Parag

On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain  wrote:

> Hi,
>
> We are running 3 node solr cloud(4.4) in our production infrastructure, We
> recently moved our SOLR server host softlayer to digital ocean server with
> same configuration as production.
>
> Now we are facing some slowness in the searcher when we index document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searching the request.
>
>
> I am just wondering what was the reason searches become slow while indexing
> even we are using same configuration as we had in prod?
>
> at the time we are pushing 500 document at a time, this processing is
> continuously running(adding & deleting)
>
> these are the indexing logs
>
> 65497339 [http-apr-8980-exec-45] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> D1E52788A466E484 (1612655281636900864)]} 0 9
> 65497459 [http-apr-8980-exec-22] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> 50EF977E5E873065 (1612655281759584256)]} 0 9
> 65497572 [http-apr-8980-exec-40] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> 59C4A764BB50B13B (1612655281880170496)]} 0 9
> 65497724 [http-apr-8980-exec-31] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


searching is slow while adding document each time

2018-09-26 Thread Mugeesh Husain
Hi,

We are running 3 node solr cloud(4.4) in our production infrastructure, We
recently moved our SOLR server host softlayer to digital ocean server with
same configuration as production.

Now we are facing some slowness in the searcher when we index document, when
we stop indexing then searches is fine, while adding document then it become
slow. one of solr server we are indexing other 2 for searching the request.


I am just wondering what was the reason searches become slow while indexing
even we are using same configuration as we had in prod? 

at the time we are pushing 500 document at a time, this processing is
continuously running(adding & deleting)

these are the indexing logs

65497339 [http-apr-8980-exec-45] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
(1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
(1612655281566646272), 8D15813305BF7417 (1612655281584472064),
DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
(1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
D1E52788A466E484 (1612655281636900864)]} 0 9
65497459 [http-apr-8980-exec-22] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
(1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
(1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
(1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
50EF977E5E873065 (1612655281759584256)]} 0 9
65497572 [http-apr-8980-exec-40] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
(1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
(1612655281814110208), DAA49178A5E74285 (1612655281830887424),
829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
(1612655281859198976), BE0F7354DC30164C (1612655281869684736),
59C4A764BB50B13B (1612655281880170496)]} 0 9
65497724 [http-apr-8980-exec-31] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
(1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
(1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
(1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
5131AEC4B87FBFE9 (1612655282037456896)]} 0 10




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Re: Multi word searching is not working getting random search results

2018-09-07 Thread Muddapati, Jagadish
Hi Susheel,

Thanks for your response. Well If I use plural also it is giving the same 
results and the Solr is not finding the 2 different words that is in the same 
page. I am trying to figure out how to pass the query to find the results while 
doing multi word search in same page.

Thanks,
Jagadish M.

-Original Message-
From: Susheel Kumar [mailto:susheel2...@gmail.com] 
Sent: Thursday, September 6, 2018 3:01 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: Multi word searching is not working getting random 
search results

How about you search with Intermodal Schedules (plural) & try phrase slop for 
better control on relevancy order

https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html


On Thu, Sep 6, 2018 at 12:10 PM Muddapati, Jagadish < 
jagadish.muddap...@nscorp.com> wrote:

> Label: newbie
> Environment:
> I am currently running solr on Linux platform.
>
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.5"
>
> openjdk version "1.8.0_181"
>
> AEM version: 6.2
>
> I recently integrate solr to AEM and when i do search for multiple 
> words the search results are getting randomly.
>
> search words: Intermodal schedule
> Results: First solr displaying the search results related to 
> Intermodal and after few pages I am seeing the serch term schedule 
> related pages randomly. I am not getting the results related to multi words 
> on the page.
> For example: I am not seeing the results like [Terminals & Schedules | 
> Intermodal | Shipping Options ... page on starting and getting random 
> results and the  [Terminals & Schedules | Intermodal | Shipping Options ...
> page displaying after the 40 results.
>
> Here is the query on browser URL:
>
> http://secure-web.cisco.com/1YHJLnjxg38Ifakaan8ZFPHodNi4HbrQLcmhM5kTSr
> x5jOf2uH75_wUtu7AFAEP7KxUB5b6CrixqFq_S0f7HrOh3XWjXQ-Ulh0lCLiuuIfRXn0SG
> Cb4RFSBeyoQrKVcF6qdneCM8cn12qIiYp4k48VW-L2yHoLpFq3on--oSxJjw9n0ELt4iIo
> aT942NX3uB1ugWvTByJNd7dD2uv0FqAv3uLqPHCGs6apleaaXiLWjwnRO6f_yAMNTnfRy3
> QfUOyEi4KyrULJfGTS6Hs8hNewO86LHWjh6jbxMYJg_sbi8epNC0ptsjtDTj-6YOzHx7w/
> http%3A%2F%2Ftest-servername%2Fcontent%2Fnscorp%2Fen%2Fsearch-results.
> html%3Fstart%3D0%26q%3DIntermodal%2BSchedule
> <
> http://secure-web.cisco.com/1LoQdHzbU9XRxNqP3SAui7hMZDrw5utvOmhYY6FBfl
> yIR4Kn1pbTM5fxobgygd2KkgQ3MkCOlNCXtFBX48duxR6YmWb-pXqUKJavqTvE4V_9PpVm
> tJbE2B1bKJGFOXlEUqk2V4_k5NgmGIk4cxJHbOb0KWCxj9XD61t210oPjzOZUQnwzdZRnF
> jyAvNImN3Hmm8TIM_StSq5SnFeabqUcnhP73OIJF43s-Wt5tpPqobgdcAEEjf9Fy6oPivN
> 26n-AFiMOMveKIqiwxgX1NDgW1hDFIiuyM5bVr-JnXgXUuCrMHZxRZsDE4jthLDtJ44fb/
> http%3A%2F%2Fservername%2Fcontent%2Fnscorp%2Fen%2Fsearch-results.html%
> 3Fstart%3D0%26q%3DIntermodal%2BSchedule
> >
>
> I am using solr version 7.4
>
> Thanks,
> Jagadish M.
>
>
>


Re: Multi word searching is not working getting random search results

2018-09-06 Thread Susheel Kumar
How about you search with Intermodal Schedules (plural) & try phrase slop
for better control on relevancy order

https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html


On Thu, Sep 6, 2018 at 12:10 PM Muddapati, Jagadish <
jagadish.muddap...@nscorp.com> wrote:

> Label: newbie
> Environment:
> I am currently running solr on Linux platform.
>
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.5"
>
> openjdk version "1.8.0_181"
>
> AEM version: 6.2
>
> I recently integrate solr to AEM and when i do search for multiple words
> the search results are getting randomly.
>
> search words: Intermodal schedule
> Results: First solr displaying the search results related to Intermodal
> and after few pages I am seeing the serch term schedule related pages
> randomly. I am not getting the results related to multi words on the page.
> For example: I am not seeing the results like [Terminals & Schedules |
> Intermodal | Shipping Options ... page on starting and getting random
> results and the  [Terminals & Schedules | Intermodal | Shipping Options ...
> page displaying after the 40 results.
>
> Here is the query on browser URL:
>
> http://test-servername/content/nscorp/en/search-results.html?start=0=Intermodal+Schedule
> <
> http://servername/content/nscorp/en/search-results.html?start=0=Intermodal+Schedule
> >
>
> I am using solr version 7.4
>
> Thanks,
> Jagadish M.
>
>
>


Multi word searching is not working getting random search results

2018-09-06 Thread Muddapati, Jagadish
Label: newbie
Environment:
I am currently running solr on Linux platform.

NAME="Red Hat Enterprise Linux Server"
VERSION="7.5"

openjdk version "1.8.0_181"

AEM version: 6.2

I recently integrate solr to AEM and when i do search for multiple words the 
search results are getting randomly.

search words: Intermodal schedule
Results: First solr displaying the search results related to Intermodal and 
after few pages I am seeing the serch term schedule related pages randomly. I 
am not getting the results related to multi words on the page.
For example: I am not seeing the results like [Terminals & Schedules | 
Intermodal | Shipping Options ... page on starting and getting random results 
and the  [Terminals & Schedules | Intermodal | Shipping Options ... page 
displaying after the 40 results.

Here is the query on browser URL:
http://test-servername/content/nscorp/en/search-results.html?start=0=Intermodal+Schedule

I am using solr version 7.4

Thanks,
Jagadish M.




RE: Searching by dates

2018-08-16 Thread Markus Jelsma
Hello Christopher,

We have a library whose soul purpose it is to extract, parse and validate dates 
found in free text, in all major world languages (and many more) and every in 
thinkable format/notation. It can also deal with times, timezones (resolve them 
back to UTC), different eras (e.g. Buddhist), validate dates (e.g. 2018-1-4) 
and figure out which format is correct (-m-d or -d-m) if a day name is 
found somewhere very close to the date. And it supports month names including 
abbreviated format (thanks to Locale).

We use it to get the date for an article/web page on our Sitesearch platform, 
and index it to Solr so we can boost recent articles. But some of our customers 
use it together with a Lucene CharFilter to transform it on-the-fly 
(maintaining offsets and positions for highlighting) when indexing or 
searching, or embedded in a QueryParser.

It is a mature project in on-going development since 2010, but not open source, 
so if you are interested contact us off list.

Regards,
Markus

 
 
-Original message-
> From:Shawn Heisey 
> Sent: Thursday 16th August 2018 20:09
> To: solr-user@lucene.apache.org
> Subject: Re: Searching by dates
> 
> On 8/16/2018 9:20 AM, Christopher Schultz wrote:
> > Hmm. I could have sworn the documentation I read in the past (maybe as
> > long as 3-4 months ago) indicated that date+timestamp was necessary.
> > Maybe that was just for the index, while the searches can be partial.
> 
> DateRangeField was introduced four years ago, first available in Solr
> version 5.0.
> 
> https://issues.apache.org/jira/browse/SOLR-6103
> 
> > As for i18n, is there a way to have the query analyzer convert strings
> > like "mm/dd/" into "-mm-dd"?
> 
> Solr doesn't accept dates in mm/dd/ syntax, and can't convert that
> for you.  The ISO standard that *is* accepted is the more logical
> -mm-dd.  It's generally best if you don't use a freeform text field
> for dates ... provide a full interface for choosing specific dates so
> that user input is predictable.  Probably something like this:
> 
> https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date
> 
> Looking at the documentation, I don't see any way to search for just a
> day without the year.  That could be a useful enhancement for
> birthday-related use cases, but I have no idea how hard it would be to
> write.
> 
> Thanks,
> Shawn
> 
> 


Re: Searching by dates

2018-08-16 Thread Shawn Heisey
On 8/16/2018 9:20 AM, Christopher Schultz wrote:
> Hmm. I could have sworn the documentation I read in the past (maybe as
> long as 3-4 months ago) indicated that date+timestamp was necessary.
> Maybe that was just for the index, while the searches can be partial.

DateRangeField was introduced four years ago, first available in Solr
version 5.0.

https://issues.apache.org/jira/browse/SOLR-6103

> As for i18n, is there a way to have the query analyzer convert strings
> like "mm/dd/" into "-mm-dd"?

Solr doesn't accept dates in mm/dd/ syntax, and can't convert that
for you.  The ISO standard that *is* accepted is the more logical
-mm-dd.  It's generally best if you don't use a freeform text field
for dates ... provide a full interface for choosing specific dates so
that user input is predictable.  Probably something like this:

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date

Looking at the documentation, I don't see any way to search for just a
day without the year.  That could be a useful enhancement for
birthday-related use cases, but I have no idea how hard it would be to
write.

Thanks,
Shawn



Re: Searching by dates

2018-08-16 Thread Alexandre Rafalovitch
You could have PatternReplace in your field definition either as a
CharFilter or a TokenFilter. See:
http://www.solr-start.com/info/analyzers/

Regards,
   Alex.

On 16 August 2018 at 11:20, Christopher Schultz
 wrote:
> Shawn,
>
> On 8/16/18 10:37 AM, Shawn Heisey wrote:
>> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>>> I haven't actually tried this, yes, but from the docs I'm guessing that
>>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>>
>>> No user is ever going to do that.
>>
>> If you use the field class called DateRangeField, instead of the trie or
>> point classes, you can get what you're after.
>>
>> It allows both searching and indexing dates as vague as "2018".
>>
>> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html
>
> Hmm. I could have sworn the documentation I read in the past (maybe as
> long as 3-4 months ago) indicated that date+timestamp was necessary.
> Maybe that was just for the index, while the searches can be partial.
>
> As long as users don't have to enter timestamps to search, I think all
> is well in terms of index/search for me.
>
> As for i18n, is there a way to have the query analyzer convert strings
> like "mm/dd/" into "-mm-dd"?
>
> I'm sure we can take the query (before handing-off to Solr), look for
> anything that looks like a date and convert it into ISO-8601 for
> searching, but if Solr already provides a facility to do that, I'd
> rather not complicate my code in order to get it working.
>
>> For an existing index, you will have to change the schema and completely
>> reindex.
>
> That's okay. The index doesn't actually exist, yet :) This is all just
> planning.
>
> Thanks,
> -chris
>


Re: Searching by dates

2018-08-16 Thread Christopher Schultz
Shawn,

On 8/16/18 10:37 AM, Shawn Heisey wrote:
> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>> I haven't actually tried this, yes, but from the docs I'm guessing that
>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>
>> No user is ever going to do that.
> 
> If you use the field class called DateRangeField, instead of the trie or
> point classes, you can get what you're after.
> 
> It allows both searching and indexing dates as vague as "2018".
> 
> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html

Hmm. I could have sworn the documentation I read in the past (maybe as
long as 3-4 months ago) indicated that date+timestamp was necessary.
Maybe that was just for the index, while the searches can be partial.

As long as users don't have to enter timestamps to search, I think all
is well in terms of index/search for me.

As for i18n, is there a way to have the query analyzer convert strings
like "mm/dd/" into "-mm-dd"?

I'm sure we can take the query (before handing-off to Solr), look for
anything that looks like a date and convert it into ISO-8601 for
searching, but if Solr already provides a facility to do that, I'd
rather not complicate my code in order to get it working.

> For an existing index, you will have to change the schema and completely
> reindex.

That's okay. The index doesn't actually exist, yet :) This is all just
planning.

Thanks,
-chris



signature.asc
Description: OpenPGP digital signature


Re: Searching by dates

2018-08-16 Thread Alexandre Rafalovitch
However, you probably will still need to convert your dates into
strings as well to match people's search expectation, as the date
fields do not store _english_ month names internally.

So, you will want to have a secondary field that expands 2018-02-31
into "February 2018" (and "Feb 2018"?) including the analysis pipeline
that does lowercasing.

Regards,
   Alex.

On 16 August 2018 at 10:37, Shawn Heisey  wrote:
> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>> I haven't actually tried this, yes, but from the docs I'm guessing that
>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>
>> No user is ever going to do that.
>
> If you use the field class called DateRangeField, instead of the trie or
> point classes, you can get what you're after.
>
> It allows both searching and indexing dates as vague as "2018".
>
> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html
>
> For an existing index, you will have to change the schema and completely
> reindex.
>
> Thanks,
> Shawn
>


Re: Searching by dates

2018-08-16 Thread Shawn Heisey
On 8/16/2018 7:48 AM, Christopher Schultz wrote:
> I haven't actually tried this, yes, but from the docs I'm guessing that
> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>
> No user is ever going to do that.

If you use the field class called DateRangeField, instead of the trie or
point classes, you can get what you're after.

It allows both searching and indexing dates as vague as "2018".

https://lucene.apache.org/solr/guide/7_4/working-with-dates.html

For an existing index, you will have to change the schema and completely
reindex.

Thanks,
Shawn



Searching by dates

2018-08-16 Thread Christopher Schultz
All,

My understanding is that Solr (really Lucene) only handles temporal data
using full timestamps (date+time, always UTC). I have a use-case where
I'd like to store and search for people by their birth dates, so the
timestamp information is not relevant for me.

I haven't actually tried this, yes, but from the docs I'm guessing that
I can't search for a DOB using e.g. 2018-08-16 but instead I need to
search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.

No user is ever going to do that.

I can also offer a separate form-field for "enter your DOB search here"
and then correctly-format it for Solr/Lucene, but then users can't
conveniently search for e.g. "chris schultz 2018-08-16" and have the DOB
match anything useful.

Is there any standard way of handling dates, or any ideas people have
come up with that kind of work for this use-case?

I could always convert dates to unparsed strings (so I don't get
separate tokens like 2018, 08, and 16 in the document), but then I won't
be able to do range queries against the index.

I would definitely want to be able to search for "chris [born in] august
2018" and find any matches.

Any ideas?

Thanks
-chris



signature.asc
Description: OpenPGP digital signature


Re: Question regarding searching Chinese characters

2018-08-14 Thread Christopher Beer
Hi all,

Thanks for this enlightening thread. As it happens, at Stanford Libraries we’re 
currently working on upgrading from Solr 4 to 7 and we’re looking forward to 
using the new dictionary-based word splitting in the ICUTokenizer.

We have many of the same challenges as Amanda mentioned, and thanks to the 
advice on this thread, we’ve taken a stab at a CharFilter to do the traditional 
-> simplified transformation [1] and it seems to be promising and we've sent it 
out for testing by our subject matter experts for evaluation.

Thanks,
Chris

[1] 
https://github.com/sul-dlss/CJKFilterUtils/blob/master/src/main/java/edu/stanford/lucene/analysis/ICUTransformCharFilter.java

On 2018/07/24 12:54:35, Tomoko Uchida  wrote:
Hi Amanda,>

do all I need to do is modify the settings from smartChinese to the ones>
you posted here>

Yes, the settings I posted should work for you, at least partially.>
If you are happy with the results, it's OK!>
But please take this as a starting point because it's not perfect.>

Or do I need to still do something with the SmartChineseAnalyzer?>

Try the settings, then if you notice something strange and want to know why>
and how to solve it, that may be the time to dive deep into. ;)>

I cannot explain how analyzers works here... but you should start off with>
the Solr documentation.>
https://lucene.apache.org/solr/guide/7_0/understanding-analyzers-tokenizers-and-filters.html>

Regards,>
Tomoko>



2018年7月24日(火) 21:08 Amanda Shuman :>

Hi Tomoko,>

Thanks so much for this explanation - I did not even know this was>
possible! I will try it out but I have one question: do all I need to do is>
modify the settings from smartChinese to the ones you posted here:>

>
>
>

id="Traditional-Simplified"/>>
>

Or do I need to still do something with the SmartChineseAnalyzer? I did not>
quite understand this in your first message:>

" I think you need two steps if you want to use HMMChineseTokenizer>
correctly.>

1. transform all traditional characters to simplified ones and save to>
temporary files.>
I do not have clear idea for doing this, but you can create a Java>
program that calls Lucene's ICUTransformFilter>
2. then, index to Solr using SmartChineseAnalyzer.">

My understanding is that with the new settings you posted, I don't need to>
do these steps. Is that correct? Otherwise, I don't really know how to do>
step 1 with the java program>

Thanks!>
Amanda>


-->
Dr. Amanda Shuman>
Post-doc researcher, University of Freiburg, The Maoist Legacy Project>
>
PhD, University of California, Santa Cruz>
http://www.amandashuman.net/>
http://www.prchistoryresources.org/>
Office: +49 (0) 761 203 4925>



AW: indexing two words, searching single word

2018-08-03 Thread Clemens Wyss DEV
+1 ;)

-Ursprüngliche Nachricht-
Von: Susheel Kumar  
Gesendet: Freitag, 3. August 2018 14:40
An: solr-user@lucene.apache.org
Betreff: Re: indexing two words, searching single word

and as you suggested, use stop word before shingles...

On Fri, Aug 3, 2018 at 8:10 AM, Clemens Wyss DEV 
wrote:

> 
>   
>   
>outputUnigrams="true" tokenSeparator=""/>  
> 
>
> seems to "work"
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV 
> Gesendet: Freitag, 3. August 2018 13:46
> An: solr-user@lucene.apache.org
> Betreff: AW: indexing two words, searching single word
>
> >Because you probably are not looking for "andthe" kind of tokens
> (unfortunately) I guess I am, as we don't know what people enter...
>
> > a shingle plus regex to remove whitespace
> sounds interesting. How would that filter-chain look like? That would 
> be an type="index"-analyzer?
> I guess we could shingle after stop-word-filtering and I quess 
> maxShingleSize="2" would suffice
>
> -Ursprüngliche Nachricht-
> Von: Alexandre Rafalovitch 
> Gesendet: Freitag, 3. August 2018 13:33
> An: solr-user 
> Betreff: Re: indexing two words, searching single word
>
> But what is your generic problem then. Because you probably are not 
> looking for "andthe" kind of tokens.
>
> However a shingle plus regex to remove whitespace can give you "anytwo 
> wordstogether smooshed" tokens in the index.
>
> Regards,
>  Alex
>
>
> On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, 
> wrote:
>
> > Hi Markus,
> > thanks for the quick answer.
> >
> > "sound stage" was just an example. We are looking for a generic 
> > solution ...
> >
> > Is it "ok" to apply an NGRamFilter for query-analyzing?
> > 
> > 
> > 
> >  > maxGramSize="15" />
> > 
> >
> > I guess (besides the performance impact) this reduces search results 
> > accuracy?
> >
> > -Clemens
> >
> > -Ursprüngliche Nachricht-
> > Von: Markus Jelsma 
> > Gesendet: Freitag, 3. August 2018 12:43
> > An: solr-user@lucene.apache.org
> > Betreff: RE: indexing two words, searching single word
> >
> > Hello,
> >
> > If your case is English you could use synonyms to work around the 
> > problem of the few compound words of the language. However, would 
> > you be dealing with a Germanic compound language, the 
> > HyphenationCompoundWordTokenFilter
> > [1] or DictionaryCompoundWordTokenFilter are a better choice. The 
> > former is much more flexible but has its drawbacks.
> >
> > Regards,
> > Markus
> >
> >
> > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/luc
> > en 
> > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
> >
> >
> >
> > -Original message-
> > > From:Clemens Wyss DEV 
> > > Sent: Friday 3rd August 2018 12:22
> > > To: solr-user@lucene.apache.org
> > > Subject: indexing two words, searching single word
> > >
> > > Sounds like a rather simple issue:
> > > if I index "sound stage" and search for "soundstage" I get no hits
> > >
> > > What am I doing wrong
> > > a) when indexing
> > > b) when searching
> > > ?
> > >
> > > Thx in advance
> > > - Clemens
> > >
> >
>


Re: indexing two words, searching single word

2018-08-03 Thread Susheel Kumar
and as you suggested, use stop word before shingles...

On Fri, Aug 3, 2018 at 8:10 AM, Clemens Wyss DEV 
wrote:

> 
>   
>   
>outputUnigrams="true" tokenSeparator=""/> 
> 
>
> seems to "work"
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV 
> Gesendet: Freitag, 3. August 2018 13:46
> An: solr-user@lucene.apache.org
> Betreff: AW: indexing two words, searching single word
>
> >Because you probably are not looking for "andthe" kind of tokens
> (unfortunately) I guess I am, as we don't know what people enter...
>
> > a shingle plus regex to remove whitespace
> sounds interesting. How would that filter-chain look like? That would be
> an type="index"-analyzer?
> I guess we could shingle after stop-word-filtering and I quess
> maxShingleSize="2" would suffice
>
> -Ursprüngliche Nachricht-
> Von: Alexandre Rafalovitch 
> Gesendet: Freitag, 3. August 2018 13:33
> An: solr-user 
> Betreff: Re: indexing two words, searching single word
>
> But what is your generic problem then. Because you probably are not
> looking for "andthe" kind of tokens.
>
> However a shingle plus regex to remove whitespace can give you "anytwo
> wordstogether smooshed" tokens in the index.
>
> Regards,
>  Alex
>
>
> On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, 
> wrote:
>
> > Hi Markus,
> > thanks for the quick answer.
> >
> > "sound stage" was just an example. We are looking for a generic
> > solution ...
> >
> > Is it "ok" to apply an NGRamFilter for query-analyzing?
> > 
> > 
> > 
> >  > maxGramSize="15" />
> > 
> >
> > I guess (besides the performance impact) this reduces search results
> > accuracy?
> >
> > -Clemens
> >
> > -Ursprüngliche Nachricht-
> > Von: Markus Jelsma 
> > Gesendet: Freitag, 3. August 2018 12:43
> > An: solr-user@lucene.apache.org
> > Betreff: RE: indexing two words, searching single word
> >
> > Hello,
> >
> > If your case is English you could use synonyms to work around the
> > problem of the few compound words of the language. However, would you
> > be dealing with a Germanic compound language, the
> > HyphenationCompoundWordTokenFilter
> > [1] or DictionaryCompoundWordTokenFilter are a better choice. The
> > former is much more flexible but has its drawbacks.
> >
> > Regards,
> > Markus
> >
> >
> > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen
> > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
> >
> >
> >
> > -Original message-
> > > From:Clemens Wyss DEV 
> > > Sent: Friday 3rd August 2018 12:22
> > > To: solr-user@lucene.apache.org
> > > Subject: indexing two words, searching single word
> > >
> > > Sounds like a rather simple issue:
> > > if I index "sound stage" and search for "soundstage" I get no hits
> > >
> > > What am I doing wrong
> > > a) when indexing
> > > b) when searching
> > > ?
> > >
> > > Thx in advance
> > > - Clemens
> > >
> >
>


AW: indexing two words, searching single word

2018-08-03 Thread Clemens Wyss DEV

  
  
   


seems to "work"

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV  
Gesendet: Freitag, 3. August 2018 13:46
An: solr-user@lucene.apache.org
Betreff: AW: indexing two words, searching single word

>Because you probably are not looking for "andthe" kind of tokens
(unfortunately) I guess I am, as we don't know what people enter...

> a shingle plus regex to remove whitespace
sounds interesting. How would that filter-chain look like? That would be an 
type="index"-analyzer?
I guess we could shingle after stop-word-filtering and I quess 
maxShingleSize="2" would suffice

-Ursprüngliche Nachricht-
Von: Alexandre Rafalovitch 
Gesendet: Freitag, 3. August 2018 13:33
An: solr-user 
Betreff: Re: indexing two words, searching single word

But what is your generic problem then. Because you probably are not looking for 
"andthe" kind of tokens.

However a shingle plus regex to remove whitespace can give you "anytwo 
wordstogether smooshed" tokens in the index.

Regards,
 Alex


On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV,  wrote:

> Hi Markus,
> thanks for the quick answer.
>
> "sound stage" was just an example. We are looking for a generic 
> solution ...
>
> Is it "ok" to apply an NGRamFilter for query-analyzing?
> 
> 
> 
>  maxGramSize="15" />
> 
>
> I guess (besides the performance impact) this reduces search results 
> accuracy?
>
> -Clemens
>
> -Ursprüngliche Nachricht-
> Von: Markus Jelsma 
> Gesendet: Freitag, 3. August 2018 12:43
> An: solr-user@lucene.apache.org
> Betreff: RE: indexing two words, searching single word
>
> Hello,
>
> If your case is English you could use synonyms to work around the 
> problem of the few compound words of the language. However, would you 
> be dealing with a Germanic compound language, the 
> HyphenationCompoundWordTokenFilter
> [1] or DictionaryCompoundWordTokenFilter are a better choice. The 
> former is much more flexible but has its drawbacks.
>
> Regards,
> Markus
>
>
> https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen
> e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
>
>
>
> -Original message-
> > From:Clemens Wyss DEV 
> > Sent: Friday 3rd August 2018 12:22
> > To: solr-user@lucene.apache.org
> > Subject: indexing two words, searching single word
> >
> > Sounds like a rather simple issue:
> > if I index "sound stage" and search for "soundstage" I get no hits
> >
> > What am I doing wrong
> > a) when indexing
> > b) when searching
> > ?
> >
> > Thx in advance
> > - Clemens
> >
>


AW: indexing two words, searching single word

2018-08-03 Thread Clemens Wyss DEV
>Because you probably are not looking for "andthe" kind of tokens
(unfortunately) I guess I am, as we don't know what people enter...

> a shingle plus regex to remove whitespace
sounds interesting. How would that filter-chain look like? That would be an 
type="index"-analyzer?
I guess we could shingle after stop-word-filtering and I quess 
maxShingleSize="2" would suffice

-Ursprüngliche Nachricht-
Von: Alexandre Rafalovitch  
Gesendet: Freitag, 3. August 2018 13:33
An: solr-user 
Betreff: Re: indexing two words, searching single word

But what is your generic problem then. Because you probably are not looking for 
"andthe" kind of tokens.

However a shingle plus regex to remove whitespace can give you "anytwo 
wordstogether smooshed" tokens in the index.

Regards,
 Alex


On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV,  wrote:

> Hi Markus,
> thanks for the quick answer.
>
> "sound stage" was just an example. We are looking for a generic 
> solution ...
>
> Is it "ok" to apply an NGRamFilter for query-analyzing?
> 
> 
> 
>  maxGramSize="15" />
> 
>
> I guess (besides the performance impact) this reduces search results 
> accuracy?
>
> -Clemens
>
> -Ursprüngliche Nachricht-----
> Von: Markus Jelsma 
> Gesendet: Freitag, 3. August 2018 12:43
> An: solr-user@lucene.apache.org
> Betreff: RE: indexing two words, searching single word
>
> Hello,
>
> If your case is English you could use synonyms to work around the 
> problem of the few compound words of the language. However, would you 
> be dealing with a Germanic compound language, the 
> HyphenationCompoundWordTokenFilter
> [1] or DictionaryCompoundWordTokenFilter are a better choice. The 
> former is much more flexible but has its drawbacks.
>
> Regards,
> Markus
>
>
> https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen
> e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
>
>
>
> -Original message-
> > From:Clemens Wyss DEV 
> > Sent: Friday 3rd August 2018 12:22
> > To: solr-user@lucene.apache.org
> > Subject: indexing two words, searching single word
> >
> > Sounds like a rather simple issue:
> > if I index "sound stage" and search for "soundstage" I get no hits
> >
> > What am I doing wrong
> > a) when indexing
> > b) when searching
> > ?
> >
> > Thx in advance
> > - Clemens
> >
>


Re: indexing two words, searching single word

2018-08-03 Thread Alexandre Rafalovitch
But what is your generic problem then. Because you probably are not looking
for "andthe" kind of tokens.

However a shingle plus regex to remove whitespace can give you "anytwo
wordstogether smooshed" tokens in the index.

Regards,
 Alex


On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV,  wrote:

> Hi Markus,
> thanks for the quick answer.
>
> "sound stage" was just an example. We are looking for a generic solution
> ...
>
> Is it "ok" to apply an NGRamFilter for query-analyzing?
> 
> 
> 
>  maxGramSize="15" />
> 
>
> I guess (besides the performance impact) this reduces search results
> accuracy?
>
> -Clemens
>
> -Ursprüngliche Nachricht-
> Von: Markus Jelsma 
> Gesendet: Freitag, 3. August 2018 12:43
> An: solr-user@lucene.apache.org
> Betreff: RE: indexing two words, searching single word
>
> Hello,
>
> If your case is English you could use synonyms to work around the problem
> of the few compound words of the language. However, would you be dealing
> with a Germanic compound language, the HyphenationCompoundWordTokenFilter
> [1] or DictionaryCompoundWordTokenFilter are a better choice. The former is
> much more flexible but has its drawbacks.
>
> Regards,
> Markus
>
>
> https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
>
>
>
> -Original message-
> > From:Clemens Wyss DEV 
> > Sent: Friday 3rd August 2018 12:22
> > To: solr-user@lucene.apache.org
> > Subject: indexing two words, searching single word
> >
> > Sounds like a rather simple issue:
> > if I index "sound stage" and search for "soundstage" I get no hits
> >
> > What am I doing wrong
> > a) when indexing
> > b) when searching
> > ?
> >
> > Thx in advance
> > - Clemens
> >
>


AW: indexing two words, searching single word

2018-08-03 Thread Clemens Wyss DEV
Hi Markus,
thanks for the quick answer. 

"sound stage" was just an example. We are looking for a generic solution ...

Is it "ok" to apply an NGRamFilter for query-analyzing?






I guess (besides the performance impact) this reduces search results accuracy?

-Clemens

-Ursprüngliche Nachricht-
Von: Markus Jelsma  
Gesendet: Freitag, 3. August 2018 12:43
An: solr-user@lucene.apache.org
Betreff: RE: indexing two words, searching single word

Hello,

If your case is English you could use synonyms to work around the problem of 
the few compound words of the language. However, would you be dealing with a 
Germanic compound language, the HyphenationCompoundWordTokenFilter [1] or 
DictionaryCompoundWordTokenFilter are a better choice. The former is much more 
flexible but has its drawbacks.

Regards,
Markus

https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html

 
 
-Original message-
> From:Clemens Wyss DEV 
> Sent: Friday 3rd August 2018 12:22
> To: solr-user@lucene.apache.org
> Subject: indexing two words, searching single word
> 
> Sounds like a rather simple issue:
> if I index "sound stage" and search for "soundstage" I get no hits
> 
> What am I doing wrong 
> a) when indexing
> b) when searching
> ?
> 
> Thx in advance
> - Clemens
> 


RE: indexing two words, searching single word

2018-08-03 Thread Markus Jelsma
Hello,

If your case is English you could use synonyms to work around the problem of 
the few compound words of the language. However, would you be dealing with a 
Germanic compound language, the HyphenationCompoundWordTokenFilter [1] or 
DictionaryCompoundWordTokenFilter are a better choice. The former is much more 
flexible but has its drawbacks.

Regards,
Markus

https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html

 
 
-Original message-
> From:Clemens Wyss DEV 
> Sent: Friday 3rd August 2018 12:22
> To: solr-user@lucene.apache.org
> Subject: indexing two words, searching single word
> 
> Sounds like a rather simple issue:
> if I index "sound stage" and search for "soundstage" I get no hits
> 
> What am I doing wrong 
> a) when indexing
> b) when searching
> ?
> 
> Thx in advance
> - Clemens
> 


indexing two words, searching single word

2018-08-03 Thread Clemens Wyss DEV
Sounds like a rather simple issue:
if I index "sound stage" and search for "soundstage" I get no hits

What am I doing wrong 
a) when indexing
b) when searching
?

Thx in advance
- Clemens


Re: Alias field names when searching (not for results)

2018-07-24 Thread Chris Hostetter


: >  defType=edismax q=sysadmin name:Mike qf=title text last_name
: > first_name
: 
: Aside: I'm curious about the use of "qf", here. Since I didn't want my
: users to have to specify any particular field to search, I created an
: "all" field and dumped everything into it. It seems like it would be
: better to change that so that I don't have an "all" field at all and
...
: Does that sound like a better approach than packing-together an "all"
: field during indexing?

well -- you may have other reasons why an "all" field is useful, but yeah 
-- when using dismax/edismax the "qf" param is really designed to let you 
search across many diff fields, and to associate query time weights with 
those fields.  see the docs i linked to earlier, but there's also a blog 
post on the scoring implications i wrote a lifetime ago...

https://lucidworks.com/2010/05/23/whats-a-dismax/

: > ...the examples above all show the request params, so "f.last.qf"
: > is a param name, "last_name" is the corrisponding param value.
: 
: Awesome. I didn't realize that "f.alias.qf" was the name of the actual
: parameter to send. I was staring at the Solr Dashboard's selection of
: edismax parameters and not seeing anything that seemed correct. That's
: because it's a new parameter! Makes sense, now.

that syntax is an example of a "per field override" where in this case the 
"field" you are overriding doesn't *have* to be a "real" field in the 
index -- it can be an alias and for that alias (when used by your users) 
you are defining the qf to use.  it could in fact be a "real" field name, 
where you override what gets searched "I'm not going to let them search 
directly against just the last_name, when they try i'm going to *actually* 
search against last_name and full_name" etc...)


-Hoss
http://www.lucidworks.com/


Re: Alias field names when searching (not for results)

2018-07-24 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Chris,

On 7/24/18 1:40 PM, Chris Hostetter wrote:
> 
> : So if I want to alias the "first_name" field to "first" and the :
> "last_name" field to "last", then I would ... do what, exactly?
> 
> se the last example here...
> 
> https://lucene.apache.org/solr/guide/7_4/the-extended-dismax-query-par
ser.html#examples-of-edismax-queries
>
>  defType=edismax q=sysadmin name:Mike qf=title text last_name
> first_name

Aside: I'm curious about the use of "qf", here. Since I didn't want my
users to have to specify any particular field to search, I created an
"all" field and dumped everything into it. It seems like it would be
better to change that so that I don't have an "all" field at all and
instead I mention all of the fields I would normally have packed into
the "all" field in the "qf" parameter. That would reduce my index size
and also help with another question I had today (subject: Possible to
define a field so that substring-search is always used?).

Does that sound like a better approach than packing-together an "all"
field during indexing?

> f.name.qf=last_name first_name
> 
> the "f.name.qf" has created an "alias" so that when the "q"
> contains "name:Mike" it searches for "Mike" in both the last_name
> and first_name fields.  if it were "f.name.qf=last_name
> first_name^2" then there would be a boost on matches in the
> first_name field.
> 
> For your usecase you want something like...
> 
> defType=edismax q=sysadmin first:Mike last:Smith qf=title text
> last_name first_name f.first.qf=first_name f.last.qf=last_name
> 
> : I'm using SolrJ as the client.
> 
> ...the examples above all show the request params, so "f.last.qf"
> is a param name, "last_name" is the corrisponding param value.

Awesome. I didn't realize that "f.alias.qf" was the name of the actual
parameter to send. I was staring at the Solr Dashboard's selection of
edismax parameters and not seeing anything that seemed correct. That's
because it's a new parameter! Makes sense, now.

Thanks a bunch,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXb9IACgkQHPApP6U8
pFifZxAAgQGXwsMzSQf9shJYmjgLgFWTYxQQBRJDFRgtEz0wtYkRS0nEoE+kO0xs
BEGC6iXfXChAkOQ3Bv/QittRCxCQvXL+aoZA5ewcyumf0XhmU0My4R7AJOoIRGpO
C9oPfUf8bwqynrTN0cXBIN8pr+KAG4rimAEMLxuscVeQAm3McrNbmmX22LL9VNRv
/QBDnil8rPCYiprQn7SnN88IkU9irgwN/1QQ+YaUhwOMubPwygfxGTdkTJivi0KA
fi5nmYE8A+wOzAGlP8GrMUZpkIfVx8VV96fwKdCyw+fi8MXVF+6rd+Z0u4TOI6Yq
ZQ3d/GK7W5OImWpQOJUX9oHRmoKiUgE/27XRb6QSC/WwF1WOonClmHggSKkh24a8
dGa+5A6tbPdCxJwv9T2NPn7XBqOyvNfxzMUnItpIdNoM0lrHCOMmANoU6nnSjrPg
iInAM9oG2p41zO8S83tv7KLVbOwS1xogmeUn5fr/5XQ5Z7g7V5yBE5oYgVTiUleB
Sd+wjoCWeZIfLSJJfRYFLLjQmFqQOh2Fc6XCoyBYQeGLrlCiNLRHIS6dEisHFNq8
PLbXNuMyZOkrvLNFUWwYhC9pwQ8Q8z3C0i1uVSYlOVDd1GHVwJowVI9XCFbAGFoO
0ZXSy3TuHMgk8VGUZNNO0H9nHf3i8MAoMo4TDsgROs2Y9TXRVPM=
=AEkI
-END PGP SIGNATURE-


Re: Alias field names when searching (not for results)

2018-07-24 Thread Chris Hostetter


: So if I want to alias the "first_name" field to "first" and the
: "last_name" field to "last", then I would ... do what, exactly?

se the last example here...

https://lucene.apache.org/solr/guide/7_4/the-extended-dismax-query-parser.html#examples-of-edismax-queries

defType=edismax
q=sysadmin name:Mike
qf=title text last_name first_name
f.name.qf=last_name first_name

the "f.name.qf" has created an "alias" so that when the "q" contains 
"name:Mike" it searches for "Mike" in both the last_name and first_name 
fields.  if it were "f.name.qf=last_name first_name^2" then there would be 
a boost on matches in the first_name field.

For your usecase you want something like...

defType=edismax
q=sysadmin first:Mike last:Smith
qf=title text last_name first_name
f.first.qf=first_name
f.last.qf=last_name

: I'm using SolrJ as the client.

...the examples above all show the request params, so "f.last.qf" is a 
param name, "last_name" is the corrisponding param value.



-Hoss
http://www.lucidworks.com/


  1   2   3   4   5   6   7   8   9   10   >