Json facet 6.1 Filter based on command query's output

2016-11-15 Thread Jai
Hi

is it possible to filter results based on out put of command query in json
facet solr6.1.

For example in below query i am converting duration_seconds to average time
in minutes. i want to return results where time in minute is greator than
30 minutes?

json.facet:{files:{type:terms, field:clientName,
facet:{timeinMin:"avg(div(duration_seconds,60))"}}}

Kindly help.

thanks and regards
Mrityunjay


getting following error while building solr wit ant

2016-11-15 Thread Midas A
io problem while parsing ivy file:
http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom:


Re: Measuring the entropy of a field

2016-11-15 Thread Joel Bernstein
You may be interested in:
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/IGainTermsQParserPlugin.java

This iterates through a training set and scores terms in a text field using
Information Gain. You'll see entropy calculations in the implementation.

It was developed as part of this ticket:

https://issues.apache.org/jira/browse/SOLR-9252

This may not be your use case, but it can provide an example of how to plug
an algorithm into Solr.

Also if you can provide details about your use case perhaps we can add the
feature. We are looking to add more useful algorithms.



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Nov 15, 2016 at 10:31 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Does Lucene/Solr include any tools for measuring the entropy/information
> of a field?   My intuition is that this would only work if the field were a
> single-value field and the analysis identified characters rather than
> tokens.Also, Unicode does through a wrench in it - I suppose such a
> thing would also need to have a set of expected symbols as by entropy I
> mean against ASCII or Latin-1.
>
> Just curious here - I have no problem to solve, and you guys are expert in
> this sort of thing solved in Java, so if there are other libraries or
> corners of OpenNLP that address this, let me know.  I know more at this
> point about tackling this stuff from Python.
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>


Re: book for Solr 3.4?

2016-11-15 Thread HelponR
Thank you. Just found one here https://wiki.apache.org/solr/SolrResources

"Apache Solr 3 Enterprise Search Server
 by David Smiley and Eric
Pugh. This is the 2nd edition of the first book, published by Packt.
Essential reading for developers, this book covers nearly every feature up
thru Solr 3.4. "


On Tue, Nov 15, 2016 at 2:15 PM, Deeksha Sharma 
wrote:

> BTW its Apache Solr 4 Cookbook
> 
> From: Deeksha Sharma 
> Sent: Tuesday, November 15, 2016 2:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: book for Solr 3.4?
>
> Apache solr cookbook will definitely help you get started. This is in
> addition to the Apache Solr official documentation.
>
>
> Thanks
> Deeksha
> 
> From: HelponR 
> Sent: Tuesday, November 15, 2016 2:03 PM
> To: solr-user@lucene.apache.org
> Subject: book for Solr 3.4?
>
> Hello!
>
> Is there a good book for Solr 3.4? The "Solr in Action" is for 4.4.
>
> googling did not help:(
>
> Thanks!
>


Re: book for Solr 3.4?

2016-11-15 Thread Deeksha Sharma
BTW its Apache Solr 4 Cookbook

From: Deeksha Sharma 
Sent: Tuesday, November 15, 2016 2:06 PM
To: solr-user@lucene.apache.org
Subject: Re: book for Solr 3.4?

Apache solr cookbook will definitely help you get started. This is in addition 
to the Apache Solr official documentation.


Thanks
Deeksha

From: HelponR 
Sent: Tuesday, November 15, 2016 2:03 PM
To: solr-user@lucene.apache.org
Subject: book for Solr 3.4?

Hello!

Is there a good book for Solr 3.4? The "Solr in Action" is for 4.4.

googling did not help:(

Thanks!


Re: book for Solr 3.4?

2016-11-15 Thread Deeksha Sharma
Apache solr cookbook will definitely help you get started. This is in addition 
to the Apache Solr official documentation.


Thanks
Deeksha

From: HelponR 
Sent: Tuesday, November 15, 2016 2:03 PM
To: solr-user@lucene.apache.org
Subject: book for Solr 3.4?

Hello!

Is there a good book for Solr 3.4? The "Solr in Action" is for 4.4.

googling did not help:(

Thanks!


Re: empty strings outputting to numeric field types

2016-11-15 Thread Chris Hostetter

: fields storing dollar values as tdouble. they don't always exist in the
: outputted rows, however, at which point they throw an error and fail at
: indexing because the field is seen as an empty string (the log message: str
: = '').
: 
: for now i've gotten around this by skipping out of any output for that
: field in those cases, but wanted to know what the best method for

Strictly speaking Solr isn't complaining because you give it an "empty 
string" it's complaining because you give it a string which can not be 
legally parsed as a double (or int, or float, etc...)

Fixing your client to only send Solr valid numeric values, or no value 
when that's what you want for a given document, it what i would conider 
the most correct solution -- but if you want solr to ignore strings that 
aren't valid numeric values, that's what things like the 
RemoveBlankFieldUpdateProcessorFactory are for...

https://lucene.apache.org/solr/6_3_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html

you can configure things like TrimFieldUpdateProcessorFactory and 
RegexReplaceProcessorFactory to pre-process string values to ignore 
whitespace or non decimal characters, etc... before they make it to the 
RemoveBlankFieldUpdateProcessorFactory.



-Hoss
http://www.lucidworks.com/


book for Solr 3.4?

2016-11-15 Thread HelponR
Hello!

Is there a good book for Solr 3.4? The "Solr in Action" is for 4.4.

googling did not help:(

Thanks!


empty strings outputting to numeric field types

2016-11-15 Thread John Blythe
hi all.

i'm outputting our data to xml format for solr to consume. i have several
fields storing dollar values as tdouble. they don't always exist in the
outputted rows, however, at which point they throw an error and fail at
indexing because the field is seen as an empty string (the log message: str
= '').

for now i've gotten around this by skipping out of any output for that
field in those cases, but wanted to know what the best method for
circumventing this problem is in solr (if any). i'd tried a default value
to no avail.

thanks for any thoughts-


Re: Issue with empty strings not being indexed/stored?

2016-11-15 Thread Chris Hostetter

You'll have to give us more details on what exactly you are doing to 
reproduce the problem you are seeing, and more detals on how exactly 
you upgraded (and what version you upgraded from) ...

https://wiki.apache.org/solr/UsingMailingLists

When i launch 6.3.0 using "bin/solr -e techproducts" I can index a 
document with a blank string value, see the stored field in the result, 
and search on that blank value just fine (see below)

Wild guess: perhaps when you upgraded you also changed the configs you are 
using, and now have RemoveBlankFieldUpdateProcessorFactory in your default 
updateRequestProcessorChain ?


What i tried with 6.3.0 ...

$ bin/solr -e techproducts
$ curl -H "Content-Type: application/json" 
'http://localhost:8983/solr/techproducts/update?commit=true' --data-binary 
'[{"id":"HOSS","blank_s":""}]'
{"responseHeader":{"status":0,"QTime":48}}
$ curl 'http://localhost:8983/solr/techproducts/query?q=id:HOSS'
{
  "responseHeader":{
"status":0,
"QTime":5,
"params":{
  "q":"id:HOSS"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"blank_s":"",
"_version_":1551101798705528832}]
  }}
$ curl 'http://localhost:8983/solr/techproducts/query?q=blank_s:";'
{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"blank_s:\"\""}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"blank_s":"",
"_version_":1551101798705528832}]
  }}







-Hoss
http://www.lucidworks.com/


Re: solr shutdown

2016-11-15 Thread Mark Miller
That is probably partly because of hdfs cache key unmapping. I think I
improved that in some issue at some point.

We really want to wait by default for a long time though - even 10 minutes
or more. If you have tons of SolrCores, each of them has to be torn down,
each of them might commit on close, custom code and resources can be used
and need to be released, and a lot of time can be spent legit. Given these
long shutdowns will normally be legit and not some hang, I think we want to
be willing to wait a long time. A user that finds this too long can always
kill the process themselves, or lower the wait. But most of the time you
will pay for that for a non clean shutdown except in exceptional situations.

- Mark

On Fri, Oct 21, 2016 at 12:10 PM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thanks Shawn - We've had to increase this to 300 seconds when using a
> large cache size with HDFS, and a fairly heavily loaded index routine (3
> million docs per day).  I don't know if that's why it takes a long time
> to shutdown, but it can take a while for solr cloud to shutdown
> gracefully.  If it does not, you end up with write.lock files for some
> (if not all) of the shards, and have to delete them manually before
> restarting.
>
> -Joe
>
>
> On 10/21/2016 9:01 AM, Shawn Heisey wrote:
> > On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
> >> I'm running solrcloud in foreground mode (-f). Does it make a
> >> difference for Solr if I stop it by pressing ctrl-c, sending it a
> >> SIGTERM or using "solr stop"?
> > All of those should produce the same result in the end -- Solr's
> > shutdown hook will be called and a graceful shutdown will commence.
> >
> > Note that in the case of the "bin/solr stop" command, the default is to
> > only wait five seconds for graceful shutdown before proceeding to a
> > forced kill, which for a typical install, means that forced kills become
> > the norm rather than the exception.  We have an issue to increase the
> > max timeout, but it hasn't been done yet.
> >
> > I strongly recommend anyone going into production should edit the script
> > to increase the timeout.  For the shell script I would do at least 60
> > seconds.  The Windows script just does a pause, not an intelligent wait,
> > so going that high probably isn't advisable on Windows.
> >
> > Thanks,
> > Shawn
> >
>
> --
- Mark
about.me/markrmiller


Re: autoAddReplicas:true not working

2016-11-15 Thread Mark Miller
Look at the Overseer host and see if there are any relevant logs for
autoAddReplicas.

- Mark

On Mon, Oct 24, 2016 at 3:01 PM Chetas Joshi  wrote:

> Hello,
>
> I have the following configuration for the Solr cloud and a Solr collection
> This is Solr on HDFS and Solr version I am using is 5.5.0
>
> No. of hosts: 52 (Solr Cloud)
>
> shard count:   50
> replicationFactor:   1
> MaxShardsPerNode: 1
> autoAddReplicas:   true
>
> Now, one of my shards is down. Although there are two hosts which are
> available in my cloud on which a new replica could be created, it just does
> not create a replica. All 52 hosts are healthy. What could be the reason
> for this?
>
> Thanks,
>
> Chetas.
>
-- 
- Mark
about.me/markrmiller


Re: ClassNotFoundException with Custom ZkACLProvider

2016-11-15 Thread Mark Miller
Could you file a JIRA issue so that this report does not get lost?

- Mark

On Tue, Nov 15, 2016 at 10:49 AM Solr User  wrote:

> For those interested, I ended up bundling the customized ACL provider with
> the solr.war.  I could not stomach looking at the stack trace in the logs.
>
> On Mon, Nov 7, 2016 at 4:47 PM, Solr User  wrote:
>
> > This is mostly just an FYI regarding future work on issues like
> SOLR-8792.
> >
> > I wanted admin update but world read on ZK since I do not have anything
> > sensitive from a read perspective in the Solr data and did not want to
> > force all SolrCloud clients to implement authentication just for read.
> So,
> > I extended DefaultZkACLProvider and implemented a replacement for
> > VMParamsAllAndReadonlyDigestZkACLProvider.
> >
> > My custom code is loaded from the sharedLib in solr.xml.  However, there
> > is a temporary ZK lookup to read solr.xml (and chroot) which is obviously
> > done before loading sharedLib.  Therefore, I am faced with a
> > ClassNotFoundException.  This has no negative effect on the ACL
> > functionalityjust the annoying stack trace in the logs.  I do not
> want
> > to package this custom code with the Solr code and do not want to package
> > this along with Solr dependencies in the Jetty lib/ext.
> >
> > So, I am planning to live with the stack trace and just wanted to share
> > this for any future work on the dynamic solr.xml and chroot lookups or in
> > case I am missing some work-around.
> >
> > Thanks!
> >
> >
>
-- 
- Mark
about.me/markrmiller


Re: index and data directories

2016-11-15 Thread Erick Erickson
Oh, and to make matters even more "interesting", for
docValues=true fields there's no need to even store
anything, you can return the fields in the fl list that
are docValues=true, stored=false...

On Tue, Nov 15, 2016 at 1:53 AM, Prateek Jain J
 wrote:
>
> Thanks a lot Erick
>
>
> Regards,
> Prateek Jain
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 14 November 2016 09:14 PM
> To: solr-user 
> Subject: Re: index and data directories
>
> Theoretically, perhaps. And it's quite true that stored data for fields 
> marked stored=true are just passed through verbatim and compressed on disk 
> while the data associated with indexed=true fields go through an analysis 
> chain and are stored in a much different format. However these different data 
> are simply stored in files with different suffixes in a segment. So you might 
> have _0.fdx, _0.fdt, _0.tim, _0.tvx etc. that together form a single segment.
>
> This is done on a per-segment basis. So certain segment files, namely the 
> *.fdt and *.fdx file will contain the stored data while other extensions have 
> the indexed data, see: "File naming" here for a somewhat out of date format, 
> but close enough for this discussion:
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html.
> And there's no option to store the *.fdt and *.fdx files independently from 
> the rest of the segment files.
>
> This statement: "I mean documents which are to be indexed" really doesn't 
> make sense. You send these things called Solr documents to be indexed, but 
> they are just a set of fields with values handled as their definitions 
> indicate (i.e. respecting stored=true|false, indexed=true false, 
> docValues=true|false. The Solr document sent by SolrJ is simply thrown away 
> after processing into segment files.
>
> If you're sending semi-structured docs (say Word, PDF etc) to be indexed 
> through Tika they are simply transformed into a Solr doc (set of field/value 
> pairs) and the original document is thrown away as well. There's no option to 
> store the original semi-structured doc either.
>
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J 
>  wrote:
>>
>> By data, I mean documents which are to be indexed. Some fields can be 
>> stored="true" but that doesn’t matter.
>>
>> For example: App1 creates an object (AppObj) to be indexed and sends it to 
>> SOLR via solrj. Some of the attributes of this object can be declared to be 
>> used for storage.
>>
>> Now, my understanding is data and indexes generated on data are two separate 
>> things. In my particular example, all fields have stored="true" but only 
>> selected fields have indexed="true". My expectation is, indexes are stored 
>> separately from data because indexes can be generated by different 
>> techniques/algorithms but data/documents remain unchanged. Please correct me 
>> if my understanding is not correct.
>>
>>
>> Regards,
>> Prateek Jain
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: 14 November 2016 07:05 PM
>> To: solr-user 
>> Subject: Re: index and data directories
>>
>> The question is pretty opaque. What do you mean by "data" as opposed to 
>> "indexes"? Are you talking about where Lucene puts stored="true"
>> fields? If not, what do you mean by "data"?
>>
>> If you are talking about where Lucene puts the stored="true" bits the no, 
>> there's no way to segregate that our from the other files that make up a 
>> segment.
>>
>> Best,
>> Erick
>>
>> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J 
>>  wrote:
>>>
>>> Hi Alex,
>>>
>>>  I am unable to get it correctly. Is it possible to store indexes and data 
>>> separately?
>>>
>>>
>>> Regards,
>>> Prateek Jain
>>>
>>> -Original Message-
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: 14 November 2016 03:53 PM
>>> To: solr-user 
>>> Subject: Re: index and data directories
>>>
>>> solr.xml also has a bunch of properties under the core tag:
>>>
>>>   
>>> 
>>>   
>>> 
>>>   
>>>
>>> You can get the Reference Guide for your specific version here:
>>> http://archive.apache.org/dist/lucene/solr/ref-guide/
>>>
>>> Regards,
>>>Alex.
>>> 
>>> Solr Example reading group is starting November 2016, join us at 
>>> http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
>>> intermediates:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 15 November 2016 at 02:37, Prateek Jain J  
>>> wrote:

 Hi All,

 We are using solr 4.8.1 and would like to know if it is possible to
 store data and indexes in separate directories? I know following tag
 exist in solrconfig.xml file

 
 

Re: 5.5.3: fieldValueCache auto-warming error

2016-11-15 Thread Erick Erickson
Thanks for letting us know. I raised a JIRA but I won't have time to
work on it in the foreseeable future.

Erick

On Tue, Nov 15, 2016 at 6:24 AM, Bram Van Dam  wrote:
> On 11/11/16 18:08, Bram Van Dam wrote:
>> On 10/11/16 17:10, Erick Erickson wrote:
>>> Just facet on the text field yourself ;)
>
> Quick update: you were right. One of the users managed to find a bug in
> our application which enabled them to facet on the text field. It would
> be still be nice if Solr wouldn't try to keep caching a broken query (or
> an impossible facet field), but we can work around the issue by fixing
> our own bug.
>
> Thanks!
>
>  - Bram
>


Issue with empty strings not being indexed/stored?

2016-11-15 Thread Michael Joyner

Hello all,

We've been indexing documents with empty strings for some fields.

After our latest round of Solr/SolrJ updates to 6.3.0 we have discovered 
that fields with empty strings are no longer being stored, effectively 
storing documents with those fields as being NULL/NOT-PRESENT instead of 
EMPTY. (Most definitely not the same thing!)


We are using SolrInputDocuments.

Documents indexed before our latest round of updates have the fields 
with empty strings just fine, new documents indexed since the updates don't.


Example field that is in the input document that isn't showing up as 
populated in the query results:


"mesh_s" : {
"boost" : 1.0,
"firstValue" : "",
"name" : "mesh_s",
"value" : "",
"valueCount" : 1,
"values" : [ "" ]
  }

-Mike




Re: ClassNotFoundException with Custom ZkACLProvider

2016-11-15 Thread Solr User
For those interested, I ended up bundling the customized ACL provider with
the solr.war.  I could not stomach looking at the stack trace in the logs.

On Mon, Nov 7, 2016 at 4:47 PM, Solr User  wrote:

> This is mostly just an FYI regarding future work on issues like SOLR-8792.
>
> I wanted admin update but world read on ZK since I do not have anything
> sensitive from a read perspective in the Solr data and did not want to
> force all SolrCloud clients to implement authentication just for read.  So,
> I extended DefaultZkACLProvider and implemented a replacement for
> VMParamsAllAndReadonlyDigestZkACLProvider.
>
> My custom code is loaded from the sharedLib in solr.xml.  However, there
> is a temporary ZK lookup to read solr.xml (and chroot) which is obviously
> done before loading sharedLib.  Therefore, I am faced with a
> ClassNotFoundException.  This has no negative effect on the ACL
> functionalityjust the annoying stack trace in the logs.  I do not want
> to package this custom code with the Solr code and do not want to package
> this along with Solr dependencies in the Jetty lib/ext.
>
> So, I am planning to live with the stack trace and just wanted to share
> this for any future work on the dynamic solr.xml and chroot lookups or in
> case I am missing some work-around.
>
> Thanks!
>
>


RE: Multi word synonyms

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
Midas,

I apparently I didn't read carefully enough, Ted Sullivan has in the 
configuration of this AutoPhrasingTokenFilter a configuration file 
"autophrases.txt".   It only recognizes phrases that are in that file.   
Because of this, it doesn't seem directly applicable to your problem of 
multi-word synonym matching at query time - because it won't know what terms to 
clump.Here's Ted Sullivan's earlier post on the Token filter - 
https://lucidworks.com/blog/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

I would therefore ask your users or their representative about the priority of 
this feature/requirement.

Going on, I think what you could do is to use an NLP toolkit such as OpenNLP, 
StanfordNLP (both Java) or python NLTK to identify noun phrases in your 
text/corpus, and then use those to build autophrases.txt.   You wouldn't need 
to use all of your corpus to get somewhat good accuracy because new noun 
phrases will be rare at some point.   You may need to play with which phrases 
to include, e.g. the size of autophrases.txt depending on how 
AutoPhrasingTokenFilter is implemented and the rate of indexing you need to 
maintain. Depending on your experience, you can do this even if you are new to 
Solr, as you've mentioned.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] 
Sent: Tuesday, November 15, 2016 10:22 AM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

I'm not as expert as some on this list, but reading the article suggested, 
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/,
 what you do is this:

- Have one field that takes text as normal
- Copy that field to another field, whose field type uses the 
AutoPhrasingTokenFilter
- Configure your result handler to query against both fields

You don't know the list of synonyms at query time, but now you have another 
field that contains phrases, not words, and so you can indeed use synonym 
matching at query time against this secondary field.   You can even use the 
multi-word phrases in the copied field to suggest to admin users a list of 
candidate synonyms.

-Original Message-
From: Midas A [mailto:test.mi...@gmail.com]
Sent: Tuesday, November 15, 2016 7:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore 
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the 
> issues were fixed.
> The good news is that
> https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann  wrote:
>
> > This is a nice reading though, but that solution depends on the 
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution 
> > anyway, it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> > > 
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-starte
> > >> d
> > >>
> > >> It's working nicely, but can lead to OOME when you add many 
> > >> synonyms with multiple terms. And I'm not sure whether it#s still 
> > >> working with Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or 
> > >>> not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


RE: Multi word synonyms

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
I'm not as expert as some on this list, but reading the article suggested, 
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/,
 what you do is this:

- Have one field that takes text as normal
- Copy that field to another field, whose field type uses the 
AutoPhrasingTokenFilter
- Configure your result handler to query against both fields

You don't know the list of synonyms at query time, but now you have another 
field that contains phrases, not words, and so you can indeed use synonym 
matching at query time against this secondary field.   You can even use the 
multi-word phrases in the copied field to suggest to admin users a list of 
candidate synonyms.

-Original Message-
From: Midas A [mailto:test.mi...@gmail.com] 
Sent: Tuesday, November 15, 2016 7:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore 
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the 
> issues were fixed.
> The good news is that 
> https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann  wrote:
>
> > This is a nice reading though, but that solution depends on the 
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution 
> > anyway, it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> > > 
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-starte
> > >> d
> > >>
> > >> It's working nicely, but can lead to OOME when you add many 
> > >> synonyms with multiple terms. And I'm not sure whether it#s still 
> > >> working with Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or 
> > >>> not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Measuring the entropy of a field

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
Does Lucene/Solr include any tools for measuring the entropy/information of a 
field?   My intuition is that this would only work if the field were a 
single-value field and the analysis identified characters rather than tokens.   
 Also, Unicode does through a wrench in it - I suppose such a thing would also 
need to have a set of expected symbols as by entropy I mean against ASCII or 
Latin-1.

Just curious here - I have no problem to solve, and you guys are expert in this 
sort of thing solved in Java, so if there are other libraries or corners of 
OpenNLP that address this, let me know.  I know more at this point about 
tackling this stuff from Python.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



Re: 5.5.3: fieldValueCache auto-warming error

2016-11-15 Thread Bram Van Dam
On 11/11/16 18:08, Bram Van Dam wrote:
> On 10/11/16 17:10, Erick Erickson wrote:
>> Just facet on the text field yourself ;)

Quick update: you were right. One of the users managed to find a bug in
our application which enabled them to facet on the text field. It would
be still be nice if Solr wouldn't try to keep caching a broken query (or
an impossible facet field), but we can work around the issue by fixing
our own bug.

Thanks!

 - Bram



Re: DIH problem with multiple (types of) resources

2016-11-15 Thread Peter Blokland
hi,

On Tue, Nov 15, 2016 at 02:54:49AM +1100, Alexandre Rafalovitch wrote:

>> 
>> 
 
> Attribute names are case sensitive as far as I remember. Try
> 'dataSource' for the second definition.

oh wow... that's sneaky. in the old version the case didn't seem to matter,
but now it certainly does. thx :)

-- 
CUL8R, Peter.

www.desk.nl

Your excuse is: It is a layer 8 problem


Re: how to tell SolrHttpServer client to accept/ignore all certs?

2016-11-15 Thread Shawn Heisey
On 11/14/2016 2:44 PM, Robert Hume wrote:
> I'm using HttpSolrServer (in Solr 3.6) to connect to a Solr web
> service and perform a query.

That's quite old, and if you do find a but, it won't be fixed in that
version.  If your server is running at least version 3.6 and has configs
that originated with a 3.6 or later example, then you should consider
upgrading to HttpSolrClient in SolrJ 6.x.  It should work properly with
SolrJ 6.x.  If its configurations originated with earlier 1.x or 3.x
versions, then it might not work very well with anything newer without
changes on the server side.

> The certificate at the other end has expired and so connections now
> fail. It will take the IT at the other end too many days to replace
> the cert (this is out of my control). How can I tell the
> HttpSolrServer to ignore bad certs when it does queries to the server?
> NOTE 1: I noticed that I can pass my own Apache HttpClient (we're
> currently using 4.3) into the HttpSolrServer constructor, but
> internally HttpSolrServer seems to do a lot of customizing/configuring
> it's own default HttpClient, so I didn't want to mess with that. 

HttpSolrServer and HttpSolrClient do create their own HttpClient if it's
not passed in, but it's pretty much created with defaults, nothing is
really customized.  That would be the correct way to have the Solr
client ignore certificate validation -- create a custom HttpClient that
does what you need and use it to build your Solr client.  If it's
configured to handle enough simultaneous connections, you can even share
one HttpClient between multiple Solr clients.

Thanks,
Shawn



Re: Multi word synonyms

2016-11-15 Thread Michael Kuhlmann
Wow, that's great news! I didn't notice that.

Am 15.11.2016 um 13:05 schrieb Vincenzo D'Amore:
> Hi Michael,
>
> an update, reading the article I double checked if at least one of the
> issues were fixed.
> The good news is that https://issues.apache.org/jira/browse/LUCENE-2605 has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann  wrote:
>
>> This is a nice reading though, but that solution depends on the
>> precondition that you'll already know your synonyms at index time.
>>
>> While having synonyms in the index is mostly the better solution anyway,
>> it's sometimes not feasible.
>>
>> -Michael
>>
>> Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
>>> Hi Midas,
>>>
>>> I suggest this interesting reading:
>>>
>>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-
>> term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>>>
>>>
>>> On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
>> wrote:
 It's not working out of the box, sorry.

 We're using this plugin:
 https://github.com/healthonnet/hon-lucene-synonyms#getting-started

 It's working nicely, but can lead to OOME when you add many synonyms
 with multiple terms. And I'm not sure whether it#s still working with
 Solr 6.0.

 -Michael

 Am 15.11.2016 um 10:29 schrieb Midas A:
> - i have to  use multi word synonyms at query time .
>
> Please suggest how can i do it .
> and let me know it whether it would be visible in debug query or not .
>
>>
>



Re: Multi word synonyms

2016-11-15 Thread Midas A
I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore 
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the
> issues were fixed.
> The good news is that https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann  wrote:
>
> > This is a nice reading though, but that solution depends on the
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution anyway,
> > it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
> > >>
> > >> It's working nicely, but can lead to OOME when you add many synonyms
> > >> with multiple terms. And I'm not sure whether it#s still working with
> > >> Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: Multi word synonyms

2016-11-15 Thread Vincenzo D'Amore
Hi Michael,

an update, reading the article I double checked if at least one of the
issues were fixed.
The good news is that https://issues.apache.org/jira/browse/LUCENE-2605 has
been closed and is available in 6.2.

On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann  wrote:

> This is a nice reading though, but that solution depends on the
> precondition that you'll already know your synonyms at index time.
>
> While having synonyms in the index is mostly the better solution anyway,
> it's sometimes not feasible.
>
> -Michael
>
> Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > Hi Midas,
> >
> > I suggest this interesting reading:
> >
> > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> >
> >
> > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> wrote:
> >
> >> It's not working out of the box, sorry.
> >>
> >> We're using this plugin:
> >> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
> >>
> >> It's working nicely, but can lead to OOME when you add many synonyms
> >> with multiple terms. And I'm not sure whether it#s still working with
> >> Solr 6.0.
> >>
> >> -Michael
> >>
> >> Am 15.11.2016 um 10:29 schrieb Midas A:
> >>> - i have to  use multi word synonyms at query time .
> >>>
> >>> Please suggest how can i do it .
> >>> and let me know it whether it would be visible in debug query or not .
> >>>
> >>
> >
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Multi word synonyms

2016-11-15 Thread Michael Kuhlmann
This is a nice reading though, but that solution depends on the
precondition that you'll already know your synonyms at index time.

While having synonyms in the index is mostly the better solution anyway,
it's sometimes not feasible.

-Michael

Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> Hi Midas,
>
> I suggest this interesting reading:
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
>
>
> On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann  wrote:
>
>> It's not working out of the box, sorry.
>>
>> We're using this plugin:
>> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
>>
>> It's working nicely, but can lead to OOME when you add many synonyms
>> with multiple terms. And I'm not sure whether it#s still working with
>> Solr 6.0.
>>
>> -Michael
>>
>> Am 15.11.2016 um 10:29 schrieb Midas A:
>>> - i have to  use multi word synonyms at query time .
>>>
>>> Please suggest how can i do it .
>>> and let me know it whether it would be visible in debug query or not .
>>>
>>
>



Re: Multi word synonyms

2016-11-15 Thread Vincenzo D'Amore
Hi Midas,

I suggest this interesting reading:

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/



On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann  wrote:

> It's not working out of the box, sorry.
>
> We're using this plugin:
> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
>
> It's working nicely, but can lead to OOME when you add many synonyms
> with multiple terms. And I'm not sure whether it#s still working with
> Solr 6.0.
>
> -Michael
>
> Am 15.11.2016 um 10:29 schrieb Midas A:
> > - i have to  use multi word synonyms at query time .
> >
> > Please suggest how can i do it .
> > and let me know it whether it would be visible in debug query or not .
> >
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Filtering a field when some of the documents don't have the value

2016-11-15 Thread Gintautas Sulskus
Thanks Erick, it works exactly as required!

Gintas

On Mon, Nov 14, 2016 at 7:02 PM, Erick Erickson 
wrote:

> You want something like:
> name:x=population:[10 TO *] OR (*:* -population:*:*)
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 10:29 AM, Gintautas Sulskus
>  wrote:
> > Hi,
> >
> > I have an index with two fields "name" and "population". Some of the
> > documents have the "population" field empty.
> >
> > I would like to search for a value X in field "name" with the following
> > condition:
> > 1. if the field is empty - return results for
> > name:X
> > 2. else set the minimum value for the "population" field to 10:
> >  name:X AND population: [10 TO *]
> > The population field should not influence the score.
> >
> > Could you please help me out with the query construction?
> > I have tried conditional statements with exists(), but it seems it does
> not
> > suit the case.
> >
> > Thanks,
> > Gin
>


Re: Multi word synonyms

2016-11-15 Thread Michael Kuhlmann
It's not working out of the box, sorry.

We're using this plugin:
https://github.com/healthonnet/hon-lucene-synonyms#getting-started

It's working nicely, but can lead to OOME when you add many synonyms
with multiple terms. And I'm not sure whether it#s still working with
Solr 6.0.

-Michael

Am 15.11.2016 um 10:29 schrieb Midas A:
> - i have to  use multi word synonyms at query time .
>
> Please suggest how can i do it .
> and let me know it whether it would be visible in debug query or not .
>



RE: index and data directories

2016-11-15 Thread Prateek Jain J

Thanks a lot Erick


Regards,
Prateek Jain

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 14 November 2016 09:14 PM
To: solr-user 
Subject: Re: index and data directories

Theoretically, perhaps. And it's quite true that stored data for fields marked 
stored=true are just passed through verbatim and compressed on disk while the 
data associated with indexed=true fields go through an analysis chain and are 
stored in a much different format. However these different data are simply 
stored in files with different suffixes in a segment. So you might have _0.fdx, 
_0.fdt, _0.tim, _0.tvx etc. that together form a single segment.

This is done on a per-segment basis. So certain segment files, namely the *.fdt 
and *.fdx file will contain the stored data while other extensions have the 
indexed data, see: "File naming" here for a somewhat out of date format, but 
close enough for this discussion:
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html.
And there's no option to store the *.fdt and *.fdx files independently from the 
rest of the segment files.

This statement: "I mean documents which are to be indexed" really doesn't make 
sense. You send these things called Solr documents to be indexed, but they are 
just a set of fields with values handled as their definitions indicate (i.e. 
respecting stored=true|false, indexed=true false, docValues=true|false. The 
Solr document sent by SolrJ is simply thrown away after processing into segment 
files.

If you're sending semi-structured docs (say Word, PDF etc) to be indexed 
through Tika they are simply transformed into a Solr doc (set of field/value 
pairs) and the original document is thrown away as well. There's no option to 
store the original semi-structured doc either.


Best,
Erick

On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J  
wrote:
>
> By data, I mean documents which are to be indexed. Some fields can be 
> stored="true" but that doesn’t matter.
>
> For example: App1 creates an object (AppObj) to be indexed and sends it to 
> SOLR via solrj. Some of the attributes of this object can be declared to be 
> used for storage.
>
> Now, my understanding is data and indexes generated on data are two separate 
> things. In my particular example, all fields have stored="true" but only 
> selected fields have indexed="true". My expectation is, indexes are stored 
> separately from data because indexes can be generated by different 
> techniques/algorithms but data/documents remain unchanged. Please correct me 
> if my understanding is not correct.
>
>
> Regards,
> Prateek Jain
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 14 November 2016 07:05 PM
> To: solr-user 
> Subject: Re: index and data directories
>
> The question is pretty opaque. What do you mean by "data" as opposed to 
> "indexes"? Are you talking about where Lucene puts stored="true"
> fields? If not, what do you mean by "data"?
>
> If you are talking about where Lucene puts the stored="true" bits the no, 
> there's no way to segregate that our from the other files that make up a 
> segment.
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J  
> wrote:
>>
>> Hi Alex,
>>
>>  I am unable to get it correctly. Is it possible to store indexes and data 
>> separately?
>>
>>
>> Regards,
>> Prateek Jain
>>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: 14 November 2016 03:53 PM
>> To: solr-user 
>> Subject: Re: index and data directories
>>
>> solr.xml also has a bunch of properties under the core tag:
>>
>>   
>> 
>>   
>> 
>>   
>>
>> You can get the Reference Guide for your specific version here:
>> http://archive.apache.org/dist/lucene/solr/ref-guide/
>>
>> Regards,
>>Alex.
>> 
>> Solr Example reading group is starting November 2016, join us at 
>> http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
>> intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 15 November 2016 at 02:37, Prateek Jain J  
>> wrote:
>>>
>>> Hi All,
>>>
>>> We are using solr 4.8.1 and would like to know if it is possible to 
>>> store data and indexes in separate directories? I know following tag 
>>> exist in solrconfig.xml file
>>>
>>> 
>>> C:/del-it/solr/cm_events_nbi/data
>>>
>>>
>>>
>>> Regards,
>>> Prateek Jain


Multi word synonyms

2016-11-15 Thread Midas A
- i have to  use multi word synonyms at query time .

Please suggest how can i do it .
and let me know it whether it would be visible in debug query or not .


Re: Sorl shards: very sensitive to swap space usage !?

2016-11-15 Thread Toke Eskildsen
On Mon, 2016-11-14 at 16:29 -0800, Chetas Joshi wrote:
> Hi Toke, can you explain exactly what you mean by "the aggressive IO
> for the memory mapping caused the kernel to start swapping parts of
> the JVM heap to get better caching of storage data"?

I am not sure what you are asking for. I'll try adding more details:


Our machine(s) which ran into the swap problem had 256GB of physical
memory, with some 50GB+ free for caching, but handled multi terabytes
of index. So the free memory for memory mapping (aka disk cache) was
around 1% if the index size. With 25 active shards on the machine, each
search request resulted in a lot of IO to map memory from index data to
physical memory.

Solr JVMs on the machine did not do a lot of garbage collection. Partly
because of low query rate, partly because of some internal hacks.

So we had a machine with very heavy memory mapping and not-too-active
JVM heaps.

The principle behind swap is to store infrequently used memory in on
slower storage. This is where I guess that the kernel guessed that
freeing more memory for mapping, by pushing relatively stale parts of
the JVM heaps onto swap, would result in overall better performance.

- Toke Eskildsen, State and University Library, Denmark