RE: Facet ignoring repeated word

2016-05-01 Thread G, Rajesh
Hi Erick/ Ahmet,

Thanks for your suggestion. Can we have a query in TermsComponent like. I need 
the word count of comments for a question id not all. When I include the query 
q=questionid=123 I still see count of all

http://localhost:8182/solr/dev/terms?terms.fl=comments&terms=true&terms.limit=1000&q=questionid=123

StatsComponent is not supporting text fields

Field type 
textcloud_en{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
 class=solr.TextField}} is not currently supported

  

  
  
  


  
  

  

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, April 29, 2016 9:16 PM
To: solr-user ; Ahmet Arslan 
Subject: Re: Facet ignoring repeated word

That's the way faceting is designed to work. It counts the _documents_ that a 
term appears in that satisfy your query, if a word appears multiple times in a 
doc, it'll only count it once.

For the general use-case it'd be unsettling for a user to see a facet count of 
500, then click on it and discover that the number of docs in the corpus was 
really 345 or something.

Ahmet's hints might help, but I'd really ask if counting words multiple times 
really satisfies the use case.

Best,
Erick

On Fri, Apr 29, 2016 at 7:10 AM, Ahmet Arslan  wrote:
> Hi,
>
> Depending on your requirements; StatsComponent, TermsComponent, 
> LukeRequestHandler can also be used.
>
>
> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> https://wiki.apache.org/solr/LukeRequestHandler
> https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> Ahmet
>
>
>
> On Friday, April 29, 2016 11:56 AM, "G, Rajesh"  wrote:
> Hi,
>
> I am trying to implement word 
> cloud
>   using Solr.  The problem I have is Solr facet query ignores repeated words 
> in a document eg.
>
> I have indexed the text :
> It seems that the harder I work, the more work I get for the same 
> compensation and reward. The more work I take on gets absorbed into my 
> "normal" workload and I'm not recognized for working harder than my peers, 
> which makes me not want to work to my potential. I am very underwhelmed by 
> the evaluation process and bonus structure. I don't believe the current 
> structure rewards strong performers. I am confident that the company could 
> not hire someone with my talent to replace me if I left, but I don't think 
> the company realizes that.
>
> The indexed content has word my and the count the is 3 but when I run the 
> query 
> http://localhost:8182/solr/dev/select?facet=true&facet.field=comments&rows=0&indent=on&q=questionid:3956&wt=json
>  the count of word my  is 1 and not 3. Can you please help?
>
> Also please suggest If there is a better way to implement word cloud in Solr 
> other than using facet?
>
> "facet_fields":{
>   "comments":[
> "absorbed",1,
> "am",1,
> "believe",1,
> "bonus",1,
> "company",1,
> "compensation",1,
> "confident",1,
> "could",1,
> "current",1,
> "don't",1,
> "evaluation",1,
> "get",1,
> "gets",1,
> "harder",1,
> "hire",1,
> "i",1,
> "i'm",1,
> "left",1,
> "makes",1,
> "me",1,
> "more",1,
> "my",1,
> "normal",1,
> "peers",1,
> "performers",1,
> "potential",1,
> "process",1,
> "realizes",1,
> "recognized",1,
> "replace",1,
> "reward",1,
> "rewards",1,
> "same",1,
> "seems",1,
> "someone",1,
> "strong",1,
> "structure",1,
> "take",1,
> "talent",1,
> "than",1,
> "think",1,
> "u

What does the "Max Doc" means in Admin interface?

2016-05-01 Thread Bastien Latard - MDPI AG

Hi All,

Everything is in the title...


Can this value be modified?
Or is it because of my environment?

Also, what does "Heap Memory Usage: -1" mean?

Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



OOM script executed

2016-05-01 Thread Bastien Latard - MDPI AG

Hi Guys,

I got several times the OOM script executed since I upgraded to Solr6.0:

$ cat solr_oom_killer-8983-2016-04-29_15_16_51.log
Running OOM killer script for process 26044 for Solr on port 8983

Does it mean that I need to increase my JAVA Heap?
Or should I do anything else?

Here are some further logs:
$ cat solr_gc_log_20160502_0730:
}
{Heap before GC invocations=1674 (full 91):
 par new generation   total 1747648K, used 1747135K 
[0x0005c000, 0x00064000, 0x00064000)
  eden space 1398144K, 100% used [0x0005c000, 
0x00061556, 0x00061556)
  from space 349504K,  99% used [0x00061556, 
0x00062aa2fc30, 0x00062aab)
  to   space 349504K,   0% used [0x00062aab, 
0x00062aab, 0x00064000)
 concurrent mark-sweep generation total 6291456K, used 6291455K 
[0x00064000, 0x0007c000, 0x0007c000)
 Metaspace   used 39845K, capacity 40346K, committed 40704K, 
reserved 1085440K
  class spaceused 4142K, capacity 4273K, committed 4368K, reserved 
1048576K
2016-04-29T21:15:41.970+0200: 20356.359: [Full GC (Allocation Failure) 
2016-04-29T21:15:41.970+0200: 20356.359: [CMS: 
6291455K->6291456K(6291456K), 12.5694653 secs] 
8038591K->8038590K(8039104K), [Metaspace: 39845K->39845K(1085440K)], 
12.5695497 secs] [Times: user=12.57 sys=0.00, real=12.57 secs]



Kind regards,
Bastien



RE: solr sql & streaming

2016-05-01 Thread Chaushu, Shani
U tried 2 examples:

curl -id 'expr=search(collections_test, q="*:*",fl=id,name,inStock, sort="id 
asc")' http://localhost:8983/solr/collections_test/stream

curl http://localhost:8983/solr/collections_test/stream -d 
'expr=reduce(search(collections_test, q="*:*",fl=id,name,inStock, sort="id 
asc") , by="id",group(sord="id asc",n="2"))'

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, May 02, 2016 05:28
To: solr-user@lucene.apache.org
Subject: Re: solr sql & streaming

It appears that you are not formatting the streaming expression properly.
Can you post your entire http request?

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, May 1, 2016 at 2:01 PM, Chaushu, Shani 
wrote:

> Yes I'm running in solr cloud mode.
> I managed to make the query work with sql queries, but when I'm trying 
> to run it with stream request, I get an error When I try to run 
> expr=search:
>
> "Unable to construct instance of
> org.apache.solr.client.solrj.io.stream.CloudSolrStream
>
> When I try to run expr=reduce:
> org.apache.solr.client.solrj.io.stream.ReducerStream
>
>
> Any thoughts?
>
>
> -Original Message-
> From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> Sent: Thursday, April 28, 2016 15:32
> To: solr-user@lucene.apache.org
> Subject: Re: solr sql & streaming
>
> Hi Shani,
> Are you running in SolrCloud mode? Here is blog post you can follow:
> https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management 
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 28.04.2016 13:45, Chaushu, Shani wrote:
> > Hi,
> > I installed solr 6 and try to run /sql and /stream request follow to
> this wiki
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interfac
> e
> > I saw in changes list that it doesn't need request handler
> configuration, but when I try to acces I get the following message:
> > 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404
> > Problem accessing /solr/collection_test/sql. Reason:
> > Not Found
> > 
> > 
> >
> > My request was
> >
> > curl --data-urlencode 'stmt=SELECT author, count(*) FROM 
> > collection_test
> GROUP BY author ORDER BY count(*) desc'
> http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
> >
> >
> >
> >
> >
> >
> > 
> > -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material 
> > for the sole use of the intended recipient(s). Any review or 
> > distribution by others is strictly prohibited. If you are not the 
> > intended recipient, please contact the sender and delete all copies.
> >
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>
>
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Using updateRequest Processor with DIH

2016-05-01 Thread Jay Potharaju
Hi,
I was wondering if it is possible to use Update Request Processor with DIH.
I would like to update an index_time field whenever documents are
added/updated in the collection.
I know that I could easily pass a time stamp which would update the field
in my collection but I was trying to do it using Request processor.

I tried the following but got an error. Any recommendations on how to use
this correctly?



index_time




  data-config.xml
update_indextime



Error:
Error from server at unknown UpdateRequestProcessorChain: update_indextime

-- 
Thanks
Jay


Re: Streaming expression for suggester

2016-05-01 Thread Joel Bernstein
Sure take a look at the RandomStream. You can copy the basic structure of
it but have it work with the suggester. The link below shows the test cases
as well:

https://github.com/apache/lucene-solr/commit/7b5f12e622f10206f3ab3bf9f79b9727c73c6def

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, May 1, 2016 at 2:45 PM, Pranaya Behera 
wrote:

> Hi Joel,
> If you could point me in the right direction I would like to
> take shot.
>
>
> On Sunday 01 May 2016 10:38 PM, Joel Bernstein wrote:
>
>> This is the type of thing that Streaming Expressions does well, but there
>> isn't one yet for the suggester. Feel free to add a SuggestStream jira
>> ticket, it should be very easy to add.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Sat, Apr 30, 2016 at 6:30 AM, Pranaya Behera 
>> wrote:
>>
>> Hi,
>>>   I have two collections lets name them as A and B. I want to
>>> suggester
>>> to work on both the collection while searching on the front-end
>>> application.
>>> In collection A I have 4 different fields. I want to use all of them for
>>> the suggester. Shall I copy them to a new field of combined of the 4
>>> fields
>>> and use it on the spellcheck component and then use that field for the
>>> suggester?
>>> In collection B I have only 1 field.
>>>
>>> When user searches something in the front-end application, I would like
>>> to
>>> show results from the both collections. Is streaming expression would be
>>> a
>>> viable option here ? If so, how ? I couldn't find any related document
>>> for
>>> the suggester streaming expression. If not, then how would I approach
>>> this ?
>>>
>>>
>


Re: solr sql & streaming

2016-05-01 Thread Joel Bernstein
It appears that you are not formatting the streaming expression properly.
Can you post your entire http request?

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, May 1, 2016 at 2:01 PM, Chaushu, Shani 
wrote:

> Yes I'm running in solr cloud mode.
> I managed to make the query work with sql queries, but when I'm trying to
> run it with stream request, I get an error
> When I try to run expr=search:
>
> "Unable to construct instance of
> org.apache.solr.client.solrj.io.stream.CloudSolrStream
>
> When I try to run expr=reduce:
> org.apache.solr.client.solrj.io.stream.ReducerStream
>
>
> Any thoughts?
>
>
> -Original Message-
> From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> Sent: Thursday, April 28, 2016 15:32
> To: solr-user@lucene.apache.org
> Subject: Re: solr sql & streaming
>
> Hi Shani,
> Are you running in SolrCloud mode? Here is blog post you can follow:
> https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 28.04.2016 13:45, Chaushu, Shani wrote:
> > Hi,
> > I installed solr 6 and try to run /sql and /stream request follow to
> this wiki
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface
> > I saw in changes list that it doesn't need request handler
> configuration, but when I try to acces I get the following message:
> > 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404
> > Problem accessing /solr/collection_test/sql. Reason:
> > Not Found
> > 
> > 
> >
> > My request was
> >
> > curl --data-urlencode 'stmt=SELECT author, count(*) FROM collection_test
> GROUP BY author ORDER BY count(*) desc'
> http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
> >
> >
> >
> >
> >
> >
> > -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>


Re: Phrases and edismax

2016-05-01 Thread Mark Robinson
Thanks much Eric for checking in detail.
Yes I found the first term being left out in pf.
Because of that I had some cases where a couple of unwanted records came in
the results with higher priority than the normal ones. When I checked they
matched from the 2nd term onwards.

As suggested I wud raise a  JIRA.

Thanks!
Mark

On Sat, Apr 30, 2016 at 1:20 PM, Erick Erickson 
wrote:

> Looks like a bug in edismax to me when you field-qualify
> the terms.
>
> As an aside, there's no need to specify the field when you only
> want it to go against the fields defined in "qf" and "pf" etc. And,
> that's a work-around for this particular case. But still:
>
> So here's what I get on 5x:
> q=(erick men truck)&defType=edismax&qf=name&pf=name
> correctly returns:
> "+((name:erick) (name:men) (name:truck)) (name:"erick men truck")",
>
> But,
> q=name:(erick men truck)&defType=edismax&qf=name&pf=name
> incorrectly returns:
> "+(name:erick name:men name:truck) (name:"men truck")",
>
> And this:
> q=name:(erick men truck)&defType=edismax&qf=name&pf=features
> incorrectly gives this.
>
> "+(name:erick name:men name:truck) (features:"men truck")",
>
> Confusingly, the terms (with "erick" left out, strike 1)
> goes against the pf field even though it's fully qualified against the
> name field. Not entirely sure whether this is intended or not frankly.
>
> Please go ahead and raise a JIRA.
>
> Best,
> Erick
>
> On Fri, Apr 29, 2016 at 7:55 AM, Mark Robinson 
> wrote:
> > Hi,
> >
> > q=productType:(two piece bathtub white)
> > &defType=edismax&pf=productType^20.0&qf=productType^15.0
> >
> > In the debug section this is what I see:-
> > 
> > (+(productType:two productType:piec productType:bathtub
> productType:white)
> > DisjunctionMaxQuery((productType:"piec bathtub white"^20.0)))/no_coord
> > 
> >
> > My question is related to the "pf" (phrases) section of edismax.
> > As shown in the debug section why is the phrase taken as "piec bathtub
> > white". Why is the first word "two" not considered in the phrase fields
> > section.
> > I am looking for queries with the words "two piece bathtub white" being
> > together to be boosted and not "piece bathtub white" only to be boosted.
> >
> > Could some one help me understand what I am missing?
> >
> > Thanks!
> > Mark
>


Re: Streaming expression for suggester

2016-05-01 Thread Pranaya Behera

Hi Joel,
If you could point me in the right direction I would like 
to take shot.


On Sunday 01 May 2016 10:38 PM, Joel Bernstein wrote:

This is the type of thing that Streaming Expressions does well, but there
isn't one yet for the suggester. Feel free to add a SuggestStream jira
ticket, it should be very easy to add.


Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Apr 30, 2016 at 6:30 AM, Pranaya Behera 
wrote:


Hi,
  I have two collections lets name them as A and B. I want to suggester
to work on both the collection while searching on the front-end application.
In collection A I have 4 different fields. I want to use all of them for
the suggester. Shall I copy them to a new field of combined of the 4 fields
and use it on the spellcheck component and then use that field for the
suggester?
In collection B I have only 1 field.

When user searches something in the front-end application, I would like to
show results from the both collections. Is streaming expression would be a
viable option here ? If so, how ? I couldn't find any related document for
the suggester streaming expression. If not, then how would I approach this ?





RE: solr sql & streaming

2016-05-01 Thread Chaushu, Shani
Yes I'm running in solr cloud mode.
I managed to make the query work with sql queries, but when I'm trying to run 
it with stream request, I get an error
When I try to run expr=search:

"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream

When I try to run expr=reduce:
org.apache.solr.client.solrj.io.stream.ReducerStream


Any thoughts?


-Original Message-
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
Sent: Thursday, April 28, 2016 15:32
To: solr-user@lucene.apache.org
Subject: Re: solr sql & streaming

Hi Shani,
Are you running in SolrCloud mode? Here is blog post you can follow: 
https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
Elasticsearch Support * http://sematext.com/



On 28.04.2016 13:45, Chaushu, Shani wrote:
> Hi,
> I installed solr 6 and try to run /sql and /stream request follow to this 
> wiki https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface
> I saw in changes list that it doesn't need request handler configuration, but 
> when I try to acces I get the following message:
> 
> 
> 
> Error 404 Not Found
> 
> HTTP ERROR 404
> Problem accessing /solr/collection_test/sql. Reason:
> Not Found
> 
> 
>
> My request was
>
> curl --data-urlencode 'stmt=SELECT author, count(*) FROM collection_test 
> GROUP BY author ORDER BY count(*) desc' 
> http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
>
>
>
>
>
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



Re: Solr 5.2.1 on Java 8 GC

2016-05-01 Thread Shawn Heisey
On 4/28/2016 9:43 AM, Nick Vasilyev wrote:
> I forgot to mention that the index is approximately 50 million docs split
> across 4 shards (replication factor 2) on 2 solr replicas.

Later in the thread, Jeff Wartes mentioned my wiki page for GC tuning.

https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

Oracle seems to be putting most of their recent work into the G1
collector.  They want to make it the default collector for Java 9.  I
don't know if this is going to happen, or it they will delay that change
until Java 10 ... but I think that the default *is* going to eventually
be G1.

Lucene strongly recommends against EVER using G1, but I have not noticed
any problems with it on Solr.  I have not seen any specific *reason*
that G1 is not recommended, at least with a 64-bit JVM.  I can only
remember seeing issues in Jira with G1 when the JVM was 32-bit -- which
has its own limitations, so I don't recommend it anyway.  If somebody
can point me to specific *OPEN* issues showing current problems with G1
on a 64-bit JVM, I will have an easier time believing the Lucene
assertion that it's a bad idea.  I have searched Jira and can't find
anything relevant.

The best GC results I've seen in testing Solr have been with the G1
collector.  I haven't done any testing for quite a while, and almost all
of the testing that I've done has been with 4.x versions, not 5.x.

Out of the box, Solr 5.0 and later uses GC tuning with the CMS collector
that looks a lot like the CMS config that I came up with.  This works
pretty well, but if the heap size gets big enough, especially 32GB or
larger, I suspect that it will start to have problems with GC performance.

You could give the settings I listed under "Current experiments" a try. 
You would do this by editing your solr.in.* script to comment out the
current GC tuning parameters and substituting the new set of
parameters.  This is a G1 config, and as I already mentioned, Lucene
recommends NEVER using G1.

For your specific situation, I think you should try setting the max heap
to 31GB instead of 32GB, so the pointer sizes are cut in half, without
making a huge difference in the total amount of memory available.

The chewiebug version of GCViewer is what I use to take the GC logfile
and see how Solr did with particular GC settings.  At my request, they
have incorporated some really nice statistical metrics into GCViewer,
but it's not available in the released versions -- you'll have to
compile it yourself until 1.35 comes out.

https://github.com/chewiebug/GCViewer/issues/139

I have also had some good luck with jHiccup.  That tool will inform you
of pauses that happen for *any* reason, not just because of GC.  My
experience is that GC is the only *major* cause of pauses.

Thanks,
Shawn



Re: Streaming expression for suggester

2016-05-01 Thread Joel Bernstein
This is the type of thing that Streaming Expressions does well, but there
isn't one yet for the suggester. Feel free to add a SuggestStream jira
ticket, it should be very easy to add.


Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Apr 30, 2016 at 6:30 AM, Pranaya Behera 
wrote:

> Hi,
>  I have two collections lets name them as A and B. I want to suggester
> to work on both the collection while searching on the front-end application.
> In collection A I have 4 different fields. I want to use all of them for
> the suggester. Shall I copy them to a new field of combined of the 4 fields
> and use it on the spellcheck component and then use that field for the
> suggester?
> In collection B I have only 1 field.
>
> When user searches something in the front-end application, I would like to
> show results from the both collections. Is streaming expression would be a
> viable option here ? If so, how ? I couldn't find any related document for
> the suggester streaming expression. If not, then how would I approach this ?
>


Re: Solr 5.2.1 on Java 8 GC

2016-05-01 Thread Nick Vasilyev
How do you log GC frequency and time to compare it with other GC
configurations?

Also, do you tweak parameters automatically or is there a set of
configuration that get tested?

Lastly, I was under impression that G1 is not recommended to be used based
on some issues with Lucene, so I haven't tried it. Are you guys seeing any
significant performance benefits with it and Java 8? Any issues?
On May 1, 2016 12:57 PM, "Bram Van Dam"  wrote:

> On 30/04/16 17:34, Davis, Daniel (NIH/NLM) [C] wrote:
> > Bram, on the subject of brute force - if your script is "clever" and
> uses binary first search, I'd love to adapt it to my environment.  I am
> trying to build a truly multi-tenant Solr because each of our indexes is
> tiny, but all together they will eventually be big, and so I'll have to
> repeat this experiment, many, many times.
>
> Sorry to disappoint, the script is very dumb, and it doesn't just
> start/stop Solr, it installs our application suite, picks a GC profile
> at random, indexes a boatload of data and then runs a bunch of query tests.
>
> Three pointers I can give you:
>
> 1) beware of JVM versions, especially when using the G1 collector, it
> behaves horribly on older JVMs but rather nicely on newer versions.
>
> 2) At the very least you'll want to test the G1 and CMS collectors.
>
> 3) One large index vs many small indexes: the behaviour is very
> different. Depending on how many indexes you have, it might be worth to
> run each one in a different JVM. Of course that's not practical if you
> have thousands of indexes.
>
>  - Bram
>
>


Re: Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection

2016-05-01 Thread Joel Bernstein
Can you post your stack trace? I suspect this has to do with how the
Streaming API is interacting with SolrCloud. We can probably also create a
jira ticket for this.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, May 1, 2016 at 4:02 AM, Ryan Yacyshyn 
wrote:

> Hi all,
>
> I'm exploring with parallel SQL queries and found something strange after
> reloading the collection: the same query will return a
> java.lang.NullPointerException error. Here are my steps on a fresh install
> of Solr 6.0.0.
>
> *Start Solr in cloud mode with example*
> bin/solr -e cloud -noprompt
>
> *Index some data*
> bin/post -c gettingstarted example/exampledocs/*.xml
>
> *Send query, which works*
> curl --data-urlencode 'stmt=select id,name from gettingstarted where
> inStock = true limit 2' http://localhost:8983/solr/gettingstarted/sql
>
> *Reload the collection*
> curl '
>
> http://localhost:8983/solr/admin/collections?action=RELOAD&name=gettingstarted
> '
>
> After reloading, running the exact query above will return the null pointer
> exception error. Any idea why?
>
> If I stop all Solr severs and restart, then it's fine.
>
> *java -version*
> java version "1.8.0_25"
> Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
>
> Thanks,
> Ryan
>


Re: Solr 5.2.1 on Java 8 GC

2016-05-01 Thread Bram Van Dam
On 30/04/16 17:34, Davis, Daniel (NIH/NLM) [C] wrote:
> Bram, on the subject of brute force - if your script is "clever" and uses 
> binary first search, I'd love to adapt it to my environment.  I am trying to 
> build a truly multi-tenant Solr because each of our indexes is tiny, but all 
> together they will eventually be big, and so I'll have to repeat this 
> experiment, many, many times.

Sorry to disappoint, the script is very dumb, and it doesn't just
start/stop Solr, it installs our application suite, picks a GC profile
at random, indexes a boatload of data and then runs a bunch of query tests.

Three pointers I can give you:

1) beware of JVM versions, especially when using the G1 collector, it
behaves horribly on older JVMs but rather nicely on newer versions.

2) At the very least you'll want to test the G1 and CMS collectors.

3) One large index vs many small indexes: the behaviour is very
different. Depending on how many indexes you have, it might be worth to
run each one in a different JVM. Of course that's not practical if you
have thousands of indexes.

 - Bram



Searching for term sequence including blank character using regex

2016-05-01 Thread Ali Nazemian
Dear Solr Users/Developers,
Hi,

I was wondering what is the correct query syntax for searching sequence of
terms with blank character in the middle of sequence. Suppose I am looking
for a query syntax with using fq parameter. For example suppose I want to
search for all documents having "hello world" sequence using fq parameter.
I am not sure why using fq=content:/.*hello world.*/ did not works for
tokenized field in this situation. However, fq=content:/.*hello.*/ did work
for the same field. Is there any possible fq query syntax for such
searching requirement?

Best regards.


-- 
A.Nazemian


Re: Many to Many Mapping with Solr

2016-05-01 Thread Sandeep Mestry
Thanks Alexandre, even I am of the opinion not to use solr rdbms way but i
am concerned about the updates to the indexes. We're expecting around 500
writes per second to the database which will generate in >500 updates to
the index per second. If the entities are denormalised this will have an
impact on performance hence I was inclined to design it like db.

Joel,
I will explain it in a bit more detail what my use cases are, all of these
should be driven by search engine:

1) user logs in and the system should display all recordings for that user
2) user adds a recording, the system is updated with the additional
recording
3) user removes a recording, the system is updated with the recording
removed.
4) when the user searches for a recording, the system should only display
matches in his recordings. Every user-recording mapping has additional
properties which are also searchable attributes.

here, we are talking about 2M users and 500M recordings and this is
currently driven by database of size ~60-80GB.

I am going to do a small poc for these use cases and I will go with
denormalised entities with search requirements as my main focus. However,
if you have anything more to add, do let me know. I will be grateful.

Many Thanks,
Sandeep


On 29 April 2016 at 14:54, Joel Bernstein  wrote:

> We really still need to know more about your use case. In particular what
> types of questions will you be asking of the data? It's useful to do this
> in plain english without mapping to any specific implementation.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 29, 2016 at 9:43 AM, Alexandre Rafalovitch  >
> wrote:
>
> > You do not structure Solr to represent your database. You structure it
> > to represent what you will search.
> >
> > In your case, it sounds like you want to return 'user-records', in
> > which case you will index the related information all together. Yes,
> > you will possibly need to recreate the multiple documents when you
> > update one record (or one user). And yes, you will have the same
> > information multiple times. But you can used index-only values or
> > docvalues to reduce storage and duplication.
> >
> > You may also want to have Solr return only the relevant IDs from the
> > search and you recreate the m-to-m object structure from the database.
> > Then, you don't need to store much at all, just index.
> >
> > Basically, don't think about your database as much when deciding Solr
> > structure. It does not map one-to-one.
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 29 April 2016 at 20:48, Sandeep Mestry  wrote:
> > > Hi All,
> > >
> > > Hope the day is going on well for you.
> > >
> > > This question has been asked before, but I couldn't find answer to my
> > > specific request. I have many to many relationship and the mapping
> table
> > > has additional columns. Whats the best way I can model this into solr
> > > entity?
> > >
> > > For example: a user has many recordings and a recording belongs to many
> > > users. But each user-recording has additional feature like type, number
> > etc.
> > > I'd like to fetch recordings for the user. If the user adds/ updates/
> > > deletes a recording then that should be reflected in the search.
> > >
> > > I have 2 options:
> > > 1) to create user entity, recording entity and user_recording entity
> > > - this is good but it's like treating solr like rdbms which i mostly
> > avoid..
> > >
> > > 2) user entity containing all the recordings information and each
> > recording
> > > containing user information
> > > - this has impact on index size but the fetch and manipulation will be
> > > faster.
> > >
> > > Any guidance will be good..
> > >
> > > Thanks,
> > > Sandeep
> >
>


Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection

2016-05-01 Thread Ryan Yacyshyn
Hi all,

I'm exploring with parallel SQL queries and found something strange after
reloading the collection: the same query will return a
java.lang.NullPointerException error. Here are my steps on a fresh install
of Solr 6.0.0.

*Start Solr in cloud mode with example*
bin/solr -e cloud -noprompt

*Index some data*
bin/post -c gettingstarted example/exampledocs/*.xml

*Send query, which works*
curl --data-urlencode 'stmt=select id,name from gettingstarted where
inStock = true limit 2' http://localhost:8983/solr/gettingstarted/sql

*Reload the collection*
curl '
http://localhost:8983/solr/admin/collections?action=RELOAD&name=gettingstarted
'

After reloading, running the exact query above will return the null pointer
exception error. Any idea why?

If I stop all Solr severs and restart, then it's fine.

*java -version*
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

Thanks,
Ryan