Re: Solr comparsion between two columns

2016-05-02 Thread Alexandre Rafalovitch
Function queries if you want to do this on a fly:
https://cwiki.apache.org/confluence/display/solr/Function+Queries
'if' and 'sub' most likely

However, if you are going to do this often for the same fields, you
may be better off calculating this during indexing using
UpdateRequestProcessors. I don't think we have one for math directly
(full list here:
http://www.solr-start.com/info/update-request-processors/) , but you
could use a Scripting one and implement it that way.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 3 May 2016 at 15:14, kavurupavan  wrote:
> Compare two columns in solr if two fields are equal display true or else
> false.
>
> CASE WHEN o_is_follow = o_follow_id THEN 'TRUE' ELSE 'FALSE'. Please help
> me. Thanks in advance.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-comparsion-between-two-columns-tp4274155.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr comparsion between two columns

2016-05-02 Thread kavurupavan
Compare two columns in solr if two fields are equal display true or else
false.

CASE WHEN o_is_follow = o_follow_id THEN 'TRUE' ELSE 'FALSE'. Please help
me. Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-comparsion-between-two-columns-tp4274155.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Results of facet differs with change in facet.limit.

2016-05-02 Thread Modassar Ather
Hi,

Kindly share your inputs on this issue.

Thanks,
Modassar

On Mon, May 2, 2016 at 3:53 PM, Modassar Ather 
wrote:

> Hi,
>
> I have a field f which is defined as follows on solr 5.x. It is 12 shard
> cluster with no replica.
>
>  stored="false" indexed="false" docValues="true"/>
>
> When I facet on this field with different facet.limit I get different
> facet count.
>
> E.g.
> Query : text_field:term=f=100
> Result :
> 1225
> 1082
> 1076
>
> Query : text_field:term=f=200
> 1366
> 1321
> 1315
>
> I am noticing lesser document in facets whereas the numFound during search
> is more. Please refer to following query for details.
>
> Query : text_field:term=f
> Result :
> 1225
> 1082
> 1076
>
> Query : text_field:term AND f:val1
> Result: numFound=1366
>
> Kindly help me understand this behavior or let me know if it is an issue.
>
> Thanks,
> Modassar
>


Re: Phrases and edismax

2016-05-02 Thread Erick Erickson
Mark:

KYLIN-1644? This should be SOLR-. I suspect you entered the JIRA
in the wrong Apache project.


Erick

On Mon, May 2, 2016 at 8:05 PM, Mark Robinson  wrote:
> Hi Eric,
>
> I have raised a JIRA:-   *KYLIN-1644*   with the problem mentioned.
>
> Thanks!
> Mark.
>
> On Sun, May 1, 2016 at 5:25 PM, Mark Robinson 
> wrote:
>
>> Thanks much Eric for checking in detail.
>> Yes I found the first term being left out in pf.
>> Because of that I had some cases where a couple of unwanted records came
>> in the results with higher priority than the normal ones. When I checked
>> they matched from the 2nd term onwards.
>>
>> As suggested I wud raise a  JIRA.
>>
>> Thanks!
>> Mark
>>
>> On Sat, Apr 30, 2016 at 1:20 PM, Erick Erickson 
>> wrote:
>>
>>> Looks like a bug in edismax to me when you field-qualify
>>> the terms.
>>>
>>> As an aside, there's no need to specify the field when you only
>>> want it to go against the fields defined in "qf" and "pf" etc. And,
>>> that's a work-around for this particular case. But still:
>>>
>>> So here's what I get on 5x:
>>> q=(erick men truck)=edismax=name=name
>>> correctly returns:
>>> "+((name:erick) (name:men) (name:truck)) (name:"erick men truck")",
>>>
>>> But,
>>> q=name:(erick men truck)=edismax=name=name
>>> incorrectly returns:
>>> "+(name:erick name:men name:truck) (name:"men truck")",
>>>
>>> And this:
>>> q=name:(erick men truck)=edismax=name=features
>>> incorrectly gives this.
>>>
>>> "+(name:erick name:men name:truck) (features:"men truck")",
>>>
>>> Confusingly, the terms (with "erick" left out, strike 1)
>>> goes against the pf field even though it's fully qualified against the
>>> name field. Not entirely sure whether this is intended or not frankly.
>>>
>>> Please go ahead and raise a JIRA.
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Apr 29, 2016 at 7:55 AM, Mark Robinson 
>>> wrote:
>>> > Hi,
>>> >
>>> > q=productType:(two piece bathtub white)
>>> > =edismax=productType^20.0=productType^15.0
>>> >
>>> > In the debug section this is what I see:-
>>> > 
>>> > (+(productType:two productType:piec productType:bathtub
>>> productType:white)
>>> > DisjunctionMaxQuery((productType:"piec bathtub white"^20.0)))/no_coord
>>> > 
>>> >
>>> > My question is related to the "pf" (phrases) section of edismax.
>>> > As shown in the debug section why is the phrase taken as "piec bathtub
>>> > white". Why is the first word "two" not considered in the phrase fields
>>> > section.
>>> > I am looking for queries with the words "two piece bathtub white" being
>>> > together to be boosted and not "piece bathtub white" only to be boosted.
>>> >
>>> > Could some one help me understand what I am missing?
>>> >
>>> > Thanks!
>>> > Mark
>>>
>>
>>


Re: Phrases and edismax

2016-05-02 Thread Mark Robinson
Hi Eric,

I have raised a JIRA:-   *KYLIN-1644*   with the problem mentioned.

Thanks!
Mark.

On Sun, May 1, 2016 at 5:25 PM, Mark Robinson 
wrote:

> Thanks much Eric for checking in detail.
> Yes I found the first term being left out in pf.
> Because of that I had some cases where a couple of unwanted records came
> in the results with higher priority than the normal ones. When I checked
> they matched from the 2nd term onwards.
>
> As suggested I wud raise a  JIRA.
>
> Thanks!
> Mark
>
> On Sat, Apr 30, 2016 at 1:20 PM, Erick Erickson 
> wrote:
>
>> Looks like a bug in edismax to me when you field-qualify
>> the terms.
>>
>> As an aside, there's no need to specify the field when you only
>> want it to go against the fields defined in "qf" and "pf" etc. And,
>> that's a work-around for this particular case. But still:
>>
>> So here's what I get on 5x:
>> q=(erick men truck)=edismax=name=name
>> correctly returns:
>> "+((name:erick) (name:men) (name:truck)) (name:"erick men truck")",
>>
>> But,
>> q=name:(erick men truck)=edismax=name=name
>> incorrectly returns:
>> "+(name:erick name:men name:truck) (name:"men truck")",
>>
>> And this:
>> q=name:(erick men truck)=edismax=name=features
>> incorrectly gives this.
>>
>> "+(name:erick name:men name:truck) (features:"men truck")",
>>
>> Confusingly, the terms (with "erick" left out, strike 1)
>> goes against the pf field even though it's fully qualified against the
>> name field. Not entirely sure whether this is intended or not frankly.
>>
>> Please go ahead and raise a JIRA.
>>
>> Best,
>> Erick
>>
>> On Fri, Apr 29, 2016 at 7:55 AM, Mark Robinson 
>> wrote:
>> > Hi,
>> >
>> > q=productType:(two piece bathtub white)
>> > =edismax=productType^20.0=productType^15.0
>> >
>> > In the debug section this is what I see:-
>> > 
>> > (+(productType:two productType:piec productType:bathtub
>> productType:white)
>> > DisjunctionMaxQuery((productType:"piec bathtub white"^20.0)))/no_coord
>> > 
>> >
>> > My question is related to the "pf" (phrases) section of edismax.
>> > As shown in the debug section why is the phrase taken as "piec bathtub
>> > white". Why is the first word "two" not considered in the phrase fields
>> > section.
>> > I am looking for queries with the words "two piece bathtub white" being
>> > together to be boosted and not "piece bathtub white" only to be boosted.
>> >
>> > Could some one help me understand what I am missing?
>> >
>> > Thanks!
>> > Mark
>>
>
>


Re: Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection

2016-05-02 Thread Joel Bernstein
Looks like the loop below is throwing a Null pointer. I suspect the
collection has not yet come back online. In theory this should be self
healing and when the collection comes back online it should start working
again. If not then that would be a bug.

for(String col : clusterState.getCollections()) {


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 2, 2016 at 10:06 PM, Ryan Yacyshyn 
wrote:

> Yes stack trace can be found here:
>
> http://pastie.org/10821638
>
>
>
> On Mon, 2 May 2016 at 01:05 Joel Bernstein  wrote:
>
> > Can you post your stack trace? I suspect this has to do with how the
> > Streaming API is interacting with SolrCloud. We can probably also create
> a
> > jira ticket for this.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sun, May 1, 2016 at 4:02 AM, Ryan Yacyshyn 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'm exploring with parallel SQL queries and found something strange
> after
> > > reloading the collection: the same query will return a
> > > java.lang.NullPointerException error. Here are my steps on a fresh
> > install
> > > of Solr 6.0.0.
> > >
> > > *Start Solr in cloud mode with example*
> > > bin/solr -e cloud -noprompt
> > >
> > > *Index some data*
> > > bin/post -c gettingstarted example/exampledocs/*.xml
> > >
> > > *Send query, which works*
> > > curl --data-urlencode 'stmt=select id,name from gettingstarted where
> > > inStock = true limit 2' http://localhost:8983/solr/gettingstarted/sql
> > >
> > > *Reload the collection*
> > > curl '
> > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=RELOAD=gettingstarted
> > > '
> > >
> > > After reloading, running the exact query above will return the null
> > pointer
> > > exception error. Any idea why?
> > >
> > > If I stop all Solr severs and restart, then it's fine.
> > >
> > > *java -version*
> > > java version "1.8.0_25"
> > > Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> > > Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> > >
> > > Thanks,
> > > Ryan
> > >
> >
>


Re: Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection

2016-05-02 Thread Ryan Yacyshyn
Yes stack trace can be found here:

http://pastie.org/10821638



On Mon, 2 May 2016 at 01:05 Joel Bernstein  wrote:

> Can you post your stack trace? I suspect this has to do with how the
> Streaming API is interacting with SolrCloud. We can probably also create a
> jira ticket for this.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, May 1, 2016 at 4:02 AM, Ryan Yacyshyn 
> wrote:
>
> > Hi all,
> >
> > I'm exploring with parallel SQL queries and found something strange after
> > reloading the collection: the same query will return a
> > java.lang.NullPointerException error. Here are my steps on a fresh
> install
> > of Solr 6.0.0.
> >
> > *Start Solr in cloud mode with example*
> > bin/solr -e cloud -noprompt
> >
> > *Index some data*
> > bin/post -c gettingstarted example/exampledocs/*.xml
> >
> > *Send query, which works*
> > curl --data-urlencode 'stmt=select id,name from gettingstarted where
> > inStock = true limit 2' http://localhost:8983/solr/gettingstarted/sql
> >
> > *Reload the collection*
> > curl '
> >
> >
> http://localhost:8983/solr/admin/collections?action=RELOAD=gettingstarted
> > '
> >
> > After reloading, running the exact query above will return the null
> pointer
> > exception error. Any idea why?
> >
> > If I stop all Solr severs and restart, then it's fine.
> >
> > *java -version*
> > java version "1.8.0_25"
> > Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> >
> > Thanks,
> > Ryan
> >
>


Re: Facet ignoring repeated word

2016-05-02 Thread Ahmet Arslan
Hi,

StatsComponent does not respect the query parameter. However you can feed a 
function query (e.g., termfreq) to it.

Instead consider using TermVectors or MLT's interesting terms.


https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis

Ahmet


On Monday, May 2, 2016 9:31 AM, "G, Rajesh"  wrote:
Hi Erick/ Ahmet,

Thanks for your suggestion. Can we have a query in TermsComponent like. I need 
the word count of comments for a question id not all. When I include the query 
q=questionid=123 I still see count of all

http://localhost:8182/solr/dev/terms?terms.fl=comments=true=1000=questionid=123

StatsComponent is not supporting text fields

Field type 
textcloud_en{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
 class=solr.TextField}} is not currently supported

  

  
  
  


  
  

  

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, April 29, 2016 9:16 PM
To: solr-user ; Ahmet Arslan 
Subject: Re: Facet ignoring repeated word

That's the way faceting is designed to work. It counts the _documents_ that a 
term appears in that satisfy your query, if a word appears multiple times in a 
doc, it'll only count it once.

For the general use-case it'd be unsettling for a user to see a facet count of 
500, then click on it and discover that the number of docs in the corpus was 
really 345 or something.

Ahmet's hints might help, but I'd really ask if counting words multiple times 
really satisfies the use case.

Best,
Erick

On Fri, Apr 29, 2016 at 7:10 AM, Ahmet Arslan  wrote:
> Hi,
>
> Depending on your requirements; StatsComponent, TermsComponent, 
> LukeRequestHandler can also be used.
>
>
> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> https://wiki.apache.org/solr/LukeRequestHandler
> https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> Ahmet
>
>
>
> On Friday, April 29, 2016 11:56 AM, "G, Rajesh"  wrote:
> Hi,
>
> I am trying to implement word 
> cloud
>   using Solr.  The problem I have is Solr facet query ignores repeated words 
> in a document eg.
>
> I have indexed the text :
> It seems that the harder I work, the more work I get for the same 
> compensation and reward. The more work I take on gets absorbed into my 
> "normal" workload and I'm not recognized for working harder than my peers, 
> which makes me not want to work to my potential. I am very underwhelmed by 
> the evaluation process and bonus structure. I don't believe the current 
> structure rewards strong performers. I am confident that the company could 
> not hire someone with my talent to replace me if I left, but I don't think 
> the company realizes that.
>
> The indexed content has word my and the count the is 3 but when I run the 
> query 
> http://localhost:8182/solr/dev/select?facet=true=comments=0=on=questionid:3956=json
>  the count of word my  is 1 and not 3. Can you please help?
>
> Also please suggest If there is a better way to implement word cloud in Solr 
> other than using facet?
>
> "facet_fields":{
>   "comments":[
> "absorbed",1,
> "am",1,
> "believe",1,
> "bonus",1,
> "company",1,
> "compensation",1,
> "confident",1,
> "could",1,
> "current",1,
> "don't",1,
> "evaluation",1,
> "get",1,
> "gets",1,
> "harder",1,
> "hire",1,
> "i",1,
> "i'm",1,
> "left",1,
> "makes",1,
> "me",1,
> "more",1,
> "my",1,
> "normal",1,
> 

Re[4]: Block Join faceting on intermediate levels with JSON Facet API (might be related to block join rollups & SOLR-8998)

2016-05-02 Thread Alisa Z .
 >>You could add a "level2_comment_id" field to the level 2 commends and
>>it's children, and then use unique() on that.

OK, I see, I missed the children... Thank you for pointing out. 

I have introduced that "unique sub-branch identifying" field and propagated it 
down the subbranch (the data is here: 
https://github.com/alisa-ipn/solr_nesting/blob/master/data/example-data-solr-for-faceting.json).
 Also changed the corresponding part of the post. 

And it actually works. Yet it requires a lot of effort to make Json API 
faceting handle faceting by intermediate levels.  

Making those "unique sub-branch identifying" fields dynamically appear the same 
way as the "_root_" field does will make Solr use friendlier for nested data 
like email chains and social media data... 

Thanks,
Alisa 

>Пятница, 22 апреля 2016, 13:47 -04:00 от Yonik Seeley :
>
>On Fri, Apr 22, 2016 at 12:26 PM, Alisa Z. < prol...@mail.ru > wrote:
>>  Hi Yonik,
>>
>> Thanks a lot for your response.
>>
>> I have discussed this with Mikhail Khludnev already and tried this 
>> suggestion. Here's what I've got:
>>
>>
>>
>> sentiment: positive
>> author: Bob
>> text: Great post about Solr
>> 2.blog-posts.comments-id: 10735-23004   //this is a 
>> new field, field name is different on each level for each type, values are 
>> unique
>> date: 2015-04-10T11:30:00Z
>> path: 2.blog-posts.comments
>> id: 10735-23004
>> Query:
>> curl http://localhost:8985/solr/solr_nesting_unique/query -d 
>> 'q=path:2.blog-posts.comments=0&
>> json.facet={
>>   filter_by_child_type :{
>> type:query,
>> q:"path:*comments*keywords",
>> domain: { blockChildren : "path:2.blog-posts.comments" },
>> facet:{
>>   top_entity_text : {
>> type: terms,
>> field: text,
>> limit: 10,
>> sort: "counts_by_comments desc",
>> facet: {
>>counts_by_comments: "unique (2.blog-posts.comments-id )"  
>>   // changed
>>  }'
>
>
>Something is wrong if you are getting 0 counts.
>Lets try taking it piece-by-piece:
>
>Step 1:  q=path:2.blog-posts.comments
>This finds level 2 documents
>
>Step 2:  domain: { blockChildren : "path:2.blog-posts.comments" }
>This first maps to  all of the children (level 3 and level4)
>
>Step 3:  q:"path:*comments*keywords"
>This selects a subset of level3 and level4 documents with keywords
>(Note, in the future this should be doable as an additional filter in
>the domain spec, w/o an additional sub-facet level)
>
>Step 4:
>Facet on the text field of those level3 and level4 keyword docs. For
>each bucket, also find the unique number of values in the
>"2.blog-posts.comments-id" field on those documents.
>
>"Without seeing what you indexed, my guess is that the issue is that
>the "2.blog-posts.comments-id" field does not actually exist on those
>level3 and level4 docs being faceted.  The JSON Facet API doesn't
>propagate field values up/down the nested stack yet.  That's what
>https://issues.apache.org/jira/browse/SOLR-8998 is mostly about.
>
>-Yonik
>
>
>>
>> Response:
>>
>> "response":{"numFound":3,"start":0,"docs":[]
>>   },
>>   "facets":{
>> "count":3,
>> "filter_by_child_type":{
>>   "count":9,
>>   "top_entity_text":{
>> "buckets":[{
>> "val":"Elasticsearch",
>> "count":2,
>> "counts_by_comments":0},
>>   {
>> "val":"Solr",
>> "count":5,
>> "counts_by_comments":0},
>>   {
>> "val":"Solr 5.5",
>> "count":1,
>> "counts_by_comments":0},
>>   {
>> "val":"feature",
>> "count":1,
>> "counts_by_comments":0}]
>>
>> So unless I messed something up... or the field name does not look 
>> "canonical" (but it was fast to generate and  it is accepted in a normal 
>> query
>>  http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id 
>> :* )
>>
>> So I think that it's just a JSON facet API limitation...
>>
>> Best,
>> --Alisa
>>
>>
>>>Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley < ysee...@gmail.com >:
>>>
>>>Hi Alisa,
>>>This was a bit too hard for me to grok on a first pass... then I saw
>>>your related blog post which includes the actual sample data and makes
>>>it more clear.
>>>
>>> More comments inline:
>>>
>>>On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. <  prol...@mail.ru > wrote:
  Hi all,

 I have been stretching some SOLR's capabilities for nested documents 
 handling and I've come up with the following issue...

 Let's say I have the following structure:

 {
 "blog-posts":{  //level 1
 "leaf-fields":[
 "date",
 "author"],
 "title":{   //level 2
 "leaf-fields":[ "text"],
 "keywords":{//level 3
 "leaf-fields":[
 "text",

Re: OOM script executed

2016-05-02 Thread Tomás Fernández Löbbe
You could, but before that I'd try to see what's using your memory and see
if you can decrease that. Maybe identify why you are running OOM now and
not with your previous Solr version (assuming you weren't, and that you are
running with the same JVM settings). A bigger heap usually means more work
to the GC and less memory available for the OS cache.

Tomás

On Sun, May 1, 2016 at 11:20 PM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> Hi Guys,
>
> I got several times the OOM script executed since I upgraded to Solr6.0:
>
> $ cat solr_oom_killer-8983-2016-04-29_15_16_51.log
> Running OOM killer script for process 26044 for Solr on port 8983
>
> Does it mean that I need to increase my JAVA Heap?
> Or should I do anything else?
>
> Here are some further logs:
> $ cat solr_gc_log_20160502_0730:
> }
> {Heap before GC invocations=1674 (full 91):
>  par new generation   total 1747648K, used 1747135K [0x0005c000,
> 0x00064000, 0x00064000)
>   eden space 1398144K, 100% used [0x0005c000, 0x00061556,
> 0x00061556)
>   from space 349504K,  99% used [0x00061556, 0x00062aa2fc30,
> 0x00062aab)
>   to   space 349504K,   0% used [0x00062aab, 0x00062aab,
> 0x00064000)
>  concurrent mark-sweep generation total 6291456K, used 6291455K
> [0x00064000, 0x0007c000, 0x0007c000)
>  Metaspace   used 39845K, capacity 40346K, committed 40704K, reserved
> 1085440K
>   class spaceused 4142K, capacity 4273K, committed 4368K, reserved
> 1048576K
> 2016-04-29T21:15:41.970+0200: 20356.359: [Full GC (Allocation Failure)
> 2016-04-29T21:15:41.970+0200: 20356.359: [CMS:
> 6291455K->6291456K(6291456K), 12.5694653 secs]
> 8038591K->8038590K(8039104K), [Metaspace: 39845K->39845K(1085440K)],
> 12.5695497 secs] [Times: user=12.57 sys=0.00, real=12.57 secs]
>
>
> Kind regards,
> Bastien
>
>


Re: What does the "Max Doc" means in Admin interface?

2016-05-02 Thread Tomás Fernández Löbbe
"Max Docs" is a confusing. It's not really the maximum number of docs you
can have, it's just the total amount of docs in your index INCLUDING
DELETED DOCS that haven't been cleared by a merge.
"Heap Memory Usage" is currently broken. See
https://issues.apache.org/jira/browse/SOLR-7475

On Sun, May 1, 2016 at 11:25 PM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> Hi All,
>
> Everything is in the title...
>
>
> Can this value be modified?
> Or is it because of my environment?
>
> Also, what does "Heap Memory Usage: -1" mean?
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>


Re: Tuning solr for large index with rapid writes

2016-05-02 Thread Stephen Lewis
Thanks for the good suggestions on read traffic. I have been simulating
reads through parsing our elb logs and replaying them from a fleet of test
servers acting as frontends using Siege .
We are hoping to tune mostly based on exact use case, and so this seems the
most effective route. I see why for the average user experience, 0-hit
queries would provide some better data. Our plan is to start with exact
user patterns and then branch and refine our metrics from there.

For writes, I am using an index rebuild which we have written. We use this
for building anew or refreshing an existing index in case of changes to our
data model, document structure, schema, etc... It was actually turning on
this rebuild to our main cluster that started edging us toward the
performance limits on writes.

After writing last, we discovered we were garbage collection limited in our
current cluster. We noticed that when doing writes, especially the large
volume of writes our background rebuild was using, we generally do okay,
but eventually the GC would do a deep pass and we'd see 504 gateway
timeouts. We updated with the settings from Shawn Heisey
's page, and we have only seen
timeouts a couple of times since then (these don't kill the rebuild, they
simply get retried later). I see from you here and on another thread right
now that gc seems to be an area of active discussion.

Best,
Stephen

On Mon, May 2, 2016 at 9:20 AM, Erick Erickson 
wrote:

> Bram:
>
> That works. I try to monitor the number of 0-hit
> queries when I generate a test set on the theory that
> those are _usually_ groups of random terms I've
> selected that aren't a good model. So it's often
> a sequence like "generate my list, see which
> ones give 0 results and remove them". Rinse,
> repeat.
>
> Like you said, imperfect but _loads_ better than
> trying to create them without real user queries
> as guidance...
>
> Best,
> Erick
>
> On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam 
> wrote:
> >> If I'm reading this right, you have 420M docs on a single shard?
> >> Yep, you were reading it right.
> >
> > Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> > found 120M to be the magic number. When a shard contains more than 120M
> > documents, performance goes down rapidly & GC pauses grow a lot longer.
> > Up until 250M things remain acceptable. But then performance starts to
> > drop very quickly after that.
> >
> >  - Bram
> >
>



-- 
Stephen

(206)753-9320
stephen-lewis.net


Re: Tuning solr for large index with rapid writes

2016-05-02 Thread Erick Erickson
Bram:

That works. I try to monitor the number of 0-hit
queries when I generate a test set on the theory that
those are _usually_ groups of random terms I've
selected that aren't a good model. So it's often
a sequence like "generate my list, see which
ones give 0 results and remove them". Rinse,
repeat.

Like you said, imperfect but _loads_ better than
trying to create them without real user queries
as guidance...

Best,
Erick

On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam  wrote:
>> If I'm reading this right, you have 420M docs on a single shard?
>> Yep, you were reading it right.
>
> Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> found 120M to be the magic number. When a shard contains more than 120M
> documents, performance goes down rapidly & GC pauses grow a lot longer.
> Up until 250M things remain acceptable. But then performance starts to
> drop very quickly after that.
>
>  - Bram
>


RE: Using updateRequest Processor with DIH

2016-05-02 Thread Davis, Daniel (NIH/NLM) [C]
I don't know whether that works; but you can use the ScriptTransformer with DIH 
to achieve similar results.
I've only used JavaScript (Rhino) scripts, but they worked for me.   

More recently, I've found that most of my transformations can be accomplished 
with the TemplateTransformer.

-Original Message-
From: Jay Potharaju [mailto:jspothar...@gmail.com] 
Sent: Monday, May 02, 2016 1:39 AM
To: solr-user@lucene.apache.org
Subject: Using updateRequest Processor with DIH

Hi,
I was wondering if it is possible to use Update Request Processor with DIH.
I would like to update an index_time field whenever documents are added/updated 
in the collection.
I know that I could easily pass a time stamp which would update the field in my 
collection but I was trying to do it using Request processor.

I tried the following but got an error. Any recommendations on how to use this 
correctly?



index_time 

 
  data-config.xml
update_indextime



Error:
Error from server at unknown UpdateRequestProcessorChain: update_indextime

--
Thanks
Jay


Re: BlockJoinFacetComponent on solr 4.10

2016-05-02 Thread Mikhail Khludnev
Hello,

It needs to be backported.

On Mon, May 2, 2016 at 11:59 AM, tkg_cangkul  wrote:

> hi i wanna asking a question about using BlockJoinFacetComponent
>
> in solr 4. how can i use that library on solr 4.10.4
> i want to install casebox with solr 4.10.4 but i have this error.
> [image: error]
>
> when i check in solr-core-4.10.4.jar there is no class Block JoinFacet
> COmponent. i found it in solr-core.6.0.0.jar . is it any other way for me
> to can use it on solr 4.?
> pls help
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Sort group.query

2016-05-02 Thread Xam Uo
Hi all,

Is there a way to define a label on group.query (for example something like
group.query={!label=AGroup}={!label=BGroup} and then  sort on
this label?

Thank you,

Regards,

Xam


Re: Streaming expression for suggester

2016-05-02 Thread Pranaya Behera

I cant return other fields in the response if I use SuggestComponent ?

On Monday 02 May 2016 08:13 AM, Joel Bernstein wrote:

Sure take a look at the RandomStream. You can copy the basic structure of
it but have it work with the suggester. The link below shows the test cases
as well:

https://github.com/apache/lucene-solr/commit/7b5f12e622f10206f3ab3bf9f79b9727c73c6def

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, May 1, 2016 at 2:45 PM, Pranaya Behera 
wrote:


Hi Joel,
 If you could point me in the right direction I would like to
take shot.


On Sunday 01 May 2016 10:38 PM, Joel Bernstein wrote:


This is the type of thing that Streaming Expressions does well, but there
isn't one yet for the suggester. Feel free to add a SuggestStream jira
ticket, it should be very easy to add.


Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Apr 30, 2016 at 6:30 AM, Pranaya Behera 
wrote:

Hi,

   I have two collections lets name them as A and B. I want to
suggester
to work on both the collection while searching on the front-end
application.
In collection A I have 4 different fields. I want to use all of them for
the suggester. Shall I copy them to a new field of combined of the 4
fields
and use it on the spellcheck component and then use that field for the
suggester?
In collection B I have only 1 field.

When user searches something in the front-end application, I would like
to
show results from the both collections. Is streaming expression would be
a
viable option here ? If so, how ? I couldn't find any related document
for
the suggester streaming expression. If not, then how would I approach
this ?






Re: solr sql & streaming

2016-05-02 Thread Joel Bernstein
Great! Feel free to continue posting questions like this to the list as the
come up for streaming and SQL.

Also branch_6x now has improved error handling for streaming & SQL. These
changes will be in Solr 6.1.

In Solr 6.0 the root cause to certain errors was not being propagated all
the way back to the client. In branch_6x this is fixed. You can still see
the root cause in the logs in both Solr 6.0 and 6.1.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 2, 2016 at 9:05 AM, Chaushu, Shani 
wrote:

> It worked!
> thanks
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Monday, May 02, 2016 14:39
> To: solr-user@lucene.apache.org
> Subject: Re: solr sql & streaming
>
> Try putting quotes around the fl parameter.
>
> search(collections_test,
>  q="*:*",
>  fl="id,name,inStock",
>  sort="id asc")
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 2, 2016 at 2:09 AM, Chaushu, Shani 
> wrote:
>
> > U tried 2 examples:
> >
> > curl -id 'expr=search(collections_test, q="*:*",fl=id,name,inStock,
> > sort="id asc")' http://localhost:8983/solr/collections_test/stream
> >
> > curl http://localhost:8983/solr/collections_test/stream -d
> > 'expr=reduce(search(collections_test, q="*:*",fl=id,name,inStock,
> > sort="id
> > asc") , by="id",group(sord="id asc",n="2"))'
> >
> > -Original Message-
> > From: Joel Bernstein [mailto:joels...@gmail.com]
> > Sent: Monday, May 02, 2016 05:28
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr sql & streaming
> >
> > It appears that you are not formatting the streaming expression properly.
> > Can you post your entire http request?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sun, May 1, 2016 at 2:01 PM, Chaushu, Shani
> > 
> > wrote:
> >
> > > Yes I'm running in solr cloud mode.
> > > I managed to make the query work with sql queries, but when I'm
> > > trying to run it with stream request, I get an error When I try to
> > > run
> > > expr=search:
> > >
> > > "Unable to construct instance of
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream
> > >
> > > When I try to run expr=reduce:
> > > org.apache.solr.client.solrj.io.stream.ReducerStream
> > >
> > >
> > > Any thoughts?
> > >
> > >
> > > -Original Message-
> > > From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> > > Sent: Thursday, April 28, 2016 15:32
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: solr sql & streaming
> > >
> > > Hi Shani,
> > > Are you running in SolrCloud mode? Here is blog post you can follow:
> > > https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/
> > >
> > > Thanks,
> > > Emir
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > > Management Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On 28.04.2016 13:45, Chaushu, Shani wrote:
> > > > Hi,
> > > > I installed solr 6 and try to run /sql and /stream request follow
> > > > to
> > > this wiki
> > > https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interf
> > > ac
> > > e
> > > > I saw in changes list that it doesn't need request handler
> > > configuration, but when I try to acces I get the following message:
> > > > 
> > > > 
> > > >  > > > content="text/html;charset=utf-8"/>
> > > > Error 404 Not Found  HTTP ERROR
> > > > 404 Problem accessing /solr/collection_test/sql. Reason:
> > > > Not Found
> > > > 
> > > > 
> > > >
> > > > My request was
> > > >
> > > > curl --data-urlencode 'stmt=SELECT author, count(*) FROM
> > > > collection_test
> > > GROUP BY author ORDER BY count(*) desc'
> > > http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > -
> > > > Intel Electronics Ltd.
> > > >
> > > > This e-mail and any attachments may contain confidential material
> > > > for the sole use of the intended recipient(s). Any review or
> > > > distribution by others is strictly prohibited. If you are not the
> > > > intended recipient, please contact the sender and delete all copies.
> > > >
> > > 
> > > -
> > > Intel Electronics Ltd.
> > >
> > > This e-mail and any attachments may contain confidential material
> > > for the sole use of the intended recipient(s). Any review or
> > > distribution by others is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender and delete all copies.
> > >
> > >
> > -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is 

RE: solr sql & streaming

2016-05-02 Thread Chaushu, Shani
It worked!
thanks

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, May 02, 2016 14:39
To: solr-user@lucene.apache.org
Subject: Re: solr sql & streaming

Try putting quotes around the fl parameter.

search(collections_test,
 q="*:*",
 fl="id,name,inStock",
 sort="id asc")

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 2, 2016 at 2:09 AM, Chaushu, Shani 
wrote:

> U tried 2 examples:
>
> curl -id 'expr=search(collections_test, q="*:*",fl=id,name,inStock, 
> sort="id asc")' http://localhost:8983/solr/collections_test/stream
>
> curl http://localhost:8983/solr/collections_test/stream -d 
> 'expr=reduce(search(collections_test, q="*:*",fl=id,name,inStock, 
> sort="id
> asc") , by="id",group(sord="id asc",n="2"))'
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Monday, May 02, 2016 05:28
> To: solr-user@lucene.apache.org
> Subject: Re: solr sql & streaming
>
> It appears that you are not formatting the streaming expression properly.
> Can you post your entire http request?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, May 1, 2016 at 2:01 PM, Chaushu, Shani 
> 
> wrote:
>
> > Yes I'm running in solr cloud mode.
> > I managed to make the query work with sql queries, but when I'm 
> > trying to run it with stream request, I get an error When I try to 
> > run
> > expr=search:
> >
> > "Unable to construct instance of
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream
> >
> > When I try to run expr=reduce:
> > org.apache.solr.client.solrj.io.stream.ReducerStream
> >
> >
> > Any thoughts?
> >
> >
> > -Original Message-
> > From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> > Sent: Thursday, April 28, 2016 15:32
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr sql & streaming
> >
> > Hi Shani,
> > Are you running in SolrCloud mode? Here is blog post you can follow:
> > https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/
> >
> > Thanks,
> > Emir
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log 
> > Management Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On 28.04.2016 13:45, Chaushu, Shani wrote:
> > > Hi,
> > > I installed solr 6 and try to run /sql and /stream request follow 
> > > to
> > this wiki
> > https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interf
> > ac
> > e
> > > I saw in changes list that it doesn't need request handler
> > configuration, but when I try to acces I get the following message:
> > > 
> > > 
> > >  > > content="text/html;charset=utf-8"/>
> > > Error 404 Not Found  HTTP ERROR 
> > > 404 Problem accessing /solr/collection_test/sql. Reason:
> > > Not Found
> > > 
> > > 
> > >
> > > My request was
> > >
> > > curl --data-urlencode 'stmt=SELECT author, count(*) FROM 
> > > collection_test
> > GROUP BY author ORDER BY count(*) desc'
> > http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > --
> > > -
> > > Intel Electronics Ltd.
> > >
> > > This e-mail and any attachments may contain confidential material 
> > > for the sole use of the intended recipient(s). Any review or 
> > > distribution by others is strictly prohibited. If you are not the 
> > > intended recipient, please contact the sender and delete all copies.
> > >
> > 
> > -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material 
> > for the sole use of the intended recipient(s). Any review or 
> > distribution by others is strictly prohibited. If you are not the 
> > intended recipient, please contact the sender and delete all copies.
> >
> >
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Include and exclude feature with multi valued fileds

2016-05-02 Thread Anil
HI,

i have created a document with multi valued fields.

Eg :
An issue is impacting multiple customers, products, versions etc.

In my issue document, i have created customers, products, versions as multi
valued fields.

how to find all issues that are impacting google (customer) but not
facebook (customer) ?

Google and facebook can be part of in single issue document.

Please let me know if you have any questions. Thanks.

Regards,
Anil


Re: query logging using query rest api

2016-05-02 Thread Vincenzo D'Amore
Hi Matteo, take a look at:

https://cwiki.apache.org/confluence/display/solr/Configuring+Logging

And also the comments:

https://cwiki.apache.org/confluence/display/solr/Configuring+Logging#comment-thread-51808825

I have not tried but it seems interesting.


On Mon, May 2, 2016 at 11:45 AM, Vincenzo D'Amore 
wrote:

> Hi Matteo,
>
> trying a few of queries, using POST and GET method, you can see query
> params in logs appears only when you pass them in query string along with
> HTTP GET method.
>
> curl 'localhost:8983/solr/test/query?q=*:*'
>
> The reason of this behaviour (I think) is in the big difference between
> the size of parameters (number and length) usually used in HTTP GET (short)
> and HTTP POST (big).
>
> Let me see if we can change such log behaviour, but I suppose it could be
> quite dangerous...
>
>
>
>
>
>
> On Mon, May 2, 2016 at 11:21 AM, Matteo Grolla 
> wrote:
>
>> Hi Vincenzo,
>>  you're right -XGET shouldn't be there but curl is smart enough to
>> ignore it so nothing changes eliminating it, tested.
>>
>> 2016-04-28 11:28 GMT+02:00 Vincenzo D'Amore :
>>
>> > Hi Matteo,
>> >
>> > there is a problem in your curl test: as far as I know you cannot use
>> GET
>> > HTTP method ( -XGET ) and pass parameters in POST (-d).
>> >
>> > Try to remove the -XGET parameter.
>> >
>> > On Thu, Apr 28, 2016 at 11:18 AM, Matteo Grolla <
>> matteo.gro...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > > I'm experimenting the query rest api with solr 5.4 and I'm
>> noticing
>> > > that query parameters are not logged in solr.log.
>> > > Here are query and log line
>> > >
>> > > curl -XGET 'localhost:8983/solr/test/query' -d '{"query":"*:*"}'
>> > >
>> > > 2016-04-28 09:16:54.008 INFO  (qtp668849042-17) [   x:test]
>> > > o.a.s.c.S.Request [test] webapp=/solr path=/query params={} hits=32
>> > > status=0 QTime=46
>> > >
>> > > Why this behaviour? How can I log request parameters for those
>> queries?
>> > > Thanks
>> > >
>> > > Matteo
>> > >
>> >
>> >
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.dam...@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
>> >
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: solr sql & streaming

2016-05-02 Thread Joel Bernstein
Try putting quotes around the fl parameter.

search(collections_test,
 q="*:*",
 fl="id,name,inStock",
 sort="id asc")

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 2, 2016 at 2:09 AM, Chaushu, Shani 
wrote:

> U tried 2 examples:
>
> curl -id 'expr=search(collections_test, q="*:*",fl=id,name,inStock,
> sort="id asc")' http://localhost:8983/solr/collections_test/stream
>
> curl http://localhost:8983/solr/collections_test/stream -d
> 'expr=reduce(search(collections_test, q="*:*",fl=id,name,inStock, sort="id
> asc") , by="id",group(sord="id asc",n="2"))'
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Monday, May 02, 2016 05:28
> To: solr-user@lucene.apache.org
> Subject: Re: solr sql & streaming
>
> It appears that you are not formatting the streaming expression properly.
> Can you post your entire http request?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, May 1, 2016 at 2:01 PM, Chaushu, Shani 
> wrote:
>
> > Yes I'm running in solr cloud mode.
> > I managed to make the query work with sql queries, but when I'm trying
> > to run it with stream request, I get an error When I try to run
> > expr=search:
> >
> > "Unable to construct instance of
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream
> >
> > When I try to run expr=reduce:
> > org.apache.solr.client.solrj.io.stream.ReducerStream
> >
> >
> > Any thoughts?
> >
> >
> > -Original Message-
> > From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> > Sent: Thursday, April 28, 2016 15:32
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr sql & streaming
> >
> > Hi Shani,
> > Are you running in SolrCloud mode? Here is blog post you can follow:
> > https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/
> >
> > Thanks,
> > Emir
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On 28.04.2016 13:45, Chaushu, Shani wrote:
> > > Hi,
> > > I installed solr 6 and try to run /sql and /stream request follow to
> > this wiki
> > https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interfac
> > e
> > > I saw in changes list that it doesn't need request handler
> > configuration, but when I try to acces I get the following message:
> > > 
> > > 
> > > 
> > > Error 404 Not Found
> > > 
> > > HTTP ERROR 404
> > > Problem accessing /solr/collection_test/sql. Reason:
> > > Not Found
> > > 
> > > 
> > >
> > > My request was
> > >
> > > curl --data-urlencode 'stmt=SELECT author, count(*) FROM
> > > collection_test
> > GROUP BY author ORDER BY count(*) desc'
> > http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
> > >
> > >
> > >
> > >
> > >
> > >
> > > 
> > > -
> > > Intel Electronics Ltd.
> > >
> > > This e-mail and any attachments may contain confidential material
> > > for the sole use of the intended recipient(s). Any review or
> > > distribution by others is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender and delete all copies.
> > >
> > -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>


Bringing Old Collections Up Again

2016-05-02 Thread Salman Ansari
Hi,

I am hosting Zookeeper ensemble and Solr servers on Microsoft cloud
(Azure). From time to time machines are forced to restart to install
updates. Recently, this happened again and it caused Zookeeper ensemble and
Solr instances to go down. When the machines came back up again. I tried
the following

1) Started Zookeeper on all machines using the following command
zkServer.cmd (on all three machines)

2) Started Solr on two of those machines using

solr.cmd start -c -p 8983 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 8983 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"

After several trials, it did start Solr on both machines but *non of the
previous collections came back normally.* When I look at the admin page, it
shows errors as follows

*[Collection_name]_shard2_replica2:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Index locked for write for core '[Collection_name]_shard2_replica2'. Solr
now longer supports forceful unlocking via 'unlockOnStartup'. Please verify
locks manually!

So probably I am doing something wrong or there is a different way of
bringing old collections up.

Appreciate your comments/feedback regarding this.

Regards,
Salman


EmbeddedSolrServer Loading Core Containers Solr 4.3.1

2016-05-02 Thread SRINI SOLR
Hi Team -
I am using Solr 4.3.1.

We are using this EmbeddedSolrServer to load Core Containers in one of the
java application.

This is setup as a cron job for every 1 hour to load the new data on to the
containers.

Otherwise - the new data is not getting loaded on the containers , if we
access from Java application even after re-indexing also.

Please help here to resolve the issue ...?


Results of facet differs with change in facet.limit.

2016-05-02 Thread Modassar Ather
Hi,

I have a field f which is defined as follows on solr 5.x. It is 12 shard
cluster with no replica.



When I facet on this field with different facet.limit I get different facet
count.

E.g.
Query : text_field:term=f=100
Result :
1225
1082
1076

Query : text_field:term=f=200
1366
1321
1315

I am noticing lesser document in facets whereas the numFound during search
is more. Please refer to following query for details.

Query : text_field:term=f
Result :
1225
1082
1076

Query : text_field:term AND f:val1
Result: numFound=1366

Kindly help me understand this behavior or let me know if it is an issue.

Thanks,
Modassar


Re: query logging using query rest api

2016-05-02 Thread Vincenzo D'Amore
Hi Matteo,

trying a few of queries, using POST and GET method, you can see query
params in logs appears only when you pass them in query string along with
HTTP GET method.

curl 'localhost:8983/solr/test/query?q=*:*'

The reason of this behaviour (I think) is in the big difference between the
size of parameters (number and length) usually used in HTTP GET (short) and
HTTP POST (big).

Let me see if we can change such log behaviour, but I suppose it could be
quite dangerous...






On Mon, May 2, 2016 at 11:21 AM, Matteo Grolla 
wrote:

> Hi Vincenzo,
>  you're right -XGET shouldn't be there but curl is smart enough to
> ignore it so nothing changes eliminating it, tested.
>
> 2016-04-28 11:28 GMT+02:00 Vincenzo D'Amore :
>
> > Hi Matteo,
> >
> > there is a problem in your curl test: as far as I know you cannot use GET
> > HTTP method ( -XGET ) and pass parameters in POST (-d).
> >
> > Try to remove the -XGET parameter.
> >
> > On Thu, Apr 28, 2016 at 11:18 AM, Matteo Grolla  >
> > wrote:
> >
> > > Hi,
> > > I'm experimenting the query rest api with solr 5.4 and I'm noticing
> > > that query parameters are not logged in solr.log.
> > > Here are query and log line
> > >
> > > curl -XGET 'localhost:8983/solr/test/query' -d '{"query":"*:*"}'
> > >
> > > 2016-04-28 09:16:54.008 INFO  (qtp668849042-17) [   x:test]
> > > o.a.s.c.S.Request [test] webapp=/solr path=/query params={} hits=32
> > > status=0 QTime=46
> > >
> > > Why this behaviour? How can I log request parameters for those queries?
> > > Thanks
> > >
> > > Matteo
> > >
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
> >
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Problem in Issuing a Command to Upload Configuration

2016-05-02 Thread Salman Ansari
Well, that just happened! Solr and Zookeeper machines faced a forced
restart to install Windows Updates. This caused Zookeeper ensemble and Solr
instances to go down. When the machines came back up again. I tried the
following

1) Started Zookeeper on all machines using the following command
zkServer.cmd (on all three machines)

2) Started Solr on two of those machines using

solr.cmd start -c -p 8983 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 8983 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"

After several trials, it did start the solr on both machines but *non of
the previous collections came back normally.* When I look at the admin
page, it shows errors as follows

*[Collection_name]_shard2_replica2:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Index locked for write for core '[Collection_name]_shard2_replica2'. Solr
now longer supports forceful unlocking via 'unlockOnStartup'. Please verify
locks manually!

So probably I am doing something wrong or the simple scenario is not
straight forward to recover from.

Your comment/feedback is appreciated.

Regards,
Salman



On Thu, Apr 7, 2016 at 3:56 PM, Shawn Heisey  wrote:

> On 4/7/2016 5:40 AM, Salman Ansari wrote:
> > Any comments regarding the issue I mentioned above "the proper procedure
> of
> > bringing old collections up after a restart of zookeeper ensemble and
> Solr
> > instances"?
>
> What precisely do you mean by "old collections"?  The simplest
> interpretation of that is that you are trying to restart your servers
> and have everything you already had in the cloud work properly.  An
> alternate interpretation, which might be just as valid, is that you have
> some collections on some old servers that you want to incorporate into a
> new cloud.
>
> If it's the simple scenario: shut down solr, shut down zookeeper, start
> zookeeper, start solr.  If it's the other scenario, that is not quite so
> simple.
>
> Thanks,
> Shawn
>
>


Re: query logging using query rest api

2016-05-02 Thread Matteo Grolla
Hi Vincenzo,
 you're right -XGET shouldn't be there but curl is smart enough to
ignore it so nothing changes eliminating it, tested.

2016-04-28 11:28 GMT+02:00 Vincenzo D'Amore :

> Hi Matteo,
>
> there is a problem in your curl test: as far as I know you cannot use GET
> HTTP method ( -XGET ) and pass parameters in POST (-d).
>
> Try to remove the -XGET parameter.
>
> On Thu, Apr 28, 2016 at 11:18 AM, Matteo Grolla 
> wrote:
>
> > Hi,
> > I'm experimenting the query rest api with solr 5.4 and I'm noticing
> > that query parameters are not logged in solr.log.
> > Here are query and log line
> >
> > curl -XGET 'localhost:8983/solr/test/query' -d '{"query":"*:*"}'
> >
> > 2016-04-28 09:16:54.008 INFO  (qtp668849042-17) [   x:test]
> > o.a.s.c.S.Request [test] webapp=/solr path=/query params={} hits=32
> > status=0 QTime=46
> >
> > Why this behaviour? How can I log request parameters for those queries?
> > Thanks
> >
> > Matteo
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: bf calculation

2016-05-02 Thread Georg Sorst
Hi Jan,

have you tried Solr's debug output? ie. add
"...=true=true" to your query. This should
answer your question.

Best,
Georg

Jan Verweij - Reeleez  schrieb am Mo., 2. Mai 2016 um
09:47 Uhr:

> Hi,
> I'm trying to understand the exact calculation that takes place when using
> edismax and the bf parameter.
> When searching I get a product returned with a score of 0.625
> Now, I have a field called productranking with a value of 0.5 for this
> specific
> product. If I add =field(productranking) to the request the score
> becomes 0.7954515
> How is this calculated?
> Cheers,
> Jan Verweij

-- 
*Georg M. Sorst I CTO*
FINDOLOGIC GmbH



Jakob-Haringer-Str. 5a | 5020 Salzburg I T.: +43 662 456708
E.: g.so...@findologic.com
www.findologic.com Folgen Sie uns auf: XING
facebook
 Twitter


Wir sehen uns auf dem *Shopware Community Day in Ahaus am 20.05.2016!* Hier
 Termin
vereinbaren!
Wir sehen uns auf der* dmexco in Köln am 14.09. und 15.09.2016!* Hier
 Termin
vereinbaren!


BlockJoinFacetComponent on solr 4.10

2016-05-02 Thread tkg_cangkul

hi i wanna asking a question about using


   BlockJoinFacetComponent

in solr 4. how can i use that library on solr 4.10.4
i want to install casebox with solr 4.10.4 but i have this error.

error

when i check in solr-core-4.10.4.jar there is no class Block JoinFacet 
COmponent. i found it in solr-core.6.0.0.jar . is it any other way for 
me to can use it on solr 4.?

pls help


BlockJoinFacetComponent on solr 4.10

2016-05-02 Thread tkg_cangkul

hi i wanna asking a question about using


   BlockJoinFacetComponent

in solr 4. how can i use that library on solr 4.10.4
i want to install casebox with solr 4.10.4 but i have this error.

error

when i check in solr-core-4.10.4.jar there is no class Block JoinFacet 
COmponent. i found it in solr-core.6.0.0.jar . is it any try for me to 
can use it on solr 4.?

pls help


Re: Using updateRequest Processor with DIH

2016-05-02 Thread Alexandre Rafalovitch
You just need to setup the full chain, not a single processor.

Regards,
Alex
On 2 May 2016 3:39 pm, "Jay Potharaju"  wrote:

> Hi,
> I was wondering if it is possible to use Update Request Processor with DIH.
> I would like to update an index_time field whenever documents are
> added/updated in the collection.
> I know that I could easily pass a time stamp which would update the field
> in my collection but I was trying to do it using Request processor.
>
> I tried the following but got an error. Any recommendations on how to use
> this correctly?
>
>
>  name="update_indextime">
> index_time
> 
>
> 
> 
>   data-config.xml
> update_indextime
> 
> 
>
> Error:
> Error from server at unknown UpdateRequestProcessorChain: update_indextime
>
> --
> Thanks
> Jay
>


bf calculation

2016-05-02 Thread Jan Verweij - Reeleez

Hi,
I'm trying to understand the exact calculation that takes place when using
edismax and the bf parameter.
When searching I get a product returned with a score of 0.625
Now, I have a field called productranking with a value of 0.5 for this specific
product. If I add =field(productranking) to the request the score becomes 
0.7954515
How is this calculated?
Cheers,
Jan Verweij

RE: Facet ignoring repeated word

2016-05-02 Thread G, Rajesh
Hi Erick/ Ahmet,

Thanks for your suggestion. Can we have a query in TermsComponent like. I need 
the word count of comments for a question id not all. When I include the query 
q=questionid=123 I still see count of all

http://localhost:8182/solr/dev/terms?terms.fl=comments=true=1000=questionid=123

StatsComponent is not supporting text fields

Field type 
textcloud_en{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
 class=solr.TextField}} is not currently supported

  

  
  
  


  
  

  

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, April 29, 2016 9:16 PM
To: solr-user ; Ahmet Arslan 
Subject: Re: Facet ignoring repeated word

That's the way faceting is designed to work. It counts the _documents_ that a 
term appears in that satisfy your query, if a word appears multiple times in a 
doc, it'll only count it once.

For the general use-case it'd be unsettling for a user to see a facet count of 
500, then click on it and discover that the number of docs in the corpus was 
really 345 or something.

Ahmet's hints might help, but I'd really ask if counting words multiple times 
really satisfies the use case.

Best,
Erick

On Fri, Apr 29, 2016 at 7:10 AM, Ahmet Arslan  wrote:
> Hi,
>
> Depending on your requirements; StatsComponent, TermsComponent, 
> LukeRequestHandler can also be used.
>
>
> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> https://wiki.apache.org/solr/LukeRequestHandler
> https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> Ahmet
>
>
>
> On Friday, April 29, 2016 11:56 AM, "G, Rajesh"  wrote:
> Hi,
>
> I am trying to implement word 
> cloud
>   using Solr.  The problem I have is Solr facet query ignores repeated words 
> in a document eg.
>
> I have indexed the text :
> It seems that the harder I work, the more work I get for the same 
> compensation and reward. The more work I take on gets absorbed into my 
> "normal" workload and I'm not recognized for working harder than my peers, 
> which makes me not want to work to my potential. I am very underwhelmed by 
> the evaluation process and bonus structure. I don't believe the current 
> structure rewards strong performers. I am confident that the company could 
> not hire someone with my talent to replace me if I left, but I don't think 
> the company realizes that.
>
> The indexed content has word my and the count the is 3 but when I run the 
> query 
> http://localhost:8182/solr/dev/select?facet=true=comments=0=on=questionid:3956=json
>  the count of word my  is 1 and not 3. Can you please help?
>
> Also please suggest If there is a better way to implement word cloud in Solr 
> other than using facet?
>
> "facet_fields":{
>   "comments":[
> "absorbed",1,
> "am",1,
> "believe",1,
> "bonus",1,
> "company",1,
> "compensation",1,
> "confident",1,
> "could",1,
> "current",1,
> "don't",1,
> "evaluation",1,
> "get",1,
> "gets",1,
> "harder",1,
> "hire",1,
> "i",1,
> "i'm",1,
> "left",1,
> "makes",1,
> "me",1,
> "more",1,
> "my",1,
> "normal",1,
> "peers",1,
> "performers",1,
> "potential",1,
> "process",1,
> "realizes",1,
> "recognized",1,
> "replace",1,
> "reward",1,
> "rewards",1,
> "same",1,
> "seems",1,
> "someone",1,
> "strong",1,
> "structure",1,
> "take",1,
> "talent",1,
> "than",1,
> "think",1,
> 

What does the "Max Doc" means in Admin interface?

2016-05-02 Thread Bastien Latard - MDPI AG

Hi All,

Everything is in the title...


Can this value be modified?
Or is it because of my environment?

Also, what does "Heap Memory Usage: -1" mean?

Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



OOM script executed

2016-05-02 Thread Bastien Latard - MDPI AG

Hi Guys,

I got several times the OOM script executed since I upgraded to Solr6.0:

$ cat solr_oom_killer-8983-2016-04-29_15_16_51.log
Running OOM killer script for process 26044 for Solr on port 8983

Does it mean that I need to increase my JAVA Heap?
Or should I do anything else?

Here are some further logs:
$ cat solr_gc_log_20160502_0730:
}
{Heap before GC invocations=1674 (full 91):
 par new generation   total 1747648K, used 1747135K 
[0x0005c000, 0x00064000, 0x00064000)
  eden space 1398144K, 100% used [0x0005c000, 
0x00061556, 0x00061556)
  from space 349504K,  99% used [0x00061556, 
0x00062aa2fc30, 0x00062aab)
  to   space 349504K,   0% used [0x00062aab, 
0x00062aab, 0x00064000)
 concurrent mark-sweep generation total 6291456K, used 6291455K 
[0x00064000, 0x0007c000, 0x0007c000)
 Metaspace   used 39845K, capacity 40346K, committed 40704K, 
reserved 1085440K
  class spaceused 4142K, capacity 4273K, committed 4368K, reserved 
1048576K
2016-04-29T21:15:41.970+0200: 20356.359: [Full GC (Allocation Failure) 
2016-04-29T21:15:41.970+0200: 20356.359: [CMS: 
6291455K->6291456K(6291456K), 12.5694653 secs] 
8038591K->8038590K(8039104K), [Metaspace: 39845K->39845K(1085440K)], 
12.5695497 secs] [Times: user=12.57 sys=0.00, real=12.57 secs]



Kind regards,
Bastien



RE: solr sql & streaming

2016-05-02 Thread Chaushu, Shani
U tried 2 examples:

curl -id 'expr=search(collections_test, q="*:*",fl=id,name,inStock, sort="id 
asc")' http://localhost:8983/solr/collections_test/stream

curl http://localhost:8983/solr/collections_test/stream -d 
'expr=reduce(search(collections_test, q="*:*",fl=id,name,inStock, sort="id 
asc") , by="id",group(sord="id asc",n="2"))'

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, May 02, 2016 05:28
To: solr-user@lucene.apache.org
Subject: Re: solr sql & streaming

It appears that you are not formatting the streaming expression properly.
Can you post your entire http request?

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, May 1, 2016 at 2:01 PM, Chaushu, Shani 
wrote:

> Yes I'm running in solr cloud mode.
> I managed to make the query work with sql queries, but when I'm trying 
> to run it with stream request, I get an error When I try to run 
> expr=search:
>
> "Unable to construct instance of
> org.apache.solr.client.solrj.io.stream.CloudSolrStream
>
> When I try to run expr=reduce:
> org.apache.solr.client.solrj.io.stream.ReducerStream
>
>
> Any thoughts?
>
>
> -Original Message-
> From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> Sent: Thursday, April 28, 2016 15:32
> To: solr-user@lucene.apache.org
> Subject: Re: solr sql & streaming
>
> Hi Shani,
> Are you running in SolrCloud mode? Here is blog post you can follow:
> https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management 
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 28.04.2016 13:45, Chaushu, Shani wrote:
> > Hi,
> > I installed solr 6 and try to run /sql and /stream request follow to
> this wiki
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interfac
> e
> > I saw in changes list that it doesn't need request handler
> configuration, but when I try to acces I get the following message:
> > 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404
> > Problem accessing /solr/collection_test/sql. Reason:
> > Not Found
> > 
> > 
> >
> > My request was
> >
> > curl --data-urlencode 'stmt=SELECT author, count(*) FROM 
> > collection_test
> GROUP BY author ORDER BY count(*) desc'
> http://localhost:8983/solr/collection_test/sql?aggregationMode=facet
> >
> >
> >
> >
> >
> >
> > 
> > -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material 
> > for the sole use of the intended recipient(s). Any review or 
> > distribution by others is strictly prohibited. If you are not the 
> > intended recipient, please contact the sender and delete all copies.
> >
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>
>
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.