Re: Error casting to PointField

2018-09-11 Thread Erick Erickson
People usually just use a string field in place of longs etc..
On Tue, Sep 11, 2018 at 9:15 PM Zahra Aminolroaya
 wrote:
>
> Thanks Erick. We used to use TrieLongField for our unique id and in the
> document it is said that all Trie* fieldtypes are casting to
> *pointfieldtypes. What would be the alternative solution?
>
>
>
> Best,
>
> Zahra
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: 6.x to 7.x differences

2018-09-11 Thread Preeti Bhat
Hi John,

Please check the solrQueryParser option, it was removed in 7.4 version, so you 
will need to provide AND in solrconfig.xml or give the 
q.op option while querying to solve this problem. By default solr makes it an 
"OR" operation leading to too many results.

Old Way: In Managed-schema or schema.xml


New Way: in solrconfig.xml

  

  AND

  


Thanks and Regards,
Preeti Bhat

-Original Message-
From: John Blythe [mailto:johnbly...@gmail.com]
Sent: Wednesday, September 12, 2018 8:02 AM
To: solr-user@lucene.apache.org
Subject: 6.x to 7.x differences

hi, all.

we recently migrated to cloud. part of that migration jumped us from 6.1 to 7.4.

one example query between our old solr instance and our new cloud instance 
produces 42 results and 19k results.

the analyzer is the same aside from WordDelimiterFilterFactory moving over to 
the graph variation of it and the lucene parser moving from 6.1 to 7.4 
obviously.

i've used the analysis tool in solr admin to try to determine the difference 
between the two. i'm seeing the same output between index and query results yet 
when actually running the queries have that huge divergence of results.

i'm left scratching my head at this point. i'm guessing it's from the lucene 
parser? hoping to get some clarity from you guys!

thanks!

--
John Blythe

NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.




Re: Error casting to PointField

2018-09-11 Thread Zahra Aminolroaya
Thanks Erick. We used to use TrieLongField for our unique id and in the
document it is said that all Trie* fieldtypes are casting to
*pointfieldtypes. What would be the alternative solution?



Best,

Zahra



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Data Import Handler with Solr Source behind Load Balancer

2018-09-11 Thread Zimmermann, Thomas
We have a Solr v7 Instance sourcing data from a Data Import Handler with a Solr 
data source running Solr v4. When it hits a single server in that instance 
directly, all documents are read and written correctly to the v7. When we hit 
the load balancer DNS entry, the resulting data import handler json states that 
it read all the documents and skipped none, and all looks fine, but the result 
set is missing ~20% of the documents in the v7 core. This has happened multiple 
time on multiple environments.

Any thoughts on whether this might be a bug in the underlying DIH code? I'll 
also pass it along to the server admins on our side for input.


Re: Docker and Solr Indexing

2018-09-11 Thread Shawn Heisey

On 9/11/2018 9:20 PM, solrnoobie wrote:

So what we did is we upgraded the instances to 16 gigs and we rarely
encounter this now.

So what we did was to increase the batch size to 500 instead of 50 and it
worked for our test data. But when we tried 1000 batch size, the invalid
content type error returned. Can you guys shed some light on why this is
happening? I don't think that a thousand per batch is too much (although we
have documents with many fields and child documents) so I am not really sure
what's causing this aside from a docker containter restart.


At no point in this thread have you shared the actual error messages.  
Without those and the exact version of Solr, it's difficult to help 
you.  Saying that you got a "content type error" doesn't mean anything.  
We need to see the actual error, complete with all stacktrace data.  The 
best information will be found in the logfile -- solr.log.


Solr (as packaged by this project) is not designed to restart itself 
automatically.  If the JVM encounters an OutOfMemoryError exception and 
the platform is NOT Windows, then Solr is designed to kill itself ... 
but it will NOT automatically restart without outside intervention or a 
change to its startup scripts.  This is done because program operation 
is completely unpredictable when OOME hits, so the best course of action 
is to self-terminate and let the admin fix the problem that cause the OOME.


The publicly available Solr docker container is NOT an official product 
of this project.  It is third-party, so problems specific to the docker 
container may need to be handled by the project that created it.  If the 
docker container is set up to automatically restart Solr when it dies, I 
would consider that to be a bug. About the only reason that Solr will 
ever die is the OOME self-termination that I already described ... and 
since the OOME is likely to occur again after restart, it's usually 
better for the software to stay offline until the admin fixes the problem.


Thanks,
Shawn



Re: error render solr data spatial from geoserver

2018-09-11 Thread Zheng Lin Edwin Yeo
Hi,

Which version of Solr are you using?
And are your different shard on the same machine or different machine?

Regards,
Edwin

On Tue, 4 Sep 2018 at 18:04, tkg_cangkul  wrote:

> Hi i wanna try to rendering solr data spatial from geoserver layer.
> when i try to render it from single shard solr collection, it works
> normally.
> but when i try to render it from multi shards solr collection, i've found
> an error message below on my geoserver. Pls help
>
>
>


Re: parent/child rows in solr

2018-09-11 Thread John Smith
On Tue, Sep 11, 2018 at 11:05 PM Walter Underwood 
wrote:

> Have you tried modeling it with multivalued fields?
>
>
That's an interesting idea, but I don't think that would work. We would
lose the concept of "rows". So let's say child1 has col "a" and col "b",
both are turned into multi-value fields in the solr index. Normally in sql
we can query for a specific value in col "a", and then see what the
associated value in col "b" would be, but we can't do that if we stuff the
col values in multi-value; we can no longer see which value from col "a"
corresponds to which value in col "b". I'm probably explaining that poorly,
but I just don't see how that would work.


Re: Docker and Solr Indexing

2018-09-11 Thread solrnoobie
Thank you all for the kind and timely reply.

So what we did is we upgraded the instances to 16 gigs and we rarely
encounter this now.

So what we did was to increase the batch size to 500 instead of 50 and it
worked for our test data. But when we tried 1000 batch size, the invalid
content type error returned. Can you guys shed some light on why this is
happening? I don't think that a thousand per batch is too much (although we
have documents with many fields and child documents) so I am not really sure
what's causing this aside from a docker containter restart.

Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: parent/child rows in solr

2018-09-11 Thread John Smith
On Tue, Sep 11, 2018 at 11:00 PM Shawn Heisey  wrote:

> On 9/11/2018 8:35 PM, John Smith wrote:
> > The problem is that the math isn't a simple case of adding up all the row
> > counts. These are "left outer join"s. In sql, it would be this query:
>
> I think we'll just have to conclude that I do not understand what you
> are doing.  I have no idea what "left outer join" even means, how it's
> different than a join that's NOT "left outer".
>
> I will say this:  Solr is not very efficient at joins, and there are a
> bunch of caveats involved.  It's usually better to go with a flat
> document space for a search engine.
>
> Thanks,
> Shawn
>
>
A "left outer join" in sql is a join such that if there is no match in the
child table for a given header id, then the child cells are returned as
"null" values, instead of the header row being removed from the result set
(which is what happens in "inner join" or standard sql join).

A good rundown on the various sql joins:
https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join


Re: parent/child rows in solr

2018-09-11 Thread Walter Underwood
Have you tried modeling it with multivalued fields?

Also, why do you think Solr is a good solution? What is the problem?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 11, 2018, at 7:35 PM, John Smith  wrote:
> 
> On Tue, Sep 11, 2018 at 9:32 PM Shawn Heisey  wrote:
> 
>> On 9/11/2018 7:07 PM, John Smith wrote:
>>> header:  223,580
>>> 
>>> child1:  124,978
>>> child2:  254,045
>>> child3:  127,917
>>> child4:1,009,030
>>> child5:  225,311
>>> child6:  381,561
>>> child7:  438,315
>>> child8:   18,850
>>> 
>>> 
>>> Trying to index that into solr with a flatfile schema, blows up into
>>> 5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a
>> 
>> I think you're not getting what I'm suggesting.  Or maybe there's an
>> aspect of your data that I'm not understanding.
>> 
>> If we add up all those numbers for the child docs, there are 2.5 million
>> of them.  So you would have 2.5 million docs in Solr.  I have created
>> Solr indexes far larger than this, and I do not consider my work to be
>> "big data".  Solr can handle 2.5 million docs easily, as long as the
>> hardware resources are sufficient.
>> 
>> Where the data duplication will come in is in additional fields in those
>> 2.5 million docs.  Each one will contain some (or maybe all) of the data
>> that WOULD have been in the parent document.  The amount of data
>> balloons, but the number of documents (rows) doesn't.
>> 
>> That kind of arrangement is usually enough to accomplish whatever is
>> needed.  I cannot assume that it will work for your use case, but it
>> does work for most.
>> 
>> Thanks,
>> Shawn
>> 
>> 
> The problem is that the math isn't a simple case of adding up all the row
> counts. These are "left outer join"s. In sql, it would be this query:
> 
> select * from header h
> left outer join child1 c1 on c1.hid = h.id
> left outer join child2 c2 on c2.hid = h.id
> ...
> left outer join child8 c8 on c8.hid = h.id
> 
> 
> If there are 10 rows in child1 linked to 1 header with id "abc", and 10
> rows in child2 linked to that same header, then we end up with 10 * 10 rows
> in solr, not 20. Considering there are 8 child tables in this example,
> there is simply an explosion of data.
> 
> I can't describe it much better than that (abstractly), though perhaps I
> could put together a simple example with live data. Suffice it to say, in
> my example row counts above, that is all "live data" in a relatively small
> database of ours, the row counts are real, and the final row count of 5.5
> billion was calculated inside sql using that query above:
> 
> select count(*) from (
>select id from header h
>left outer join child1 c1 on c1.hid = h.id
>left outer join child2 c2 on c2.hid = h.id
>...
>left outer join child8 c8 on c8.hid = h.id
> ) tmp;



Re: parent/child rows in solr

2018-09-11 Thread Shawn Heisey

On 9/11/2018 8:35 PM, John Smith wrote:

The problem is that the math isn't a simple case of adding up all the row
counts. These are "left outer join"s. In sql, it would be this query:


I think we'll just have to conclude that I do not understand what you 
are doing.  I have no idea what "left outer join" even means, how it's 
different than a join that's NOT "left outer".


I will say this:  Solr is not very efficient at joins, and there are a 
bunch of caveats involved.  It's usually better to go with a flat 
document space for a search engine.


Thanks,
Shawn



Re: LIBLINEAR model lacks weight(s) when training for SolrFeatures in LTR

2018-09-11 Thread Zheng Lin Edwin Yeo
I have found that it is due to insufficient training data that are related
to that feature.
After I add more entries that are related to that features to the training
data, the issues did not occur.

Regards,
Edwin

On Tue, 28 Aug 2018 at 15:56, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I am using Solr 7.4.0, and using LIBLINEAR to do the training for the LTR
> model based on this example:
> https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md
>
> However, I found that when I wanted to train for solr filter query with
> the class SolrFeature, I will get the following error saying that the model
> lacks weight(s):
>
> Exception: Status: 400 Bad Request
> Response: {
>   "responseHeader":{
> "status":400,
> "QTime":1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.ltr.model.ModelException"],
> "msg":"org.apache.solr.ltr.model.ModelException: Model myModel lacks
> weight(s) for [category]",
>
> This is how I define it in my feature JSON file:
>
>   {
> "store" : "myFeatures",
> "name" : "category",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
> "fq": ["{!terms f=category}book"]
> }
>   }
>
> What could be the reason that causes this, and how can we resolve this
> issue?
>
> Regards,
> Edwin
>


Implementing NeuralNetworkModel RankNet in Solr LTR

2018-09-11 Thread Zheng Lin Edwin Yeo
Hi,

I am working on to implementing Solr LTR in Solr 7.4.0 by using the
NeuralNetworkModel for the feature selection and model training, and I have
found this site which uses RankNet:
https://github.com/airalcorn2/Solr-LTR#RankNet

Has anyone tried on this before? And what is the format of the training
data that this model requires?

Regards,
Edwin


Re: parent/child rows in solr

2018-09-11 Thread John Smith
On Tue, Sep 11, 2018 at 9:32 PM Shawn Heisey  wrote:

> On 9/11/2018 7:07 PM, John Smith wrote:
> > header:  223,580
> >
> > child1:  124,978
> > child2:  254,045
> > child3:  127,917
> > child4:1,009,030
> > child5:  225,311
> > child6:  381,561
> > child7:  438,315
> > child8:   18,850
> >
> >
> > Trying to index that into solr with a flatfile schema, blows up into
> > 5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a
>
> I think you're not getting what I'm suggesting.  Or maybe there's an
> aspect of your data that I'm not understanding.
>
> If we add up all those numbers for the child docs, there are 2.5 million
> of them.  So you would have 2.5 million docs in Solr.  I have created
> Solr indexes far larger than this, and I do not consider my work to be
> "big data".  Solr can handle 2.5 million docs easily, as long as the
> hardware resources are sufficient.
>
> Where the data duplication will come in is in additional fields in those
> 2.5 million docs.  Each one will contain some (or maybe all) of the data
> that WOULD have been in the parent document.  The amount of data
> balloons, but the number of documents (rows) doesn't.
>
> That kind of arrangement is usually enough to accomplish whatever is
> needed.  I cannot assume that it will work for your use case, but it
> does work for most.
>
> Thanks,
> Shawn
>
>
The problem is that the math isn't a simple case of adding up all the row
counts. These are "left outer join"s. In sql, it would be this query:

select * from header h
left outer join child1 c1 on c1.hid = h.id
left outer join child2 c2 on c2.hid = h.id
...
left outer join child8 c8 on c8.hid = h.id


If there are 10 rows in child1 linked to 1 header with id "abc", and 10
rows in child2 linked to that same header, then we end up with 10 * 10 rows
in solr, not 20. Considering there are 8 child tables in this example,
there is simply an explosion of data.

I can't describe it much better than that (abstractly), though perhaps I
could put together a simple example with live data. Suffice it to say, in
my example row counts above, that is all "live data" in a relatively small
database of ours, the row counts are real, and the final row count of 5.5
billion was calculated inside sql using that query above:

select count(*) from (
select id from header h
left outer join child1 c1 on c1.hid = h.id
left outer join child2 c2 on c2.hid = h.id
...
left outer join child8 c8 on c8.hid = h.id
) tmp;


6.x to 7.x differences

2018-09-11 Thread John Blythe
hi, all.

we recently migrated to cloud. part of that migration jumped us from 6.1 to
7.4.

one example query between our old solr instance and our new cloud instance
produces 42 results and 19k results.

the analyzer is the same aside from WordDelimiterFilterFactory moving over
to the graph variation of it and the lucene parser moving from 6.1 to 7.4
obviously.

i've used the analysis tool in solr admin to try to determine the
difference between the two. i'm seeing the same output between index and
query results yet when actually running the queries have that huge
divergence of results.

i'm left scratching my head at this point. i'm guessing it's from the
lucene parser? hoping to get some clarity from you guys!

thanks!

--
John Blythe


Re: parent/child rows in solr

2018-09-11 Thread Shawn Heisey

On 9/11/2018 7:07 PM, John Smith wrote:

header:  223,580

child1:  124,978
child2:  254,045
child3:  127,917
child4:1,009,030
child5:  225,311
child6:  381,561
child7:  438,315
child8:   18,850


Trying to index that into solr with a flatfile schema, blows up into
5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a


I think you're not getting what I'm suggesting.  Or maybe there's an 
aspect of your data that I'm not understanding.


If we add up all those numbers for the child docs, there are 2.5 million 
of them.  So you would have 2.5 million docs in Solr.  I have created 
Solr indexes far larger than this, and I do not consider my work to be 
"big data".  Solr can handle 2.5 million docs easily, as long as the 
hardware resources are sufficient.


Where the data duplication will come in is in additional fields in those 
2.5 million docs.  Each one will contain some (or maybe all) of the data 
that WOULD have been in the parent document.  The amount of data 
balloons, but the number of documents (rows) doesn't.


That kind of arrangement is usually enough to accomplish whatever is 
needed.  I cannot assume that it will work for your use case, but it 
does work for most.


Thanks,
Shawn



Re: parent/child rows in solr

2018-09-11 Thread John Smith
>
> On 9/7/2018 7:44 PM, John Smith wrote:
> > Thanks Shawn, for your comments. The reason why I don't want to go flat
> > file structure, is due to all the wasted/duplicated data. If a department
> > has 100 employees, then it's very wasteful in terms of disk space to
> repeat
> > the header data over and over again, 100 times. In this example there is
> > only a few doc types, but my real-life data is much larger, and the
> problem
> > is a "scaling" problem; with just a little bit of data, no problem in
> > duplicating header fields, but with massive amounts of data it's a large
> > problem.
>
> If your goal is data storage, then you are completely correct.  All that
> data duplication is something to avoid for a data storage situation.
> Normalizing your data so it's relational makes perfect sense, because
> most database software is designed to efficiently deal with those
> relationships.
>
> Solr is not designed as a data storage platform, and does not handle
> those relationships efficiently.  Solr's design goals are all about
> *search*.  It often gets touted as filling a NoSQL role ... but it's not
> something I would personally use as a primary data repository.  Search
> is a space where data duplication is expected and completely normal.
> This is something that people often have a hard time accepting.
>
>
I'm not actually trying to use solr as a data storage platform; all our
data is stored in an sql database, we are using solr strictly for the
search features, not storage features.

Here is a good example from a test I ran today. I have a header table, and
8 child tables which link directly to the header table. The children link
only to 1 header row, and they do not link to other children. So a 1:many
between header and each child. Some row counts:

header:  223,580

child1:  124,978
child2:  254,045
child3:  127,917
child4:1,009,030
child5:  225,311
child6:  381,561
child7:  438,315
child8:   18,850


Trying to index that into solr with a flatfile schema, blows up into
5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a
left outer join between header and each child and getting a row count in
the database. That's not going to scale, at all, considering the small size
of the source input tables. Some of our indexes would require 50 million
header rows alone, never mind the child tables.

So solr has no way of indexing something like this? I can't believe I would
be the first person to run into this issue, I have a feeling I'm missing
something obvious somewhere.


Re: any way to post json document to a MoreLikeThisHandler?

2018-09-11 Thread Alexandre Rafalovitch
Hmm.

I guess the issue is that the handler is the one doing parsing, so the
input document can be in XML or JSON or CSV. And MLT as a handler is then a
competing end point.

So you actually want to use it later in a pipeline but with a document
constructed on the fly and not stored.

This may not exist right now. Though maybe some combination of
DumpRequestHandler and MLT as a search component could do the trick?

I would be curious to know if it can be made to work out of the box.
Otherwise, patches are welcome But they should not expect just JSON
input format.

Regards,
Alex

On Tue, Sep 11, 2018, 4:57 PM Matt Work Coarr, 
wrote:

> Thanks Alex.  Yes, I've been using the MoreLikeThisHandler, but that takes
> a block of text as input posted to the request, not the structured json
> that corresponds to the fields.
>
> On Tue, Sep 11, 2018 at 10:14 AM Alexandre Rafalovitch  >
> wrote:
>
> > There are three ways to trigger MLT:
> > https://lucene.apache.org/solr/guide/7_4/morelikethis.html
> >
> > MoreLikeThisHandler allows to supply text externally. Unfortunately, I
> > can't find the specific example demonstrating it, so not sure if it
> > just a blob of text or a document.
> >
> > Regards,
> >Alex.
> >
> > On 11 September 2018 at 09:55, Matt Work Coarr  >
> > wrote:
> > > Hello,
> > >
> > > Using a MoreLikeThisHandler, I was hoping to be able to pass in in the
> > post
> > > body a json document (the same format as a document indexed in my core,
> > but
> > > the document in the request is not and should not be added to the
> core).
> > >
> > > I'm thinking it would handle an incoming document similar to how the
> > > /update handler can split up a json document into the set of fields
> > defined
> > > in the schema (or auto created fields).
> > >
> > > For instance, my input document would look like this:
> > >
> > > {
> > >   "id": 1234,
> > >   "field1": "blah blah blah",
> > >   "field2": "foo bar",
> > >   "field3": 112233
> > > }
> > >
> > > And then I want to be able to use the MoreLikeThis query parameters to
> > > determine which fields are used in the MLT comparison.
> > >
> > > Thanks,
> > > Matt
> >
>


Re: any way to post json document to a MoreLikeThisHandler?

2018-09-11 Thread Matt Work Coarr
Thanks Alex.  Yes, I've been using the MoreLikeThisHandler, but that takes
a block of text as input posted to the request, not the structured json
that corresponds to the fields.

On Tue, Sep 11, 2018 at 10:14 AM Alexandre Rafalovitch 
wrote:

> There are three ways to trigger MLT:
> https://lucene.apache.org/solr/guide/7_4/morelikethis.html
>
> MoreLikeThisHandler allows to supply text externally. Unfortunately, I
> can't find the specific example demonstrating it, so not sure if it
> just a blob of text or a document.
>
> Regards,
>Alex.
>
> On 11 September 2018 at 09:55, Matt Work Coarr 
> wrote:
> > Hello,
> >
> > Using a MoreLikeThisHandler, I was hoping to be able to pass in in the
> post
> > body a json document (the same format as a document indexed in my core,
> but
> > the document in the request is not and should not be added to the core).
> >
> > I'm thinking it would handle an incoming document similar to how the
> > /update handler can split up a json document into the set of fields
> defined
> > in the schema (or auto created fields).
> >
> > For instance, my input document would look like this:
> >
> > {
> >   "id": 1234,
> >   "field1": "blah blah blah",
> >   "field2": "foo bar",
> >   "field3": 112233
> > }
> >
> > And then I want to be able to use the MoreLikeThis query parameters to
> > determine which fields are used in the MLT comparison.
> >
> > Thanks,
> > Matt
>


Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Erick Erickson
OK, you just don't want to group as Shawn says. The group.main=true
just flattens the list but still returns the docs in groups does all
the work of grouping. So instead of getting
value=1 [
   doc31
   doc64
 ]
value=2 [
   doc 98
   doc 6
  ]

you get

doc31
doc64
doc98
doc6

Best,
Erick
On Tue, Sep 11, 2018 at 11:48 AM Shawn Heisey  wrote:
>
> On 9/11/2018 12:00 PM, Venkateswarlu Bommineni wrote:
> > What i am expecting is (it might be silly) if i put group.main=true and
> > sort by price then the results are:
> >
> >{
> >  "priceValueGLP_usd_double":32015.0,
> >  "sapRank_int":446},
> >{
> >  "priceValueGLP_usd_double":32015.0,
> >  "sapRank_int":446},
> >{
> >  "priceValueGLP_usd_double":*31000.0*,
> >  "sapRank_int":445},
> >{
> >  "priceValueGLP_usd_double":*30670.0*,
> >  "sapRank_int":446},
> >{
> >  "priceValueGLP_usd_double":29040.0,
> >  "sapRank_int":436},
> >{
> >  "priceValueGLP_usd_double":27775.0,
> >  "sapRank_int":436},
>
> It sounds like you don't want grouping at all.  That seems to be a
> result list sorted by price.  If you group by rank, then all of the
> results for a specific rank will be together, and the response you
> indicated above where the docs with rank 446 are not all together will
> be impossible.  If you remove the grouping, then you can get a simple
> result sorted by price.
>
> Thanks,
> Shawn
>


Re: Solr RSIZE memory overusage

2018-09-11 Thread Erick Erickson
bq. We're using NRTCachingDirectoryFactory

Which uses MMapDirectory under the covers.

The file handle counts will vary. During merging,
files are held open and while segments are merged
so new and old segments are open. Once merged,
the files in the old segment will be deleted so some
variance is expected.

What are your ulimits set at? I'm wondering if
there's some weird reporting going on, which would
be true if, say, your file handle limit were lower
than what's being reported.

Best,
Erick
On Tue, Sep 11, 2018 at 12:24 PM Boris Pasko  wrote:
>
> On Tue, 2018-09-11 at 12:43 -0600, Shawn Heisey wrote:
> > On 9/11/2018 12:14 PM, Boris Pasko wrote:
> > >
> > > >
> > > > Run top, press shift-M to sort by memory usage, then grab a
> > > atop: http://oi68.tinypic.com/10pokkk.jpg
> > > top: http://oi63.tinypic.com/msbpfp.jpg
> > Looking at the second one:
> >
> > The SHR value is showing 90GB.
> >
> Thanks! Thats  a good catch.
> >
> >
> > Looks like the amount of memory in the machine is too large for "top"
> > to
> 128Gb
>
>
>
> > It does look like you've got in the neighborhood of 600GB of index
> > data.
> 600-700Gb per node, correct.
>
> >  Only 113GB of that data is cached.  For some use cases, this will
> > be plenty of cached data for good performance. For others, it won't
> > be
> > anywhere near enough.  For *perfect* performance, there will be
> > enough
> > memory for ALL of the index data to fit into memory.
> >
> > Thanks,
> > Shawn
> >
>
>
> –
> The information contained in this message and any attachments may be 
> confidential and/or restricted and protected from disclosure. If the reader 
> of this message is not the intended recipient, disclosure, copying, use, or 
> distribution of the information included in this message is prohibited - 
> please destroy all electronic and paper copies and notify the sender 
> immediately.


Re: Update partial document

2018-09-11 Thread Vincenzo D'Amore
Hi Mikhail, Shawn,

thanks for your prompt answer.
The problem is that the indexed documents have dozen of fields and usually
they are different for each document.

For example document id 1 has few generic fields like title, description
and all the attributes like attr_1224, attr_4343, attr_4454, attr_5345, and
so on (dozen).
document id 2 like the former has its generic fields and attr_435,
attr_165, attr_986, attr_12, and so on (dozen).

In other words, I cannot know for each document I have update what are the
list of attr_# that I have to remove.

In the update request there is only the list of new fields/values that I
have to substitute in the document and yes, this list can be different from
the original document.




On Tue, Sep 11, 2018 at 7:42 PM Shawn Heisey  wrote:

> On 9/11/2018 10:23 AM, Vincenzo D'Amore wrote:
> > I suppose to be able to remove attr_1 and add attr_3 with one atomic
> update.
> >
> > Like this:
> >
> > curl -X POST -H 'Content-Type: application/json' '
> >
> http://localhost:8983/solr/gettingstarted/update?versions=true=true
> '
> > --data-binary '
> >   [
> >  {
> >"id" : "aaa" ,
> >"attr_" : [ "set" : null ],
> >"attr_3" : [ "set" : "x" ]
> >  }
> > ]'
>
> This would probably have worked if you had used "attr_1" instead of
> "attr_".  There is no field named "attr_" in your document, so that line
> does nothing.  Fields in atomic updates must be fully specified. I am
> not aware of any kind of wildcard support.
>
> > But as result I only have a new attr_3 field (the field attr_1 is still
> > there)
> >
> >   {
> >  "id":"aaa",
> >  "value_i":10,
> >  "attr_1":["a"],
> >  "attr_3":["x"]
> >   }
> >
> > So it seem that, for this particular case, I have first to read the
> > document and then I can update it.
> >
> > Do you think there are other options?
> > Can I use the StatelessScriptUpdateProcessorFactory ?
> > Should I write my own UpdateProcessor ?
>
> Thanks,
> Shawn
>
>

-- 
Vincenzo D'Amore


Re: Solr RSIZE memory overusage

2018-09-11 Thread Boris Pasko
On Tue, 2018-09-11 at 12:43 -0600, Shawn Heisey wrote:
> On 9/11/2018 12:14 PM, Boris Pasko wrote:
> >
> > >
> > > Run top, press shift-M to sort by memory usage, then grab a
> > atop: http://oi68.tinypic.com/10pokkk.jpg
> > top: http://oi63.tinypic.com/msbpfp.jpg
> Looking at the second one:
>
> The SHR value is showing 90GB.
>
Thanks! Thats  a good catch.
>
>
> Looks like the amount of memory in the machine is too large for "top"
> to
128Gb



> It does look like you've got in the neighborhood of 600GB of index
> data.
600-700Gb per node, correct.

>  Only 113GB of that data is cached.  For some use cases, this will
> be plenty of cached data for good performance. For others, it won't
> be
> anywhere near enough.  For *perfect* performance, there will be
> enough
> memory for ALL of the index data to fit into memory.
>
> Thanks,
> Shawn
>


–
The information contained in this message and any attachments may be 
confidential and/or restricted and protected from disclosure. If the reader of 
this message is not the intended recipient, disclosure, copying, use, or 
distribution of the information included in this message is prohibited - please 
destroy all electronic and paper copies and notify the sender immediately.


Re: Docker and Solr Indexing

2018-09-11 Thread Jan Høydahl
You have not shed any light on what the reason for the container restart was, 
and there is too little information about your setup and Solr usage to guess 
what goes on. Whether 4Gb is sufficient or not depends on how much data and 
queries you plan for each shard to handle, how much heap you give to Solr out 
of those 4G and many other factors.

Jan

> 11. sep. 2018 kl. 08:05 skrev solrnoobie :
> 
> So we have a dockerized aws environment with the solr docker container having
> only 4 gigs for max ram.
> 
> Our problem is whenever we index, the container containing the leader shard
> will restart after around 2 or less minutes of index time (batch is 50 docs
> per batch with 3 threads in our app thread pool). Because of the container
> restart, indexing will fail because solrJ will throw an invalid content type
> exception because of the quick container restart.
> 
> What can possible casue the issues above?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Shawn Heisey

On 9/11/2018 12:00 PM, Venkateswarlu Bommineni wrote:

What i am expecting is (it might be silly) if i put group.main=true and
sort by price then the results are:

   {
 "priceValueGLP_usd_double":32015.0,
 "sapRank_int":446},
   {
 "priceValueGLP_usd_double":32015.0,
 "sapRank_int":446},
   {
 "priceValueGLP_usd_double":*31000.0*,
 "sapRank_int":445},
   {
 "priceValueGLP_usd_double":*30670.0*,
 "sapRank_int":446},
   {
 "priceValueGLP_usd_double":29040.0,
 "sapRank_int":436},
   {
 "priceValueGLP_usd_double":27775.0,
 "sapRank_int":436},


It sounds like you don't want grouping at all.  That seems to be a 
result list sorted by price.  If you group by rank, then all of the 
results for a specific rank will be together, and the response you 
indicated above where the docs with rank 446 are not all together will 
be impossible.  If you remove the grouping, then you can get a simple 
result sorted by price.


Thanks,
Shawn



Re: Solr RSIZE memory overusage

2018-09-11 Thread Shawn Heisey

On 9/11/2018 12:14 PM, Boris Pasko wrote:

Run top, press shift-M to sort by memory usage, then grab a

atop: http://oi68.tinypic.com/10pokkk.jpg
top: http://oi63.tinypic.com/msbpfp.jpg


Looking at the second one:

The SHR value is showing 90GB.

Your Java process is in actuality only using in the ballpark of 9GB 
memory -- the difference between RES and SHR.  I have no idea why the 
SHR value goes so high sometimes, but it does.  This is a strange memory 
reporting anomaly encountered when using Java software.  The problem 
might be in the OS, or it might be in Java ... but there IS a reporting 
problem.  It does seem related to MMap, though.  You said you're using 
NRTCachingDirectoryFactory ... which *does* use MMap for all index file 
access.


Looks like the amount of memory in the machine is too large for "top" to 
display the numbers correctly.  See the "+" sign for total memory, 
buff/cache/ and avail Mem.


If we switch over to atop (a program that I did not know about) for a 
moment, you'll see that there is 113GB used by the disk cache.  So 
having 101GB (the RSIZE value for the Java process) is simply not 
possible.  That number is being reported incorrectly.


It does look like you've got in the neighborhood of 600GB of index 
data.  Only 113GB of that data is cached.  For some use cases, this will 
be plenty of cached data for good performance. For others, it won't be 
anywhere near enough.  For *perfect* performance, there will be enough 
memory for ALL of the index data to fit into memory.


Thanks,
Shawn



Solr RSIZE memory overusage

2018-09-11 Thread Boris Pasko
Hi. We're running Solr 6.6.1 (SolrCloud, 3 clusters). Recently I noticed
it became significantly slower to respond and did some basic checks on
servers. There is little IO, a bit of CPU usage (110% user, 3090% idle),
but one thing is very strange - the resident memory usage of the Solr.

Despite the -Xms=8G and -Xmx=8G and despite that Solr UI shows only 4Gb
heap used, the top and atop shows RSIZE=100Gb+ used by Solr.

We are not using direct memory. We have not increased  the java allowed
direct memory.

Further digging show that solr has 130K+ open files:

$ sudo lsof | grep 5514 | grep REG | wc -l
132104

What is even more worriesome is that some files are literally open
thousand times, and even old tlog files are still kept open:

$ sudo lsof | grep 5514 | grep REG | grep tlog | headjava
5514 solr  142u  REG  252,0 2918073
71565947
/var/db/solr/data/regulatory_shard1_replica1/data/tlog/tlog.0038952
java   5514 solr  147u  REG 252,0   2794
71565408
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.3144700
java   5514 solr  153u  REG 252,0   2869
71565602
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.3144697
java   5514 solr  160u  REG 252,0   2869
71565823
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.3144699
java   5514 solr  161u  REG 252,0 428385
71566321
/var/db/solr/data/WebResource_shard1_replica2/data/tlog/tlog.00075888387
java   5514 solr  162u  REG 252,0  11518
71567726
/var/db/solr/data/RSS_shard2_replica1/data/tlog/tlog.0055215
java   5514 solr  163u  REG 252,0   1676
71566426
/var/db/solr/data/jobs_shard2_replica2/data/tlog/tlog.3143993
java   5514 solr  176u  REG 252,0   1199
71565773
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.3144710
java   5514 solr  179u  REG 252,0   1769
71565833
/var/db/solr/data/jobs_shard2_replica2/data/tlog/tlog.3143992
java   5514 solr  180u  REG 252,02006034
71565631
/var/db/solr/data/WebResource_shard3_replica1/data/tlog/tlog.00075897473


$ sudo lsof | grep 5514 | grep REG | grep tlog | grep
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.3144700
| wc -l
98

So this old tlog file, which (as I understand) supposed to be closed, is
still open 98 times.

I wonder if that is how Solr is supposed to work (I really doubt it).

Boris.


–
The information contained in this message and any attachments may be 
confidential and/or restricted and protected from disclosure. If the reader of 
this message is not the intended recipient, disclosure, copying, use, or 
distribution of the information included in this message is prohibited - please 
destroy all electronic and paper copies and notify the sender immediately.


Re: Solr RSIZE memory overusage

2018-09-11 Thread Boris Pasko
>Run top, press shift-M to sort by memory usage, then grab a
atop: http://oi68.tinypic.com/10pokkk.jpg
top: http://oi63.tinypic.com/msbpfp.jpg


–
The information contained in this message and any attachments may be 
confidential and/or restricted and protected from disclosure. If the reader of 
this message is not the intended recipient, disclosure, copying, use, or 
distribution of the information included in this message is prohibited - please 
destroy all electronic and paper copies and notify the sender immediately.


Re: Solr RSIZE memory overusage

2018-09-11 Thread Boris Pasko
On Tue, 2018-09-11 at 10:26 -0700, Erick Erickson wrote:
> The memory usage is probably MMapDirectory, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.ht
> ml,
> that's not a problem I'd guess.
We're using NRTCachingDirectoryFactory

> The open file handles... and ones being open many times.
> 1> how many _total_ segment files do you have in all the replicas in
> the JVM?
I counted 542 segments total across all shards (using UI)


> 2> Do you have any custom code? It's very easy to open a searcher and
> _not_ close it in custom code, which will keep file handles open.
We do use ICU plugins which come with solr as contributions, but no
custom code.

>
> 3> Are you using CDCR? And if so, are you buffering? That might keep
> tlog files open.
Nope, we're not using CDCR

>
> 4> If you shut down the JVM and restart Solr, do the index files
> (segment files and/or TLOGs) disappear? Not sure you can run this
> experiment on your system if it's prod, but it'd be some information
> to go on, indicating "something" is not closing searchers or the
> like.
I have test env which (having less load than production) uses
RSIZE=35Gb. I can try it right now.

/var/db/solr/data$ find . -type f | wc -l
1664
/var/db/solr/data$ sudo service solr restart
/var/db/solr/data$ sleep 120
/var/db/solr/data$ find . -type f | wc -l
1586

So..hm. Some  segments were merged I guess... The production file count
is not much higher though:
/var/db/solr/data$ find . -type f | wc -l
3272
>
> Best,
> Erick
> On Tue, Sep 11, 2018 at 10:11 AM Boris Pasko 
> wrote:
> >
> >
> >
> > >
> > > Same picture on Solr 6.6.2, tested on various Oracle JVMs ranging
> > > from 1.8.0_171 to 1.8.0_171
> > From 1.8.0_171 to 1.8.0_181
> >
> >
> > –
> > The information contained in this message and any attachments may
> > be confidential and/or restricted and protected from disclosure. If
> > the reader of this message is not the intended recipient,
> > disclosure, copying, use, or distribution of the information
> > included in this message is prohibited - please destroy all
> > electronic and paper copies and notify the sender immediately.


–
The information contained in this message and any attachments may be 
confidential and/or restricted and protected from disclosure. If the reader of 
this message is not the intended recipient, disclosure, copying, use, or 
distribution of the information included in this message is prohibited - please 
destroy all electronic and paper copies and notify the sender immediately.


Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Venkateswarlu Bommineni
Erik and Shawn,

Sorry for the confusion.

Yes Solr is sorting inside the grouped results.but not on all the results.
Example : we have got 6 records by solr response. if i do *sort=price desc
, it is sorting indise each group.*

*But the requirement is to sort on all the results.*

*Current results :*

"matches":14640,
  "groups":[{
  "groupValue":446,
  "doclist":{"numFound":4,"start":0,"docs":[
  {
"price":32015.0,
"rank":446},
  {
"price":32015.0,
"rank":446},
  {
"price":*30670.0,*
"rank":446}]
  }},
{
  "groupValue":436,
  "doclist":{"numFound":4,"start":0,"docs":[
  {
"price":*31000.0,*
"rank":436},
  {
"price":29040.0,
"rank":436},
  {
"price":27775.0,
"rank":436}]
  }},
{


What i am expecting is (it might be silly) if i put group.main=true and
sort by price then the results are:

  {
"priceValueGLP_usd_double":32015.0,
"sapRank_int":446},
  {
"priceValueGLP_usd_double":32015.0,
"sapRank_int":446},
  {
"priceValueGLP_usd_double":*31000.0*,
"sapRank_int":445},
  {
"priceValueGLP_usd_double":*30670.0*,
"sapRank_int":446},
  {
"priceValueGLP_usd_double":29040.0,
"sapRank_int":436},
  {
"priceValueGLP_usd_double":27775.0,
"sapRank_int":436},


Sorry if it did not make sense. and suggest if there is any way we can
achieve that.



Thanks,
Venkat.





On Tue, Sep 11, 2018 at 11:37 AM Erick Erickson 
wrote:

> How this all works will be much clearer if you don't use "group.main=true"
>
> But you still haven't _shown_ us what you _expect_.
>
> In the second query, Solr is doing exactly what you're telling it to.
> Return groups of up to three docs lowest-priced docs in each group and
> ordering the groups by the lowest price doc appearing in the group.
>
> What I'm guessing you want to do is specify sort=rank asc=price
> asc
>
> Best,
> Erick
> On Tue, Sep 11, 2018 at 10:23 AM Shawn Heisey  wrote:
> >
> > On 9/11/2018 10:14 AM, Venkateswarlu Bommineni wrote:
> > > Please find the resonse and query when grouping and sorting by rank :
> >
> > I see no evidence of grouping happening in either of those responses.
> > They look like standard responses do when grouping is not enabled.
> >
> > Here's an example of a grouped result from the techproducts example that
> > ships with Solr.  The "response" section is gone and has been replaced
> > by a "grouped" section:
> >
> > |{ "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*",
> > "fl":"id,manu,price", "group.limit":"3", "group.field":"manu",
> > "_":"1536686115965", "group":"true"}}, "grouped":{ "manu":{
> > "matches":32, "groups":[{ "groupValue":null,
> > "doclist":{"numFound":12,"start":0,"docs":[ { "id":"GB18030TEST",
> > "price":0.0}, { "id":"adata"}, { "id":"apple"}] }}, {
> > "groupValue":"Samsung Electronics Co. Ltd.",
> > "doclist":{"numFound":1,"start":0,"docs":[ { "id":"SP2514N",
> > "manu":"Samsung Electronics Co. Ltd.", "price":92.0}] }}, {
> > "groupValue":"Maxtor Corp.", "doclist":{"numFound":1,"start":0,"docs":[
> > { "id":"6H500F0", "manu":"Maxtor Corp.", "price":350.0}] }}, {
> > "groupValue":"Belkin", "doclist":{"numFound":2,"start":0,"docs":[ {
> > "id":"F8V7067-APL-KIT", "manu":"Belkin", "price":19.95}, { "id":"IW-02",
> > "manu":"Belkin", "price":11.5}] }}, { "groupValue":"Apple Computer
> > Inc.", "doclist":{"numFound":1,"start":0,"docs":[ { "id":"MA147LL/A",
> > "manu":"Apple Computer Inc.", "price":399.0}] }}, {
> > "groupValue":"Corsair Microsystems Inc.",
> > "doclist":{"numFound":2,"start":0,"docs":[ { "id":"TWINX2048-3200PRO",
> > "manu":"Corsair Microsystems Inc.", "price":185.0}, { "id":"VS1GB400C3",
> > "manu":"Corsair Microsystems Inc.", "price":74.99}] }}, {
> > "groupValue":"A-DATA Technology Inc.",
> > "doclist":{"numFound":1,"start":0,"docs":[ { "id":"VDBDB1A16",
> > "manu":"A-DATA Technology Inc."}] }}, { "groupValue":"Bank of America",
> > "doclist":{"numFound":1,"start":0,"docs":[ { "id":"USD", "manu":"Bank of
> > America"}] }}, { "groupValue":"European Union",
> > "doclist":{"numFound":1,"start":0,"docs":[ { "id":"EUR",
> > "manu":"European Union"}] }}, { "groupValue":"U.K.",
> > "doclist":{"numFound":1,"start":0,"docs":[ { "id":"GBP", "manu":"U.K."}]
> > }}]}}} ||Thanks,||Shawn|
> >
> > 
>


Re: Update partial document

2018-09-11 Thread Shawn Heisey

On 9/11/2018 10:23 AM, Vincenzo D'Amore wrote:

I suppose to be able to remove attr_1 and add attr_3 with one atomic update.

Like this:

curl -X POST -H 'Content-Type: application/json' '
http://localhost:8983/solr/gettingstarted/update?versions=true=true'
--data-binary '
  [
 {
   "id" : "aaa" ,
   "attr_" : [ "set" : null ],
   "attr_3" : [ "set" : "x" ]
 }
]'


This would probably have worked if you had used "attr_1" instead of 
"attr_".  There is no field named "attr_" in your document, so that line 
does nothing.  Fields in atomic updates must be fully specified. I am 
not aware of any kind of wildcard support.



But as result I only have a new attr_3 field (the field attr_1 is still
there)

  {
 "id":"aaa",
 "value_i":10,
 "attr_1":["a"],
 "attr_3":["x"]
  }

So it seem that, for this particular case, I have first to read the
document and then I can update it.

Do you think there are other options?
Can I use the StatelessScriptUpdateProcessorFactory ?
Should I write my own UpdateProcessor ?


Thanks,
Shawn



Re: Solr RSIZE memory overusage

2018-09-11 Thread Shawn Heisey

On 9/11/2018 11:07 AM, Boris Pasko wrote:

Hi. We're running Solr 6.6.1 (SolrCloud, 3 nodes). Recently I noticed
it became significantly slower to respond and did some basic checks on
servers. There is little IO, a bit of CPU usage (110% user, 3090%
idle),
but one thing is very strange - the resident memory usage of the Solr.

Despite the -Xms=8G and -Xmx=8G and despite that Solr UI shows only
4Gb heap used, the top and atop shows RSIZE=100Gb+ used by Solr.


Can you share the "top" screen so we can see specifically how it looks?

Run top, press shift-M to sort by memory usage, then grab a screenshot.  
Solr is usually one of the big memory consumers, so it should be at or 
near the top of the list.  Use a file sharing website to share the 
screenshot.


https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_POSIX_operating_systems

Thanks,
Shawn



Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Erick Erickson
How this all works will be much clearer if you don't use "group.main=true"

But you still haven't _shown_ us what you _expect_.

In the second query, Solr is doing exactly what you're telling it to.
Return groups of up to three docs lowest-priced docs in each group and
ordering the groups by the lowest price doc appearing in the group.

What I'm guessing you want to do is specify sort=rank asc=price asc

Best,
Erick
On Tue, Sep 11, 2018 at 10:23 AM Shawn Heisey  wrote:
>
> On 9/11/2018 10:14 AM, Venkateswarlu Bommineni wrote:
> > Please find the resonse and query when grouping and sorting by rank :
>
> I see no evidence of grouping happening in either of those responses.
> They look like standard responses do when grouping is not enabled.
>
> Here's an example of a grouped result from the techproducts example that
> ships with Solr.  The "response" section is gone and has been replaced
> by a "grouped" section:
>
> |{ "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*",
> "fl":"id,manu,price", "group.limit":"3", "group.field":"manu",
> "_":"1536686115965", "group":"true"}}, "grouped":{ "manu":{
> "matches":32, "groups":[{ "groupValue":null,
> "doclist":{"numFound":12,"start":0,"docs":[ { "id":"GB18030TEST",
> "price":0.0}, { "id":"adata"}, { "id":"apple"}] }}, {
> "groupValue":"Samsung Electronics Co. Ltd.",
> "doclist":{"numFound":1,"start":0,"docs":[ { "id":"SP2514N",
> "manu":"Samsung Electronics Co. Ltd.", "price":92.0}] }}, {
> "groupValue":"Maxtor Corp.", "doclist":{"numFound":1,"start":0,"docs":[
> { "id":"6H500F0", "manu":"Maxtor Corp.", "price":350.0}] }}, {
> "groupValue":"Belkin", "doclist":{"numFound":2,"start":0,"docs":[ {
> "id":"F8V7067-APL-KIT", "manu":"Belkin", "price":19.95}, { "id":"IW-02",
> "manu":"Belkin", "price":11.5}] }}, { "groupValue":"Apple Computer
> Inc.", "doclist":{"numFound":1,"start":0,"docs":[ { "id":"MA147LL/A",
> "manu":"Apple Computer Inc.", "price":399.0}] }}, {
> "groupValue":"Corsair Microsystems Inc.",
> "doclist":{"numFound":2,"start":0,"docs":[ { "id":"TWINX2048-3200PRO",
> "manu":"Corsair Microsystems Inc.", "price":185.0}, { "id":"VS1GB400C3",
> "manu":"Corsair Microsystems Inc.", "price":74.99}] }}, {
> "groupValue":"A-DATA Technology Inc.",
> "doclist":{"numFound":1,"start":0,"docs":[ { "id":"VDBDB1A16",
> "manu":"A-DATA Technology Inc."}] }}, { "groupValue":"Bank of America",
> "doclist":{"numFound":1,"start":0,"docs":[ { "id":"USD", "manu":"Bank of
> America"}] }}, { "groupValue":"European Union",
> "doclist":{"numFound":1,"start":0,"docs":[ { "id":"EUR",
> "manu":"European Union"}] }}, { "groupValue":"U.K.",
> "doclist":{"numFound":1,"start":0,"docs":[ { "id":"GBP", "manu":"U.K."}]
> }}]}}} ||Thanks,||Shawn|
>
> 


Re: Solr RSIZE memory overusage

2018-09-11 Thread Erick Erickson
The memory usage is probably MMapDirectory, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html,
that's not a problem I'd guess.

The open file handles... and ones being open many times.
1> how many _total_ segment files do you have in all the replicas in
the JVM? A single segment consists of 0_.fdt, 0_.fdx, 0_.tim etc...,
the next segment _1.fdt, _1.fdx... So the total number of files will
be much greater than the number of segments. If this total is in the
100K range, it's kind of normal, maybe. Or something.

2> Do you have any custom code? It's very easy to open a searcher and
_not_ close it in custom code, which will keep file handles open.

3> Are you using CDCR? And if so, are you buffering? That might keep
tlog files open.

4> If you shut down the JVM and restart Solr, do the index files
(segment files and/or TLOGs) disappear? Not sure you can run this
experiment on your system if it's prod, but it'd be some information
to go on, indicating "something" is not closing searchers or the like.
I'm not asking about file handles here, but rather do the files
disappear off disk?

Best,
Erick
On Tue, Sep 11, 2018 at 10:11 AM Boris Pasko  wrote:
>
>
> > Same picture on Solr 6.6.2, tested on various Oracle JVMs ranging
> > from 1.8.0_171 to 1.8.0_171
>
> From 1.8.0_171 to 1.8.0_181
>
>
> –
> The information contained in this message and any attachments may be 
> confidential and/or restricted and protected from disclosure. If the reader 
> of this message is not the intended recipient, disclosure, copying, use, or 
> distribution of the information included in this message is prohibited - 
> please destroy all electronic and paper copies and notify the sender 
> immediately.


Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Shawn Heisey

On 9/11/2018 10:14 AM, Venkateswarlu Bommineni wrote:

Please find the resonse and query when grouping and sorting by rank :


I see no evidence of grouping happening in either of those responses.  
They look like standard responses do when grouping is not enabled.


Here's an example of a grouped result from the techproducts example that 
ships with Solr.  The "response" section is gone and has been replaced 
by a "grouped" section:


|{ "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", 
"fl":"id,manu,price", "group.limit":"3", "group.field":"manu", 
"_":"1536686115965", "group":"true"}}, "grouped":{ "manu":{ 
"matches":32, "groups":[{ "groupValue":null, 
"doclist":{"numFound":12,"start":0,"docs":[ { "id":"GB18030TEST", 
"price":0.0}, { "id":"adata"}, { "id":"apple"}] }}, { 
"groupValue":"Samsung Electronics Co. Ltd.", 
"doclist":{"numFound":1,"start":0,"docs":[ { "id":"SP2514N", 
"manu":"Samsung Electronics Co. Ltd.", "price":92.0}] }}, { 
"groupValue":"Maxtor Corp.", "doclist":{"numFound":1,"start":0,"docs":[ 
{ "id":"6H500F0", "manu":"Maxtor Corp.", "price":350.0}] }}, { 
"groupValue":"Belkin", "doclist":{"numFound":2,"start":0,"docs":[ { 
"id":"F8V7067-APL-KIT", "manu":"Belkin", "price":19.95}, { "id":"IW-02", 
"manu":"Belkin", "price":11.5}] }}, { "groupValue":"Apple Computer 
Inc.", "doclist":{"numFound":1,"start":0,"docs":[ { "id":"MA147LL/A", 
"manu":"Apple Computer Inc.", "price":399.0}] }}, { 
"groupValue":"Corsair Microsystems Inc.", 
"doclist":{"numFound":2,"start":0,"docs":[ { "id":"TWINX2048-3200PRO", 
"manu":"Corsair Microsystems Inc.", "price":185.0}, { "id":"VS1GB400C3", 
"manu":"Corsair Microsystems Inc.", "price":74.99}] }}, { 
"groupValue":"A-DATA Technology Inc.", 
"doclist":{"numFound":1,"start":0,"docs":[ { "id":"VDBDB1A16", 
"manu":"A-DATA Technology Inc."}] }}, { "groupValue":"Bank of America", 
"doclist":{"numFound":1,"start":0,"docs":[ { "id":"USD", "manu":"Bank of 
America"}] }}, { "groupValue":"European Union", 
"doclist":{"numFound":1,"start":0,"docs":[ { "id":"EUR", 
"manu":"European Union"}] }}, { "groupValue":"U.K.", 
"doclist":{"numFound":1,"start":0,"docs":[ { "id":"GBP", "manu":"U.K."}] 
}}]}}} ||Thanks,||Shawn|





Re: Solr RSIZE memory overusage

2018-09-11 Thread Boris Pasko

> Same picture on Solr 6.6.2, tested on various Oracle JVMs ranging
> from 1.8.0_171 to 1.8.0_171

From 1.8.0_171 to 1.8.0_181


–
The information contained in this message and any attachments may be 
confidential and/or restricted and protected from disclosure. If the reader of 
this message is not the intended recipient, disclosure, copying, use, or 
distribution of the information included in this message is prohibited - please 
destroy all electronic and paper copies and notify the sender immediately.


Re: Error while creating a new solr core

2018-09-11 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shalvak,

On 9/11/18 01:51, Shalvak Mittal (UST, ) wrote:
> I have recently installed solr 7.2.1 in my ubuntu 16.04 system.
> While creating a new core, the solr logging shows an error saying
> 
> 
> " Caused by: org.apache.solr.common.SolrException: fips module was
> not loaded."
> 
> 
> I have downloaded the necessary jar files like cryptoj.jar and
> copied them in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/ but
> the error still persists.
> 
> I have also updated the java.security file with 
> security.provider.x=com.rsa.jsafe.provider.JsafeJCE

Does JsafeJCE provide a FIPS-compliant JSSE back-end? If so, it looks
like it's not configured properly.

Does Solr work as expected when you are using the built-in JSSE (Sun)
provider?

> Can you please suggest a solution to the FIPS module problem. Are 
> there any files I am missing while creating the solr core?
You'll have to talk to your security module vendor about fixing this
issue... it's got nothing to do with Solr.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluX9qgACgkQHPApP6U8
pFjM4BAAv1/bXM8VwKo+ObWaRR07Y2YC/K0v6BG1yQnxv5M3OWA9zQnrm0ktAFW6
yYzF/OP8HpDKjXoS/ZahHaDS54hjLwcnBbNzDK6vbSfk556gI55v6RIEpZ/R4aYE
cae7dQIYqiGQ18igIEoGxj8ZXcNHfLmfMhVoLBCd7JJnUucToTUVhpNY4UqzBlBq
sxUzziTuMsm0RWYB4HedK8k0Tg0Sltw1XgYzeFb325Dmhw9HOLQukVvjRHrg/tCW
+n0JVzXJdANqpJpHDhmEnv3/Lw6j/8kl9APOt0cLP3bRAmD2V7QkvDBsNpOnlwiE
TfBjkv4gCkBjcB9aPInMQOdwpVp+i28RqQzw+lMipqCUVY/F0/u45WHuM9BF2IED
7fZ6PhxY953qGn5KSKpg2ol6H5X9BMswI5Az+MMGfri2dNRjgU8UfW2sr/YdrNvN
KMzo9vKsbiTGQ6sxb3Ot1ARjDUivletvI4mGjb5dUwV+xKCpWe+CSrwSZDhk5JsE
mR9jeil7QtMBuSl1ts4KB7JJ4Hlx0bHmSX7UOGSUfoqqrdfKQYDGV0GIBsDfX4uC
olcW4HEmDBnwRkxuAfm+GHCtTyWMOYBkQ3LG0uUD/HptBeXAHtVrMP7Hy5EztDiq
VxdrHG7siEKo/kIUO1yQJxUz7cXo8ZFyS7BCMYQZgiuCU3bdowM=
=8e2z
-END PGP SIGNATURE-


Solr RSIZE memory overusage

2018-09-11 Thread Boris Pasko
Hi. We're running Solr 6.6.1 (SolrCloud, 3 nodes). Recently I noticed
it became significantly slower to respond and did some basic checks on
servers. There is little IO, a bit of CPU usage (110% user, 3090%
idle),
but one thing is very strange - the resident memory usage of the Solr.

Despite the -Xms=8G and -Xmx=8G and despite that Solr UI shows only
4Gb
heap used, the top and atop shows RSIZE=100Gb+ used by Solr.

We are not using direct memory. We have not increased  the java
allowed
direct memory.

Further digging show that solr has 130K+ open files:

$ sudo lsof | grep 5514 | grep REG | wc -l
132104

What is even more worriesome is that some files are literally open
thousand times, and even old tlog files are still kept open:

$ sudo lsof | grep 5514 | grep REG | grep tlog | headjava
5514 solr  142u  REG  252,0 2918073
71565947
/var/db/solr/data/regulatory_shard1_replica1/data/tlog/tlog.000
00038952
java   5514 solr  147u  REG 252,0   2794
71565408
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.31447
00
java   5514 solr  153u  REG 252,0   2869
71565602
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.31446
97
java   5514 solr  160u  REG 252,0   2869
71565823
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.31446
99
java   5514 solr  161u  REG 252,0 428385
71566321
/var/db/solr/data/WebResource_shard1_replica2/data/tlog/tlog.00
075888387
java   5514 solr  162u  REG 252,0  11518
71567726
/var/db/solr/data/RSS_shard2_replica1/data/tlog/tlog.005521
5
java   5514 solr  163u  REG 252,0   1676
71566426
/var/db/solr/data/jobs_shard2_replica2/data/tlog/tlog.31439
93
java   5514 solr  176u  REG 252,0   1199
71565773
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.31447
10
java   5514 solr  179u  REG 252,0   1769
71565833
/var/db/solr/data/jobs_shard2_replica2/data/tlog/tlog.31439
92
java   5514 solr  180u  REG 252,02006034
71565631
/var/db/solr/data/WebResource_shard3_replica1/data/tlog/tlog.00
075897473


$ sudo lsof | grep 5514 | grep REG | grep tlog | grep
/var/db/solr/data/jobs_shard1_replica1/data/tlog/tlog.31447
00
| wc -l
98

So this old tlog file, which (as I understand) supposed to be closed,
is
still open 98 times.

I wonder if that is how Solr is supposed to work (I really doubt it).

Same picture on Solr 6.6.2, tested on various Oracle JVMs ranging
from 1.8.0_171 to 1.8.0_171

Boris.


–
The information contained in this message and any attachments may be 
confidential and/or restricted and protected from disclosure. If the reader of 
this message is not the intended recipient, disclosure, copying, use, or 
distribution of the information included in this message is prohibited - please 
destroy all electronic and paper copies and notify the sender immediately.


Re: local "q.op=AND" ignored for edismax query

2018-09-11 Thread Shawn Heisey

On 9/10/2018 5:45 PM, dshih wrote:

Based on what you said, is my query supposed to work as is if I set
luceneMatchVersion=7.1.0?  It does not appear to.


It does look like a luceneMatchVersion check was added to the change in 
SOLR-11501, so I would expect that to work.  Setting luceneMatchVersion 
will only affect the parts of Lucene and Solr code that have been 
specifically altered to pay attention to it.



Also, my understanding is using the local param makes the AND apply only to
the following search terms provided to the "q" query string.  If I add a
q.op=AND as a separate URL parameter, wouldn't that operator also apply for
everything else in the query operation?


Yes.  But since in most cases localparams have to be at the very 
beginning of the search string anyway, and cannot be in the middle like 
you have them, this is not much of a limitation.


Thanks,
Shawn



Re: Error while creating a new solr core

2018-09-11 Thread Shawn Heisey

On 9/10/2018 11:51 PM, Shalvak Mittal (UST, ) wrote:

I have recently installed solr 7.2.1 in my ubuntu 16.04 system. While creating 
a new core, the solr logging shows an error saying

" Caused by: org.apache.solr.common.SolrException: fips module was not loaded."


I have never heard of a module for Solr called "fips".  It certainly 
isn't one that comes with the Solr package. If you have are dealing with 
a module named fips, you would need to talk to whoever created it for help.


If you are customizing Jetty, you won't find any help here for that.  
Beyond a very few simple changes, we don't know all that much about 
Jetty.  It is used to provide network services for Solr, but this is 
done with a config that isn't changed much from Jetty defaults.



I have downloaded the necessary jar files like cryptoj.jar and copied them in 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/ but the error still persists.


This is a directory location that is meaningless for Solr. You would 
need to talk to someone at OpenJDK about this directory, since it's theirs.


Thanks,
Shawn



Re: Nutch 1.15 Indexing

2018-09-11 Thread Shawn Heisey

On 9/11/2018 12:22 AM, Bineesh wrote:

Need help on Nutch 1.15 indexing issues. We are using Nutch 1.15 and Solr
7.3.1 in our setup

1 : Is there a way i can mention multiple collections in the Nutch 1.15
indexwriters.xml file for the same   ?

I see collection works fine if i hardcoded the collection name in
indexer_solr_1

2 : I need to crwal multiple sites and to be indexed in multiple collections
. How can i achieve this as i cannot hardcoded the collection name in
indexwriters.xml for writer id="indexer_solr_1 everytime.


This is a Solr mailing list, and you are asking for help on Nutch, which 
is a completely separate project.  I've got no idea how Nutch works, 
even the parts that are meant to interface to Solr.


You're going to need to ask for help on the nutch mailing list.

Thanks,
Shawn



Re: Update partial document

2018-09-11 Thread Mikhail Khludnev
Hello, Vincenzo.

What about adding 1 into  "attr_" : [ "set" : null ], ?

On Tue, Sep 11, 2018 at 7:23 PM Vincenzo D'Amore  wrote:

> Hi Solr gurus :)
>
> I have a delicious question (that I'm struggling with), really hope that
> someone can help me.
>
> There is a document with many fields but I have to modify only few of them.
>
> I thought to use atomic update but it seems that I cannot replace an entire
> list of dynamic fields.
>
> Here I try to explain my problem, for example using the schemaless
> configuration, I have a dynamic field:
>
>  stored="true" multiValued="true"/>
>
> And then I have a document :
>
>  {
> "id":"aaa",
> "value_i":10,
> "attr_1":["a"]
>  }
>
> I suppose to be able to remove attr_1 and add attr_3 with one atomic
> update.
>
> Like this:
>
> curl -X POST -H 'Content-Type: application/json' '
> http://localhost:8983/solr/gettingstarted/update?versions=true=true
> '
> --data-binary '
>  [
> {
>   "id" : "aaa" ,
>   "attr_" : [ "set" : null ],
>   "attr_3" : [ "set" : "x" ]
> }
> ]'
>
> But as result I only have a new attr_3 field (the field attr_1 is still
> there)
>
>  {
> "id":"aaa",
> "value_i":10,
> "attr_1":["a"],
> "attr_3":["x"]
>  }
>
> So it seem that, for this particular case, I have first to read the
> document and then I can update it.
>
> Do you think there are other options?
> Can I use the StatelessScriptUpdateProcessorFactory ?
> Should I write my own UpdateProcessor ?
>
> Thanks in advance for your time.
> Vincenzo
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Update partial document

2018-09-11 Thread Vincenzo D'Amore
Hi Solr gurus :)

I have a delicious question (that I'm struggling with), really hope that
someone can help me.

There is a document with many fields but I have to modify only few of them.

I thought to use atomic update but it seems that I cannot replace an entire
list of dynamic fields.

Here I try to explain my problem, for example using the schemaless
configuration, I have a dynamic field:



And then I have a document :

 {
"id":"aaa",
"value_i":10,
"attr_1":["a"]
 }

I suppose to be able to remove attr_1 and add attr_3 with one atomic update.

Like this:

curl -X POST -H 'Content-Type: application/json' '
http://localhost:8983/solr/gettingstarted/update?versions=true=true'
--data-binary '
 [
{
  "id" : "aaa" ,
  "attr_" : [ "set" : null ],
  "attr_3" : [ "set" : "x" ]
}
]'

But as result I only have a new attr_3 field (the field attr_1 is still
there)

 {
"id":"aaa",
"value_i":10,
"attr_1":["a"],
"attr_3":["x"]
 }

So it seem that, for this particular case, I have first to read the
document and then I can update it.

Do you think there are other options?
Can I use the StatelessScriptUpdateProcessorFactory ?
Should I write my own UpdateProcessor ?

Thanks in advance for your time.
Vincenzo

-- 
Vincenzo D'Amore


Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Venkateswarlu Bommineni
Please find the resonse and query when grouping and sorting by rank :
http://localhost:8983/solr/master_shaneco_Product_flip/select?indent=on=rank:[1%20TO%20*]=json=true=true=rank=50=code_String,price,rank=rank+asc=3
{
  "responseHeader":{
"status":0,
"QTime":5,
"params":{
  "q":"rank:[1 TO *]",
  "group.main":"true",
  "indent":"on",
  "fl":"code_String,price,rank",
  "group.limit":"3",
  "sort":"rank asc",
  "rows":"50",
  "wt":"json",
  "group.field":"rank",
  "group":"true"}},
  "response":{"numFound":14640,"start":0,"docs":[
  {
"price":4120.0,
"rank":1},
  {
"price":4210.0,
"rank":1},
  {
"price":4185.0,
"rank":1},
  {
"price":4225.0,
"rank":2},
  {
"price":4270.0,
"rank":2},
  {
"price":4270.0,
"rank":2},
  {
"price":2230.0,
"rank":3},
  {
"price":2110.0,
"rank":3},
  {
"price":2110.0,
"rank":3},
  {
"price":1910.0,
"rank":4},
  {
"price":2175.0,
"rank":4},
  {
"price":2045.0,
"rank":4},
  {
"price":4830.0,
"rank":5},
  {
"price":4845.0,
"rank":5},
  {
"price":4905.0,
"rank":5},
  {
"price":4180.0,
"rank":6},
  {
"price":4485.0,
"rank":6},
  {
"price":4530.0,
"rank":6},
  {
"price":1340.0,
"rank":7},
  {
"price":3535.0,
"rank":7},
  {
"price":1360.0,
"rank":7},
  {
"price":1275.0,
"rank":8},
  {
"price":1165.0,
"rank":8},
  {
"price":1215.0,
"rank":8},
  {
"price":1080.0,
"rank":9},
  {
"price":1075.0,
"rank":9},
  {
"price":1030.0,
"rank":9},
  {
"price":3310.0,
"rank":10},
  {
"price":4030.0,
"rank":10},
  {
"price":2625.0,
"rank":10},
  {
"price":4140.0,
"rank":11},
  {
"price":4140.0,
"rank":11},
  {
"price":3915.0,
"rank":11},
  {
"price":1670.0,
"rank":12},
  {
"price":1610.0,
"rank":12},
  {
"price":1670.0,
"rank":12},
  {
"price":1530.0,
"rank":13},
  {
"price":1530.0,
"rank":13},
  {
"price":1530.0,
"rank":13},
  {
"price":3945.0,
"rank":14},
  {
"price":3955.0,
"rank":14},
  {
"price":4160.0,
"rank":14},
  {
"price":2045.0,
"rank":15},
  {
"price":2545.0,
"rank":15},
  {
"price":2615.0,
"rank":15},
  {
"price":720.0,
"rank":16},
  {
"price":630.0,
"rank":16},
  {
"price":505.0,
"rank":16},
  {
"price":2835.0,
"rank":17},
  {
"price":2835.0,
"rank":17}]
  }}

Requirement is sort on top of those grouped results based on price. Below
is the query i am trying with bit not working.
http://localhost:8983/solr/master_shaneco_Product_flip/select?indent=on=rank:[1%20TO%20*]=json=true=true=rank=50=code_String,price,rank=price+asc=3

{
  "responseHeader":{
"status":0,
"QTime":4,
"params":{
  "q":"rank:[1 TO *]",
  "group.main":"true",
  "indent":"on",
  "fl":"code_String,price,rank",
  "group.limit":"3",
  "sort":"price asc",
  "rows":"50",
  "wt":"json",
  "group.field":"rank",
  "group":"true"}},
  "response":{"numFound":14640,"start":0,"docs":[
  {
"price":10.0,
"rank":1422},
  {
"price":10.0,
"rank":1422},
  {
"price":25.0,
"rank":1533},
  {
"price":25.0,
"rank":1533},
  {
"price":35.0,
"rank":1533},
  {
"price":25.0,
"rank":1766},
  {
"price":25.0,
"rank":1766},
  {
"price":35.0,
"rank":1868},
  {
"price":35.0,
"rank":1868},
  {
"price":40.0,
"rank":1868},
  {
"price":40.0,
"rank":1779},
  {
"price":40.0,
"rank":1779},
  {
"price":45.0,
"rank":1759},
  {
"price":45.0,
"rank":1759},
  {
"price":60.0,
"rank":1759},
  {
"price":55.0,
"rank":2267},
  {
"price":55.0,
"rank":2267},
  {
"price":60.0,
"rank":1272},
  {
"price":60.0,
"rank":1272},
  {
"price":65.0,
"rank":1272},
  {
"price":60.0,
"rank":2356},
  {
"price":60.0,
"rank":2356},
  {
"price":60.0,

Re: Docker and Solr Indexing

2018-09-11 Thread Walter Underwood
4 Gb is very small for Solr.

Solr is not designed for Dockerized, fail-often use.

We use a LOT of Docker ECS, but all of our Solr servers are on EC2
instances. That’s about sixty instances in several clusters.

We run an 8 Gb heap for all our Solr instances. Instances in our biggest
cluster (in terms of index size and doc count) are c4.8xlarge, with 36 vCPU
and 60 Gb of RAM.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 10, 2018, at 11:05 PM, solrnoobie  wrote:
> 
> So we have a dockerized aws environment with the solr docker container having
> only 4 gigs for max ram.
> 
> Our problem is whenever we index, the container containing the leader shard
> will restart after around 2 or less minutes of index time (batch is 50 docs
> per batch with 3 threads in our app thread pool). Because of the container
> restart, indexing will fail because solrJ will throw an invalid content type
> exception because of the quick container restart.
> 
> What can possible casue the issues above?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: group.limit>1 and sorting is not working as expected

2018-09-11 Thread Erick Erickson
This is still confusing:

bq. But the requirement is to sort on all the results we show to the customer.

What does grouping have to do with that statement? Would it be served
by just _not_ grouping at all? If not, why not?

Please provide a small set of example documents and what you want to
show in the two cases.

Best,
Erick
On Mon, Sep 10, 2018 at 6:52 PM Venkateswarlu Bommineni
 wrote:
>
> Hello Erik,
>
> Sorry for the confustion . here is the scenario.
>
> We have 2 fields rank,price for each product. multiple products may have
> same rank but different prices.
>
> So on products listing page, by default we will group on rank and show 3
> products for each group and sort based on Rank.
>
> But Customers can sort on price too. If they do sorting on price then
> sorting is happening inside group instead of all the records.
>
> But the requirement is to sort on all the results we show to the customer.
>
>
>  Query for grouping :
> https://localhost/solr/master_shaneco_Product_flip/select?indent=on=rank:[1%20TO%20*]=json=true=3=true=rank=50=rank+desc
>
>  Query when customer click on sort by price:
>
> https://localhost/solr/master_shaneco_Product_flip/select?indent=on=rank:[1%20TO%20*]=json=true=3=true=rank=50=price+desc
>
>
> Thanks,
> Venkat.
>
> On Mon, Sep 10, 2018 at 5:34 PM Erick Erickson 
> wrote:
>
> > bq. I just wanted to know if there is any attribute which says sort on all
> > the
> > document list instead of relative to group results.
> >
> > I really don't know what you want here. "sort on all the document list"
> > seems
> > like just sorting without grouping.
> >
> > From that problem statement I don't see what grouping has to do with
> > anything.
> >
> > Perhaps if you gave a concrete example of documents and the return you
> > expect we could understand your use-case better.
> >
> > Best,
> > Erick
> > On Mon, Sep 10, 2018 at 3:03 PM Venkateswarlu Bommineni
> >  wrote:
> > >
> > > Thanks for the reply Shawn.
> > >
> > > I have tried multiple combination of group.sort and sort but non of them
> > > worked.
> > >
> > > I just wanted to know if there is any attribute which says sort on all
> > the
> > > document list instead of relative to group results.
> > >
> > > Can you please help if you have any idea or work around ?
> > >
> > > Thanks,
> > > Venakt.
> > >
> > > On Sat, Sep 8, 2018 at 8:40 PM Shawn Heisey  wrote:
> > >
> > > > On 9/8/2018 8:34 PM, Venkateswarlu Bommineni wrote:
> > > > > Query ;
> > > > > https://
> > > >
> > /solr/default/select?fq=rank_int:[1%20TO%20*]=on=json=true=true=
> > > > >
> > > >
> > rank_int=3=*=code_string,sapRank_int,price=price+desc
> > > > >
> > > > > I am grouping on field *rank_int with group limit>3 and doing the
> > sorting
> > > > > on price. sorting is happening inside the group not on whole
> > records.*
> > > >
> > > > You need to find the right combination of sort and group.sort
> > parameters.
> > > >
> > > >
> > > >
> > https://lucene.apache.org/solr/guide/6_6/result-grouping.html#ResultGrouping-RequestParameters
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> >


Re: 504 timeout

2018-09-11 Thread John Blythe
ah, great thought. didn't even think of that. we already have a couple
ngram-based fields. will send over to the stakeholder who was attempting
this.

thanks!

--
John Blythe


On Sun, Sep 9, 2018 at 11:31 PM Erick Erickson 
wrote:

> First of all, wildcards are evil. Be sure that the reason people are
> using wildcards wouldn't be better served by proper tokenizing,
> perhaps something like stemming etc.
>
> Assuming that wildcards must be handled though, there are two main
> strategies:
> 1> if you want to use leading wildcards, look at
> ReverseWildcardFilterFactory. For something like abc* (trailing
> wildcard), conceptually Lucene has to construct a big OR query of
> every term that starts with "abc". That's not hard and is also pretty
> fast, just jump to the first term that starts with "abc" and gather
> all of them (they're sorted lexicaly) until you get to the first term
> starting with "abd".
>
> _Leading_ wildcards are a whole 'nother story. *abc means that each
> and every distinct term in the field must be enumerated. The first
> term could be abc and the last term in the field zzzabc.
> There's no way to tell without checking every one.
> ReverseWildcardFilterFactory handles indexing the term, well, reversed
> so the above example not only would the term abc bb indexed,
> but also cba. Now both leading and trailing wildcards are
> automagically made into trailing wildcards.
>
> 2> If you must allow leading and trailing wildcards on the same term
> *abc*, consider ngramming, bigrams are usually sufficient. So aaabcde
> is indexed as aa, aa, ab, bd, de and searching for *abc* becomes
> searching for "ab bc".
>
> Both of these make the index larger, but usually by surprisingly
> little. People will also index these variants in separate fields upon
> occasion, it depends on the use-cases needed to support. Ngramming for
> instance would find "ab" in the above (no wildcards)
>
> Best,
> Erick
> On Sun, Sep 9, 2018 at 1:40 PM John Blythe  wrote:
> >
> > hi all. we just migrated to cloud on friday night (woohoo!). everything
> is
> > looking good (great!) overall. we did, however, just run into a hiccup.
> > running a query like this got us a 504 gateway time-out error:
> >
> > **some* *foo* *bar* *query**
> >
> > it was about 6 partials with encapsulating wildcards that someone was
> > running that gave the error. doing 4 or 5 of them worked fine, but upon
> > adding the last one or two it went caput. all operations have been
> zippier
> > since the migration before doing some of those wildcard queries which
> took
> > time (if they worked at all). is this something related directly w our
> > server configuration or is there some solr/cloud config'ing that we could
> > work on that would allow better response to these sorts of queries
> (though
> > it'd be at a cost, i'd imagine!).
> >
> > thanks for any insight!
> >
> > best,
> >
> > --
> > John Blythe
>


Speakers needed for Apache DC Roadshow

2018-09-11 Thread Rich Bowen
We need your help to make the Apache Washington DC Roadshow on Dec 4th a 
success.


What do we need most? Speakers!

We're bringing a unique DC flavor to this event by mixing Open Source 
Software with talks about Apache projects as well as OSS CyberSecurity, 
OSS in Government and and OSS Career advice.


Please take a look at: http://www.apachecon.com/usroadshow18/

(Note: You are receiving this message because you are subscribed to one 
or more mailing lists at The Apache Software Foundation.)


Rich, for the ApacheCon Planners

--
rbo...@apache.org
http://apachecon.com
@ApacheCon


Re: Error casting to PointField

2018-09-11 Thread Erick Erickson
point-based fields cannot be used for , see:
https://issues.apache.org/jira/browse/SOLR-10829

This should be documented better in the ref guide
On Tue, Sep 11, 2018 at 5:53 AM Zahra Aminolroaya
 wrote:
>
> We read that in Solr 7, Trie* fields are deprecated, so we decided to change
> all of our Trie* fields to *pointtype Fields.
>
> Our unique key field type is long, and we changed our long field type
> something like below;
>
>  indexed="false"/>
>
> We get the error uniqueKey field can not be configured to use a Points based
> FieldType.
>
>
> I think it is a bug. If lucene decides to deprecate the Trie* filed type, it
> should also think of these kinds of errors.
>
>
> What is the solution?
>
> Best,
> Zahra
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: any way to post json document to a MoreLikeThisHandler?

2018-09-11 Thread Alexandre Rafalovitch
There are three ways to trigger MLT:
https://lucene.apache.org/solr/guide/7_4/morelikethis.html

MoreLikeThisHandler allows to supply text externally. Unfortunately, I
can't find the specific example demonstrating it, so not sure if it
just a blob of text or a document.

Regards,
   Alex.

On 11 September 2018 at 09:55, Matt Work Coarr  wrote:
> Hello,
>
> Using a MoreLikeThisHandler, I was hoping to be able to pass in in the post
> body a json document (the same format as a document indexed in my core, but
> the document in the request is not and should not be added to the core).
>
> I'm thinking it would handle an incoming document similar to how the
> /update handler can split up a json document into the set of fields defined
> in the schema (or auto created fields).
>
> For instance, my input document would look like this:
>
> {
>   "id": 1234,
>   "field1": "blah blah blah",
>   "field2": "foo bar",
>   "field3": 112233
> }
>
> And then I want to be able to use the MoreLikeThis query parameters to
> determine which fields are used in the MLT comparison.
>
> Thanks,
> Matt


any way to post json document to a MoreLikeThisHandler?

2018-09-11 Thread Matt Work Coarr
Hello,

Using a MoreLikeThisHandler, I was hoping to be able to pass in in the post
body a json document (the same format as a document indexed in my core, but
the document in the request is not and should not be added to the core).

I'm thinking it would handle an incoming document similar to how the
/update handler can split up a json document into the set of fields defined
in the schema (or auto created fields).

For instance, my input document would look like this:

{
  "id": 1234,
  "field1": "blah blah blah",
  "field2": "foo bar",
  "field3": 112233
}

And then I want to be able to use the MoreLikeThis query parameters to
determine which fields are used in the MLT comparison.

Thanks,
Matt


Error casting to PointField

2018-09-11 Thread Zahra Aminolroaya
We read that in Solr 7, Trie* fields are deprecated, so we decided to change
all of our Trie* fields to *pointtype Fields. 

Our unique key field type is long, and we changed our long field type
something like below;



We get the error uniqueKey field can not be configured to use a Points based
FieldType.


I think it is a bug. If lucene decides to deprecate the Trie* filed type, it
should also think of these kinds of errors.


What is the solution?

Best,
Zahra




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Error while creating a new solr core

2018-09-11 Thread Shalvak Mittal (UST, )
Hi,


I have recently installed solr 7.2.1 in my ubuntu 16.04 system. While creating 
a new core, the solr logging shows an error saying


" Caused by: org.apache.solr.common.SolrException: fips module was not loaded."


I have downloaded the necessary jar files like cryptoj.jar and copied them in 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/ but the error still persists.

I have also updated the java.security file with
security.provider.x=com.rsa.jsafe.provider.JsafeJCE



Can you please suggest a solution to the FIPS module problem. Are there any 
files I am missing while creating the solr core?


Thank You,

Shalvak Mittal




Nutch 1.15 Indexing

2018-09-11 Thread Bineesh
Hi Team,

Need help on Nutch 1.15 indexing issues. We are using Nutch 1.15 and Solr
7.3.1 in our setup

1 : Is there a way i can mention multiple collections in the Nutch 1.15
indexwriters.xml file for the same   ?

I see collection works fine if i hardcoded the collection name in
indexer_solr_1

2 : I need to crwal multiple sites and to be indexed in multiple collections
. How can i achieve this as i cannot hardcoded the collection name in
indexwriters.xml for writer id="indexer_solr_1 everytime.

Please suggest



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Potential bug? maxConnectionsPerHost on requestHandler configuration

2018-09-11 Thread Greg Roodt
This is expected behaviour. The shardHandlerFactory element is configured
in solr.xml, not solrconfig.xml See:
https://lucene.apache.org/solr/guide/7_4/format-of-solr-xml.html





On Tue, 11 Sep 2018 at 11:55, Ash Ramesh  wrote:

> Hi,
>
> I tried setting up a bespoke ShardHandlerFactory configuration for each
> request handler in solrconfig.xml. However when I stepped through the code
> in debug mode (via IntelliJ) I could see that the ShardHandler created and
> used in the searcher still didn't reflect the values in solrconfig (even
> after a core RELOAD).
>
> I did find that it did reflect changes to the ShardHandlerFactory in
> solr.xml when I changed it, pushed to ZK and restarted Solr.
>
> Is this expected or am I going about this the wrong way.
>
> Example RequestHandler syntax:
>
>  ="search_defaults">
> 
>  class="HttpShardHandlerFactory">
> 6
> 1000
> 99
> 
> 
>
> We are trying to understand why our machines have their CPUs stalling at
> 60-80% all the time. We suspect it's because of the maxConnections, but ran
> into this issue first.
>
> Best Regards,
>
> Ash
>
> --
> *P.S. We've launched a new blog to share the latest ideas and case studies
> from our team. Check it out here: product.canva.com
> . ***
> ** Empowering the world
> to design
> Also, we're hiring. Apply here!
> 
>  
>  
> 
>
>
>
>
>
>


Docker and Solr Indexing

2018-09-11 Thread solrnoobie
So we have a dockerized aws environment with the solr docker container having
only 4 gigs for max ram.

Our problem is whenever we index, the container containing the leader shard
will restart after around 2 or less minutes of index time (batch is 50 docs
per batch with 3 threads in our app thread pool). Because of the container
restart, indexing will fail because solrJ will throw an invalid content type
exception because of the quick container restart.

What can possible casue the issues above?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html