Re: Different number of replicas for different shards

2019-06-30 Thread Nawab Zada Asad Iqbal
@Erick

Actually, i thought further and realized what you were saying. I am hoping
to rely on the murmur3 hash of the routing key to find the destination
shard.



On Sun, Jun 30, 2019 at 3:32 AM Nawab Zada Asad Iqbal 
wrote:

> Hi Erick,
>
> I plan to use the composite-id routing. And I can use the same routing
> part of the key to determine the shard number in ADDREPLICA command (using
> the route parameter). I think this solution will work for me.
>
>
> Thanks
> Nawab
>
>
>
> On Sat, Jun 29, 2019 at 8:55 AM Erick Erickson 
> wrote:
>
>> What’s your basis for thinking that some shard will get more queries?
>> Unless you’re using implicit routing, you really have no control over
>> either where docs end up or thus where queries go.
>>
>> If you do somehow know some shards get more queries, one strategy is to
>> simply have more _replicas_ for those shards with the ADDREPLICA
>> collections API command.
>>
>>
>> > On Jun 29, 2019, at 7:00 AM, Shawn Heisey  wrote:
>> >
>> > On 6/29/2019 12:23 AM, Nawab Zada Asad Iqbal wrote:
>> >> is it possible to specify different number of replicas for different
>> >> shards? i.e if I expect some shard to get more queries , i can add more
>> >> replicas to that shard alone, instead of adding replicas for all the
>> >> shards.
>> >
>> > On initial collection creation, I don't think that is possible -- the
>> number of replicas requested will apply to every shard.  But you can add
>> and remove replicas on shards after collection creation, so this is
>> achievable.
>> >
>> > Thanks,
>> > Shawn
>>
>>


Re: Different number of replicas for different shards

2019-06-30 Thread Nawab Zada Asad Iqbal
Hi Erick,

I plan to use the composite-id routing. And I can use the same routing part
of the key to determine the shard number in ADDREPLICA command (using the
route parameter). I think this solution will work for me.


Thanks
Nawab



On Sat, Jun 29, 2019 at 8:55 AM Erick Erickson 
wrote:

> What’s your basis for thinking that some shard will get more queries?
> Unless you’re using implicit routing, you really have no control over
> either where docs end up or thus where queries go.
>
> If you do somehow know some shards get more queries, one strategy is to
> simply have more _replicas_ for those shards with the ADDREPLICA
> collections API command.
>
>
> > On Jun 29, 2019, at 7:00 AM, Shawn Heisey  wrote:
> >
> > On 6/29/2019 12:23 AM, Nawab Zada Asad Iqbal wrote:
> >> is it possible to specify different number of replicas for different
> >> shards? i.e if I expect some shard to get more queries , i can add more
> >> replicas to that shard alone, instead of adding replicas for all the
> >> shards.
> >
> > On initial collection creation, I don't think that is possible -- the
> number of replicas requested will apply to every shard.  But you can add
> and remove replicas on shards after collection creation, so this is
> achievable.
> >
> > Thanks,
> > Shawn
>
>


Different number of replicas for different shards

2019-06-29 Thread Nawab Zada Asad Iqbal
Hi,

is it possible to specify different number of replicas for different
shards? i.e if I expect some shard to get more queries , i can add more
replicas to that shard alone, instead of adding replicas for all the
shards.

Thanks
Nawab


Re: Solr metric for finding number of merges in progress

2018-09-07 Thread Nawab Zada Asad Iqbal
Actually, i found this after posting;  (still if someone has more to offer,
please reply)

https://lucene.apache.org/solr/guide/7_0/metrics-reporting.html#index-merge-metrics

On Fri, Sep 7, 2018 at 12:13 PM Nawab Zada Asad Iqbal 
wrote:

> Hi,
>
> Does Solr expose a metric to find the number of segment merges which are
> in progress?
>
>
> Thanks
> Nawab
>


Solr metric for finding number of merges in progress

2018-09-07 Thread Nawab Zada Asad Iqbal
Hi,

Does Solr expose a metric to find the number of segment merges which are in
progress?


Thanks
Nawab


Re: Internal details on how ADDREPLICA executes?

2018-08-02 Thread Nawab Zada Asad Iqbal
Thanks Erick!

On Thu, Aug 2, 2018 at 1:26 PM, Erick Erickson 
wrote:

> Oh my, I see confusion on the horizon. _Which_ autoaddreplica?
>
> Pre 7x and autoscaling, autoAddReplica is all about HDFS and
> spinning up a new Solr instance that points to an existing index
> so there's no impact on the leader (other than normal peer sync
> if you're actively indexing).
>
> Autoscaling, on the other hand (7x) spins up a new replica and
> it goes through sync before it's ready to server queries. This is
> really just an ADDREPLICA and you can test the impacts on
> your particular installation by issuing that collections API
> command.
>
> Best,
> Erick
>
> On Thu, Aug 2, 2018 at 12:20 PM, Nawab Zada Asad Iqbal 
> wrote:
> > Hi,
> >
> > I am considering using SolrCloud and enable autoADDREPLICA.
> > I am curious on how long does it take for SolrCloud to setup the replica.
> > Before the new replica can start to serve queries, it needs to copy all
> the
> > documents in the current leader and also index whatever new traffic is
> > arriving. How is this state tracked?  I assume SolrCloud knows about this
> > transient state between a new replica creation and being ready to server
> > queries and handles it appropriately.
> >
> > Do I need to consider any performance issues after executing addReplica
> > (i.e., will it affect the current leader's index or querying response
> time
> > when older segment files are being copied to the new replica).
> >
> >
> > Thanks
> > Nawab
>


Internal details on how ADDREPLICA executes?

2018-08-02 Thread Nawab Zada Asad Iqbal
Hi,

I am considering using SolrCloud and enable autoADDREPLICA.
I am curious on how long does it take for SolrCloud to setup the replica.
Before the new replica can start to serve queries, it needs to copy all the
documents in the current leader and also index whatever new traffic is
arriving. How is this state tracked?  I assume SolrCloud knows about this
transient state between a new replica creation and being ready to server
queries and handles it appropriately.

Do I need to consider any performance issues after executing addReplica
(i.e., will it affect the current leader's index or querying response time
when older segment files are being copied to the new replica).


Thanks
Nawab


Re: SolrCloud: Different replicationFactor for different shards in same collection

2018-07-31 Thread Nawab Zada Asad Iqbal
Thanks Erick


This is for future. I am exploring to use a custom sharding scheme (which
will require modification in Solr code) together with the benefits of
SolrCloud.



Thanks
Nawab



On Tue, Jul 31, 2018 at 4:51 PM, Erick Erickson 
wrote:

> Sure, just use the Collections API ADDREPLICA command to add as many
> replicas for specific shards as you want. There's no way to specify
> that at creation time though.
>
> Some of the new autoscaling can do this automatically I believe.
>
> I have to ask what it is about your collection that this is true. If
> you're using the default composite id routing having one shard get
> substantially more queries than the others is unexpected.
>
> If you're using implicit routing then it's perfectly understandable.
>
> Best,
> Erick
>
> On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal 
> wrote:
> > Hi,
> >
> > I am looking at Solr 7.x and couldn't find an answer in the
> documentation.
> > Is it possibly to specify different replicationFactor for different
> shards
> > in same collection? E.g. if a certain shard is receiving more queries
> than
> > rest of the collection  I would like to add more replicas for it to help
> > with the query load.
> >
> >
> >
> > Thanks
> > Nawab
>


SolrCloud: Different replicationFactor for different shards in same collection

2018-07-31 Thread Nawab Zada Asad Iqbal
Hi,

I am looking at Solr 7.x and couldn't find an answer in the documentation.
Is it possibly to specify different replicationFactor for different shards
in same collection? E.g. if a certain shard is receiving more queries than
rest of the collection  I would like to add more replicas for it to help
with the query load.



Thanks
Nawab


Re: Mysterious Solr crash

2018-06-04 Thread Nawab Zada Asad Iqbal
I am using 7.0.1  without SolrCloud (sorry for missing that detail earlier).

I totally agree with you, Shawn. An crash after an OOM is not graceful like
this and it usually has an incomplete logline in the end.

I tried to only copy the log which seemed related to this issue, I will see
if I find any other meaningful part and will copy here.

On Sun, Jun 3, 2018 at 6:07 PM, Shawn Heisey  wrote:

> On 6/3/2018 7:52 AM, Nawab Zada Asad Iqbal wrote:
>
>> I am running a batch indexing job and Solr core mysteriously shut down
>> without any particular error. How can I investigate this? I am focusing on
>> the line which mentions "Shutting down CoreContainer instance".
>>
>> There are errors soon after that, but they seemed to be caused by the core
>> shutting down and not the other way round; although i can be wrong.
>>
>
> Crashes in Solr are extremely rare.  If it actually does happen, it would
> probably be caused by severe issues with Java or the operating system, or
> hardware problems like failing memory chips.
>
> If you've got a recent version of Solr and it's running on an OS that's
> NOT Windows, then an OutOfMemoryError is likely going to result in Solr
> being killed.  But if that happens, all logging stops -- Solr will not log
> anything about the CoreContainer shutting down.
>
> The most likely cause here is that something triggered the shutdown hook
> in Solr, which caused an orderly shutdown of the CoreContainer, so you got
> that log message.  In a nutshell, Solr shut down because something external
> explicitly told it to shut down.
>
> Can you share the entire solr.log file?  I can't guarantee that the file
> will have anything in it that explains what went wrong, but we can look.
>
> Thanks,
> Shawn
>
>


Mysterious Solr crash

2018-06-03 Thread Nawab Zada Asad Iqbal
Good morning

I am running a batch indexing job and Solr core mysteriously shut down
without any particular error. How can I investigate this? I am focusing on
the line which mentions "Shutting down CoreContainer instance".

There are errors soon after that, but they seemed to be caused by the core
shutting down and not the other way round; although i can be wrong.


Thanks
Nawab


Jun 03, 2018 03:02:02 AM INFO  (qtp761960786-1049) [   x:filesearch]
o.a.s.u.p.LogUpdateProcessorFactory [filesearch]  webapp=/solr path=/update
params={commit=false}{add=[file_30594113079, file_221594113074,
file_92594113074, file_93594113076, file_94594113078, file_155594113071,
file_285594113073, file_286594113075, file_13694113071, file_16694113077]}
0 201
Jun 03, 2018 03:02:02 AM INFO  (qtp761960786-911) [   x:filesearch]
o.a.s.u.p.LogUpdateProcessorFactory [filesearch]  webapp=/solr path=/update
params={commit=false}{add=[file_98928108921, file_113038108926,
file_114038108928, file_46038108921, file_48038108925, folder_48038108925,
file_100138108926, file_291138108921, file_292138108923, file_33138108921]}
0 113
Jun 03, 2018 03:02:02 AM INFO  (Thread-0) [   ] o.a.s.c.CoreContainer
Shutting down CoreContainer instance=1021436681
Jun 03, 2018 03:02:02 AM INFO  (Thread-0) [   ] o.a.s.m.SolrMetricManager
Closing metric reporters for registry=solr.node, tag=null
Jun 03, 2018 03:02:02 AM INFO  (Thread-0) [   ] o.a.s.m.SolrMetricManager
Closing metric reporters for registry=solr.jvm, tag=null
Jun 03, 2018 03:02:02 AM INFO  (Thread-0) [   ] o.a.s.m.SolrMetricManager
Closing metric reporters for registry=solr.jetty, tag=null
Jun 03, 2018 03:02:02 AM INFO  (qtp761960786-851) [   x:filesearch]
o.a.s.u.p.LogUpdateProcessorFactory [filesearch]  webapp=/solr path=/update
params={commit=false}{add=[file_34202892465, file_35202892467,
file_226202892462]} 0 75
Jun 03, 2018 03:02:02 AM INFO  (qtp761960786-1124) [   x:filesearch]
o.a.s.u.p.LogUpdateProcessorFactory [filesearch]  webapp=/solr path=/update
params={commit=false}{add=[folder_31457908173, file_32457908175,
folder_32457908175, file_33457908177, file_34457908179, file_94457908170,
folder_20557908177, file_19557908175]} 0 115
Jun 03, 2018 03:02:02 AM ERROR (qtp761960786-851) [   x:filesearch]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Early EOF
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:190)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)


Query logs when query reached the solr server

2018-05-31 Thread Nawab Zada Asad Iqbal
Hi,

1. Is there a way to enable query log when the query first reached the solr
server?
2. In recent Solr versions, there is a NOW value in the query log. Is it
correct to assume that this is the time when query arrived on that server?



Thanks
Nawab


Solr admin Segments page legend

2018-05-17 Thread Nawab Zada Asad Iqbal
Hi,

Solr has a nice segments visualization at [core_name]/segments , but I am
wondering what the colors mean?

Gray color is probably deleted documents., But I couldn't guess the
significance of pink color:



Thanks
Nawab


Re: Solr 7.3 debug/explain with boost applied

2018-04-24 Thread Nawab Zada Asad Iqbal
I didn't know you can add boosts like that (=2 ). Are you boosting on
a field or document by using that syntax?

On Sun, Apr 22, 2018 at 10:51 PM, Ryan Yacyshyn 
wrote:

> Hi all,
>
> When viewing the explain under debug=true in Solr 7.3.0 using
> the edismax query parser with a boost, I only see the "boost" part of the
> explain. Without applying a boost I see the full explain. Is this the
> expected behaviour?
>
> Here's how to check using the techproducts example..
>
> bin/solr -e techproducts
>
> ```
> http://localhost:8983/solr/techproducts/select?q={!
> edismax}samsung=name=true
> ```
>
> returns:
>
> ```
> "debug": {
> "rawquerystring": "{!edismax}samsung",
> "querystring": "{!edismax}samsung",
> "parsedquery": "+DisjunctionMaxQuery((name:samsung))",
> "parsedquery_toString": "+(name:samsung)",
> "explain": {
>   "SP2514N": "\n2.3669035 = weight(name:samsung in 1)
> [SchemaSimilarity], result of:\n  2.3669035 = score(doc=1,freq=1.0 =
> termFreq=1.0\n), product of:\n2.6855774 = idf, computed as log(1 +
> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  1.0 = docFreq\n
> 21.0 = docCount\n0.8813388 = tfNorm, computed as (freq * (k1 + 1))
> / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n  1.0
> = termFreq=1.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
> 7.5238094 = avgFieldLength\n  10.0 = fieldLength\n"
> },
> "QParser": "ExtendedDismaxQParser",
> ...
> ```
>
> If I just add =2 to this, I get this explain back:
>
> ```
> "debug": {
> "rawquerystring": "{!edismax}samsung",
> "querystring": "{!edismax}samsung",
> "parsedquery": "FunctionScoreQuery(FunctionScoreQuery(+(name:samsung),
> scored by boost(const(2",
> "parsedquery_toString": "FunctionScoreQuery(+(name:samsung), scored by
> boost(const(2)))",
> "explain": {
>   "SP2514N": "\n4.733807 = product of:\n  1.0 = boost\n  4.733807 =
> boost(const(2))\n"
> },
> "QParser": "ExtendedDismaxQParser",
> ...
> ```
>
> Is this normal? I was expecting to see more like the first example, with
> the addition of the boost applied.
>
> Thanks,
> Ryan
>


Re: Solr document routing using composite key

2018-04-07 Thread Nawab Zada Asad Iqbal
Thanks Shawn and Erick.

This is what I also ended up finding, as the number of buckets increased, I
noticed the issue.

Zheng: I am using Solr7. But this was only an experiment on the hash, i.e.,
what distribution should I expect from it. (as the above gist shows). I
didn't actually index into solr7 but would expect it to do something like
the above if I had actually indexed in solr with these partitions and Ids.





On Fri, Mar 16, 2018 at 9:24 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> What Shawn said. 117 shards and 116 docs tells you absolutely nothing
> useful. I've never seen the number of docs on various shards be off by
> more than 2-3% when enough docs are indexed to be statistically valid.
>
> Best,
> Erick
>
> On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote:
> >>
> >> I have 117 shards and i tried to use document ids from zero to 116. I
> find
> >> that the distribution is very uneven, e.g., the largest bucket receives
> >> total 5 documents; and around 38 shards will be empty.  Is it expected?
> >
> >
> > With such a small data set, this fits what I would expect.
> >
> > Choosing buckets by hashing (which is what compositeId does) is not
> perfect,
> > but if you send it thousands or millions of documents, it will be
> > *generally* balanced.
> >
> > Thanks,
> > Shawn
> >
>


Solr Metrics mismatch the query logs

2018-03-28 Thread Nawab Zada Asad Iqbal
Hi,


I gather solr metrics from the following URL for my request handler.

[solr-host]/solr/core1/admin/mbeans?stats=true=json



Specifically, there is a requests value in the stats map for each handler.
I use this value to calculate the requests which arrived between two
successive intervals when I retrieved the stats.

What i want to know is:

How is this number updated? At the start of the query handler  execution or
near the end? I tried to grep the code but couldn't find the place.

How does this number match with queryStartTime and queryEndTime which
appear in INFO log for each query.

I have noticed that sometimes there is a big spike in the numbers coming
from the metrics api (requests count), while the count of the queries in
the logs don't jump as drastically.


PS: I am using solr4 hence the metrics api url may not work for recent
versions.


Thanks!
Nawab


Solr document routing using composite key

2018-03-06 Thread Nawab Zada Asad Iqbal
Hi solr community:


I have been thinking to use composite key for my next project iteration and
tried it today to see how it distributes the documents.

Here is a gist of my code:
https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478

I have 117 shards and i tried to use document ids from zero to 116. I find
that the distribution is very uneven, e.g., the largest bucket receives
total 5 documents; and around 38 shards will be empty.  Is it expected?

In the following result: value1 is the shard number, value 2 is a list of
documents which it received.

List(98:List(29)
, 34:List(36)
, 8:List(54)
, 73:List(31)
, 19:List(77)
, 23:List(59)
, 62:List(86)
, 77:List(105)
, 11:List(11)
, 104:List(23)
, 44:List(4)
, 37:List(0)
, 61:List(71)
, 107:List(37)
, 46:List(34)
, 99:List(19)
, 24:List(32)
, 94:List(90)
, 103:List(106)
, 72:List(97)
, 59:List(2)
, 76:List(6)
, 54:List(20)
, 65:List(3)
, 71:List(26)
, 108:List(17)
, 106:List(57)
, 17:List(108)
, 25:List(13)
, 60:List(56)
, 102:List(87)
, 69:List(60)
, 64:List(53)
, 53:List(85)
, 42:List(35)
, 115:List(82)
, 0:List(28)
, 20:List(27)
, 81:List(39)
, 101:List(92)
, 30:List(16)
, 41:List(63)
, 3:List(10)
, 91:List(21)
, 85:List(18)
, 28:List(8)
, 113:List(76, 95)
, 51:List(47, 102)
, 78:List(30, 67)
, 4:List(52, 84)
, 110:List(112, 116)
, 9:List(1, 40)
, 50:List(22, 101)
, 13:List(72, 83)
, 35:List(73, 100)
, 16:List(48, 64)
, 112:List(69, 103)
, 10:List(14, 66)
, 87:List(68, 104)
, 57:List(49, 114)
, 36:List(79, 99)
, 1:List(24, 70)
, 96:List(5, 98)
, 95:List(45, 89)
, 75:List(9, 91)
, 70:List(62, 78)
, 2:List(74, 75)
, 114:List(81, 88)
, 74:List(7, 115)
, 52:List(46, 111)
, 55:List(12, 50, 113)
, 47:List(43, 44, 96)
, 92:List(25, 33, 58)
, 39:List(15, 41, 61, 107)
, 21:List(38, 51, 55, 93, 110)
, 27:List(42, 65, 80, 94, 109)
)


Re: Are flushed (but not committed yet) segments mutable?

2018-02-10 Thread Nawab Zada Asad Iqbal
Thanks Erick!

On Sat, Feb 10, 2018 at 11:37 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> "I also see some segment merges before the hard-commit executes, which
> make me think that flush converts the in-memory data-structures into
> Lucene"
>
> That's my understanding. Essentially each flush creates a new segment
> that gets merged sometime.
>
> "How is a flushed-but-not-committed segment different from a committed
> segment?"
>
> In a nutshell, it hasn't been added to the "segments_n" file, which
> contains a list of all of the segments as of the last commit point.
> Segments added for whatever reason since the last hard commit aren't
> added to that file. So say Solr is killed before committing. When it
> restarts it sees the segments_n file that contains the old "picture"
> of the index. If tlogs are around, then Solr replays the documents
> since that point.
>
> Best,
> Erick
>
> On Fri, Feb 9, 2018 at 8:07 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Hi,
> >
> > When a segment is flushed to disk because it is exceeding available
> memory,
> > is it sill updated when new documents are added? I also read somewhere
> that
> > a segment is not committed even if it is flushed. How is a
> > flushed-but-not-committed segment different from a committed segment?
> >
> > For example, my hard-commit is scheduled for every 30 seconds, but many
> > segments are flushed during this interval. Are they flushed as in-memory
> > data structures (which will keep them optimal for updates) or are they
> > immutable?
> >
> > I also see some segment merges before the hard-commit executes, which
> make
> > me think that flush converts the in-memory data-structures into Lucene
> > segment.
> >
> > Thanks
> > Nawab
>


Are flushed (but not committed yet) segments mutable?

2018-02-09 Thread Nawab Zada Asad Iqbal
Hi,

When a segment is flushed to disk because it is exceeding available memory,
is it sill updated when new documents are added? I also read somewhere that
a segment is not committed even if it is flushed. How is a
flushed-but-not-committed segment different from a committed segment?

For example, my hard-commit is scheduled for every 30 seconds, but many
segments are flushed during this interval. Are they flushed as in-memory
data structures (which will keep them optimal for updates) or are they
immutable?

I also see some segment merges before the hard-commit executes, which make
me think that flush converts the in-memory data-structures into Lucene
segment.

Thanks
Nawab


Using 'learning to rank' with user specific features

2018-01-23 Thread Nawab Zada Asad Iqbal
hi,

I am going through learning to rank examples in Solr7. In the examples, the
features are part of the searched document.  Can I use solr's learning to
rank system if my features are user specific? e.g., if searching for
products, i want to rank some products higher if they have been used by
current user's friends.

Initially, i was thinking of tracking a list of 'friend products' with each
user so that after the query, i re-rank the results if any of the resulting
item is also in 'friend products' list. This list is being generated
outside the solr server. Can I user Solr7's re-ranking functionality with
this list?

One alternative is to extract features (e.g., like category, price buckets
etc.) of the products and save them in the solr documents; and then also
deduce the features of user's 'friend products' instead of just keeping the
raw product list. However, in both cases the search user has its specific
feature values.



Thanks
Nawab


Substitute for SmartChineseWordTokenFilterFactory in Solr7

2018-01-23 Thread Nawab Zada Asad Iqbal
Hi,

I used to use SmartChineseSentenceTokenizerFactory and
SmartChineseWordTokenFilterFactory in Solr4 for analyzing Chinese text. In
solr7, I found that SmartChineseSentenceTokenizerFactory is replaced with
HMMChineseTokenizerFactory, but i cannot locate a Chinese Word Tokenizer.
Is that not needed anymore?



Thanks
Nawab


Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-17 Thread Nawab Zada Asad Iqbal
Chris / Hoss

Thanks for the detailed explanation. Erick Erickson's explanation made
sense to me but it didn't explain the part why the fields are different for
'hello' vs '*:*' .

I had never paid much attention the parser part of query handling and so
far focused only on the field definitions. I had to re-read parts of this
thread to understand the whole picture.

I had dropped an apparently unnecessary question but this thread has
provided a lot of necessary learning.


Thanks
Nawab

On Fri, Jan 12, 2018 at 10:38 AM, Chris Hostetter 
wrote:

>
> : defType=dismax does NOT do anything special with *:* other than treat it
> ...
> : > As Chris explained, this is special:
> ...
>
> I'm interpreting your followup question differently then Erick & Erik
> did.  I'm going to assume both E & E missunderstood your question, and i'm
> going to assume you completley understood my response to your original
> question.
>
> I'm going to assume that a way to rewrod/expand your followup question is
> something like this...
>
> "I understand now that defType=dismax doesn't support special syntax like
> '*:*' and treats that 3 input as just another 3 character string to search
> against the qf & pf fields -- but now what i don't understand is why are
> list of fields in the debug query output is different for 'q=*:*' compared
> to something like 'q=hello'"
>
> (If i have not understood your followup question correctly, please
> clarify)
>
> Let's look at those outputs you mentioned...
>
> : >> http://localhost:8983/solr/filesearch/select?fq=id:1193;
> : >> q=*:*=true
> : >>
> : >>
> : >>   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* |
> user_name:*:* |
> : >>   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) |
> : >> id:*:*)~0.01)
> : >>   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
> : >>   tags:*:*)~0.01)",
> ...
> : >> e.g. following query uses the my expected set of pf and qf.
> ...
> : >> http://localhost:8983/solr/filesearch/select?fq=id:1193;
> : >> q=hello=true
> : >>
> : >>
> : >>
> : >>   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
> : >>   user_email:hello | (name_combined:hello)^10.0 |
> (name_zh-cn:hello)^10.0
> : >> |
> : >>   name_shingle:hello | comments:hello | user_name:hello |
> : >> description:hello |
> : >>   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
> : >>   file_content_it:hell | file_content_fr:hello | file_content_es:hell
> |
> : >>   file_content_en:hello | id:hello)~0.01)
> : >> DisjunctionMaxQuery((description:hello
> : >>   | (name_shingle:hello)^100.0 | comments:hello | tags:hello)~0.01)",
>
>
> The answer has to do with the list of qf & pf fields you have confiugred
> -- you didn't provide us with concrete specifics of what qf/pf you
> have configured in your requestHandler -- but you did mention in your
> second example that "following query uses the my expected set of pf and
> qf"
>
> By comparing the 2 examples at a glance, It appears that the fields in the
> first example (q=*:* ... again, searching for the literal 3 character
> string '*:*') are (mostly) a subset of the fields you "expected" (from the
> 2nd example)
>
> I'm fairly certain that what's happening here is that in both examples the
> literal string input is being given to the analyzer for all of your fields
> -- but in the case of the (literal) string '*:*' many of the analyzers are
> producing no terms at all -- ie: they are completley striping out
> punctuation -- so they don't appear in the final query.
>
> IIUC it looks like one other oddity here is that the reverse also
> seems to be true in some cases -- i suspect that
> although "name_shingle_zh-cn" doesn't appera in your 2nd example, it
> probably *is* in your pf param but whatever analyzer you have confiured
> for it produces no tokens for the latin characters "hello" but does
> produces tokens for the pure-punctuation characters "*:*"
>
>
> (If i'm correct about your question, but wrong about your qf/pf then
> please provide us with a lot more details -- notably your full
> schema/solrconfig used when executing those queries.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-06 Thread Nawab Zada Asad Iqbal
Thanks everyone, that was a very informative thread.

One more curiosity: why are different set of fields being used based on the
query string:-


http://localhost:8983/solr/filesearch/select?fq=id:1193;
q=*:*=true


   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* | user_name:*:* |
   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) | id:*:*)~0.01)
   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
   tags:*:*)~0.01)",



I find it perplexing as the default values for qf and pf are very different
from above so I am not sure where these fields are coming from (although
they are all valid fields)
e.g. following query uses the my expected set of pf and qf.

http://localhost:8983/solr/filesearch/select?fq=id:1193;
q=hello=true



   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
   user_email:hello | (name_combined:hello)^10.0 | (name_zh-cn:hello)^10.0 |
   name_shingle:hello | comments:hello | user_name:hello | description:hello |
   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
   file_content_it:hell | file_content_fr:hello | file_content_es:hell |
   file_content_en:hello | id:hello)~0.01)
DisjunctionMaxQuery((description:hello
   | (name_shingle:hello)^100.0 | comments:hello | tags:hello)~0.01)",


On Sat, Jan 6, 2018 at 12:05 PM, Chris Hostetter 
wrote:

>
> : Yes, i am using dismax. But dismax allows *:* for q.alt ,which also seems
> : like inconsistency.
>
> dismax is a *parser* that affects how a single query string is parsed.
>
> when you use defType=dismax, that only changes how the "q" param is
> parsed -- not any other query string params, like "fq" or "facet.query"
> (or "q.alt")
>
> when you have a request like "defType=dismax==*:*" what you are
> saying, and what solr is doing, is...
>
> * YOU: hey solr, use dismax as the default parser for the q param
> * SEARCHHANDLER: ok, if the "q" param does not use local params to
> override the parser, i will use dismax
> * SEARCHHANDLER: hey dismax qparser, go parse the string ""
> * DISMAXQP: that string is empty, so instead we should use q.alt
> * SEARCHHANDLER: ok, i will parse the q.alt param and use that query in
> place of the empty q param
> * SEARCHHANDLER: hey lucene qparser, the string "*:*" does not use local
> params to override the parser, please parse it
> * LUCENEQP: the string "*:*" is a MatchAllDocsQuery
> * SEARCHHANDLER: cool, i'll use that as my main query
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: CommonGramsFilter

2018-01-05 Thread Nawab Zada Asad Iqbal
Actually, i have found that it is *not* mandatory to use phrase search with
CommonGramsFilter .

PS: i had some other code change (which is unnecessary) which was causing
the above behavior.

On Thu, Jan 4, 2018 at 6:56 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> After some debugging, it  seems that the search works if the query is
> phrase search (i.e, enclosed in quotes)
>
> http://localhost:8983/solr/filesearch/select?q=%22not%
> 20to%20or%20be%22=true
>
> This works both in case of sow=true or false.
>
> Is it mandatory to use phrase search to properly pass the stopwords to the
> CommonGramsFilter?
>
>
>
>
>
> On Thu, Jan 4, 2018 at 6:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am looking at this documentation and wondering if it would be better to
>> optionally skip indexing of original stopwords.
>>
>> https://lucene.apache.org/solr/guide/6_6/filter-descriptions
>> .html#FilterDescriptions-CommonGramsFilter
>>
>> http://localhost:8983/solr/filesearch/select?q=not%20to%20or
>> %20be=true
>>
>>
>>- parsedquery: "+(-DisjunctionMaxQuery((commongram_field2:to)~0.01)
>>DisjunctionMaxQuery((commongram_field2:be)~0.01))~1",
>>
>>
>>
>> Other parameters are:
>>
>>
>>- params: {
>>   - mm: " 1<-0% ",
>>   - q.alt: "*:*",
>>   - ps: "100",
>>   - echoParams: "all",
>>   - sort: "score desc",
>>   - rows: "35",
>>   - version: "2.2",
>>   - q: "not to or be",
>>   - tie: "0.01",
>>   - defType: "edismax",
>>   - qf: "commongram_field2",
>>   - sow: "false",
>>   - wt: "json",
>>   - debugQuery: "true"
>>   }
>>
>>
>> And it doesn't match my document, which has following fields:
>>
>>
>>- id: "9191",
>>- commongram_field2: "not to or be",
>>
>>
>>
>> Commongram is defined as:
>>
>> > stored="true" omitPositions="false"/>
>>
>> > positionIncrementGap="100">
>>   
>> 
>> 
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" preserveOriginal="0"
>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
>> 
>> > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>> 
>> 
>> > words="stopwords.txt" ignoreCase="true"/>
>> 
>> > maxTokenCount="1" consumeAllTokens="false"/>
>> 
>>   
>>   
>> 
>> 
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" preserveOriginal="0"
>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
>> > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>> 
>> 
>> > words="stopwords.txt" ignoreCase="true"/>
>> 
>>   
>> 
>>
>>
>> I am not sure what I am missing. I have also set sow=false so that the
>> whole query string is sent to field's analysis chain instead of sending
>> word by word. But that didnt' seem to help.
>>
>> Thanks
>> Nawab
>>
>
>


Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-05 Thread Nawab Zada Asad Iqbal
HI Erik Hatcher

Yes, i am using dismax. But dismax allows *:* for q.alt ,which also seems
like inconsistency.

On Thu, Jan 4, 2018 at 5:53 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:

> defType=???  Probably dismax.  It doesn’t do *:* like edismax or lucene.
>
> > On Jan 4, 2018, at 20:39, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> >
> > Thanks Erik
> > Here is the output,
> >
> > http://localhost:8983/solr/filesearch/select?fq=id:1193;
> q.alt=*:*=true
> >
> >
> >   - parsedquery: "+MatchAllDocsQuery(*:*)",
> >
> >
> >
> > http://localhost:8983/solr/filesearch/select?fq=id:1193;
> q=*:*=true
> >
> >
> >   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* | user_name:*:* |
> >   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) |
> id:*:*)~0.01)
> >   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
> >   tags:*:*)~0.01)",
> >
> >
> >
> > I find it perplexing as the default values for qf and pf are very
> different
> > from above so I am not sure where these fields are coming from (although
> > they are all valid fields)
> > e.g. following query uses the my expected set of pf and qf.
> >
> > http://localhost:8983/solr/filesearch/select?fq=id:1193;
> q=hello=true
> >
> >
> >
> >   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
> >   user_email:hello | (name_combined:hello)^10.0 |
> (name_zh-cn:hello)^10.0 |
> >   name_shingle:hello | comments:hello | user_name:hello |
> description:hello |
> >   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
> >   file_content_it:hell | file_content_fr:hello | file_content_es:hell |
> >   file_content_en:hello | id:hello)~0.01)
> >   DisjunctionMaxQuery((description:hello | (name_shingle:hello)^100.0 |
> >   comments:hello | tags:hello)~0.01)",
> >
> >
> >
> >
> >
> > On Thu, Jan 4, 2018 at 5:22 PM, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> Hmm, seems odd. What happens when you attach =query? I'm curious
> how
> >> the parsed queries differ.
> >>
> >>> On Jan 4, 2018 15:14, "Nawab Zada Asad Iqbal" <khi...@gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> In my SearchHandler solrconfig, i have q.alt=*:* . This allows me to
> run
> >>> queries which only have `fq` filters and no `q`.
> >>>
> >>> If I remove q.alt from the solrconfig and specify `q=*:*` in the query
> >>> parameters, it does not give any results. I also tried `q=*` but of no
> >>> avail.
> >>>
> >>> Is there some good reason for this behavior? Since I already know a
> work
> >>> around, this question is only for my curiosity.
> >>>
> >>>
> >>> Thanks
> >>> Nawab
> >>>
> >>
>


Re: CommonGramsFilter

2018-01-04 Thread Nawab Zada Asad Iqbal
After some debugging, it  seems that the search works if the query is
phrase search (i.e, enclosed in quotes)

http://localhost:8983/solr/filesearch/select?q=%22not%20to%20or%20be%22=true

This works both in case of sow=true or false.

Is it mandatory to use phrase search to properly pass the stopwords to the
CommonGramsFilter?





On Thu, Jan 4, 2018 at 6:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I am looking at this documentation and wondering if it would be better to
> optionally skip indexing of original stopwords.
>
> https://lucene.apache.org/solr/guide/6_6/filter-descriptions
> .html#FilterDescriptions-CommonGramsFilter
>
> http://localhost:8983/solr/filesearch/select?q=not%20to%
> 20or%20be=true
>
>
>- parsedquery: "+(-DisjunctionMaxQuery((commongram_field2:to)~0.01)
>DisjunctionMaxQuery((commongram_field2:be)~0.01))~1",
>
>
>
> Other parameters are:
>
>
>- params: {
>   - mm: " 1<-0% ",
>   - q.alt: "*:*",
>   - ps: "100",
>   - echoParams: "all",
>   - sort: "score desc",
>   - rows: "35",
>   - version: "2.2",
>   - q: "not to or be",
>   - tie: "0.01",
>   - defType: "edismax",
>   - qf: "commongram_field2",
>   - sow: "false",
>   - wt: "json",
>   - debugQuery: "true"
>   }
>
>
> And it doesn't match my document, which has following fields:
>
>
>- id: "9191",
>- commongram_field2: "not to or be",
>
>
>
> Commongram is defined as:
>
>  stored="true" omitPositions="false"/>
>
>  positionIncrementGap="100">
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" preserveOriginal="0"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
> 
>  pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
> 
>  maxTokenCount="1" consumeAllTokens="false"/>
> 
>   
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" preserveOriginal="0"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
>  pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
> 
>   
> 
>
>
> I am not sure what I am missing. I have also set sow=false so that the
> whole query string is sent to field's analysis chain instead of sending
> word by word. But that didnt' seem to help.
>
> Thanks
> Nawab
>


CommonGramsFilter

2018-01-04 Thread Nawab Zada Asad Iqbal
Hi,

I am looking at this documentation and wondering if it would be better to
optionally skip indexing of original stopwords.

https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#
FilterDescriptions-CommonGramsFilter

http://localhost:8983/solr/filesearch/select?q=not%20to%20or%20be=true


   - parsedquery: "+(-DisjunctionMaxQuery((commongram_field2:to)~0.01)
   DisjunctionMaxQuery((commongram_field2:be)~0.01))~1",



Other parameters are:


   - params: {
  - mm: " 1<-0% ",
  - q.alt: "*:*",
  - ps: "100",
  - echoParams: "all",
  - sort: "score desc",
  - rows: "35",
  - version: "2.2",
  - q: "not to or be",
  - tie: "0.01",
  - defType: "edismax",
  - qf: "commongram_field2",
  - sow: "false",
  - wt: "json",
  - debugQuery: "true"
  }


And it doesn't match my document, which has following fields:


   - id: "9191",
   - commongram_field2: "not to or be",



Commongram is defined as:




  











  
  








  



I am not sure what I am missing. I have also set sow=false so that the
whole query string is sent to field's analysis chain instead of sending
word by word. But that didnt' seem to help.

Thanks
Nawab


Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-04 Thread Nawab Zada Asad Iqbal
Thanks Erik
Here is the output,

http://localhost:8983/solr/filesearch/select?fq=id:1193=*:*=true


   - parsedquery: "+MatchAllDocsQuery(*:*)",



http://localhost:8983/solr/filesearch/select?fq=id:1193=*:*=true


   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* | user_name:*:* |
   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) | id:*:*)~0.01)
   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
   tags:*:*)~0.01)",



I find it perplexing as the default values for qf and pf are very different
from above so I am not sure where these fields are coming from (although
they are all valid fields)
e.g. following query uses the my expected set of pf and qf.

http://localhost:8983/solr/filesearch/select?fq=id:1193=hello=true



   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
   user_email:hello | (name_combined:hello)^10.0 | (name_zh-cn:hello)^10.0 |
   name_shingle:hello | comments:hello | user_name:hello | description:hello |
   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
   file_content_it:hell | file_content_fr:hello | file_content_es:hell |
   file_content_en:hello | id:hello)~0.01)
   DisjunctionMaxQuery((description:hello | (name_shingle:hello)^100.0 |
   comments:hello | tags:hello)~0.01)",





On Thu, Jan 4, 2018 at 5:22 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Hmm, seems odd. What happens when you attach =query? I'm curious how
> the parsed queries differ.
>
> On Jan 4, 2018 15:14, "Nawab Zada Asad Iqbal" <khi...@gmail.com> wrote:
>
> > Hi,
> >
> > In my SearchHandler solrconfig, i have q.alt=*:* . This allows me to run
> > queries which only have `fq` filters and no `q`.
> >
> > If I remove q.alt from the solrconfig and specify `q=*:*` in the query
> > parameters, it does not give any results. I also tried `q=*` but of no
> > avail.
> >
> > Is there some good reason for this behavior? Since I already know a work
> > around, this question is only for my curiosity.
> >
> >
> > Thanks
> > Nawab
> >
>


trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-04 Thread Nawab Zada Asad Iqbal
Hi,

In my SearchHandler solrconfig, i have q.alt=*:* . This allows me to run
queries which only have `fq` filters and no `q`.

If I remove q.alt from the solrconfig and specify `q=*:*` in the query
parameters, it does not give any results. I also tried `q=*` but of no
avail.

Is there some good reason for this behavior? Since I already know a work
around, this question is only for my curiosity.


Thanks
Nawab


Re: Small Tokenization issue

2018-01-03 Thread Nawab Zada Asad Iqbal
Thanks Emir, Erick.

What i want to do is remove empty tokens after WordDelimiterGraphFilter ?
Is there any such option in WordDelimiterGraphFilter to not generate empty
tokens?

This index field is intended to use for strange strings e.g. part numbers.
P/N HSC0424PP
The benefit of removing the empty tokens is that if someone unintentionally
puts a space around the '/' (in above example) this field is still able to
match.

In previous solr version, ShingleFilter used to work fine in case of empty
positions and was making shingles across the empty space. Although, it is
possible that i have learned to rely on a bug.






On Wed, Jan 3, 2018 at 12:23 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Nawab,
> The reason why you do not get shingle is because there is empty token
> because after tokenizer you have 3 tokens ‘abc’, ‘-’ and ‘def’ so the token
> that you are interested in are not next to each other and cannot form
> shingle.
> What you can do is apply char filter before tokenization to remove ‘-‘
> something like:
>
>   pattern=“\s*-\s*” replacement=“ ”/>
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Jan 2018, at 21:04, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote:
> >
> > Hi,
> >
> > So, I have a string for indexing:
> >
> > abc - def (notice the space on either side of hyphen)
> >
> > which is being processed with this filter-list:-
> >
> >
> > > positionIncrementGap="100">
> >  
> > > class="org.apache.lucene.analysis.icu.ICUNormalizer2CharFilterFactory"
> > name="nfkc" mode="compose"/>
> >
> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" preserveOriginal="0"
> > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
> >
> > > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
> >
> >
> > > outputUnigrams="false" fillerToken=""/>
> >
> > > maxTokenCount="1" consumeAllTokens="false"/>
> >
> >  
> >
> >
> > I get two shingle tokens at the end:
> >
> > "abc" "def"
> >
> > I want to get "abc def" . What can I tweak to get this?
> >
> >
> > Thanks
> > Nawab
>
>


Small Tokenization issue

2018-01-03 Thread Nawab Zada Asad Iqbal
Hi,

So, I have a string for indexing:

abc - def (notice the space on either side of hyphen)

which is being processed with this filter-list:-



  











  


I get two shingle tokens at the end:

"abc" "def"

I want to get "abc def" . What can I tweak to get this?


Thanks
Nawab


Re: fq: OR operator (sometimes) not working

2017-12-27 Thread Nawab Zada Asad Iqbal
Thanks Erick for pushing me into the right direction.

so sow=false, but i think that it is the default behavior so I didn't
expect this to cause any strange outcome.  However the reason Folder_id is
being treated differently than the others is the schema definition.
Folder_id is a long. While file_type is defined as Keyword (keywrod
tokenizer doesn't seem to split on space).
So I guess my solution is to:
1) either sow=true at query time or
2) write the query differently for file_type OR
3) modify the definition of file_type field in the schema.

I guess (2) is a safe option.

Thanks
Nawab

On Wed, Dec 27, 2017 at 5:17 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> OK, that's definitely weird. A separate fq clause like
> fq={!q.op=OR}file_type:(jpg jpeg)
>
> should _not_ parse in to:
> file_type:jpg jpeg
>
> Hmmm, any possibility that Split On Whitespace is somehow being set
> (SOW) to false? Why in the world it would only show up like this is a
> mystery, just askin'.
>
> It's probably worth building it up a bit and playing around with
> reordering the fq clauses just to see if it's some weird interaction
> there. That's not a cure, but data to add to an (eventual I'd expect)
> JIRA.
>
> For instance, if you more the folder_id part after the file_type, is
> it different? If you remove that bit all together, does the problem
> still persist? What about the user_id part of the middle clause? Does
> removing that make a difference?
>
> If you do raise a JIRA, you need to include:
> 1> the raw query. Please don't edit at all (security policies allowing).
> 2> the debug=query output (full)
> 3> your request handler from solrconfig.xml
> 4> your field definitions and associated types.
>
> Because it worked for me just fine in the simple case, so some
> non-obvious combination of things is causing this. Since neither of us
> know _what_, include everything ;)..
>
> Best,
> Erick
>
> On Wed, Dec 27, 2017 at 4:45 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Thanks Erik. Yes some similar queries are also working for me.
> >
> > "file_type:(jpg%20OR%20jpeg)" and "{!q.op=OR}file_type:(jpg OR jpeg)" are
> > translated into  the following which is correct.
> >
> >- "file_type:jpg file_type:jpeg"
> >
> > While "{!q.op=OR}file_type:(jpg jpeg)" is translated into file_type:jpg
> > jpeg
> >
> >
> > Here is the complete list of my filter queries. You can see that the
> second
> > query is translated very differently from the third. Though i am not sure
> > if the second query is also correctly parsed or not.
> >
> >
> >
> >- filter_queries: [
> >   - "id:file_258470818866",
> >   - "{!q.op=OR}folder_id:(23329074268 12033480380 36928119693
> >   25894325891 25982100517 25895234569 25894295930 39367823449
> 40634891514
> >   41056556633 42045264481 41307354636 14370419636 14370432839
> 24723808252
> >   24723839431) user_id:(642129292)",
> >   - "{!q.op=OR}file_type:(jpg jpeg)"
> >   ],
> >- parsed_filter_queries: [
> >   - "id:file_258470818866",
> >   - "(IndexOrDocValuesQuery(folder_id:[23329074268 TO 23329074268])
> >   IndexOrDocValuesQuery(folder_id:[12033480380 TO 12033480380])
> >   IndexOrDocValuesQuery(folder_id:[36928119693 TO 36928119693])
> >   IndexOrDocValuesQuery(folder_id:[25894325891 TO 25894325891])
> >   IndexOrDocValuesQuery(folder_id:[25982100517 TO 25982100517])
> >   IndexOrDocValuesQuery(folder_id:[25895234569 TO 25895234569])
> >   IndexOrDocValuesQuery(folder_id:[25894295930 TO 25894295930])
> >   IndexOrDocValuesQuery(folder_id:[39367823449 TO 39367823449])
> >   IndexOrDocValuesQuery(folder_id:[40634891514 TO 40634891514])
> >   IndexOrDocValuesQuery(folder_id:[41056556633 TO 41056556633])
> >   IndexOrDocValuesQuery(folder_id:[42045264481 TO 42045264481])
> >   IndexOrDocValuesQuery(folder_id:[41307354636 TO 41307354636])
> >   IndexOrDocValuesQuery(folder_id:[14370419636 TO 14370419636])
> >   IndexOrDocValuesQuery(folder_id:[14370432839 TO 14370432839])
> >   IndexOrDocValuesQuery(folder_id:[24723808252 TO 24723808252])
> >   IndexOrDocValuesQuery(folder_id:[24723839431 TO 24723839431]))
> >   IndexOrDocValuesQuery(user_id:[642129292 TO 642129292])",
> >   - "file_type:jpg jpeg"
> >   ]
> >
> >
> >
> >
> >
> > On Wed, Dec 27, 2017 at 4:27 PM, Erick Erickson <erickerick...@gmail.com
> >
> &g

Re: fq: OR operator (sometimes) not working

2017-12-27 Thread Nawab Zada Asad Iqbal
Thanks Erik. Yes some similar queries are also working for me.

"file_type:(jpg%20OR%20jpeg)" and "{!q.op=OR}file_type:(jpg OR jpeg)" are
translated into  the following which is correct.

   - "file_type:jpg file_type:jpeg"

While "{!q.op=OR}file_type:(jpg jpeg)" is translated into file_type:jpg
jpeg


Here is the complete list of my filter queries. You can see that the second
query is translated very differently from the third. Though i am not sure
if the second query is also correctly parsed or not.



   - filter_queries: [
  - "id:file_258470818866",
  - "{!q.op=OR}folder_id:(23329074268 12033480380 36928119693
  25894325891 25982100517 25895234569 25894295930 39367823449 40634891514
  41056556633 42045264481 41307354636 14370419636 14370432839 24723808252
  24723839431) user_id:(642129292)",
  - "{!q.op=OR}file_type:(jpg jpeg)"
  ],
   - parsed_filter_queries: [
  - "id:file_258470818866",
  - "(IndexOrDocValuesQuery(folder_id:[23329074268 TO 23329074268])
  IndexOrDocValuesQuery(folder_id:[12033480380 TO 12033480380])
  IndexOrDocValuesQuery(folder_id:[36928119693 TO 36928119693])
  IndexOrDocValuesQuery(folder_id:[25894325891 TO 25894325891])
  IndexOrDocValuesQuery(folder_id:[25982100517 TO 25982100517])
  IndexOrDocValuesQuery(folder_id:[25895234569 TO 25895234569])
  IndexOrDocValuesQuery(folder_id:[25894295930 TO 25894295930])
  IndexOrDocValuesQuery(folder_id:[39367823449 TO 39367823449])
  IndexOrDocValuesQuery(folder_id:[40634891514 TO 40634891514])
  IndexOrDocValuesQuery(folder_id:[41056556633 TO 41056556633])
  IndexOrDocValuesQuery(folder_id:[42045264481 TO 42045264481])
  IndexOrDocValuesQuery(folder_id:[41307354636 TO 41307354636])
  IndexOrDocValuesQuery(folder_id:[14370419636 TO 14370419636])
  IndexOrDocValuesQuery(folder_id:[14370432839 TO 14370432839])
  IndexOrDocValuesQuery(folder_id:[24723808252 TO 24723808252])
  IndexOrDocValuesQuery(folder_id:[24723839431 TO 24723839431]))
  IndexOrDocValuesQuery(user_id:[642129292 TO 642129292])",
  - "file_type:jpg jpeg"
  ]





On Wed, Dec 27, 2017 at 4:27 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> 1> similar queries work for me just fine with the techproducts exapmle
> 2> that's not what I wanted, you just reiterated the _input_.
> I asked for the results when adding =query to the string so you
> can see the parsed query.
> You should see something similar to:
>
> "parsed_filter_queries":["file_type:jpg file_type:jpeg"]}
>
> Best,
> Erick
>
> On Wed, Dec 27, 2017 at 3:59 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > 1. input: fq={!q.op=OR}file_type:(jpg%20jpeg)  (fails, no results)
> >
> >- fq: [
> >   - "id:file_258470818866",
> >   - "{!q.op=OR}file_type:(jpg jpeg)"
> >   ],
> >
> >
> >
> >
> > 2. input: fq={!q.op=OR}file_type:(jpg%20OR%20jpeg) (This works)
> >
> >
> >- fq: [
> >   - "id:file_258470818866",
> >   - "{!q.op=OR}file_type:(jpg OR jpeg)"
> >   ],
> >
> >
> > 3. input: =file_type:(jpg%20OR%20jpeg) (This also works)
> >
> >
> >- fq: [
> >   - "id:file_258470818866",
> >   - "file_type:(jpg OR jpeg)"
> >   ],
> >
> >
> >
> > PS: I am using 7.0.0 (including almost all the updates from 7.0.1).
> >
> > Regards
> > Nawab
> > On Wed, Dec 27, 2017 at 3:54 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> What does adding =query show in the two cases?
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Dec 27, 2017 at 3:40 PM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > Are the following two queries equal:
> >> >
> >> > In my understanding, I can specify the arguments the operator once in
> the
> >> > {} local parameter syntax (example 1) or I can interleave OR between
> >> > different clauses  (example 2). But I am getting my result in the
> second
> >> > case only. What am I doing wrong?
> >> >
> >> > This was working fine in Solr 4 but not in Solr 7.
> >> >
> >> >
> >> > 1:
> >> > .../solr/filesearch/select?fq=id:258470818866={!q.op=OR}
> >> file_type:(jpg%20jpeg)
> >> > --> Returns nothing.
> >> >
> >> >
> >> > 2:
> >> > .../solr/filesearch/select?fq=id:258470818866={!q.op=OR}
> >> file_type:(jpg%20OR%20jpeg)
> >> > --> This returns the required document.
> >> >
> >> >
> >> >
> >> > Thanks
> >> > Nawab
> >>
>


Re: fq: OR operator (sometimes) not working

2017-12-27 Thread Nawab Zada Asad Iqbal
1. input: fq={!q.op=OR}file_type:(jpg%20jpeg)  (fails, no results)

   - fq: [
  - "id:file_258470818866",
  - "{!q.op=OR}file_type:(jpg jpeg)"
  ],




2. input: fq={!q.op=OR}file_type:(jpg%20OR%20jpeg) (This works)


   - fq: [
  - "id:file_258470818866",
  - "{!q.op=OR}file_type:(jpg OR jpeg)"
  ],


3. input: =file_type:(jpg%20OR%20jpeg) (This also works)


   - fq: [
  - "id:file_258470818866",
  - "file_type:(jpg OR jpeg)"
  ],



PS: I am using 7.0.0 (including almost all the updates from 7.0.1).

Regards
Nawab
On Wed, Dec 27, 2017 at 3:54 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> What does adding =query show in the two cases?
>
> Best,
> Erick
>
> On Wed, Dec 27, 2017 at 3:40 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Hi,
> >
> > Are the following two queries equal:
> >
> > In my understanding, I can specify the arguments the operator once in the
> > {} local parameter syntax (example 1) or I can interleave OR between
> > different clauses  (example 2). But I am getting my result in the second
> > case only. What am I doing wrong?
> >
> > This was working fine in Solr 4 but not in Solr 7.
> >
> >
> > 1:
> > .../solr/filesearch/select?fq=id:258470818866={!q.op=OR}
> file_type:(jpg%20jpeg)
> > --> Returns nothing.
> >
> >
> > 2:
> > .../solr/filesearch/select?fq=id:258470818866={!q.op=OR}
> file_type:(jpg%20OR%20jpeg)
> > --> This returns the required document.
> >
> >
> >
> > Thanks
> > Nawab
>


fq: OR operator (sometimes) not working

2017-12-27 Thread Nawab Zada Asad Iqbal
Hi,

Are the following two queries equal:

In my understanding, I can specify the arguments the operator once in the
{} local parameter syntax (example 1) or I can interleave OR between
different clauses  (example 2). But I am getting my result in the second
case only. What am I doing wrong?

This was working fine in Solr 4 but not in Solr 7.


1:
.../solr/filesearch/select?fq=id:258470818866={!q.op=OR}file_type:(jpg%20jpeg)
--> Returns nothing.


2:
.../solr/filesearch/select?fq=id:258470818866={!q.op=OR}file_type:(jpg%20OR%20jpeg)
--> This returns the required document.



Thanks
Nawab


Re: Do i need to reindex after changing similarity setting

2017-11-30 Thread Nawab Zada Asad Iqbal
This JIRA also throws some light. There is a discussion of encoding norm
during indexing. The contributor eventually comments that "norms" encoded
by different similarity are compatible to each other.

On Thu, Nov 30, 2017 at 5:12 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi Walter,
>
> I read the following line in reference docs, what does it mean by as long
> as the global similarity allows it:
>
> "
>
> A field type may optionally specify a  that will be used
> when scoring documents that refer to fields with this type, as long as the
> "global" similarity for the collection allows it.
> "
>
> On Wed, Nov 22, 2017 at 9:11 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
>> Thanks Walter
>>
>> On Mon, Nov 20, 2017 at 4:59 PM Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>
>>> Similarity is query time.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>> > On Nov 20, 2017, at 4:57 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I want to switch to Classic similarity instead of BM25 (default in
>>> solr7).
>>> > Do I need to reindex all cores after this? Or is it only a query time
>>> > setting?
>>> >
>>> >
>>> > Thanks
>>> > Nawab
>>>
>>>
>


Re: Do i need to reindex after changing similarity setting

2017-11-30 Thread Nawab Zada Asad Iqbal
Hi Walter,

I read the following line in reference docs, what does it mean by as long
as the global similarity allows it:

"

A field type may optionally specify a  that will be used when
scoring documents that refer to fields with this type, as long as the
"global" similarity for the collection allows it.
"

On Wed, Nov 22, 2017 at 9:11 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Thanks Walter
>
> On Mon, Nov 20, 2017 at 4:59 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> Similarity is query time.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Nov 20, 2017, at 4:57 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I want to switch to Classic similarity instead of BM25 (default in
>> solr7).
>> > Do I need to reindex all cores after this? Or is it only a query time
>> > setting?
>> >
>> >
>> > Thanks
>> > Nawab
>>
>>


Re: Do i need to reindex after changing similarity setting

2017-11-22 Thread Nawab Zada Asad Iqbal
Thanks Walter

On Mon, Nov 20, 2017 at 4:59 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> Similarity is query time.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 20, 2017, at 4:57 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I want to switch to Classic similarity instead of BM25 (default in
> solr7).
> > Do I need to reindex all cores after this? Or is it only a query time
> > setting?
> >
> >
> > Thanks
> > Nawab
>
>


Re: Solr7: Very High number of threads on aggregator node

2017-11-22 Thread Nawab Zada Asad Iqbal
Rick

Your suspicion is correct. I mostly reused my config from solr4 except
where it was deprecated or obsoleted and I switched to the newer configs:
Having said that I couldn't find any new query related settings which can
impact us, since most of our queries dont use fancy new features.

I couldn't find a decent way to copy long xml here, so i created this
stackoverflow thread:-

https://stackoverflow.com/questions/47439503/solr-7-0-1-aggregator-node-spinning-many-threads


Thanks!
Nawab


On Mon, Nov 20, 2017 at 3:10 PM, Rick Leir  wrote:

> Nawab
> Why it would be good to share the solrconfigs: I had a suspicion that you
> might be using the same solrconfig for version 7 and 4.5. That is unlikely
> to work well. But I could be way off base.
> Rick
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>


Do i need to reindex after changing similarity setting

2017-11-20 Thread Nawab Zada Asad Iqbal
Hi,

I want to switch to Classic similarity instead of BM25 (default in solr7).
Do I need to reindex all cores after this? Or is it only a query time
setting?


Thanks
Nawab


Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Nawab Zada Asad Iqbal
@rick
I see many indexing config, but i don't see any config related to query
(i.e., number of threads etc.) in solrconfig. What will be the relevant
part for this area?  In jetty threadpool is set to 1.

@Toke:
I have a webserver which uses solr for querying, this i guess is pretty
typical. At times, there are 50 users sending queries at a given second.
Sometimes, the queries take a few second to finish (i.e., if the max across
all shards is 5 second  due to any local reason, even if the median is
sub-second, the aggregator query will take 5 seconds). This can cause some
query load to build up on the aggregator node. This is all fine and
understandable. Now, the load and testclient is identical for both solr4.5
and solr7, what can be causing solr7 aggregator to spin more threads? I
also agree that 4000 threads is not useful, so the solution is not to
increase the threadlimit for the process, rather it is somewhere else.



Thanks
Nawab




On Sat, Nov 18, 2017 at 10:22 AM, Rick Leir <rl...@leirtech.com> wrote:

> Nawab
> You probably need to share the relevant config to get an answer to this.
> Cheers -- Rick
>
> On November 17, 2017 2:19:03 PM EST, Nawab Zada Asad Iqbal <
> khi...@gmail.com> wrote:
> >Hi,
> >
> >I have a sharded solr7 cluster and I am using an aggregator node (which
> >has
> >no data/index of its own) to distribute queries and aggregate results
> >from
> >the shards. I am puzzled that when I use solr7 on the aggregator node,
> >then
> >number of threads shoots up to 32000 on that host and then the process
> >reaches its memory limits. However, when i use solr4 on the aggregator,
> >then it all seems to work fine. The peak number of threads during my
> >testing were around 4000 or so. The test load is same in both cases,
> >except
> >that it doesn't finish in case of solr7 (due to the memory / thread
> >issue).
> >The memory settings and Jetty  threadpool setting (max=1) are also
> >consistent in both servers (solr 4 and solr 7).
> >
> >
> >Has anyone else been in similar circumstances?
> >
> >
> >Thanks
> >Nawab
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Solr7: Very High number of threads on aggregator node

2017-11-17 Thread Nawab Zada Asad Iqbal
Hi,

I have a sharded solr7 cluster and I am using an aggregator node (which has
no data/index of its own) to distribute queries and aggregate results from
the shards. I am puzzled that when I use solr7 on the aggregator node, then
number of threads shoots up to 32000 on that host and then the process
reaches its memory limits. However, when i use solr4 on the aggregator,
then it all seems to work fine. The peak number of threads during my
testing were around 4000 or so. The test load is same in both cases, except
that it doesn't finish in case of solr7 (due to the memory / thread
issue).
The memory settings and Jetty  threadpool setting (max=1) are also
consistent in both servers (solr 4 and solr 7).


Has anyone else been in similar circumstances?


Thanks
Nawab


Re: Solr7: Bad query throughput around commit time

2017-11-12 Thread Nawab Zada Asad Iqbal
Thanks Erik, Yes i see the 'sawtooth' pattern. I will try your suggestion,
but i am wondering why were the queries performant with solr4 without
DocValues? Have some defaults changed?

---



On Sat, Nov 11, 2017 at 8:28 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Nawab:
>
> bq: Cache hit ratios are all in 80+% (even when i decreased the
> filterCache to 128)
>
> This suggests that you use a relatively small handful of fq clauses,
> which is perfectly fine. Having 450M docs and a cache size of 1024 is
> _really_ scary! You had a potential for a 57G (yes, gigabyte)
> filterCache. Fortunately you apparently don't use enough different fq
> clauses to fill it up, or they match very few documents. I cheated a
> little, if the result set is small the individual doc IDs are stored
> rather than a bitset 450M bits wide Your
> admin>>core>>plugins/stats>>filterCache should show you how many
> evictions there are which is another interesting stat.
>
> As it is, you're filterCache might use up 7G or so. Hefty but you have
> lots of RAM.
>
> *
> bq:  Document cache hitratio is really bad,
>
> This is often the case. Getting documents really means, here, getting
> the _stored_ values. The point of the documentCache is to keep entries
> in a cache for the various elements of a single request to use. To
> name just 2
> > you get the stored values for the "fl" list
> > you highlight.
>
> These are separate, and each accesses the stored values. Problem is,
> "accessing the stored values" means
> 1> reading the document from disk
> 2> decompressing a 16K block minimum.
>
> I'm skipping the fact that returning docValues doesn't need the stored
> data, but you get the idea.
>
> Anyway, not having to read/decompress for both the"fl" list and
> highlighting is what the documentCache is about. That's where the
> recommendation "size it as (max # of users) * (max rows)"
> recommendation comes in (if you can afford the memory certainly).
>
> Some users have situations where the documentCache hit ratio is much
> better, but I'd be surprised if any core with 450M docs even got
> close.
>
> *
> bq: That supported the hypothesis that the query throughput decreases
> after opening a new searcher and **not** after committing the index
>
> Are you saying that you have something of a sawtooth pattern? I.e.
> queries are slow "for a while" after opening a new searcher but then
> improve until the next commit? This is usually an autowarm problem, so
> you might address it with a more precise autowarm. Look particularly
> for anything that sorts/groups/facets. Any such fields should have
> docValues=true set. Unfortunately this will require a complete
> re-index. Don't be frightened by the fact that enabling docValues will
> cause your index size on disk to grow. Paradoxically that will
> actually _lower_ the size of the JVM heap requirements. Essentially
> the additional size on disk is the serialized structure that would
> have to be built in the JVM. Since it is pre-built at index time, it
> can be MMapped and use OS memory space and not JVM.
>
> *
> 450M docs and 800G index size is quite large and a prime candidate for
> sharding FWIW.
>
> Best,
> Erick
>
>
>
>
> On Sat, Nov 11, 2017 at 4:52 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > ~248 gb
> >
> > Nawab
> >
> >
> > On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <kris...@apache.org> wrote:
> >
> >> > One machine runs with a 3TB drive, running 3 solr processes (each with
> >> one core as described above).
> >>
> >> How much total memory on the machine?
> >>
> >> Kevin Risden
> >>
> >> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >> wrote:
> >>
> >> > Thanks for a quick and detailed response, Erick!
> >> >
> >> > Unfortunately i don't have a proof, but our servers with solr 4.5 are
> >> > running really nicely with the above config. I had assumed that same
> or
> >> > similar settings will also perform well with Solr 7, but that
> assumption
> >> > didn't hold. As, a lot has changed in 3 major releases.
> >> > I have tweaked the cache values as you suggested but increasing or
> >> > decreasing doesn't seem to do any noticeable improvement.
> >> >
> >> > At the moment, my one core has 800GB index, ~450 Million documents,
> 48 G
> >> > Xmx. GC pauses haven't been an issue though.  One machine run

Re: Solr7: Bad query throughput around commit time

2017-11-11 Thread Nawab Zada Asad Iqbal
~248 gb

Nawab


On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <kris...@apache.org> wrote:

> > One machine runs with a 3TB drive, running 3 solr processes (each with
> one core as described above).
>
> How much total memory on the machine?
>
> Kevin Risden
>
> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
> > Thanks for a quick and detailed response, Erick!
> >
> > Unfortunately i don't have a proof, but our servers with solr 4.5 are
> > running really nicely with the above config. I had assumed that same  or
> > similar settings will also perform well with Solr 7, but that assumption
> > didn't hold. As, a lot has changed in 3 major releases.
> > I have tweaked the cache values as you suggested but increasing or
> > decreasing doesn't seem to do any noticeable improvement.
> >
> > At the moment, my one core has 800GB index, ~450 Million documents, 48 G
> > Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
> > drive, running 3 solr processes (each with one core as described
> above).  I
> > agree that it is a very atypical system so i should probably try
> different
> > parameters with a fresh eye to find the solution.
> >
> >
> > I tried with autocommits (commit with opensearcher=false very half
> minute ;
> > and softcommit every 5 minutes). That supported the hypothesis that the
> > query throughput decreases after opening a new searcher and **not** after
> > committing the index . Cache hit ratios are all in 80+% (even when i
> > decreased the filterCache to 128, so i will keep it at this lower value).
> > Document cache hitratio is really bad, it drops to around 40% after
> > newSearcher. But i guess that is expected, since it cannot be warmed up
> > anyway.
> >
> >
> > Thanks
> > Nawab
> >
> >
> >
> > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> > > What evidence to you have that the changes you've made to your configs
> > > are useful? There's lots of things in here that are suspect:
> > >
> > >   1
> > >
> > > First, this is useless unless you are forceMerging/optimizing. Which
> > > you shouldn't be doing under most circumstances. And you're going to
> > > be rewriting a lot of data every time See:
> > >
> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> > > documents-optimize-may-bad/
> > >
> > > filterCache size of size="10240" is far in excess of what we usually
> > > recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> > > Why did you choose this? On the theory that "more is better?" If
> > > you're using NOW then you may not be using the filterCache well, see:
> > >
> > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> > >
> > > autowarmCount="1024"
> > >
> > > Every time you commit you're firing off 1024 queries which is going to
> > > spike the CPU a lot. Again, this is super-excessive. I usually start
> > > with 16 or so.
> > >
> > > Why are you committing from a cron job? Why not just set your
> > > autocommit settings and forget about it? That's what they're for.
> > >
> > > Your queryResultCache is likewise kind of large, but it takes up much
> > > less space than the filterCache per entry so it's probably OK. I'd
> > > still shrink it and set the autowarm to 16 or so to start, unless
> > > you're seeing a pretty high hit ratio, which is pretty unusual but
> > > does happen.
> > >
> > > 48G of memory is just asking for long GC pauses. How many docs do you
> > > have in each core anyway? If you're really using this much heap, then
> > > it'd be good to see what you can do to shrink in. Enabling docValues
> > > for all fields you facet, sort or group on will help that a lot if you
> > > haven't already.
> > >
> > > How much memory on your entire machine? And how much is used by _all_
> > > the JVMs you running on a particular machine? MMapDirectory needs as
> > > much OS memory space as it can get, see:
> > >
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> > >
> > > Lately we've seen some structures that consume memory until a commit
> > > happens (either soft or hard). I'd shrink my autocommit down to 60
> > > seconds or even less (openSearcher=false).
> > >
>

Re: Solr7: Bad query throughput around commit time

2017-11-11 Thread Nawab Zada Asad Iqbal
Thanks for a quick and detailed response, Erick!

Unfortunately i don't have a proof, but our servers with solr 4.5 are
running really nicely with the above config. I had assumed that same  or
similar settings will also perform well with Solr 7, but that assumption
didn't hold. As, a lot has changed in 3 major releases.
I have tweaked the cache values as you suggested but increasing or
decreasing doesn't seem to do any noticeable improvement.

At the moment, my one core has 800GB index, ~450 Million documents, 48 G
Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
drive, running 3 solr processes (each with one core as described above).  I
agree that it is a very atypical system so i should probably try different
parameters with a fresh eye to find the solution.


I tried with autocommits (commit with opensearcher=false very half minute ;
and softcommit every 5 minutes). That supported the hypothesis that the
query throughput decreases after opening a new searcher and **not** after
committing the index . Cache hit ratios are all in 80+% (even when i
decreased the filterCache to 128, so i will keep it at this lower value).
Document cache hitratio is really bad, it drops to around 40% after
newSearcher. But i guess that is expected, since it cannot be warmed up
anyway.


Thanks
Nawab



On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> What evidence to you have that the changes you've made to your configs
> are useful? There's lots of things in here that are suspect:
>
>   1
>
> First, this is useless unless you are forceMerging/optimizing. Which
> you shouldn't be doing under most circumstances. And you're going to
> be rewriting a lot of data every time See:
>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-
> documents-optimize-may-bad/
>
> filterCache size of size="10240" is far in excess of what we usually
> recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> Why did you choose this? On the theory that "more is better?" If
> you're using NOW then you may not be using the filterCache well, see:
>
> https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
>
> autowarmCount="1024"
>
> Every time you commit you're firing off 1024 queries which is going to
> spike the CPU a lot. Again, this is super-excessive. I usually start
> with 16 or so.
>
> Why are you committing from a cron job? Why not just set your
> autocommit settings and forget about it? That's what they're for.
>
> Your queryResultCache is likewise kind of large, but it takes up much
> less space than the filterCache per entry so it's probably OK. I'd
> still shrink it and set the autowarm to 16 or so to start, unless
> you're seeing a pretty high hit ratio, which is pretty unusual but
> does happen.
>
> 48G of memory is just asking for long GC pauses. How many docs do you
> have in each core anyway? If you're really using this much heap, then
> it'd be good to see what you can do to shrink in. Enabling docValues
> for all fields you facet, sort or group on will help that a lot if you
> haven't already.
>
> How much memory on your entire machine? And how much is used by _all_
> the JVMs you running on a particular machine? MMapDirectory needs as
> much OS memory space as it can get, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Lately we've seen some structures that consume memory until a commit
> happens (either soft or hard). I'd shrink my autocommit down to 60
> seconds or even less (openSearcher=false).
>
> In short, I'd go back mostly to the default settings and build _up_ as
> you can demonstrate improvements. You've changed enough things here
> that untangling which one is the culprit will be hard. You want the
> JVM to have as little memory as possible, unfortunately that's
> something you figure out by experimentation.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Hi,
> >
> > I am committing every 5 minutes using a periodic cron job  "curl
> > http://localhost:8984/solr/core1/update?commit=true;. Besides this, my
> app
> > doesn't do any soft or hard commits. With Solr 7 upgrade, I am noticing
> > that query throughput plummets every 5 minutes - probably when the commit
> > happens.
> > What can I do to improve this? I didn't use to happen like this in
> solr4.5.
> > (i.e., i used to get a stable query throughput of  50-60 queries per
> > second. Now there are spikes to 60 qps interleaved by drops to almost
> > **0**).  Between those 5 minutes, I am able to achieve high throughput,
> > hence I guess that issue is related to indexing or merg

Solr7: Bad query throughput around commit time

2017-11-09 Thread Nawab Zada Asad Iqbal
Hi,

I am committing every 5 minutes using a periodic cron job  "curl
http://localhost:8984/solr/core1/update?commit=true;. Besides this, my app
doesn't do any soft or hard commits. With Solr 7 upgrade, I am noticing
that query throughput plummets every 5 minutes - probably when the commit
happens.
What can I do to improve this? I didn't use to happen like this in solr4.5.
(i.e., i used to get a stable query throughput of  50-60 queries per
second. Now there are spikes to 60 qps interleaved by drops to almost
**0**).  Between those 5 minutes, I am able to achieve high throughput,
hence I guess that issue is related to indexing or merging, and not query
flow.

I have 48G allotted to each solr process, and it seems that only ~50% is
being used at any time, similarly CPU is not spiking beyond 50% either.
There is frequent merging (every 5 minute) , but i am not sure if that is
a cause of the slowdown.

Here are my merge and cache settings:

Thanks
Nawab


  5
  5
  10
  16
  
  5
  1











false

2


  
  


  
  



Re: update document stuck on: java.net.SocketInputStream.socketRead0

2017-11-03 Thread Nawab Zada Asad Iqbal
Hi,

I added some very liberal connection timeout and socket timeout to the
request config. And I see a lot of SocketTimeoutException and some
ConnectTimeoutException

RequestConfig requestConfig = RequestConfig.custom()
.setConnectionRequestTimeout(10*60*1000)
.setConnectTimeout(60*1000)
.setSocketTimeout(3*60*1000)
.build();

I am totally lost on what needs to be fixed here, but it is blocking a lot
of connections for a very long time; and my expected throughput has reduced
to almost half (compared to Solr 4 and Jetty).
 Jetty 9 doesn't support bio.SocketConnector and following snippet from
Solr4 (probably Jetty 8) shows that Solr was performing better with
SocketConnector insted of nio.SelectChannelConnector. I am wondering if
this gives some clue to my problem. How should I keep my Jetty 9 config
(for non blocking i/o as SocketConnector is not supported), to at least
improve my performance.

PS: I have also posted this question here:
https://stackoverflow.com/questions/47098816/solr-jetty-9-webserver-sending-a-ton-of-socket-timeouts


Thanks
Nawab


On Thu, Oct 26, 2017 at 7:03 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> After Solr 7 upgrade, I am realizing that my '/update' request is
> sometimes getting stuck on this:-
>
>  - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[],
> int, int, int) @bci=0 (Compiled frame; information may be imprecise)
>  - java.net.SocketInputStream.read(byte[], int, int, int) @bci=87,
> line=152 (Compiled frame)
>  - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122
> (Compiled frame)
>  - org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer()
> @bci=71, line=166 (Compiled frame)
>  - org.apache.http.impl.io.SocketInputBuffer.fillBuffer() @bci=1, line=90
> (Compiled frame)
>  - org.apache.http.impl.io.AbstractSessionInputBuffer.
> readLine(org.apache.http.util.CharArrayBuffer) @bci=137, line=281
> (Compiled frame)
>  - org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(
> org.apache.http.io.SessionInputBuffer) @bci=16, line=92 (Compiled frame)
>  - org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(
> org.apache.http.io.SessionInputBuffer) @bci=2, line=62 (Compiled frame)
>  - org.apache.http.impl.io.AbstractMessageParser.parse() @bci=38,
> line=254 (Compiled frame)
>  - org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader()
> @bci=8, line=289 (Compiled frame)
>  - org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader()
> @bci=1, line=252 (Compiled frame)
>  - 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader()
> @bci=6, line=191 (Compiled frame)
>  - org.apache.http.protocol.HttpRequestExecutor.
> doReceiveResponse(org.apache.http.HttpRequest, 
> org.apache.http.HttpClientConnection,
> org.apache.http.protocol.HttpContext) @bci=62, line=300 (Compiled frame)
>  - 
> org.apache.http.protocol.HttpRequestExecutor.execute(org.apache.http.HttpRequest,
> org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext)
> @bci=60, line=127 (Compiled frame)
>  - org.apache.http.impl.client.DefaultRequestDirector.
> tryExecute(org.apache.http.impl.client.RoutedRequest,
> org.apache.http.protocol.HttpContext) @bci=198, line=715 (Compiled frame)
>  - org.apache.http.impl.client.DefaultRequestDirector.
> execute(org.apache.http.HttpHost, org.apache.http.HttpRequest,
> org.apache.http.protocol.HttpContext) @bci=574, line=520 (Compiled frame)
>  - 
> org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost,
> org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext)
> @bci=344, line=906 (Compiled frame)
>  - org.apache.http.impl.client.AbstractHttpClient.execute(
> org.apache.http.client.methods.HttpUriRequest, 
> org.apache.http.protocol.HttpContext)
> @bci=21, line=805 (Compiled frame)
>  - org.apache.http.impl.client.AbstractHttpClient.execute(
> org.apache.http.client.methods.HttpUriRequest) @bci=6, line=784 (Compiled
> frame)
>
>
> It seems that I am hitting this issue: https://stackoverflow.com/
> questions/28785085/how-to-prevent-hangs-on-socketinputstream-socketread0-
> in-java
> Although, I will fix my timeout settings in client, I am curious what has
> changed in Solr7 (i am upgrading from solr 4), which would cause this?
>
>
> Thanks
> Nawab
>


Re: SOLR-11504: Provide a config to restrict number of indexing threads

2017-11-01 Thread Nawab Zada Asad Iqbal
Well, the reason i want to control number of indexing threads is to
restrict number of "segments" being created at one time in the RAM. One
indexing thread in lucene  corresponds to one segment being written. I need
a fine control on the number of segments. Less than that, and I will not be
fully utilizing my writing capacity. On the other hand, if I have more
threads, then I will end up a lot more segments of small size, which I will
need to flush frequently and then merge, and that will cause a different
kind of problem.

Your suggestion will require me and other such solr users to create a tight
coupling between the clients and the Solr servers. My client is not SolrJ
based. IN a scenario when I am connecting and indexing to Solr remotely, I
want more requests to be waiting on the solr side so that they start
writing as soon as an Indexing thread is available, vs waiting on my client
side - on the other side of the wire.

Thanks
Nawab

On Wed, Nov 1, 2017 at 7:11 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote:
>
>> I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 while
>> migrating to solr6 and locally working around it in Lucene code. I am
>> thinking to fix it properly and hopefully patch back to Solr. Since,
>> Lucene
>> code does not want to keep any such config, I am thinking to use a
>> counting
>> semaphore in Solr code before calling IndexWriter.addDocument(s) or
>> IndexWriter.updateDocument(s).
>>
>
> There's a fairly simple way to control the number of indexing threads that
> doesn't require ANY changes to Solr:  Don't start as many threads/processes
> on your indexing client(s).  If you control the number of simultaneous
> requests sent to Solr, then Solr won't start as many indexing threads.
> That kind of control over your indexing system is something that's always
> preferable to have.
>
> Thanks,
> Shawn
>


SOLR-11504: Provide a config to restrict number of indexing threads

2017-10-31 Thread Nawab Zada Asad Iqbal
Hi,

I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 while
migrating to solr6 and locally working around it in Lucene code. I am
thinking to fix it properly and hopefully patch back to Solr. Since, Lucene
code does not want to keep any such config, I am thinking to use a counting
semaphore in Solr code before calling IndexWriter.addDocument(s) or
IndexWriter.updateDocument(s).


IndexWriter.getDocument(s) and updateDocument(s) are being used in
DirectUpdateHandler2  and FileBasedSpellChecker.java. Since the normal
document indexing goes through DirectUpdateHandler2, I am thinking to only
throttle the number of indexing threads in this class. Does this make
sense?

Can anyone mentor me for this and review my change?


Thanks
Nawab


update document stuck on: java.net.SocketInputStream.socketRead0

2017-10-26 Thread Nawab Zada Asad Iqbal
Hi,

After Solr 7 upgrade, I am realizing that my '/update' request is sometimes
getting stuck on this:-

 - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[],
int, int, int) @bci=0 (Compiled frame; information may be imprecise)
 - java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152
(Compiled frame)
 - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122
(Compiled frame)
 - org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer() @bci=71,
line=166 (Compiled frame)
 - org.apache.http.impl.io.SocketInputBuffer.fillBuffer() @bci=1, line=90
(Compiled frame)
 -
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer)
@bci=137, line=281 (Compiled frame)
 -
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
@bci=16, line=92 (Compiled frame)
 -
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
@bci=2, line=62 (Compiled frame)
 - org.apache.http.impl.io.AbstractMessageParser.parse() @bci=38, line=254
(Compiled frame)
 -
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader()
@bci=8, line=289 (Compiled frame)
 -
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader()
@bci=1, line=252 (Compiled frame)
 -
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader()
@bci=6, line=191 (Compiled frame)
 -
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(org.apache.http.HttpRequest,
org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext)
@bci=62, line=300 (Compiled frame)
 -
org.apache.http.protocol.HttpRequestExecutor.execute(org.apache.http.HttpRequest,
org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext)
@bci=60, line=127 (Compiled frame)
 -
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(org.apache.http.impl.client.RoutedRequest,
org.apache.http.protocol.HttpContext) @bci=198, line=715 (Compiled frame)
 -
org.apache.http.impl.client.DefaultRequestDirector.execute(org.apache.http.HttpHost,
org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext)
@bci=574, line=520 (Compiled frame)
 -
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost,
org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext)
@bci=344, line=906 (Compiled frame)
 -
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest,
org.apache.http.protocol.HttpContext) @bci=21, line=805 (Compiled frame)
 -
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest)
@bci=6, line=784 (Compiled frame)


It seems that I am hitting this issue:
https://stackoverflow.com/questions/28785085/how-to-prevent-hangs-on-socketinputstream-socketread0-in-java
Although, I will fix my timeout settings in client, I am curious what has
changed in Solr7 (i am upgrading from solr 4), which would cause this?


Thanks
Nawab


Measuring time spent in analysis and writing to index

2017-10-19 Thread Nawab Zada Asad Iqbal
Hi,

I want to analyze the time spent in different stages during add/update
document request. E.g., I want to compare time spend in analysis vs writing
to Lucene index. Does Solr provide any such thing? I have looked at
[core/admin/mbeans?stats=true=json=true]  which provides overall
stats but I am interested in breakdown for each index request.

Thanks
Nawab


Re: 3 color jvm memory usage bar

2017-10-19 Thread Nawab Zada Asad Iqbal
Thanks Erik

I see three colors in the JVM usage bar. Dark Gray, light Gray, white.
(left to right).  Only one dark and one light color made sense to me (as i
could interpret them as used vs available memory), but there is light gray
between dark gray and white parts.


Thanks
Nawab

On Thu, Oct 19, 2017 at 8:09 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Nawab:
>
> Images are stripped aggressively by the Apache mail servers, your
> attachment didn't come through. You'll have to put it somewhere and
> provide a link.
>
> Generally the lighter color in each bar is the available resource and the
> darker shade is used.
>
> Best,
> Erick
>
> On Thu, Oct 19, 2017 at 7:27 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Good morning,
> >
> >
> > What do the 3 colors mean in this bar on Solr dashboard page? (please see
> > attached) :
> >
> >
> > Regards
> > Nawab
>


3 color jvm memory usage bar

2017-10-19 Thread Nawab Zada Asad Iqbal
Good morning,


What do the 3 colors mean in this bar on Solr dashboard page? (please see
attached) :


Regards
Nawab


Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-17 Thread Nawab Zada Asad Iqbal
Randy

That is one issue, i don't know if it fixes everything for you or not.
However, Lucene doesn't put a limit on number of incoming requests and
after https://issues.apache.org/jira/browse/LUCENE-6659 , solr has no way
(i don't know at least) to limit threads. So if you have ton of parallel
updates reaching the solr websever, it can cause a performance problem.




On Tue, Oct 17, 2017 at 10:52 AM, Randy Fradin 
wrote:

> I've been trying to understand DocumentsWriterFlushControl.java to figure
> this one out. I don't really have a firm grasp of it but I'm starting to
> suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB *
> maximum # of concurrent update requests * # of cores) of heap space and
> that I need to limit how many concurrent update requests are sent to the
> same Solr node at the same time to something much lower than my current
> 240. I don't know this for sure.. it is mostly a guess based on the fact
> that one of the DocumentsWriter instances in my heap dump has just under
> 240 items in the blockedFlushes list and each of those is retaining up to
> 57MB of heap space (which is less than ramBufferSizeMB=100 but in the
> ballpark).
>
> Can anyone shed light on whether I'm going down the right path here?
>
>
> On Mon, Oct 16, 2017 at 5:34 PM David M Giannone 
> wrote:
>
> >
> >
> >
> >
> > Sent via the Samsung Galaxy S® 6, an AT 4G LTE smartphone
> >
> >
> >  Original message 
> > From: Randy Fradin 
> > Date: 10/16/17 7:38 PM (GMT-05:00)
> > To: solr-user@lucene.apache.org
> > Subject: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1
> >
> > Each shard has around 4.2 million documents which are around 40GB on
> disk.
> > Two nodes have 3 shard replicas each and the third has 2 shard replicas.
> >
> > The text of the exception is: java.lang.OutOfMemoryError: Java heap space
> > And the heap dump is a full 24GB indicating the full heap space was being
> > used.
> >
> > Here is the solrconfig as output by the config request handler:
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":0},
> >   "config":{
> > "znodeVersion":0,
> > "luceneMatchVersion":"org.apache.lucene.util.Version:6.5.1",
> > "updateHandler":{
> >   "indexWriter":{"closeWaitsForMerges":true},
> >   "commitWithin":{"softCommit":true},
> >   "autoCommit":{
> > "maxDocs":5,
> > "maxTime":30,
> > "openSearcher":false},
> >   "autoSoftCommit":{
> > "maxDocs":-1,
> > "maxTime":3}},
> > "query":{
> >   "useFilterForSortedQuery":false,
> >   "queryResultWindowSize":1,
> >   "queryResultMaxDocsCached":2147483647 <(214)%20748-3647>,
> >   "enableLazyFieldLoading":false,
> >   "maxBooleanClauses":1024,
> >   "":{
> > "size":"1",
> > "showItems":"-1",
> > "initialSize":"10",
> > "name":"fieldValueCache"}},
> > "jmx":{
> >   "agentId":null,
> >   "serviceUrl":null,
> >   "rootName":null},
> > "requestHandler":{
> >   "/select":{
> > "name":"/select",
> > "defaults":{
> >   "rows":10,
> >   "echoParams":"explicit"},
> > "class":"solr.SearchHandler"},
> >   "/update":{
> > "useParams":"_UPDATE",
> > "class":"solr.UpdateRequestHandler",
> > "name":"/update"},
> >   "/update/json":{
> > "useParams":"_UPDATE_JSON",
> > "class":"solr.UpdateRequestHandler",
> > "invariants":{"update.contentType":"application/json"},
> > "name":"/update/json"},
> >   "/update/csv":{
> > "useParams":"_UPDATE_CSV",
> > "class":"solr.UpdateRequestHandler",
> > "invariants":{"update.contentType":"application/csv"},
> > "name":"/update/csv"},
> >   "/update/json/docs":{
> > "useParams":"_UPDATE_JSON_DOCS",
> > "class":"solr.UpdateRequestHandler",
> > "invariants":{
> >   "update.contentType":"application/json",
> >   "json.command":"false"},
> > "name":"/update/json/docs"},
> >   "update":{
> > "class":"solr.UpdateRequestHandlerApi",
> > "useParams":"_UPDATE_JSON_DOCS",
> > "name":"update"},
> >   "/config":{
> > "useParams":"_CONFIG",
> > "class":"solr.SolrConfigHandler",
> > "name":"/config"},
> >   "/schema":{
> > "class":"solr.SchemaHandler",
> > "useParams":"_SCHEMA",
> > "name":"/schema"},
> >   "/replication":{
> > "class":"solr.ReplicationHandler",
> > "useParams":"_REPLICATION",
> > "name":"/replication"},
> >   "/get":{
> > "class":"solr.RealTimeGetHandler",
> > "useParams":"_GET",
> > "defaults":{
> >   "omitHeader":true,
> >   "wt":"json",
> >   "indent":true},
> > "name":"/get"},
> >   

Re: solr 7.0: What causes the segment to flush

2017-10-17 Thread Nawab Zada Asad Iqbal
I take my yesterday's comment back. I assumed that the file being written
is a segment, however after letting solr run for the night. I see that the
segment is flushed at the expected size:1945MB (so that file which i
observed was still open for writing).
Now, I have two other questions:-

1. Is there a way to not write to disk continuously and only write the file
when segment is flushed?

2. With 6.5: i had ramBufferSizeMB=20G and limiting the threadCount to 12
(since LUCENE-6659
<https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-6659>,
there is no configuration for indexing thread count, so I did a local
workaround to limit the number of threads in code); I had very good write
throughput. But with 7.0, I am getting comparable throughput only at
indexing threadcount > 50. What could be wrong ?


Thanks @Erick, I checked the commit settings, both soft and hard commits
are off.




On Tue, Oct 17, 2017 at 3:47 AM, Amrit Sarkar <sarkaramr...@gmail.com>
wrote:

> >
> > In 7.0, i am finding that the file is written to disk very early on
> > and it is being updated every second or so. Had something changed in 7.0
> > which is causing it?  I tried something similar with solr 6.5 and i was
> > able to get almost a GB size files on disk.
>
>
> Interesting observation, Nawab, with ramBufferSizeMB=20G, you are getting
> 20GB segments on 6.5 or less? a GB?
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Oct 17, 2017 at 12:48 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have  tuned  (or tried to tune) my settings to only flush the segment
> > when it has reached its maximum size. At the moment,I am using my
> > application with only a couple of threads (i have limited to one thread
> for
> > analyzing this scenario) and my ramBufferSizeMB=2 (i.e. ~20GB). With
> > this, I assumed that my file sizes on the disk will be at in the order of
> > GB; and no segments will be flushed until the segment's in memory size is
> > 2GB. In 7.0, i am finding that the file is written to disk very early on
> > and it is being updated every second or so. Had something changed in 7.0
> > which is causing it?  I tried something similar with solr 6.5 and i was
> > able to get almost a GB size files on disk.
> >
> > How can I control it to not write to disk until the segment has reached
> its
> > maximum permitted size (1945 MB?) ? My write traffic is 'new only' (i.e.,
> > it doesn't delete any document) , however I also found following
> infostream
> > logs, which incorrectly say 'delete=true':
> >
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-887) [   x:filesearch]
> > o.a.s.c.S.Request [filesearch]  webapp=/solr path=/update
> > params={commit=false} status=0 QTime=21
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> > o.a.s.u.LoggingInfoStream [DW][qtp761960786-889]: anyChanges?
> > numDocsInRam=4434 deletes=true hasTickets:false
> pendingChangesInFullFlush:
> > false
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> > o.a.s.u.LoggingInfoStream [IW][qtp761960786-889]: nrtIsCurrent:
> infoVersion
> > matches: false; DW changes: true; BD changes: false
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> > o.a.s.c.S.Request [filesearch]  webapp=/solr path=/admin/luke
> > params={show=index=0=json} status=0 QTime=0
> >
> >
> >
> > Thanks
> > Nawab
> >
>


solr 7.0: What causes the segment to flush

2017-10-17 Thread Nawab Zada Asad Iqbal
Hi,

I have  tuned  (or tried to tune) my settings to only flush the segment
when it has reached its maximum size. At the moment,I am using my
application with only a couple of threads (i have limited to one thread for
analyzing this scenario) and my ramBufferSizeMB=2 (i.e. ~20GB). With
this, I assumed that my file sizes on the disk will be at in the order of
GB; and no segments will be flushed until the segment's in memory size is
2GB. In 7.0, i am finding that the file is written to disk very early on
and it is being updated every second or so. Had something changed in 7.0
which is causing it?  I tried something similar with solr 6.5 and i was
able to get almost a GB size files on disk.

How can I control it to not write to disk until the segment has reached its
maximum permitted size (1945 MB?) ? My write traffic is 'new only' (i.e.,
it doesn't delete any document) , however I also found following infostream
logs, which incorrectly say 'delete=true':

Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-887) [   x:filesearch]
o.a.s.c.S.Request [filesearch]  webapp=/solr path=/update
params={commit=false} status=0 QTime=21
Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
o.a.s.u.LoggingInfoStream [DW][qtp761960786-889]: anyChanges?
numDocsInRam=4434 deletes=true hasTickets:false pendingChangesInFullFlush:
false
Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
o.a.s.u.LoggingInfoStream [IW][qtp761960786-889]: nrtIsCurrent: infoVersion
matches: false; DW changes: true; BD changes: false
Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
o.a.s.c.S.Request [filesearch]  webapp=/solr path=/admin/luke
params={show=index=0=json} status=0 QTime=0



Thanks
Nawab


Re: Solr test runs: test skipping logic

2017-10-06 Thread Nawab Zada Asad Iqbal
Thanks Chris,

That very likely is the reason. I had noticed the seed and realized that it
will be controlling the random input generation for the tests to make
failures reproducible. However, i didn't consider that it can also cause
test skipping.

Thanks!
Nawab


On Thu, Oct 5, 2017 at 3:13 PM, Chris Hostetter 
wrote:

>
> : I am seeing that in different test runs (e.g., by executing 'ant test' on
> : the root folder in 'lucene-solr') a different subset of tests are
> skipped.
> : Where can I find more about it? I am trying to create parity between test
> : successes before and after my changes and this is causing  confusion.
>
> The test randomization logic creates an arbitrary "master seed" that is
> assigned by ant.  This master seed is
> then used to generate some randomized default properties for the the
> forked JVMs (default timezones, default Locale, default charset, etc...)
>
> Each test class run in a forked JVM then gets it's own Random seed
> (generated fro mthe master seed as well) which the solr test-framework
> uses to randomize some more things (that are specific to the solr
> test-framework.
>
> In some cases, tests have @Assume of assumeThat(...) logic in if we know
> that certain tests are completely incompatible with certain randomized
> aspects of the environemnt -- for example: some tests won't bothe to run
> if the randomized Locale uses "tr" because of external third-party
> dependencies that break with this Locale (due to upercase/lowercase
> behavior).
>
> This is most likeley the reason you are seeing a diff "set" of tests run
> on diff times.  But if you want true parity between test runs, use the
> same master seed -- which is printed at the begining of every "ant
> test" run, as well as any time a test fails, and can be overridden on the
> ant command line for future runs.
>
> run "ant test-help" for the specifics.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Nawab Zada Asad Iqbal
Hi Steve,

I did this:

ant get-maven-poms
  cd maven-build/
  mvn -DskipTests install

On Wed, Oct 4, 2017 at 4:56 PM, Steve Rowe <sar...@gmail.com> wrote:

> Hi Nawab,
>
> > On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> >
> > I am hitting following error with maven build:
> > Is that expected?
>
> No.  What commands did you use?
>
> > Can someone share me the details about how
> > https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
>
> The Jenkins job runs the equivalent of the following:
>
> ant jenkins-maven-nightly -Dm2.repository.id=apache.snapshots.https
> -Dm2.repository.url=https://repository.apache.org/content/
> repositories/snapshots
> -DskipTests=true
>
> This in turn runs the equivalent of the following:
>
> ant get-maven-poms
> mvn -f maven-build/pom.xml -fae  -Dm2.repository.id=apache.snapshots.https
> -Dm2.repository.url=https://repository.apache.org/content/
> repositories/snapshots
> -DskipTests=true install
>
> Note that tests are not run, and that artifacts are published to the
> Apache sandbox repository.
>
> --
> Steve
> www.lucidworks.com


Re: Jenkins setup for continuous build

2017-10-04 Thread Nawab Zada Asad Iqbal
I looked at
https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console and
decided to switch to maven. However my maven build (without jenkins) is
failing with this error:

[INFO] Scanning classes for violations...
[ERROR] Forbidden class/interface use: org.bouncycastle.util.Strings
[non-portable or internal runtime class]
[ERROR]   in
org.apache.solr.response.TestCustomDocTransformer$CustomTransformerFactory
(TestCustomDocTransformer.java:78)
[ERROR] Scanned 1290 (and 2112 related) class file(s) for forbidden API
invocations (in 2.74s), 1 error(s).
[INFO]



Is that expected? Can someone share me the details about how
https://builds.apache.org/job/Lucene-Solr-Maven-master is configured




On Wed, Oct 4, 2017 at 9:14 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I have some custom code in solr (which is not of good quality for
> contributing back) so I need to setup my own continuous build solution. I
> tried jenkins and was hoping that ant build (ant clean compile) in Execute
> Shell textbox will work, but I am stuck at this ivy-fail error:
>
> To work around it, I also added another step in the 'Execute Shell' (ant
> ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the
> following error. I guess that I am not alone in doing this so there should
> be some standard work around for this.
>
> ivy-fail:
>  [echo]
>  [echo]  This build requires Ivy and Ivy could not be found in your 
> ant classpath.
>  [echo]
>  [echo]  (Due to classpath issues and the recursive nature of the 
> Lucene/Solr
>  [echo]  build system, a local copy of Ivy can not be used an loaded 
> dynamically
>  [echo]  by the build.xml)
>  [echo]
>  [echo]  You can either manually install a copy of Ivy 2.3.0 in your 
> ant classpath:
>  [echo]http://ant.apache.org/manual/install.html#optionalTasks
>  [echo]
>  [echo]  Or this build file can do it for you by running the Ivy 
> Bootstrap target:
>  [echo]ant ivy-bootstrap
>  [echo]
>  [echo]  Either way you will only have to install Ivy one time.
>  [echo]
>  [echo]  'ant ivy-bootstrap' will install a copy of Ivy into your Ant 
> User Library:
>  [echo]/home/jenkins/.ant/lib
>  [echo]
>  [echo]  If you would prefer, you can have it installed into an 
> alternative
>  [echo]  directory using the 
> "-Divy_install_path=/some/path/you/choose" option,
>  [echo]  but you will have to specify this path every time you build 
> Lucene/Solr
>  [echo]  in the future...
>  [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]
>  [echo]  If you have already run ivy-bootstrap, and still get this 
> message, please
>  [echo]  try using the "--noconfig" option when running ant, or 
> editing your global
>  [echo]  ant config to allow the user lib to be loaded.  See the wiki 
> for more details:
>  [echo]
> http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F
>  [echo]
>
>
>
>


Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Nawab Zada Asad Iqbal
So, i looked at this setup
https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console which
is using Maven, so i switched to maven too.

I am hitting following error with maven build:
Is that expected? Can someone share me the details about how
https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
Thanks.

[INFO] Scanning classes for violations...
[ERROR] Forbidden class/interface use: org.bouncycastle.util.Strings
[non-portable or internal runtime class]
[ERROR]   in
org.apache.solr.response.TestCustomDocTransformer$CustomTransformerFactory
(TestCustomDocTransformer.java:78)
[ERROR] Scanned 1290 (and 2112 related) class file(s) for forbidden API
invocations (in 2.74s), 1 error(s).



On Wed, Oct 4, 2017 at 9:14 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I have some custom code in solr (which is not of good quality for
> contributing back) so I need to setup my own continuous build solution. I
> tried jenkins and was hoping that ant build (ant clean compile) in Execute
> Shell textbox will work, but I am stuck at this ivy-fail error:
>
> To work around it, I also added another step in the 'Execute Shell' (ant
> ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the
> following error. I guess that I am not alone in doing this so there should
> be some standard work around for this.
>
> ivy-fail:
>  [echo]
>  [echo]  This build requires Ivy and Ivy could not be found in your 
> ant classpath.
>  [echo]
>  [echo]  (Due to classpath issues and the recursive nature of the 
> Lucene/Solr
>  [echo]  build system, a local copy of Ivy can not be used an loaded 
> dynamically
>  [echo]  by the build.xml)
>  [echo]
>  [echo]  You can either manually install a copy of Ivy 2.3.0 in your 
> ant classpath:
>  [echo]http://ant.apache.org/manual/install.html#optionalTasks
>  [echo]
>  [echo]  Or this build file can do it for you by running the Ivy 
> Bootstrap target:
>  [echo]ant ivy-bootstrap
>  [echo]
>  [echo]  Either way you will only have to install Ivy one time.
>  [echo]
>  [echo]  'ant ivy-bootstrap' will install a copy of Ivy into your Ant 
> User Library:
>  [echo]/home/jenkins/.ant/lib
>  [echo]
>  [echo]  If you would prefer, you can have it installed into an 
> alternative
>  [echo]  directory using the 
> "-Divy_install_path=/some/path/you/choose" option,
>  [echo]  but you will have to specify this path every time you build 
> Lucene/Solr
>  [echo]  in the future...
>  [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]...
>  [echo]ant -lib /some/path/you/choose clean compile
>  [echo]
>  [echo]  If you have already run ivy-bootstrap, and still get this 
> message, please
>  [echo]  try using the "--noconfig" option when running ant, or 
> editing your global
>  [echo]  ant config to allow the user lib to be loaded.  See the wiki 
> for more details:
>  [echo]
> http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F
>  [echo]
>
>
>
>


Jenkins setup for continuous build

2017-10-04 Thread Nawab Zada Asad Iqbal
Hi,

I have some custom code in solr (which is not of good quality for
contributing back) so I need to setup my own continuous build solution. I
tried jenkins and was hoping that ant build (ant clean compile) in Execute
Shell textbox will work, but I am stuck at this ivy-fail error:

To work around it, I also added another step in the 'Execute Shell' (ant
ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the
following error. I guess that I am not alone in doing this so there should
be some standard work around for this.

ivy-fail:
 [echo]
 [echo]  This build requires Ivy and Ivy could not be found in
your ant classpath.
 [echo]
 [echo]  (Due to classpath issues and the recursive nature of
the Lucene/Solr
 [echo]  build system, a local copy of Ivy can not be used an
loaded dynamically
 [echo]  by the build.xml)
 [echo]
 [echo]  You can either manually install a copy of Ivy 2.3.0
in your ant classpath:
 [echo]http://ant.apache.org/manual/install.html#optionalTasks
 [echo]
 [echo]  Or this build file can do it for you by running the
Ivy Bootstrap target:
 [echo]ant ivy-bootstrap
 [echo]
 [echo]  Either way you will only have to install Ivy one time.
 [echo]
 [echo]  'ant ivy-bootstrap' will install a copy of Ivy into
your Ant User Library:
 [echo]/home/jenkins/.ant/lib
 [echo]
 [echo]  If you would prefer, you can have it installed into
an alternative
 [echo]  directory using the
"-Divy_install_path=/some/path/you/choose" option,
 [echo]  but you will have to specify this path every time you
build Lucene/Solr
 [echo]  in the future...
 [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]...
 [echo]ant -lib /some/path/you/choose clean compile
 [echo]
 [echo]  If you have already run ivy-bootstrap, and still get
this message, please
 [echo]  try using the "--noconfig" option when running ant,
or editing your global
 [echo]  ant config to allow the user lib to be loaded.  See
the wiki for more details:
 [echo]
http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F
 [echo]


Solr test runs: test skipping logic

2017-10-04 Thread Nawab Zada Asad Iqbal
Hi,

I am seeing that in different test runs (e.g., by executing 'ant test' on
the root folder in 'lucene-solr') a different subset of tests are skipped.
Where can I find more about it? I am trying to create parity between test
successes before and after my changes and this is causing  confusion.


Thanks
Nawab


solr 7.0: mbeans stats missing for many keys

2017-10-01 Thread Nawab Zada Asad Iqbal
hi,

When upgrading from solr4.5 to solr 7.0, I noticed that many key names in
solr-mbeans (
http://localhost:8983/solr/tcore/admin/mbeans?stats=true=json) have
changed. Mostly, "parent key" name is also appended to the stats keyname
e.g., instead of  cumulative_evictions, I now have to look for
CACHE.searcher.filterCache.cumulative_evictions. However, there are some
keys for which I couldn't find any substitute, see the following snippet
from 4.5 and 7.0 comparing dismax sections:

Is there a bug for the keys which were found in 4.5 but missing in 7.0? Or
do i need to specify some parameter in the admin/mbeans url which was not
needed in older versions?

Solr 4.5:

  "dismax":{
"class":"org.apache.solr.handler.component.SearchHandler",
"version":"4.5-SNAPSHOT",
"description":"Search using components:
query,facet,mlt,highlight,stats,debug,",
"src":"$URL:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_5/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
$",
"stats":{
  "handlerStart":1506588718591,
  "requests":282407,
  "errors":0,
  "timeouts":0,
  "totalTime":1.5715063257002E7,
  "avgRequestsPerSecond":1.7463307695871293,
  "5minRateReqsPerSecond":5.180978453822239E-46,
  "15minRateReqsPerSecond":1.1079266845452606E-14,
  "avgTimePerRequest":55.64686164649601,
  "medianRequestTime":6.3148865,
  "75thPcRequestTime":32.59734625,
  "95thPcRequestTime":190.0439033998,
  "99thPcRequestTime":726.895605740004,
  "999thPcRequestTime":2974.306584901}},




Solr 7:

  "dismax":{
"class":"org.apache.solr.handler.component.SearchHandler",
"description":"Search using components:
query,facet,facet_module,mlt,highlight,stats,expand,terms,debug,",
"stats":{
  "QUERY.dismax.errors.count":9,
  "QUERY.dismax.timeouts.count":0,
  "QUERY.dismax.totalTime":1985804713824,
  "QUERY.dismax.clientErrors.count":9,
  "QUERY.dismax.handlerStart":1506749061230,
  "QUERY.dismax.serverErrors.count":0,
  "QUERY.dismax.requestTimes.meanRate":38.920121769557625,
  "QUERY.dismax.requests":52147}},



Thanks

Nawab


Re: how to recover from OpenSearcher called on closed core

2017-09-28 Thread Nawab Zada Asad Iqbal
Hi

Are you upgrading from an earlier version? If not, I am curious why not try
SolrCloud instead of Master/Slave.
Is there any other error before this error in the logs? Did the core close
after a crash?



Regards
Nawab

On Thu, Sep 28, 2017 at 2:57 AM, rubi.hali  wrote:

> Hi
>
> we are using Solr 6.1.0 version. We have done a Master/Slave Setup where in
> Slaves we have enabled replication polling after 300 seconds
>
> But after every replication poll, we are getting an error : Index Fetch
> Failed: opening NewSearcher called on closed core.
>
> We have enabled  softcommit after 30 ms and hardcommit with 25000 docs
> and 6 secs
> In slaves we have kept opensearcher true in case of hardcommit.
>
> we are really not sure if this issue has anything to do with our commit
> strategy.
>
> Please let me know if there is any possible explanation for why this is
> happening.
>
> From logs analysis , I observerd Caching Directory Factory is closing the
> core and after that Replication Handler starts throwing this exception.
>
> Does this exception will have any impact on memory consumption on slaves??
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr Beginner!!

2017-09-28 Thread Nawab Zada Asad Iqbal
Hi Jaya

Text extraction is a step before you put data into solr. Say, you have pdf
or doc type documents, you will extract the text (minus unnecessary
formatting details etc.) and store in solr. Later you can query it as you
said. i have not worked in extraction area, but look at this for an idea:
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html

`Tika will automatically attempt to determine the input document type
(Word, PDF, HTML) and extract the content appropriately. If you like, you
can explicitly specify a MIME type for Tika with the stream.type parameter.`



Regards
Nawab


On Thu, Sep 28, 2017 at 6:56 AM, Johnson, Jaya 
wrote:

> Hi:
> I am trying to ingest a few memos - they do not have any standard format
> (json, xml etc etc) but just plain text however the memos all follow some
> template. What I would like to od post ingestion is to extract keywords and
> some values around it. So say for instance if the text contains the key
> word Outstanding Amount: 1000.  I would like to search for Outstanding
> Amount ( I can do that using the query interface) how to I extract the
> entire string Outstanding Amount +3or4 words from Solr.
>
> I am really new to solr so any documentation etc would be super helpful.
> Is Solr the right tool for this use case also
>
> Thanks.
> -
>
> Moody's monitors email communications through its networks for regulatory
> compliance purposes and to protect its customers, employees and business
> and where allowed to do so by applicable law. The information contained in
> this e-mail message, and any attachment thereto, is confidential and may
> not be disclosed without our express permission. If you are not the
> intended recipient or an employee or agent responsible for delivering this
> message to the intended recipient, you are hereby notified that you have
> received this message in error and that any review, dissemination,
> distribution or copying of this message, or any attachment thereto, in
> whole or in part, is strictly prohibited. If you have received this message
> in error, please immediately notify us by telephone, fax or e-mail and
> delete the message and all of its attachments. Every effort is made to keep
> our network free from viruses. You should, however, review this e-mail
> message, as well as any attachment thereto, for viruses. We take no
> responsibility and have no liability for any computer virus which may be
> transferred via this e-mail message.
>
> -
>


Re: Solr cloud most stable version

2017-09-28 Thread Nawab Zada Asad Iqbal
Hi Lars

Although, that doesn't really answer of whether 6.6.1 is the most stable
one or not, but there has been a recent security fix, so definitely go to
6.6.1 .

Copied the detail below:-



CVE-2017-9803: Security vulnerability in kerberos delegation token
functionality

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Solr 6.2.0 to 6.6.0

Description:

Solr's Kerberos plugin can be configured to use delegation tokens,
which allows an application to reuse the authentication of an end-user
or another application.
There are two issues with this functionality (when using
SecurityAwareZkACLProvider type of ACL provider e.g.
SaslZkACLProvider),

Firstly, access to the security configuration can be leaked to users
other than the solr super user. Secondly, malicious users can exploit
this leaked configuration for privilege escalation to further
expose/modify private data and/or disrupt operations in the Solr
cluster.

The vulnerability is fixed from Solr 6.6.1 onwards.

Mitigation:
6.x users should upgrade to 6.6.1

Credit:
This issue was discovered by Hrishikesh Gadre of Cloudera Inc.

References:
https://issues.apache.org/jira/browse/SOLR-11184
https://wiki.apache.org/solr/SolrSecurity

On Thu, Sep 28, 2017 at 6:24 AM, Lars Karlsson <
lars.karlsson.st...@gmail.com> wrote:

> Hi, wanted to check if anyone can help guide with most stable version
> between
>
> 6.3 and 6.6.1
>
> Which should I choose ?
>
> And, are there any performance tests that one can look at for each release?
>
> Regards
> Lars
>


Re: solr 7.0: possible analysis error: startOffset must be non-negative

2017-09-27 Thread Nawab Zada Asad Iqbal
so, it seems like two steps for WordDelimiterGraphFilterFactory (with
different config in each step) were causing the error. I am still not sure
how it ended up in this state and if there is any benefit of having two
lines. But removing one of them fixed my error.


Thanks
Nawab

On Wed, Sep 27, 2017 at 3:12 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I upgraded to solr 7 today and i am seeing tonnes of following errors for
> various fields.
>
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
> Exception writing document id file_3881549 to the index; possible
> analysis error: startOffset must be non-negative, and endOffset must be >=
> startOffset, and offsets must not go backwards 
> startOffset=6,endOffset=8,lastStartOffset=9
> for field 'name_combined'
>
> We don't have a lot of custom code for analysis at indexing time, so my
> suspicion is on the schema definition, can someone suggest how should I
> start debugging this?
>
>  stored="true" omitPositions="false"/>
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
>  pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>  words="stopwords.txt"/>
> 
> 
> 
> 
>  maxTokenCount="1" consumeAllTokens="false"/>
> 
>   
>
>
>  stored="false" multiValued="true" omitPositions="true"/>
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
>  pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>  words="stopwords.txt"/>
> 
> 
>  maxGramSize="255"/>
> 
>  maxTokenCount="1" consumeAllTokens="false"/>
> 
>   
>
>
> Thanks
> nawab
>
>


solr 7.0: possible analysis error: startOffset must be non-negative

2017-09-27 Thread Nawab Zada Asad Iqbal
Hi,

I upgraded to solr 7 today and i am seeing tonnes of following errors for
various fields.

o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception
writing document id file_3881549 to the index; possible analysis error:
startOffset must be non-negative, and endOffset must be >= startOffset, and
offsets must not go backwards startOffset=6,endOffset=8,lastStartOffset=9
for field 'name_combined'

We don't have a lot of custom code for analysis at indexing time, so my
suspicion is on the schema definition, can someone suggest how should I
start debugging this?


  












  



  












  


Thanks
nawab


Re: When will be solr 7.1 released?

2017-09-26 Thread Nawab Zada Asad Iqbal
Thanks Steve,

I was trying to look at the page without login and seeing only a handful of
bugs/fixes; probably because most of those bugs/tasks do not have public
security-level.


Thanks
Nawab

On Tue, Sep 26, 2017 at 5:34 PM, Steve Rowe <sar...@gmail.com> wrote:

> Hi Nawab,
>
> > On Sep 26, 2017, at 8:04 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> >
> > Thanks ,  another question(s):
> >
> > why is this released marked 'unreleased' ?
> > https://issues.apache.org/jira/projects/SOLR/versions/12335718
>
> The 7.0 release manager hasn’t gotten around to marking it as released;
> note that there is an item to do this in the ReleaseToDo: <
> https://wiki.apache.org/lucene-java/ReleaseTodo#Update_JIRA>
>
> > how is it different from :
> > https://issues.apache.org/jira/projects/SOLR/versions/12341601 (i guess
> > this is duplicate and will not be used)
>
> Yes, the 7.0.0 version is a duplicate of the 7.0 version.  I switched
> issues with 7.0.0 to 7.0 and removed the 7.0.0 version in JIRA.
>
> > I was expecting to see https://issues.apache.org/jira/browse/SOLR-11297
> in
> > https://issues.apache.org/jira/projects/SOLR/versions/12341052 but
> couldn't
> > locate it.
>
> I see SOLR-11297 in the JIRA-generated "release note”: <
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12310230=12341052>.  (Note that Lucene and Solr
> maintain their own release notes: solr/CHANGES.txt and lucene/CHANGES.txt.)
>
> --
> Steve
> www.lucidworks.com
>
>


Re: When will be solr 7.1 released?

2017-09-26 Thread Nawab Zada Asad Iqbal
Thanks ,  another question(s):

why is this released marked 'unreleased' ?
https://issues.apache.org/jira/projects/SOLR/versions/12335718
how is it different from :
https://issues.apache.org/jira/projects/SOLR/versions/12341601 (i guess
this is duplicate and will not be used)

I was expecting to see https://issues.apache.org/jira/browse/SOLR-11297 in
https://issues.apache.org/jira/projects/SOLR/versions/12341052 but couldn't
locate it.



Regards
Nawab

On Tue, Sep 26, 2017 at 11:17 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Not quite. Ongoing development always occurs on the *.x branch.
>
>  When the release manager (RM) decides to cut a release, they set a
> label on the *.x branch. So in this case, when Anshum volunteered to
> create 7.0, he picked a time and set the branch_7_0 label pointing at
> the then-7x branch.
>
> Thereafter, further development went on the branch_7x code line, with
> some selected important improvements being backported to branch_7_0.
>
> One further point is let's say a critical must-fix problem is
> discovered in 7.0. Fixes will be committed on branch_7_0 and branch_7x
> and any point releases (i.e. 7.0.1) will be cut from branch_7_0.
>
> There's actually one more step since development usually occurs on
> master, this is the complete process:
> > do development on "master" (the future 8.0)
> > commit
> > merge with branch_7x
> > commit
> > if it's a super-critical bug merge with branch_7_0
>
> Best,
> Erick
>
>
> On Tue, Sep 26, 2017 at 11:02 AM, Nawab Zada Asad Iqbal
> <khi...@gmail.com> wrote:
> > Thanks Yonik and Erick.
> >
> > That is helpful.
> > I am slightly confused about the branch name conventions. I expected 7x
> to
> > be named as branch_7_0 , am i misunderstanding something? Similar to
> > branch_6_6 (for 6.6.x onwards) .
> >
> > Regards
> > Nawab
> >
> > On Tue, Sep 26, 2017 at 8:59 AM, Yonik Seeley <ysee...@gmail.com> wrote:
> >
> >> One can also use a nightly snapshot build to try out the latest stuff:
> >> 7.x: https://builds.apache.org/job/Solr-Artifacts-7.x/
> >> lastSuccessfulBuild/artifact/solr/package/
> >> 8.0: https://builds.apache.org/job/Solr-Artifacts-master/
> >> lastSuccessfulBuild/artifact/solr/package/
> >>
> >> -Yonik
> >>
> >>
> >> On Tue, Sep 26, 2017 at 11:50 AM, Erick Erickson
> >> <erickerick...@gmail.com> wrote:
> >> > There's nothing preventing you from getting/compiling the latest Solr
> >> > 7x (what will be 7.1) for your own use. There's information here:
> >> > https://wiki.apache.org/solr/HowToContribute
> >> >
> >> > Basically, you get the code from Git (instructions provided at the
> >> > link above) and execute the "ant package" command from the solr
> >> > directory. After things churn for a while you should have the tgz and
> >> > zip files just as though you have downloaded them from the Apache
> >> > Wiki. You need Java 1.8 JDK and ant installed, and the first time you
> >> > try to compile you may see instructions to execute an ant target that
> >> > downloads ivy.
> >> >
> >> > One note, there was a comment recently that you may have to get
> >> > ivy-2.4.0.jar to have the "ant package" complete successfully.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Tue, Sep 26, 2017 at 8:38 AM, Steve Rowe <sar...@gmail.com> wrote:
> >> >> Hi Nawab,
> >> >>
> >> >> Committership is a prerequisite for the Lucene/Solr release manager
> >> role.
> >> >>
> >> >> Some info here about the release process: <https://wiki.apache.org/
> >> lucene-java/ReleaseTodo>
> >> >>
> >> >> --
> >> >> Steve
> >> >> www.lucidworks.com
> >> >>
> >> >>> On Sep 26, 2017, at 11:28 AM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >> wrote:
> >> >>>
> >> >>> Where can I learn more about this process? I am not a committer but
> I
> >> am
> >> >>> wondering if I know enough to do it.
> >> >>>
> >> >>>
> >> >>> Thanks
> >> >>> Nawab
> >> >>>
> >> >>>
> >> >>> On Mon, Sep 25, 2017 at 9:23 PM, Erick Erickson <
> >> erickerick...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> In a word "no". Basically whenever a committer feels like there are
> >> >>>> enough changes to warrant spinning a new version, they volunteer.
> >> >>>> Nobody has stepped up to do that yet, although I expect it to be in
> >> >>>> the next 2-3 months, but that's only a guess.
> >> >>>>
> >> >>>> Best,
> >> >>>> Erick
> >> >>>>
> >> >>>> On Mon, Sep 25, 2017 at 5:21 PM, Nawab Zada Asad Iqbal <
> >> khi...@gmail.com>
> >> >>>> wrote:
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> How are the release dates decided for new versions, are they
> known in
> >> >>>>> advance?
> >> >>>>>
> >> >>>>> Thanks
> >> >>>>> Nawab
> >> >>>>
> >> >>
> >>
>


Re: When will be solr 7.1 released?

2017-09-26 Thread Nawab Zada Asad Iqbal
Thanks Yonik and Erick.

That is helpful.
I am slightly confused about the branch name conventions. I expected 7x to
be named as branch_7_0 , am i misunderstanding something? Similar to
branch_6_6 (for 6.6.x onwards) .

Regards
Nawab

On Tue, Sep 26, 2017 at 8:59 AM, Yonik Seeley <ysee...@gmail.com> wrote:

> One can also use a nightly snapshot build to try out the latest stuff:
> 7.x: https://builds.apache.org/job/Solr-Artifacts-7.x/
> lastSuccessfulBuild/artifact/solr/package/
> 8.0: https://builds.apache.org/job/Solr-Artifacts-master/
> lastSuccessfulBuild/artifact/solr/package/
>
> -Yonik
>
>
> On Tue, Sep 26, 2017 at 11:50 AM, Erick Erickson
> <erickerick...@gmail.com> wrote:
> > There's nothing preventing you from getting/compiling the latest Solr
> > 7x (what will be 7.1) for your own use. There's information here:
> > https://wiki.apache.org/solr/HowToContribute
> >
> > Basically, you get the code from Git (instructions provided at the
> > link above) and execute the "ant package" command from the solr
> > directory. After things churn for a while you should have the tgz and
> > zip files just as though you have downloaded them from the Apache
> > Wiki. You need Java 1.8 JDK and ant installed, and the first time you
> > try to compile you may see instructions to execute an ant target that
> > downloads ivy.
> >
> > One note, there was a comment recently that you may have to get
> > ivy-2.4.0.jar to have the "ant package" complete successfully.
> >
> > Best,
> > Erick
> >
> > On Tue, Sep 26, 2017 at 8:38 AM, Steve Rowe <sar...@gmail.com> wrote:
> >> Hi Nawab,
> >>
> >> Committership is a prerequisite for the Lucene/Solr release manager
> role.
> >>
> >> Some info here about the release process: <https://wiki.apache.org/
> lucene-java/ReleaseTodo>
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >>> On Sep 26, 2017, at 11:28 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> >>>
> >>> Where can I learn more about this process? I am not a committer but I
> am
> >>> wondering if I know enough to do it.
> >>>
> >>>
> >>> Thanks
> >>> Nawab
> >>>
> >>>
> >>> On Mon, Sep 25, 2017 at 9:23 PM, Erick Erickson <
> erickerick...@gmail.com>
> >>> wrote:
> >>>
> >>>> In a word "no". Basically whenever a committer feels like there are
> >>>> enough changes to warrant spinning a new version, they volunteer.
> >>>> Nobody has stepped up to do that yet, although I expect it to be in
> >>>> the next 2-3 months, but that's only a guess.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>> On Mon, Sep 25, 2017 at 5:21 PM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >>>> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> How are the release dates decided for new versions, are they known in
> >>>>> advance?
> >>>>>
> >>>>> Thanks
> >>>>> Nawab
> >>>>
> >>
>


Re: When will be solr 7.1 released?

2017-09-26 Thread Nawab Zada Asad Iqbal
Where can I learn more about this process? I am not a committer but I am
wondering if I know enough to do it.


Thanks
Nawab


On Mon, Sep 25, 2017 at 9:23 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> In a word "no". Basically whenever a committer feels like there are
> enough changes to warrant spinning a new version, they volunteer.
> Nobody has stepped up to do that yet, although I expect it to be in
> the next 2-3 months, but that's only a guess.
>
> Best,
> Erick
>
> On Mon, Sep 25, 2017 at 5:21 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Hi,
> >
> > How are the release dates decided for new versions, are they known in
> > advance?
> >
> > Thanks
> > Nawab
>


When will be solr 7.1 released?

2017-09-25 Thread Nawab Zada Asad Iqbal
Hi,

How are the release dates decided for new versions, are they known in
advance?

Thanks
Nawab


Re: solr 6.6.1: Lock held by this virtual machine

2017-08-26 Thread Nawab Zada Asad Iqbal
Hi Erick,

I spent some more time on this and found that if I modify 'core.properties'
to contain the following values (my core.propreties file is empty otherwise
and only being used for shard discovery), then the solr server works fine.

loadOnStartup=false
transient=false

The fact is that shards are being loaded more than one time at the time of
startup. There is one possible cause (which I couldn't confirm), that if
some ping request or query arrives while the shard is loading and the
transient cache hasn't been initialized yet, will solr try to load the
core? What if the shard is already being loaded (due to loadOnStartup) but
not in the cache yet? Can that cause the problem which I am seeing. My test
machine, constantly gets ping traffic from an haproxy (which i don't have
control of), so I cannot test this hypothesis.

However, on another machine with identical setup (except without haproxy
traffic), I was able to make start and use (index 100s GB and queries for
many hours) this solr version (6.6.1) without any problem.



Thanks
Nawab




On Fri, Aug 25, 2017 at 3:38 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Thanks Erik
> I expected that but it is really not the case . I have only one core per
> solr installation . Though i run 3 solr processes on each host.
>
> If you see the thread names they are :coreLoadExecutor and qtp761960786-31
> . If it was the case of two core pointing to one index (though it does not
> look like one based on my verification), then I expect to see two threads
> of
> coreLoadExecutor trying to load the core twice.
> Does the thread name prefix give any hint ?
>
>
> Nawab
>
>
> On Fri, Aug 25, 2017 at 1:55 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> In that case you probably have two different cores pointing to the
>> _same_ data directory. Examine your core.properties files and see if
>> any dataDir variables are set....
>>
>> Best,
>> Erick
>>
>> On Fri, Aug 25, 2017 at 1:12 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>> wrote:
>> > Ok, after looking at the logs for some more time, i found that there are
>> > more than one threads trying to load the core at startup time. This
>> doesn't
>> > make sense to me, is it configurable? Is there any reason why this is
>> even
>> > an option?
>> >
>> >
>> > Aug 25, 2017 12:04:37 PM INFO  (main) [   ] o.e.j.s.Server
>> > jetty-9.3.14.v20161028
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> > ___  _   Welcome to Apache Solr™ version 6.6.1-SNAPSHOT
>> > 1a390a91b5b658150478e6fc3c43381bedd3c6d3 - niqbal - 2017-08-09 10:31:27
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
>> > __| ___| |_ _   Starting in standalone mode on port 8984
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> \__
>> > \/ _ \ | '_|  Install dir: /local/bin/solr6/latest_solr
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> > |___/\___/_|_|Start time: 2017-08-25T20:04:38.231Z
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.StartupLoggingUtils
>> > Property solr.log.muteconsole given. Muting ConsoleAppender named
>> CONSOLE
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.c.SolrResourceLoader
>> > Using system property solr.solr.home: /local/etc/solr/shard1
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.c.SolrXmlConfig
>> Loading
>> > container configuration from /local/etc/solr/shard1/solr.xml
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.u.UpdateShardHandler
>> > Creating UpdateShardHandler HTTP client with params:
>> > socketTimeout=60=6=true
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ]
>> o.a.s.c.CorePropertiesLocator
>> > Found 1 core definitions underneath /local/etc/solr/shard1
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ]
>> o.a.s.c.CorePropertiesLocator
>> > Cores are: [mysearch]
>> > Aug 25, 2017 12:04:38 PM INFO  (coreLoadExecutor-6-thread-1) [
>> > x:mysearch] o.a.s.u.SolrIndexConfig IndexWriter infoStream solr logging
>> is
>> > enabled
>> > Aug 25, 2017 12:04:38 PM INFO  (coreLoadExecutor-6-thread-1) [
>> > x:mysearch] o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.5.2
>> > Aug 25, 2017 12:04:38 PM INFO  (qtp761960786-29) [   ]
>> > o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for
>> > 2147483647 <(214)%20748-3647> transient cores
>> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.e.j.s.Server Start

Re: solr 6.6.1: Lock held by this virtual machine

2017-08-25 Thread Nawab Zada Asad Iqbal
Thanks Erik
I expected that but it is really not the case . I have only one core per
solr installation . Though i run 3 solr processes on each host.

If you see the thread names they are :coreLoadExecutor and qtp761960786-31
. If it was the case of two core pointing to one index (though it does not
look like one based on my verification), then I expect to see two threads
of
coreLoadExecutor trying to load the core twice.
Does the thread name prefix give any hint ?


Nawab


On Fri, Aug 25, 2017 at 1:55 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> In that case you probably have two different cores pointing to the
> _same_ data directory. Examine your core.properties files and see if
> any dataDir variables are set
>
> Best,
> Erick
>
> On Fri, Aug 25, 2017 at 1:12 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Ok, after looking at the logs for some more time, i found that there are
> > more than one threads trying to load the core at startup time. This
> doesn't
> > make sense to me, is it configurable? Is there any reason why this is
> even
> > an option?
> >
> >
> > Aug 25, 2017 12:04:37 PM INFO  (main) [   ] o.e.j.s.Server
> > jetty-9.3.14.v20161028
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> > ___  _   Welcome to Apache Solr™ version 6.6.1-SNAPSHOT
> > 1a390a91b5b658150478e6fc3c43381bedd3c6d3 - niqbal - 2017-08-09 10:31:27
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
> > __| ___| |_ _   Starting in standalone mode on port 8984
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> \__
> > \/ _ \ | '_|  Install dir: /local/bin/solr6/latest_solr
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> > |___/\___/_|_|Start time: 2017-08-25T20:04:38.231Z
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.s.StartupLoggingUtils
> > Property solr.log.muteconsole given. Muting ConsoleAppender named CONSOLE
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> > Using system property solr.solr.home: /local/etc/solr/shard1
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading
> > container configuration from /local/etc/solr/shard1/solr.xml
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.u.UpdateShardHandler
> > Creating UpdateShardHandler HTTP client with params:
> > socketTimeout=60=6=true
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.c.CorePropertiesLocator
> > Found 1 core definitions underneath /local/etc/solr/shard1
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.a.s.c.CorePropertiesLocator
> > Cores are: [mysearch]
> > Aug 25, 2017 12:04:38 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.u.SolrIndexConfig IndexWriter infoStream solr logging
> is
> > enabled
> > Aug 25, 2017 12:04:38 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.5.2
> > Aug 25, 2017 12:04:38 PM INFO  (qtp761960786-29) [   ]
> > o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for
> > 2147483647 transient cores
> > Aug 25, 2017 12:04:38 PM INFO  (main) [   ] o.e.j.s.Server Started
> @1225ms
> > Aug 25, 2017 12:04:38 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.s.IndexSchema [mysearch] Schema name=local
> > Aug 25, 2017 12:04:38 PM WARN  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.s.IndexSchema Field name_token is not multivalued and
> > destination for multiple copyFields (7)
> > Aug 25, 2017 12:04:38 PM WARN  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.s.IndexSchema Field name_shingle is not multivalued and
> > destination for multiple copyFields (5)
> > Aug 25, 2017 12:04:38 PM WARN  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.s.IndexSchema Field name_sort is not multivalued and
> > destination for multiple copyFields (7)
> > Aug 25, 2017 12:04:38 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.s.IndexSchema Loaded schema local/1.5 with uniqueid
> field
> > id
> > Aug 25, 2017 12:04:39 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.c.CoreContainer Creating SolrCore 'mysearch' using
> > configuration from instancedir /local/etc/solr/shard1/mysearch,
> trusted=true
> > Aug 25, 2017 12:04:39 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.c.SolrCore solr.RecoveryStrategy.Builder
> > Aug 25, 2017 12:04:39 PM INFO  (coreLoadExecutor-6-thread-1) [
> > x:mysearch] o.a.s.c.SolrCore [[mysearch] ] Opening new SolrCore at
> > [/local/e

Re: solr 6.6.1: Lock held by this virtual machine

2017-08-25 Thread Nawab Zada Asad Iqbal
:04:39 PM INFO  (qtp761960786-31) [   x:mysearch]
o.a.s.c.SolrCore [[mysearch] ] Opening new SolrCore at
[/local/etc/solr/shard1/mysearch],
dataDir=[/local/etc/solr/shard1/mysearch/data/]
Aug 25, 2017 12:04:39 PM INFO  (qtp761960786-31) [   x:mysearch]
o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
Aug 25, 2017 12:04:39 PM INFO  (coreLoadExecutor-6-thread-1) [
x:mysearch] o.a.s.u.CommitTracker Hard AutoCommit: disabled
Aug 25, 2017 12:04:39 PM INFO  (qtp761960786-31) [   x:mysearch]
o.a.s.u.CommitTracker Hard AutoCommit: disabled
Aug 25, 2017 12:04:39 PM INFO  (coreLoadExecutor-6-thread-1) [
x:mysearch] o.a.s.u.CommitTracker Soft AutoCommit: disabled
Aug 25, 2017 12:04:39 PM INFO  (qtp761960786-31) [   x:mysearch]
o.a.s.u.CommitTracker Soft AutoCommit: disabled
Aug 25, 2017 12:04:39 PM INFO  (qtp761960786-31) [   x:mysearch]
o.a.s.c.SolrCore [mysearch]  CLOSING SolrCore
org.apache.solr.core.SolrCore@3f24b393
Aug 25, 2017 12:04:39 PM INFO  (qtp761960786-31) [   x:mysearch]
o.a.s.m.SolrMetricManager Closing metric reporters for: solr.core.mysearch
Aug 25, 2017 12:04:39 PM ERROR (qtp761960786-31) [   x:mysearch]
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Unable to
create core [mysearch]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:935)
at
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1331)
at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:268)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:483)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)


On Fri, Aug 25, 2017 at 10:47 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I am getting this error. i have deleted the file and restarted the server,
> but this error doesn't go away.
>
> What should I do to fix it?
>
>
>
> Caused by: org.apache.solr.common.SolrException: Error opening new
> searcher
> at org.apache.solr.core.SolrCore.(SolrCore.java:977)
> at org.apache.solr.core.SolrCore.(SolrCore.java:830)
> at org.apache.solr.core.CoreContainer.create(
> CoreContainer.java:920)
> ... 32 more
> Caused by: org.apache.solr.common.SolrException: Error opening new
> searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
> 2069)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
> at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1071)
> at org.apache.solr.core.SolrCore.(SolrCore.java:949)
> ... 34 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held
> by this virtual machine: /local/var/solr/shard1/
> mysearch/data/index/write.lock
> at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(
> NativeFSLockFactory.java:127)
> at org.apache.lucene.store.FSLockFactory.obtainLock(
> FSLockFactory.java:41)
> at org.apache.lucene.store.BaseDirectory.obtainLock(
> BaseDirectory.java:45)
> at org.apache.lucene.index.IndexWriter.(
> IndexWriter.java:800)
> at org.apache.solr.update.SolrIndexWriter.(
> SolrIndexWriter.java:118)
> at org.apache.solr.update.SolrIndexWriter.create(
> SolrIndexWriter.java:93)
> at org.apache.solr.update.DefaultSolrCoreState.
> createMainIndexWriter(DefaultSolrCoreState.java:248)
> at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(
> DefaultSolrCoreState.java:122)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
> 2030)
> ... 37 more
>
>
>
> Thanks
> Nawab
>


solr 6.6.1: Lock held by this virtual machine

2017-08-25 Thread Nawab Zada Asad Iqbal
Hi,

I am getting this error. i have deleted the file and restarted the server,
but this error doesn't go away.

What should I do to fix it?



Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:977)
at org.apache.solr.core.SolrCore.(SolrCore.java:830)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:920)
... 32 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1071)
at org.apache.solr.core.SolrCore.(SolrCore.java:949)
... 34 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by
this virtual machine: /local/var/solr/shard1/mysearch/data/index/write.lock
at
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:127)
at
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
at
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:800)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:118)
at
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:93)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:248)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:122)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2030)
... 37 more



Thanks
Nawab


Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Nawab Zada Asad Iqbal
Actually, part of me is thinking that there are valid use cases for having
fl and hl.fl with different values. e.g, receive name etc. in “clean” form
in fl field and receive both name and address in html formatted form (by
specifying in hl.fl)


On Fri, Aug 18, 2017 at 10:57 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Actually, i realize that it is an incorrect use on my part to pass only
> id+score in fl and specify more fields in the hl.fl fields. This was
> somehow supported in older versions but the new behavior is actually a
> performance improvement for the scenario when user is asking for only ids.
>
>
> Nawab
>
> On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
>> Thanks Erick for the pointing to better option. I will explore that.
>> After your email, I found that if i have specified 'fl=*' in the query then
>> it is doing the right thing (a 2 pass process). However, my queries had
>> 'fl=id+score' (or sometimes fl=id=score), in both of these cases I found
>> that the shards are asked for highlighting all the results on the first
>> request (and there is no second request).
>>
>> The fl=* query is (in my sample case) finishing in 100 msec while same
>> query with fl=id+score finishes in 1200 msec.
>>
>> Here are the two queries;
>>
>> http://solrdev.test.net:8984/solr/filesearch/select?=on
>> l=*=200=200=nawab=solrdev.test.net:8984/
>> solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrd
>> ev.test.net:8986/solr/filesearch=json
>>
>>
>> http://solrdev.test.net:8984/solr/filesearch/select?=on
>> l=id=score=200=200=nawab=solrdev.test
>> .net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesea
>> rch,solrdev.test.net:8986/solr/filesearch=json
>>
>>
>> Thanks
>> Nawab
>>
>>
>>
>>
>> On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> I don't think you're reading it correctly. First of all, if you're
>>> going to do be doing deep paging you should be using cusorMark, see:
>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>>>
>>> Second, it's a two-pass process if you don't use cursormark. The first
>>> pass gets the candidate docs from each shard. But all it returns is
>>> the ID and sort criteria. Then the aggregator node gets the _true_ top
>>> N after sorting all the lists from each shard and issues a second
>>> request for _only_ those docs that have made the top N from each sub
>>> shard, and those should be the only ones highlighted.
>>>
>>> Do you have any evidence to the contrary that they're all being
>>> highlighted? Or are you misinterpreting the log message for the first
>>> pass?
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > In a multi-node solr installation (without SolrCloud), during a paging
>>> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
>>> rows
>>> > from each shard. If highlighting is ON, then the primary node is
>>> asking for
>>> > highlighting all the 1200 results from each shard, which doesn't scale
>>> > well. Is there a way to break the shard query in two steps e.g. ask
>>> for the
>>> > 1200 rows and after sorting the 1200 responses from each shard and
>>> finding
>>> > final rows to return (1001 to 1200) , issue another query to shards for
>>> > asking highlighted response for the relevant docs?
>>> >
>>> >
>>> >
>>> > Thanks
>>> > Nawab
>>>
>>
>>
>


Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Nawab Zada Asad Iqbal
Actually, i realize that it is an incorrect use on my part to pass only
id+score in fl and specify more fields in the hl.fl fields. This was
somehow supported in older versions but the new behavior is actually a
performance improvement for the scenario when user is asking for only ids.


Nawab

On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Thanks Erick for the pointing to better option. I will explore that. After
> your email, I found that if i have specified 'fl=*' in the query then it is
> doing the right thing (a 2 pass process). However, my queries had
> 'fl=id+score' (or sometimes fl=id=score), in both of these cases I found
> that the shards are asked for highlighting all the results on the first
> request (and there is no second request).
>
> The fl=* query is (in my sample case) finishing in 100 msec while same
> query with fl=id+score finishes in 1200 msec.
>
> Here are the two queries;
>
> http://solrdev.test.net:8984/solr/filesearch/select?=on;
> fl=*=200=200=nawab=solrdev.test.net:
> 8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,
> solrdev.test.net:8986/solr/filesearch=json
>
>
> http://solrdev.test.net:8984/solr/filesearch/select?=on;
> fl=id=score=200=200=nawab=solrdev.
> test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/
> filesearch,solrdev.test.net:8986/solr/filesearch=json
>
>
> Thanks
> Nawab
>
>
>
>
> On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> I don't think you're reading it correctly. First of all, if you're
>> going to do be doing deep paging you should be using cusorMark, see:
>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>>
>> Second, it's a two-pass process if you don't use cursormark. The first
>> pass gets the candidate docs from each shard. But all it returns is
>> the ID and sort criteria. Then the aggregator node gets the _true_ top
>> N after sorting all the lists from each shard and issues a second
>> request for _only_ those docs that have made the top N from each sub
>> shard, and those should be the only ones highlighted.
>>
>> Do you have any evidence to the contrary that they're all being
>> highlighted? Or are you misinterpreting the log message for the first
>> pass?
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > In a multi-node solr installation (without SolrCloud), during a paging
>> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
>> rows
>> > from each shard. If highlighting is ON, then the primary node is asking
>> for
>> > highlighting all the 1200 results from each shard, which doesn't scale
>> > well. Is there a way to break the shard query in two steps e.g. ask for
>> the
>> > 1200 rows and after sorting the 1200 responses from each shard and
>> finding
>> > final rows to return (1001 to 1200) , issue another query to shards for
>> > asking highlighted response for the relevant docs?
>> >
>> >
>> >
>> > Thanks
>> > Nawab
>>
>
>


Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Nawab Zada Asad Iqbal
Thanks Erick for the pointing to better option. I will explore that. After
your email, I found that if i have specified 'fl=*' in the query then it is
doing the right thing (a 2 pass process). However, my queries had
'fl=id+score' (or sometimes fl=id=score), in both of these cases I found
that the shards are asked for highlighting all the results on the first
request (and there is no second request).

The fl=* query is (in my sample case) finishing in 100 msec while same
query with fl=id+score finishes in 1200 msec.

Here are the two queries;

http://solrdev.test.net:8984/solr/filesearch/select?=on=*=200=200=nawab=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch=json


http://solrdev.test.net:8984/solr/filesearch/select?=on=id=score=200=200=nawab=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch=json


Thanks
Nawab




On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I don't think you're reading it correctly. First of all, if you're
> going to do be doing deep paging you should be using cusorMark, see:
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>
> Second, it's a two-pass process if you don't use cursormark. The first
> pass gets the candidate docs from each shard. But all it returns is
> the ID and sort criteria. Then the aggregator node gets the _true_ top
> N after sorting all the lists from each shard and issues a second
> request for _only_ those docs that have made the top N from each sub
> shard, and those should be the only ones highlighted.
>
> Do you have any evidence to the contrary that they're all being
> highlighted? Or are you misinterpreting the log message for the first
> pass?
>
> Best,
> Erick
>
> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Hi,
> >
> > In a multi-node solr installation (without SolrCloud), during a paging
> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
> rows
> > from each shard. If highlighting is ON, then the primary node is asking
> for
> > highlighting all the 1200 results from each shard, which doesn't scale
> > well. Is there a way to break the shard query in two steps e.g. ask for
> the
> > 1200 rows and after sorting the 1200 responses from each shard and
> finding
> > final rows to return (1001 to 1200) , issue another query to shards for
> > asking highlighted response for the relevant docs?
> >
> >
> >
> > Thanks
> > Nawab
>


Request Highlighting only for the final set of rows

2017-08-17 Thread Nawab Zada Asad Iqbal
Hi,

In a multi-node solr installation (without SolrCloud), during a paging
scenario (e.g., start=1000, rows=200), the primary node asks for 1200 rows
from each shard. If highlighting is ON, then the primary node is asking for
highlighting all the 1200 results from each shard, which doesn't scale
well. Is there a way to break the shard query in two steps e.g. ask for the
1200 rows and after sorting the 1200 responses from each shard and finding
final rows to return (1001 to 1200) , issue another query to shards for
asking highlighted response for the relevant docs?



Thanks
Nawab


Re: solr asks more and more rows from shards

2017-08-17 Thread Nawab Zada Asad Iqbal
So, it turned out that i was not paying attention to the start parameter. I
found that the 'primary' node is asking for very large count of rows from
shard nodes when the start= is a large value.

On Thu, Aug 17, 2017 at 11:28 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi solr community
>
> I am having performance issues after solr6 upgrade. I have multiple nodes
> in the cluster and direct queries to one of them with `shards=[list of
> hosts]` which takes care of submitting queries to all the shards and
> aggregating the results. All the original queries have rows=200.
>
> I have found that many logs with `distrib=false` have large rows value.
> e.g., values like: 1200, 2200, 3200, 4200, ... .
>
> What is triggering it?  What am I doing wrong to cause this behavior.
>
>
> Thanks in advance
> Nawab
>


Re: Unable to write response, client closed connection or we are shutting down

2017-08-17 Thread Nawab Zada Asad Iqbal
Hi Shawn;

Double thanks for answering my whole thread.

Regarding the page fault thing,  that seems to be a concern because this
setup is identical for both solr4 and solr6. Although, I cannot find a good
way to debug it yet.

I found some strange behavior today that my primary solr node (which
handles queries with 'shards' parameter) is  asking a very large number of
'rows' from shard nodes. (I sent this in a different email so that I don't
jumble together different questions in same thread.)


Thanks
Nawab


On Thu, Aug 17, 2017 at 11:32 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 8/12/2017 11:48 AM, Nawab Zada Asad Iqbal wrote:
> > I am executing a query performance test against my solr 6.6 setup and I
> > noticed following exception every now and then. What do I need to do?
> >
> > Aug 11, 2017 08:40:07 AM INFO  (qtp761960786-250) [   x:filesearch]
> > o.a.s.s.HttpSolrCall Unable to write response, client closed connection
> or
> > we are shutting down
> > org.eclipse.jetty.io.EofException
>
> 
>
> > Caused by: java.io.IOException: Broken pipe
>
> 
>
> > Apart from that, I also noticed that the query response time is longer
> than
> > I expected, while the memory utilization stays <= 35%. I thought that
> > somewhere I have set maxThreads (Jetty) to a very low number, however I
> am
> > falling back on default which is 1 (so that shouldn't be a problem).
>
> The EofException and "broken pipe" messages are typical when the client
> closes the TCP connection before Solr finishes processing the request
> and sends a response.  When Solr finally finishes working and has a
> response, the web container where Solr is running tries to send the
> response back, but finds that the connection is gone, and logs the kind
> of exception you are seeing.
>
> Very likely what has happened is that the program sending the queries
> has a very low socket timeout (or total request timeout) configured on
> the http connection, and that the requests are taking longer than that
> timeout to execute, so the query software closes the connection.
>
> Later in the thread you mentioned maxConnections.  Some software might
> decide to kill existing connections when that limit is exceeded, so more
> connections can be opened.  That's something you'd need to discuss with
> whoever wrote the software.
>
> Also later in the thread you mentioned "page faults" ... without a lot
> of specific detail, we're not going to have any idea what you mean by
> that.  I can tell you that if you're looking at operating system memory
> counters, page faults are a completely normal part of OS operation.  By
> itself, that number won't mean anything.
>
> Long query times can be caused by many things.  One of the most common
> is not having enough memory left over for the operating system to
> effectively cache your index ... but this is not the only thing that can
> cause problems.
>
> Thanks,
> Shawn
>
>


solr asks more and more rows from shards

2017-08-17 Thread Nawab Zada Asad Iqbal
Hi solr community

I am having performance issues after solr6 upgrade. I have multiple nodes
in the cluster and direct queries to one of them with `shards=[list of
hosts]` which takes care of submitting queries to all the shards and
aggregating the results. All the original queries have rows=200.

I have found that many logs with `distrib=false` have large rows value.
e.g., values like: 1200, 2200, 3200, 4200, ... .

What is triggering it?  What am I doing wrong to cause this behavior.


Thanks in advance
Nawab


Re: Solr query help

2017-08-17 Thread Nawab Zada Asad Iqbal
Hi Krishna

I haven't used date range queries myself. But if Solr only supports a
particular date format, you can write a thin client for queries, which will
convert the date to solr's format and query solr.

Nawab

On Thu, Aug 17, 2017 at 7:36 AM, chiru s  wrote:

> Hello guys
>
> I am working on Apache solr and I am stuck with a use case.
>
>
> The input data will be in the documents like 2017/03/15 in 1st document,
>
> 2017/04/15 in 2nd doc,
>
> 2017/05/15 in 3rd doc,
>
> 2017/06/15 in 4th doc so on
>
> But while fetching the data it should fetch like 03/15/2017 for the first
> doc and so on.
>
> My requirement is like this ..
>
>
> The data is like above and when I do an fq with name:[2017/03/15 TO
> 2017/05/15] it fetches me the 1st three documents.. but the need the data
> as 03/15/2017 instead of 2017/03/15.
>
>
> I tried solr.pattetnReplaceCharFilterFactory but it doesn't seem working..
>
> Can you please help on the above.
>
>
> Thanks in advance
>
>
> Krishna...
>


Re: Unable to write response, client closed connection or we are shutting down

2017-08-16 Thread Nawab Zada Asad Iqbal
So, I tried few things and it seems like there are more page faults after
the solr6 upgrade. Even when there is no update or query activity (Except
the periodic commit), the pagefaults are little higher than they used to be
.


Any suggestions in this area?

Thanks
Nawab

On Tue, Aug 15, 2017 at 4:09 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi Rick
>
> My software is not very sophisticated. I have picked some queries from
> production logs, which I am replaying against this solr installation. It is
> not a SolrCloud but i specify "shards="  in the query to gather results
> from all shards.
>
> I found some values to tweak e.g.
> 1500
> 15
>
> After doing this, the "Unable to write response, client closed connection
> or we are shutting down" error is mostly gone.
>
> However, the query perf is bad. The server is not using all the assigned
> memory. However CPU usage is reaching 80%.
> The query response time is 50+ times worse (e.g., 1400 msec vs 20 msec for
> 75th percentile) .
> What can I do to use more memory and hopefully alleviate some of this bad
> performance?
>
> My cache settings are identical to older setup.
>
>
> Thanks
> Nawab
>
>
>
>
>
> On Mon, Aug 14, 2017 at 9:01 AM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Nawab
>> What test software do you use? What else is happening when the exception
>> occurs?
>> Cheers -- Rick
>>
>> On August 12, 2017 1:48:19 PM EDT, Nawab Zada Asad Iqbal <
>> khi...@gmail.com> wrote:
>> >Hi,
>> >
>> >I am executing a query performance test against my solr 6.6 setup and I
>> >noticed following exception every now and then. What do I need to do?
>> >
>> >Aug 11, 2017 08:40:07 AM INFO  (qtp761960786-250) [   x:filesearch]
>> >o.a.s.s.HttpSolrCall Unable to write response, client closed connection
>> >or
>> >we are shutting down
>> >org.eclipse.jetty.io.EofException
>> >at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:199)
>> >at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420)
>> >at org.eclipse.jetty.io.WriteFlusher.completeWrite(
>> >WriteFlusher.java:375)
>> >at org.eclipse.jetty.io.SelectChannelEndPoint$3.run(
>> >SelectChannelEndPoint.java:107)
>> >at org.eclipse.jetty.io.SelectChannelEndPoint.onSelected(
>> >SelectChannelEndPoint.java:193)
>> >at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.
>> >processSelected(ManagedSelector.java:283)
>> >at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(
>> >ManagedSelector.java:181)
>> >at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
>> >executeProduceConsume(ExecuteProduceConsume.java:249)
>> >at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
>> >produceConsume(ExecuteProduceConsume.java:148)
>> >   at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
>> >ExecuteProduceConsume.java:136)
>> >at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
>> >QueuedThreadPool.java:671)
>> >at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
>> >QueuedThreadPool.java:589)
>> >at java.lang.Thread.run(Thread.java:748)
>> >Caused by: java.io.IOException: Broken pipe
>> >at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>> >at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>> >at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>> >at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>> >at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>> >at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:177)
>> >
>> >
>> >
>> >Apart from that, I also noticed that the query response time is longer
>> >than
>> >I expected, while the memory utilization stays <= 35%. I thought that
>> >somewhere I have set maxThreads (Jetty) to a very low number, however I
>> >am
>> >falling back on default which is 1 (so that shouldn't be a
>> >problem).
>> >
>> >
>> >Thanks
>> >Nawab
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
>
>


Breaking down QTime for debugging performance

2017-08-15 Thread Nawab Zada Asad Iqbal
Hi all

For a given solr host and shard, is there any way to get a breakdown on
QTime to see where is the time being spent?


Thanks
Nawab


Re: Unable to write response, client closed connection or we are shutting down

2017-08-15 Thread Nawab Zada Asad Iqbal
Hi Rick

My software is not very sophisticated. I have picked some queries from
production logs, which I am replaying against this solr installation. It is
not a SolrCloud but i specify "shards="  in the query to gather results
from all shards.

I found some values to tweak e.g.
1500
15

After doing this, the "Unable to write response, client closed connection
or we are shutting down" error is mostly gone.

However, the query perf is bad. The server is not using all the assigned
memory. However CPU usage is reaching 80%.
The query response time is 50+ times worse (e.g., 1400 msec vs 20 msec for
75th percentile) .
What can I do to use more memory and hopefully alleviate some of this bad
performance?

My cache settings are identical to older setup.


Thanks
Nawab





On Mon, Aug 14, 2017 at 9:01 AM, Rick Leir <rl...@leirtech.com> wrote:

> Nawab
> What test software do you use? What else is happening when the exception
> occurs?
> Cheers -- Rick
>
> On August 12, 2017 1:48:19 PM EDT, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> >Hi,
> >
> >I am executing a query performance test against my solr 6.6 setup and I
> >noticed following exception every now and then. What do I need to do?
> >
> >Aug 11, 2017 08:40:07 AM INFO  (qtp761960786-250) [   x:filesearch]
> >o.a.s.s.HttpSolrCall Unable to write response, client closed connection
> >or
> >we are shutting down
> >org.eclipse.jetty.io.EofException
> >at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:199)
> >at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420)
> >at org.eclipse.jetty.io.WriteFlusher.completeWrite(
> >WriteFlusher.java:375)
> >at org.eclipse.jetty.io.SelectChannelEndPoint$3.run(
> >SelectChannelEndPoint.java:107)
> >at org.eclipse.jetty.io.SelectChannelEndPoint.onSelected(
> >SelectChannelEndPoint.java:193)
> >at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.
> >processSelected(ManagedSelector.java:283)
> >at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(
> >ManagedSelector.java:181)
> >at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> >executeProduceConsume(ExecuteProduceConsume.java:249)
> >at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> >produceConsume(ExecuteProduceConsume.java:148)
> >   at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> >ExecuteProduceConsume.java:136)
> >at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> >QueuedThreadPool.java:671)
> >at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> >QueuedThreadPool.java:589)
> >at java.lang.Thread.run(Thread.java:748)
> >Caused by: java.io.IOException: Broken pipe
> >at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> >at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> >at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> >at sun.nio.ch.IOUtil.write(IOUtil.java:51)
> >at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> >at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:177)
> >
> >
> >
> >Apart from that, I also noticed that the query response time is longer
> >than
> >I expected, while the memory utilization stays <= 35%. I thought that
> >somewhere I have set maxThreads (Jetty) to a very low number, however I
> >am
> >falling back on default which is 1 (so that shouldn't be a
> >problem).
> >
> >
> >Thanks
> >Nawab
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Unable to write response, client closed connection or we are shutting down

2017-08-12 Thread Nawab Zada Asad Iqbal
Hi,

I am executing a query performance test against my solr 6.6 setup and I
noticed following exception every now and then. What do I need to do?

Aug 11, 2017 08:40:07 AM INFO  (qtp761960786-250) [   x:filesearch]
o.a.s.s.HttpSolrCall Unable to write response, client closed connection or
we are shutting down
org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:199)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420)
at org.eclipse.jetty.io.WriteFlusher.completeWrite(
WriteFlusher.java:375)
at org.eclipse.jetty.io.SelectChannelEndPoint$3.run(
SelectChannelEndPoint.java:107)
at org.eclipse.jetty.io.SelectChannelEndPoint.onSelected(
SelectChannelEndPoint.java:193)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.
processSelected(ManagedSelector.java:283)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(
ManagedSelector.java:181)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
executeProduceConsume(ExecuteProduceConsume.java:249)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:177)



Apart from that, I also noticed that the query response time is longer than
I expected, while the memory utilization stays <= 35%. I thought that
somewhere I have set maxThreads (Jetty) to a very low number, however I am
falling back on default which is 1 (so that shouldn't be a problem).


Thanks
Nawab


which class is: o.a.s.c.S.Request

2017-08-10 Thread Nawab Zada Asad Iqbal
Hi

I see logs from this class 'o.a.s.c.S.Request',  and I am able to tune this
log by going to the logging webpage (Solr -> Request), but I cannot find
the full class name in code. What should I put in the log properties file
to disable this log?


Thanks
Nawab


Solr 6.6: Configure number of indexing threads

2017-08-06 Thread Nawab Zada Asad Iqbal
Hi

I have switched between solr and lucene user lists while debugging this
issue (detail In following thread) My current hypothesis is that since a
large number of indexing threads are being created ( maxIndexingThreads
config is now obsolete) , each output segment is really small .  Reference:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-6659


Is there any config in solr 6.6 to control this ?
If not , why was the current config  considered useless ?

Thanks
Nawab

-- Forwarded message -
From: Nawab Zada Asad Iqbal <khi...@gmail.com>
Date: Sun, Aug 6, 2017 at 8:25 AM
Subject: Re: Understanding flush and DocumentsWriterPerThread
To: <java-u...@lucene.apache.org>


I think I am hitting this problem. Since, maxIndexingThreads is not used
anymore, i see 330+ indexing threads (in the attached log:-   "334 in-use
non-flushing threads states" )

The bugfix recommends using custom code to control concurrency in
IndexWriter, how can I configure it using solr6.6 ?


On Sat, Aug 5, 2017 at 12:59 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I am debugging a bulk indexing performance issue while upgrading to 6.6
> from 4.5.0 . I have commits disabled while indexing total of 85G data
> during 7 hours. At the end of it, I want some 30 or so big segments. But i
> am getting 3000 segments.
> I deleted the index and enabled infostream logging ; i have attached the
> log when first segment is flushed. Here are few questions:
>
> 1. When a segment if flushed , then is it permanent or can more documents
> be written to it (besides the merge scenario)?
> 2. It seems that 330+ threads are writing in parallel. Will each one of
> them become one segment when written to the disk? In which case, i should
> probably decrease concurrency?
> 3. One possibility is to delay flushing, the flush is getting triggered at
> 1MB, probably coming from 1 ;
> however, the segment which is flushed is only 115MB. Is this limit for the
> combined size of all in-memory segments? In which case, is it ok to
> increase it further to use more of my heap (48GB).
> 4. How can I decrease the concurrency, maybe the solution is to use fewer
> in memory segments?
>
> In previous run, there were 110k files in the index folder after I
> stopping indexing. Before doing commit, I noticed that the file count
> continued to decrease every few minutes, until it reduced to 27k or so. (I
> committed after it stabilized)
>
>
> My Indexconfig is this:
>
>   
> 1000
> 1
> 10
> false
> 1
>class="org.apache.solr.index.TieredMergePolicyFactory">
>   5
>  3000
>   10
>   16
>   
>   20
>   1
> 
>   class="org.apache.lucene.index.ConcurrentMergeScheduler">
>10
>10
>  
> ${solr.lock.type:native}
> true
> 
>   1
>   0
> 
> true
> false
>   
>
>
> Thanks
> Nawab
>
>
>


Commit takes very long with NoSuchFileException

2017-08-03 Thread Nawab Zada Asad Iqbal
Hi,

I have a host with 3 solr processes running, each with one shard only;
there are no replicas. I am reindexing some 100 GB of data per solr (or per
shard since each solr has one shard).

After about 3 hours, I manually committed once. I was able to get through
40 GB in each shard, and the commit response came within 2 minutes.

After another 3 hours, I stopped the indexing client and manually committed
again. Two shards returned within few minutes (total size now 85GB +).
However, 3rd shard (or solr process) was stuck for almost two hours now
(until I stopped the server).  And has started to throw the following
exception:
This shard also has a lot more files in data/index folder. 68k vs 31k & in
other two shards. I am not sure if that can impact. FWIW, the file
descriptors limit on this host is 65k.
I am relying on default concurrent merge scheduler (which probably opens 4
threads on each solr server).

java.nio.file.NoSuchFileException:
/box/var/solr/shard1/filesearch/data/index/segments_2
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:615)
at
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:588)
at
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:138)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)



After stopping the server, I noticed this stacktrace:- (It seems that one
indexing request was still stuck in the system 

Re: logging support in Lucene code

2017-07-31 Thread Nawab Zada Asad Iqbal
Thanks Shawn for the detailed context.
I saw some Logger (java.util.logging) in one class in lucene folder, hence
I thought that logging is now properly supported. Since, i am using solr
(and indirectly lucene), I will use whatever solr is using.

Not depending on any concrete logger is good for lucene, as it is included
in other projects too.


Regards
Nawab

On Fri, Jul 28, 2017 at 6:57 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 7/27/2017 10:57 AM, Nawab Zada Asad Iqbal wrote:
> > I see a lot of discussion on this topic from almost 10 years ago: e.g.,
> > https://issues.apache.org/jira/browse/LUCENE-1482
> >
> > For 4.5, I relied on 'System.out.println' for writing information for
> > debugging in production.
> >
> > In 6.6, I notice that some classes in Lucene are instantiating a Logger,
> > should I use Logger instead? I tried to log with it, but I don't see any
> > output in logs.
>
> You're asking about this on a Solr list, not a Lucene list.  I am not
> subscribed to the main Lucene user list, so I do not know if you have
> also sent this question to that list.
>
> Solr uses slf4j for logging.  Many of its dependencies have chosen other
> logging frameworks.
>
> https://www.slf4j.org/
>
> With slf4j, you can utilize just about any supported logging
> implementation to do the actual end logging.  The end implementation
> chosen by the Solr project for version 4.3 and later is log4j 1.x.
>
> It is my understanding that Lucene's core module has zero dependencies
> -- it's pure Java.  That would include any external logging
> implementation.  I do not know if the core module even uses
> java.util.logging ... a quick grep for "Logger" suggests that there are
> no loggers in use in the core module at all, but it's possible that I
> have not scanned for the correct text.  I did notice that
> TestIndexWriter uses a PrintStream for logging, and Shalin's reply has
> reminded me about the infoStream feature.
>
> Looking at the source code, it does appear that some of the other Lucene
> modules do use a logger. Some of them appear to use the logger built
> into java, others seem to use one of the third-party implementations
> like slf4j.  Some of the dependent jars pulled in for non-core Lucene
> modules depend on various logging implementations.
>
> Logging frameworks can be the center of a religious flamewar.  Opinions
> run strong.  IMHO, if you are writing your own code, the best option is
> slf4j, bound to whatever end logging implementation you are most
> comfortable using.  You can install slf4j jars to intercept logging sent
> to the other common logging implementations and direct those through
> slf4j so they end up in the same place as everything else.
>
> Note if you want to use log4j2 as your end logging destination with
> slf4j: log4j2 comes with jars implementing the slf4j classes, so you're
> probably going to want to use those.
>
> Thanks,
> Shawn
>
>


Re: logging support in Lucene code

2017-07-31 Thread Nawab Zada Asad Iqbal
Thanks Shalin
I actually had that config true, so it seems that I may not be exercising
the right scenario to execute that logline.

On Fri, Jul 28, 2017 at 1:47 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Lucene does not use a logger framework. But if you are using Solr then you
> can route the infoStream logging to Solr's log files by setting an option
> in the solrconfig.xml. See
> http://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html#
> IndexConfiginSolrConfig-OtherIndexingSettings
>
> On Fri, Jul 28, 2017 at 11:13 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
> > Any doughnut for me ?
> >
> >
> > Regards
> > Nawab
> >
> > On Thu, Jul 27, 2017 at 9:57 AM Nawab Zada Asad Iqbal <khi...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I see a lot of discussion on this topic from almost 10 years ago: e.g.,
> > > https://issues.apache.org/jira/browse/LUCENE-1482
> > >
> > > For 4.5, I relied on 'System.out.println' for writing information for
> > > debugging in production.
> > >
> > > In 6.6, I notice that some classes in Lucene are instantiating a
> Logger,
> > > should I use Logger instead? I tried to log with it, but I don't see
> any
> > > output in logs.
> > >
> > >
> > > Regards
> > > Nawab
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


master replication: what are 'master (searching)' and 'master (replicable)' fields

2017-07-31 Thread Nawab Zada Asad Iqbal
Hi,

>From the  solr console, I see  'master (searching)' and 'master
(replicable)' fields on `host/solr/#/core_1/replication` page and wondering
how does it impact me given that I don't have any replicas.

If I don't have any replicas, does it make any impact on performance by
enabling or disabling replication?

Also, when i am indexing new documents, I see `master (searching)` size
growing even when I disable the replication from this UI. What does it
imply?


Regards
Nawab


Re: logging support in Lucene code

2017-07-27 Thread Nawab Zada Asad Iqbal
Any doughnut for me ?


Regards
Nawab

On Thu, Jul 27, 2017 at 9:57 AM Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Hi,
>
> I see a lot of discussion on this topic from almost 10 years ago: e.g.,
> https://issues.apache.org/jira/browse/LUCENE-1482
>
> For 4.5, I relied on 'System.out.println' for writing information for
> debugging in production.
>
> In 6.6, I notice that some classes in Lucene are instantiating a Logger,
> should I use Logger instead? I tried to log with it, but I don't see any
> output in logs.
>
>
> Regards
> Nawab
>


Re: Unable to create core [collection] Caused by: null

2017-07-27 Thread Nawab Zada Asad Iqbal
Lucas may be hitting this issue:
https://stackoverflow.com/questions/4659151/recurring-exception-without-a-stack-trace-how-to-reset


Could you try running your server with jvm value:
-XX:-OmitStackTraceInFastThrow
?



Nawab

On Wed, Jul 26, 2017 at 11:42 AM, Anshum Gupta  wrote:

> Hi Lucas,
>
> It would be super useful if you provided more information with the
> question. A few things you might want to include are - version of Solr, how
> did you start it, stack trace from the log etc.
>
>
> -Anshum
>
>
>
> > On Jul 25, 2017, at 4:21 PM, Lucas Pelegrino 
> wrote:
> >
> > Hey guys.
> >
> > Trying to make solr work here, but I'm getting this error from this
> command:
> >
> > $ ./solr create -c products -d /Users/lucaswxp/reduza-solr/
> products/conf/
> >
> > Error CREATEing SolrCore 'products': Unable to create core [products]
> > Caused by: null
> >
> > I'm posting my solrconf.xml, schema.xml and data-config.xml here:
> > https://pastebin.com/fnYK9pSJ
> >
> > The debug from log solr: https://pastebin.com/kVLMvBwZ
> >
> > Not sure what to do, the error isn't very descriptive.
>
>


logging support in Lucene code

2017-07-27 Thread Nawab Zada Asad Iqbal
Hi,

I see a lot of discussion on this topic from almost 10 years ago: e.g.,
https://issues.apache.org/jira/browse/LUCENE-1482

For 4.5, I relied on 'System.out.println' for writing information for
debugging in production.

In 6.6, I notice that some classes in Lucene are instantiating a Logger,
should I use Logger instead? I tried to log with it, but I don't see any
output in logs.


Regards
Nawab


Re: How to use javacc with QueryParser.jj

2017-07-24 Thread Nawab Zada Asad Iqbal
I guess, I finally found the answer here:

http://codegouge.blogspot.com/2014/01/modifying-solr-queryparser.html

"
If you're doing development in Solr trunk and want to adjust the
QueryParser, take a look at the JavaCC grammar file
<https://javacc.java.net/doc/javaccgrm.html> at
lucene/solr/core/src/java/org/apache/solr/parser/QueryParser.jj.  This
isn't a tutorial about JavaCC - there are plenty of those out there
<http://cs.lmu.edu/%7Eray/notes/javacc/>.

Once your changes are complete, you'll need to generate the underlying
classes again.  ant builds from lucene/ or lucene/solr/ don't accomplish
this.  So to do this, run 'ant javacc' from lucene/solr/core/.

That's it.
"

On Mon, Jul 24, 2017 at 7:31 AM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> [Subject changed for reposting]
>
> Good morning,
>
> If I want to change something in the lucene-solr/solr/core/src/java
> /org/apache/solr/parser/QueryParser.jj, what is the workflow to generate
> the new Java code?
>
>
> Thanks
> Nawab
>
> On Fri, Jul 21, 2017 at 7:33 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
>> ok,  I see there is an `ant javacc` target in some folders, e.g.
>>
>> 1) lucene-solr/solr/build/solr/src-export/solr/core
>> 2) lucene-solr/lucene/queryparser
>>
>> Both of them use different parser files. I am interested in the
>> QueryParser at path:
>> lucene-solr/solr/core/src/java/org/apache/solr/parser/QueryParser.jj
>>
>> this apparently is getting dropped at:  lucene-solr/solr/build/solr/sr
>> c-export/solr/core/src/java/org/apache/solr/parser/QueryParser.jj
>>
>> However, I am not sure what target drops it!
>>
>>
>> Nawab
>>
>>
>>
>>
>> On Fri, Jul 21, 2017 at 7:12 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I know that we can make changes in the language by editing
>>> QueryParser.jj, however, how does it get generated into java code? Is there
>>> any ant target?
>>> 'compile' doesn't seem to generate java code for my changes (e.g.,
>>> adding lower case logical operators).
>>>
>>>
>>> Regards
>>> Nawab
>>>
>>
>>
>


How to use javacc with QueryParser.jj

2017-07-24 Thread Nawab Zada Asad Iqbal
[Subject changed for reposting]

Good morning,

If I want to change something in the lucene-solr/solr/core/src/java
/org/apache/solr/parser/QueryParser.jj, what is the workflow to generate
the new Java code?


Thanks
Nawab

On Fri, Jul 21, 2017 at 7:33 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> ok,  I see there is an `ant javacc` target in some folders, e.g.
>
> 1) lucene-solr/solr/build/solr/src-export/solr/core
> 2) lucene-solr/lucene/queryparser
>
> Both of them use different parser files. I am interested in the
> QueryParser at path:
> lucene-solr/solr/core/src/java/org/apache/solr/parser/QueryParser.jj
>
> this apparently is getting dropped at:  lucene-solr/solr/build/solr/sr
> c-export/solr/core/src/java/org/apache/solr/parser/QueryParser.jj
>
> However, I am not sure what target drops it!
>
>
> Nawab
>
>
>
>
> On Fri, Jul 21, 2017 at 7:12 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I know that we can make changes in the language by editing
>> QueryParser.jj, however, how does it get generated into java code? Is there
>> any ant target?
>> 'compile' doesn't seem to generate java code for my changes (e.g., adding
>> lower case logical operators).
>>
>>
>> Regards
>> Nawab
>>
>
>


  1   2   >