from:"Yonik Seeley"

Re: How Json facet API works with domains and facet functions?

2015-12-11 Thread Yonik Seeley

If you search on the parents and want to match child documents, I
think you want {!child} and not {!parent} in your queries or filters.

fq={!child of=...}date_query_on_parents
fq=child_prop:X

For this specific example, you don't even need the block-join support
in facets since the base domain (query+filters) will already be the
child docs you want to facet over.

-Yonik


On Fri, Dec 11, 2015 at 11:46 AM, Yago Riveiro  wrote:
> Hi,
>
> How the json facet api works with domains and facet functions?
>
> I try to google some info and I do not find nothing useful.
>
> How can do a query that find all parents that match a clause (a date) and
> calculate the avg price of all of children that have property X?
>
> Following yonik's blog example I try something like this:
>
> http://localhost:8983/solr/query?q={!parent
> which="parent_type:ecommerce"}date:2015-12-11T00:00:00Z={x:'avg(price)',
> domain: { blockChildren : "parent_type:ecommerce"}}
>
> but doesn't work.
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-Json-facet-API-works-with-domains-and-facet-functions-tp4244907.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet count mismatch between solr simple facet and Json facet API.

2015-11-27 Thread Yonik Seeley

Are you using SolrCloud / distributed search?

https://issues.apache.org/jira/browse/SOLR-7452

-Yonik


On Fri, Nov 27, 2015 at 10:11 AM, Vishnu Mishra  wrote:
> Hi
>
> I am using solr 5.3.1 in my application. I have indexed field named given
> below :
>
>  multiValued="true" docValues="true" />
>
> And then using solr json facet API for faceting. But it seems that json
> facet API produce less and incorrect result counts than simple solr facet.
> The json facet request which I am doing is as below:
>
> json.facet={
> TitleFacet: {
> type: terms,
> field: Title,
> offset: 0,
> limit: 100,
> mincount: 1,
> sort: {
> count: desc
> }
> }
> }
>
> gives for example 63 count. And then equivalent simple facet query given
> below
>
> facet=true=Title=100=1=0
>
> gives 65 count.
>
>
> Is there any issue with Solr Json facet or am I doing anything wrong. Can
> anybody help me.
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Facet-count-mismatch-between-solr-simple-facet-and-Json-facet-API-tp4242461.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Faceting] Exact Value Faceting VS ID Faceting

2015-11-26 Thread Yonik Seeley

On Thu, Nov 26, 2015 at 3:32 AM, Toke Eskildsen  
wrote:
> If we had a hashing method String->long and guaranteed that there would
> be no collisions (or we accepted the occasional faulty result), then we
> could avoid the segment->global map as well as the centralized term
> server. To my knowledge, this has not yet been attempted.

I've thought about that before, but another problem with that approach
is how to map back to the actual term value (a string->long won't be
reversible).  A naive  approach would also index the hash and then
also store the original string values in docvalues.  Hence after you
find the top K hashes, you can look up a document with that hash to
find a docid containing it, and then use the string docvalues to look
it up (or store it as a payload).  That's a lot of overhead.

-Yonik

Re: JSON facets and excluded queries

2015-11-25 Thread Yonik Seeley

Here's a little tutorial on multi-select faceting w/ the JSON Facet API:
http://yonik.com/multi-select-faceting/

-Yonik


On Tue, Nov 24, 2015 at 12:56 PM, Aigner, Max  wrote:
> I'm currently evaluating Solr 5.3.1 for performance improvements with 
> faceting.
> However, I'm unable to get the 'exclude-tagged-filters' feature to work. A 
> lot of the queries I'm doing are in the format
>
> ...?q=category:123={!tag=fqCol}color:green=true{!key=price_all
>  ex=fqCol}price{!key=price_nogreen}price...
>
> I couldn't find a way to make this work with JSON facets, the 'ex=' local 
> param doesn't seem to have a corresponding new parameter in JSON facets.
> Am I just missing something or is there a new recommended way for calculating 
> facets over a subset of filters?
>
> Thanks!
>

Re: JSON facets and excluded queries

2015-11-25 Thread Yonik Seeley

On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max  wrote:
> Thanks, this is great :=))
>
> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to 
> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I 
> get that right?

Hmmm, the "domain" keyword was added for 5.3 along with block join
faceting: http://yonik.com/solr-nested-objects/
That's when I switched "excludeTags" to also be under the "domain" keyword.

Let me try it out...

-Yonik

Re: JSON facets and excluded queries

2015-11-25 Thread Yonik Seeley

OK, just fixed this in https://issues.apache.org/jira/browse/SOLR-8341
and that domain syntax will work in 5.4
I'll update my blog on multi-select faceting note that.

-Yonik

On Wed, Nov 25, 2015 at 2:37 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> 
>> wrote:
>>> Thanks, this is great :=))
>>>
>>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem 
>>> to be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. 
>>> Did I get that right?
>>
>> Hmmm, the "domain" keyword was added for 5.3 along with block join
>> faceting: http://yonik.com/solr-nested-objects/
>> That's when I switched "excludeTags" to also be under the "domain" keyword.
>>
>> Let me try it out...
>
> Ah, I messed up that migration...
> OK, for now, instead of
>   domain:{excludeTags:foo}
> just use
>   excludeTags:foo
> and it should work.
>
> -Yonik

Re: JSON facets and excluded queries

2015-11-25 Thread Yonik Seeley

On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max <max.aig...@nordstrom.com> wrote:
>> Thanks, this is great :=))
>>
>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to 
>> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I 
>> get that right?
>
> Hmmm, the "domain" keyword was added for 5.3 along with block join
> faceting: http://yonik.com/solr-nested-objects/
> That's when I switched "excludeTags" to also be under the "domain" keyword.
>
> Let me try it out...

Ah, I messed up that migration...
OK, for now, instead of
  domain:{excludeTags:foo}
just use
  excludeTags:foo
and it should work.

-Yonik

Re: Json Facet api on nested doc

2015-11-22 Thread Yonik Seeley

On Sun, Nov 22, 2015 at 3:10 PM, Mikhail Khludnev
 wrote:
> Hello,
>
> I also played with json.facet, but couldn't achieve the desired result too.
>
> Yonik, Alessandro,
> Do you think it's a new feature or it can be achieved with the current
> implementation?

Not sure if I'm misunderstanding the example, but it looks straight-forward.

terms facet on parent documents, with sub-facet on child documents.
I just committed a test for this, and it worked fine.  See
TestJsonFacets.testBlockJoin()

Can we see an example of a parent document being indexed (i.e. along
with it's child documents)?

-Yonik

Re: Json facet api NullPointerException

2015-11-12 Thread Yonik Seeley

Thanks for the report Yago,
What version is this?

-Yonik


On Thu, Nov 12, 2015 at 10:53 AM, Yago Riveiro  wrote:
> Hi,
>
> I'm hitting this NullPointerException using the json facet API.
>
> Same query using Facet component is working.
>
> Json facet query:
>
> curl -s http://node1:8983/solr/metrics/query -d
> 'q=datetime:[2015-10-01T00:00:00Z TO
> 2015-10-04T23:59:59Z]=0={
> urls: {
> type: terms,
> field: url,
> limit: -1,
> sort: index,
> numBuckets: true
> }}'
>
> Facet component query:
>
> http://node1:8983/solr/metrics/query?q=datetime:[2015-10-01T00:00:00Z%20TO%202015-10-04T23:59:59Z]=true=url=1=-1=0=json=1=index
>
> Total elements returned: 1971203
> Total unique elements returned: 307570
>
> Json facet api response:
>
> 2015-11-12 15:29:53.130 ERROR (qtp1510067370-34151) [c:metrics:shard1
> r:core_node5 x:metrics_shard1_replica2] o.a.s.s.SolrDispatchFilter
> null:java.lang.NullPointerException
> at
> org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:573)
> at
> org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:570)
> at
> org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:258)
> at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:135)
> at
> org.apache.solr.search.facet.FacetFieldProcessorFCBase.findTopSlots(FacetField.java:603)
> at
> org.apache.solr.search.facet.FacetFieldProcessorFCBase.getFieldCacheCounts(FacetField.java:547)
> at
> org.apache.solr.search.facet.FacetFieldProcessorFCBase.process(FacetField.java:512)
> at
> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:222)
> at
> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:313)
> at
> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:57)
> at
> org.apache.solr.search.facet.FacetModule.process(FacetModule.java:87)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Json-facet-api-NullPointerException-tp4239900.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Json facet api NullPointerException

2015-11-12 Thread Yonik Seeley

On Thu, Nov 12, 2015 at 11:48 AM, Yago Riveiro  wrote:
> In my query I have
> sort: index,
>
> And should be
>
> sort:{index:desc|asc}
>
> I think that the json parser should raise a “json parsing error” ...


Yeah, either that or "index" should be synonymous with "index asc".

-Yonik

Re: Parent/Child (Nested Document) Faceting

2015-11-11 Thread Yonik Seeley

On Wed, Nov 11, 2015 at 12:34 PM, Alessandro Benedetti
 wrote:
> Anyway everything seems possible to me trough the ( I love it, can stop to
> repeat it) Json Facet Approach.

Thanks, the positive feedback definitely gives me motivation to keep
improving it!

-Yonik

Re: Parent/Child (Nested Document) Faceting

2015-11-11 Thread Yonik Seeley

On Mon, Nov 9, 2015 at 2:37 PM, Mikhail Khludnev
 wrote:
> Yonik,
>
> I wonder is there a plan or a vision for something like
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html
> under JSON facets?

Hmmm, I couldn't quite grok that complicated command syntax... but the
description seems straight-forward enough:

"The following aggregations will return the top commenters' username
that have commented and per top commenter the top tags of the issues
the user has commented on:"

So if I translate that into "books" and "reviews" that I use here:
http://yonik.com/solr-nested-objects/

it sounds like we start with a set of book objects, then map to the
child domain to facet on comments, then map back to the parent domain
to facet on books again.

>From that blog, this is the command that finds top review authors:

json.facet={
  top_reviewers : {
type: terms,
field: author_s,
domain: { blockChildren : "type_s:book" }
  }
}

Now we just need to add a sub-facet that switches back to the parent
domain to facet on something there (like genre... equiv to "tags" in
the ES example):

son.facet={
  top_reviewers : {
type: terms,
field: author_s,
domain: { blockChildren : "type_s:book" },

facet : {
  type:terms,
  field:genre,
  domain:{blockParent:"type_s:book"}
}

  }
}

While there is certainly more work do be done with joins /
block-joins, it seems like we can already do that specific example at
least.

-Yonik

Re: Costs/benefits of DocValues

2015-11-09 Thread Yonik Seeley

On Mon, Nov 9, 2015 at 12:06 PM, Alexandre Rafalovitch
 wrote:
> Thank you Yonik.
>
> So I would probably advise then to "keep your indexed=true" and think
> about _adding_ docValues when there is a memory pressure or when there
> is clear performance issue for the ...specific... uses.
>
> But if we are keeping the indexed=true, then docValues=true will STILL
> use at least as much memory however efficient docValues are
> themselves, right? Or will something that is normally loaded and use
> memory will stay unloaded in this combination scenario?

Think about it this way: for something like sorting, we need a column
for fast docid->value lookup.
Enabling docValues means building this column at index time.  At
search time, it gets memory mapped, just like most other parts of the
index.  The required memory is off-heap... the OS needs to keep the
file in it's buffer cache for good performance.
If docValues aren't enabled, this means that we need to build the
column on-the-fly on-heap (i.e. FieldCache entry is built from
un-inverting the indexed values).

An indexed field by itself only takes up disk space, just like
docValues.  Of course for searches to be fast, off-heap RAM (in the
form of OS buffer cache / disk cache) is still needed.

-Yonik

Re: Costs/benefits of DocValues

2015-11-09 Thread Yonik Seeley

On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz  wrote:
> I understand that by adding "docValues=true" to some of my fields, I can 
> improve sorting/faceting performance.

I don't think this is true in the general sense.
docValues are built at index-time, so what you will save is initial
un-inversion time (i.e. the first time a field is used after a new
searcher is opened).
After that point, docValues may be slightly slower.

The other advantage of docValues is memory use... much/most of it is
essentially "off-heap", being memory-mapped from disk.  This cuts down
on memory issues and helps reduce longer GC pauses.

docValues are good in general, and I think we should default to them
more for Solr 6, but they are not better in all ways.

> However, I have a couple of questions:
>
>
> 1.)Will Solr always take proper advantage of docValues when it is turned 
> on

Yes.

> , or will I gain greater performance by turning of stored/indexed in 
> situations where only docValues are necessary (e.g. a sort-only field)?
>
> 2.)Will adding docValues to a field introduce significant performance 
> penalties for non-docValues uses of that field, beyond the obvious fact that 
> the additional data will consume more disk and memory?

No, it's a separate part of the index.

-Yonik

> I'm asking this question because the existing schema has some multi-purpose 
> fields, and I'm trying to determine whether I should just add 
> "docValues=true" wherever it might help, or if I need to take a more 
> thoughtful approach and potentially split some fields with copyFields, etc. 
> This is particularly significant because my schema makes use of some dynamic 
> field suffixes, and I'm not sure if I need to add new suffixes to 
> differentiate docValues/non-docValues fields, or if it's okay to turn on 
> docValues across the board "just in case."
>
> Apologies if these questions have already been answered - I couldn't find a 
> totally clear answer in the places I searched.
>
> Thanks!
>
> - Demian

Re: Costs/benefits of DocValues

2015-11-09 Thread Yonik Seeley

On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> I thought docValues were per segment, so the price of un-inversion was
> effectively paid on each commit for all the segments, as opposed to
> just the updated one.

Both the field cache (i.e. uninverting indexed values) and docValues
are mostly per-segment (I say mostly because some uses still require
building a global ord map).

But even when things are mostly per-segment, you hit major segment
merges and the cost of un-inversion (when you aren't using docValues)
is non-trivial.

> I admit I also find the story around docValues to be very confusing at
> the moment. Especially on the interplay with "indexed=false".

You still need "indexed=true" for efficient filters on the field.
Hence if you're faceting on a field and want to use docValues, you
probably want to keep the "indexed=true" on the field as well.

-Yonik


> It would
> make a VERY good article to have this clarified somehow by people in
> the know.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 9 November 2015 at 11:04, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz <demian.k...@villanova.edu> 
>> wrote:
>>> I understand that by adding "docValues=true" to some of my fields, I can 
>>> improve sorting/faceting performance.
>>
>> I don't think this is true in the general sense.
>> docValues are built at index-time, so what you will save is initial
>> un-inversion time (i.e. the first time a field is used after a new
>> searcher is opened).
>> After that point, docValues may be slightly slower.
>>
>> The other advantage of docValues is memory use... much/most of it is
>> essentially "off-heap", being memory-mapped from disk.  This cuts down
>> on memory issues and helps reduce longer GC pauses.
>>
>> docValues are good in general, and I think we should default to them
>> more for Solr 6, but they are not better in all ways.
>>
>>> However, I have a couple of questions:
>>>
>>>
>>> 1.)Will Solr always take proper advantage of docValues when it is 
>>> turned on
>>
>> Yes.
>>
>>> , or will I gain greater performance by turning of stored/indexed in 
>>> situations where only docValues are necessary (e.g. a sort-only field)?
>>>
>>> 2.)Will adding docValues to a field introduce significant performance 
>>> penalties for non-docValues uses of that field, beyond the obvious fact 
>>> that the additional data will consume more disk and memory?
>>
>> No, it's a separate part of the index.
>>
>> -Yonik
>>
>>
>>> I'm asking this question because the existing schema has some multi-purpose 
>>> fields, and I'm trying to determine whether I should just add 
>>> "docValues=true" wherever it might help, or if I need to take a more 
>>> thoughtful approach and potentially split some fields with copyFields, etc. 
>>> This is particularly significant because my schema makes use of some 
>>> dynamic field suffixes, and I'm not sure if I need to add new suffixes to 
>>> differentiate docValues/non-docValues fields, or if it's okay to turn on 
>>> docValues across the board "just in case."
>>>
>>> Apologies if these questions have already been answered - I couldn't find a 
>>> totally clear answer in the places I searched.
>>>
>>> Thanks!
>>>
>>> - Demian

Re: child document faceting returning empty buckets

2015-11-09 Thread Yonik Seeley

On Mon, Nov 9, 2015 at 7:30 PM, Yangrui Guo  wrote:
> Just solved the problem by changing blockChildren:"content_type:children"
> to blockParent:"content_type:children".

Unless you're dealing with multiple levels, you may be using the wrong
content_type value.
That query should always define the full set of parents for both
blockChildren and blockParents.

-Yonik

Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Yonik Seeley

On Fri, Nov 6, 2015 at 3:12 PM, Jack Krupansky  wrote:
> Just to be clear, I was suggesting that the filter query (fq) was slow

That's a possibility.  Filters were actually removed in Lucene, so
it's a very different code path now.

In 4.10, filters were first class, and SolrIndexSearcher used methods like:
search(query, pf.filter, collector);
And BitSet based filters were pushed down to the leaves of a query
(which the filter generated from MatchAllDocsQuery would have been).

At some point, those were changed to use FilteredQuery instead.  But I
think at some point prior Lucene converted a Filter to a
FilteredQuery, so that change in Solr may not have mattered at that
point.

Then in LUCENE-6583, Filters were removed and the code in
SolrIndexSearcher was changed to use a BooleanQuery:
   if (pf.filter != null) {
  Query query = new BooleanQuery.Builder()
  .add(main, Occur.MUST)
  .add(pf.filter, Occur.FILTER)
  .build();
  search(query, collector);

So... lots of changes over time, no idea which (if any) is the cause.

-Yonik

Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-06 Thread Yonik Seeley

On Wed, Nov 4, 2015 at 3:36 PM, Shawn Heisey  wrote:
> The specific index update that fails during the optimize is the SolrJ
> deleteByQuery call.

deleteByQuery may be the outlier here... we have to jump through extra
hoops internally because we don't know which documents it will affect.
Normal adds and deletes should proceed in parallel though.

-Yonik

Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Yonik Seeley

On Fri, Nov 6, 2015 at 9:30 PM, wei  wrote:
> in solr 5.3.1, there is actually a boost, and the score is product of boost
> & queryNorm.

Hmmm, well, it's worth putting on the list of stuff to investigate.
Boosting was also changed in lucene.

What happens if you try this multiple times in a row?

=2=id={!cache=false}*:*=categoryIdsPath:1001

(basically just add {!cache=false} as a prefix to the main query.)

This would allow hotspot time to compile methods, and ensure that the
filter query was cached, and do a better job of isolating the
"filtered match-all-docs" part of the execution.

-Yonik

Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Yonik Seeley

On Fri, Nov 6, 2015 at 9:56 PM, wei  wrote:
> Good point! I tried that, on solr5 the query time is around 100-110ms, and
> on solr4 it is around 60-63ms(very consistent). Solr5 is slower.

When it's something easy, there comes a point when it makes sense to
stop asking more questions and just try it yourself...
I just did this, and can confirm what you're seeing.   For me, 5.3.1
is about 5x slower than 4.10 for this particular query.
Thanks for your persistence / patience in reporting this.  Could you
open a JIRA issue for it?

-Yonik

Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-06 Thread Yonik Seeley

On Fri, Nov 6, 2015 at 10:20 PM, Shawn Heisey  wrote:
>  Is there a decent API for getting uniqueKey?

Not off the top of my head.
I deeply regret making it configurable and not just using "id" ;-)

-Yonik

Re: how to efficiently get sum of an int field

2015-11-05 Thread Yonik Seeley

You can also try the new JSON Facet API if you are on a recent version of Solr.

json.facet={x:"sum(myfield)"}

http://yonik.com/solr-facet-functions/

-Yonik


On Thu, Nov 5, 2015 at 1:14 PM, Renee Sun  wrote:
> Hi -
> I have been using stats to get the sum of a field data (int) like:
>
> =true=my_field_name=0
>
> It works fine but when the index has hundreds million messages on a sharded
> indices, it take long time.
>
> I noticed the 'stats' give out more information than I needed (just sum), I
> suspect the min/max/mean etc are the ones that caused the time.
>
> Is there a simple way I can just get the sum without other things, and run
> it on a faster and less stressed to the solr server manner?
>
> Thanks
> Renee
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to efficiently get sum of an int field

2015-11-05 Thread Yonik Seeley

On Thu, Nov 5, 2015 at 4:55 PM, Renee Sun  wrote:
> Also Yonik, out of curiosity... when I run stats on a large msg set (such as
> 200 million msgs), it tends to use a lot of memory, this should be expected
> correct?

With the stats component, yeah.

> if I were able to use !sum=true to only get sum, a clever algorithm should
> be able to tell if sum is only requited, it will avoid memory overhead, is
> that implemented so ?

I think so,  but I'm not an expert on the stats component.  I looked
at it when I wanted to implement the new JSON Facet API and decided we
were probably better off starting fresh and re-architecting some
things for better performance.

-Yonik

Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-04 Thread Yonik Seeley

On Wed, Nov 4, 2015 at 3:06 PM, Shawn Heisey  wrote:
> I had understood that since 4.0, Solr (Lucene) can continue to update an
> index even while that index is optimizing.

Yes, that should be the case.

> I have discovered in the logs of my SolrJ index maintenance program that
> this does not appear to actually be true.

Hmmm, perhaps some other resource is getting exhausted, like number of
background merges hit the limit?

-Yonik

Re: SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-30 Thread Yonik Seeley

On Thu, Oct 29, 2015 at 5:28 PM, Erick Erickson  wrote:
> Try making batches of 1,000 docs and sending them through instead.

The other thing about ConcurrentUpdateSolrClient is that it will
create batches itself while streaming.
For example, if you call add a number of  times very quickly, those
will all be put in the same update request as they are being streamed
(you get the benefits of batching without the latency it would
normally come with.)

So I guess I'd advise to not batch yourself unless it makes more sense
for your document processing for other reasons.

-Yonik

Re: missing in json facet does not work for stream?

2015-10-23 Thread Yonik Seeley

On Fri, Oct 23, 2015 at 5:55 AM, hao jin  wrote:
> Hi
> I found when the method of json facet is set to stream, the "missing" is not
> added to the result.
> Is it designed or a known issue?

You found an undocumented feature (method=stream) ;-)
That facet method doesn't have adequate testing yet, so I haven't
publicized / documented it.
Support for things like "missing" may be some of the stuff still TBD.

-Yonik

Re: missing in json facet does not work for stream?

2015-10-23 Thread Yonik Seeley

On Fri, Oct 23, 2015 at 10:24 AM, Shalin Shekhar Mangar
<shalinman...@gmail.com> wrote:
> Now I am curious, what does it do!

It's basically like facet.method=enum, but it truly streams
(calculates each facet bucket on-the-fly and writes it to the
response).
Since it is streaming, it only supports sorting by term index order.

Although if there is need/demand, we could also do a lightweight
ordering over the buckets first (ordering by count or other facet
function) and then still stream, creating the buckets and any
sub-facets on the fly.

-Yonik

> On Fri, Oct 23, 2015 at 7:40 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Fri, Oct 23, 2015 at 5:55 AM, hao jin <hao@oracle.com> wrote:
>>> Hi
>>> I found when the method of json facet is set to stream, the "missing" is not
>>> added to the result.
>>> Is it designed or a known issue?
>>
>> You found an undocumented feature (method=stream) ;-)
>> That facet method doesn't have adequate testing yet, so I haven't
>> publicized / documented it.
>> Support for things like "missing" may be some of the stuff still TBD.
>>
>> -Yonik
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: Solr cross core join special condition

2015-10-13 Thread Yonik Seeley

On Wed, Oct 7, 2015 at 9:42 AM, Ryan Josal  wrote:
> I developed a join transformer plugin that did that (although it didn't
> flatten the results like that).  The one thing that was painful about it is
> that the TextResponseWriter has references to both the IndexSchema and
> SolrReturnFields objects for the primary core.  So when you add a
> SolrDocument from another core it returned the wrong fields.

We've made some progress on this front in trunk:

* SOLR-7957: internal/expert - ResultContext was significantly changed
and expanded
  to allow for multiple full query results (DocLists) per Solr request.
  TransformContext was rendered redundant and was removed. (yonik)

So ResultContext now has it's own searcher, ReturnFields, etc.

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Yonik Seeley

On Fri, Sep 25, 2015 at 6:33 AM, Uwe Reh <r...@hebis.uni-frankfurt.de> wrote:
> Am 25.09.2015 um 05:16 schrieb Yonik Seeley:
>>
>> I did some performance benchmarks and opened an issue.  It's bad.
>> https://issues.apache.org/jira/browse/SOLR-8096
>
>
> Hi Yonik,
> thanks a lot for your investigation.
> Using the JSON Facet API is fast and seems to be a usable workaround for new
> applications. But not really, as fast patch to our production environment.

Single-valued fields were likely also impacted (but probably not to
the extent that multi-valued fields were).
Are you faceting on any of those?

> What' your assessment about Bill's question? Is there a chance to get the
> fieldValueCache back?

Unclear.  If you look at
https://issues.apache.org/jira/browse/SOLR-8096
You see
"I was always in favour of removing those top-level facetting
algorithms. So they still have my strong +1."
Which means that it could be veto'd

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Yonik Seeley

On Fri, Sep 25, 2015 at 5:07 AM, Alessandro Benedetti
 wrote:
>There is an undocumented "method" parameter - I need to enable that to
>
>> allow switching between the docvalues approach and the UnInvertedField
>> approach.
>>
>
> Only to clarify, please correct me Yonik if my understanding is wrong or
> outdated :
> To calculate facets, without going into the algorithm details there are 2
> approaches available :
> Term Enum ( good for limited number of unique values for your field) and Fc
> ( FieldCache) good for a lot of unique values, but not for big fields.
>
> For the FC approach,
>  - storing the DocValues for the field would transparently use them ( with
> the known benefit at the cost of disk space for the docValues data
> structures)
>  - without the DocValues , there algorithm will un-invert the index at
> runtime using the field cache to store the results

Yeah, that's right so far.
We should add a switch though for the method of uninversion...
UnInvertedField (for indexes that change less frequently) vs DocValues
(i.e. if you didn't index with DocValues, UnInvertedReader will
uninvert to an in-memory structure that looks like DocValues).

> So , from your quote, Term Enum will not be supported by Json Faceting ?

We can, it just hasn't been a priority yet.

Anyway, I'm going to step away from email and
https://issues.apache.org/jira/browse/SOLR-8096 for a couple of days.
I need to go focus on putting some slides together for
Strata/HadoopWorld next week. I'll be talking about the new facet
module / json facets there.

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Yonik Seeley

On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh  wrote:
> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
[...]
>
> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> cumulative_hitratio of 1.

Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
removed as part of LUCENE-5666, causing these performance regressions.

This code had been evolved over years to be very fast for specific use
cases.  No one facet algorithm is going to be optimal for everyone, so
it's important we have multiple.  But use of the UnInvertedField was
removed without any notification or discussion whatsoever (and
obviously no benchmarking), and was only discovered later by Solr devs
in SOLR-7190 that it was essentially dead code.

When I brought back my "JSON Facet API" work to Solr (which was based
on 4.10.x) it came with a heavily modified version of UnInvertedField
that is available via the JSON Facet API.  It might currently work
better for your usecase.

On your normal (non-docValues) index, you can try something like the
following to see what the performance would be:

$ curl http://yxz/solr/hebis/query -d 'q=darwin&
json.facet={
  authors : { type:terms, field:author_facet, limit:30 },
  material_access : { type:terms, field:material_access, limit:30 },
  material_brief : { type:terms, field:material_brief, limit:30 },
  rvk : { type:terms, field:rvk_facet, limit:30 },
  lang : { type:terms, field:language, limit:30 },
  dept : { type:terms, field:department_3, limit:30 }
}'

There were other changes in LUCENE-5666 that will probably slow down
faceting on the single valued fields as well (so this may still be a
little slower than 4.10.x), but hopefully it would be more
competitive.

-Yonik

Re: Different ports for search and upload request

2015-09-24 Thread Yonik Seeley

On Thu, Sep 24, 2015 at 5:00 PM, Siddhartha Singh Sandhu
 wrote:
> Hey,
>
> Thank you for your reply.
>
> The use case would be that I can concurrently load data into my index via
> one port and then make that(*data) available(NRT search) to user through
> another high availability search endpoint without the fear of my requests
> clogging one port.

Not yet, but it's in development.
It won't require a different port either... different endpoints will
be able to have different request queues and thread pools.

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Yonik Seeley

On Thu, Sep 24, 2015 at 9:58 AM, Yonik Seeley <ysee...@gmail.com> wrote:
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.

I did some performance benchmarks and opened an issue.  It's bad.
https://issues.apache.org/jira/browse/SOLR-8096

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Yonik Seeley

On Thu, Sep 24, 2015 at 10:16 AM, Alessandro Benedetti
<benedetti.ale...@gmail.com> wrote:
> Yonik, I am really excited about the Json faceting module.
> I find it really interesting.
> Is there any pros/cons in using them, or it's definitely the "approach of
> the future" ?

Thanks!

The cons to the new stuff is that it doesn't yet have everything the
old stuff has.  But it does already have new stuff that the old stuff
doesn't have (like sorting by any statistic and rudimentary block-join
integration).

And yes, I do see it as "the future", a platform for integrating the
disparate features that have been developed for solr over time, but
don't always work that well together:
 - search
 - statistics
 - grouping
 - joins


> I saw your benchmarks and seems impressive.
>
> I have not read all the topic in details, just briefly, but is Json
> faceting using different faceting algorithms from the standard ones ? (
> Enum and fc)

I wouldn't say different fundamental algorithms yet... (compared to
4.10) but different code (to support some of the new features) and in
some places more optimized.

> I can not find the algorithm parameter to be passed in the Json facets.

There is an undocumented "method" parameter - I need to enable that to
allow switching between the docvalues approach and the UnInvertedField
approach.

-Yonik


> Are they using a complete different approach ?
> Is the algorithm used expressed anywhere ?
> This could give very good insights on when to use them.
>
> Cheers
>
> 2015-09-24 14:58 GMT+01:00 Yonik Seeley <ysee...@gmail.com>:
>
>> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh <r...@hebis.uni-frankfurt.de>
>> wrote:
>> > our bibliographic index (~20M entries) runs fine with Solr 4.10.3
>> > With Solr 5.3 faceted searching is constantly incredibly slow (~ 20
>> seconds)
>> [...]
>> >
>> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
>> > 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> > cumulative_hitratio of 1.
>>
>>
>> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
>> removed as part of LUCENE-5666, causing these performance regressions.
>>
>> This code had been evolved over years to be very fast for specific use
>> cases.  No one facet algorithm is going to be optimal for everyone, so
>> it's important we have multiple.  But use of the UnInvertedField was
>> removed without any notification or discussion whatsoever (and
>> obviously no benchmarking), and was only discovered later by Solr devs
>> in SOLR-7190 that it was essentially dead code.
>>
>>
>> When I brought back my "JSON Facet API" work to Solr (which was based
>> on 4.10.x) it came with a heavily modified version of UnInvertedField
>> that is available via the JSON Facet API.  It might currently work
>> better for your usecase.
>>
>> On your normal (non-docValues) index, you can try something like the
>> following to see what the performance would be:
>>
>> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
>> json.facet={
>>   authors : { type:terms, field:author_facet, limit:30 },
>>   material_access : { type:terms, field:material_access, limit:30 },
>>   material_brief : { type:terms, field:material_brief, limit:30 },
>>   rvk : { type:terms, field:rvk_facet, limit:30 },
>>   lang : { type:terms, field:language, limit:30 },
>>   dept : { type:terms, field:department_3, limit:30 }
>> }'
>>
>> There were other changes in LUCENE-5666 that will probably slow down
>> faceting on the single valued fields as well (so this may still be a
>> little slower than 4.10.x), but hopefully it would be more
>> competitive.
>>
>> -Yonik
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

Re: Google didn't help on this one!

2015-09-15 Thread Yonik Seeley

On Tue, Sep 15, 2015 at 11:08 AM, Mark Fenbers  wrote:
> I'm working with the spellcheck component of Solr for the first time.  I'm
> using SolrJ, and when I submit my query, I get a Solr Exception:  "Expected
> mime type octet/stream but got text/html."
>
> What in the world is this telling me??

You're probably hitting an endpoint on Solr that doesn't exist and
getting an HTML 404 error page rather than the response (which would
be in binary by default).

An easy way to see what SolrJ is sending is to kill your solr server, then do

nc -l 8983

And then run your SolrJ program to see what it sends... if it look OK,
then try sending the request from curl to Solr.

-Yonik

Re: Issue while adding Long.MAX_VALUE to a TrieLong field

2015-09-10 Thread Yonik Seeley

On Thu, Sep 10, 2015 at 5:43 PM, Pushkar Raste  wrote:

Did you see my previous response to you today?
http://markmail.org/message/wt6db4ocqmty5a42

Try querying a different way, like from the command line using curl,
or from your browser, but not through the solr admin.

[...]
> My test case shows that MAX Value Solr can store without losing precision
> is  18014398509481982. This is equivalent to '2 ^53 - 1'  (Not really sure
> if this computation really means something).

53 happens to be the effective number of mantissa bits in a 64 bit
double precision floating point ;-)

-Yonik

Re: Issue while adding Long.MAX_VALUE to a TrieLong field

2015-09-10 Thread Yonik Seeley

On Thu, Sep 10, 2015 at 2:21 PM, Pushkar Raste  wrote:
> Hi,
> I am trying to following add document (value for price.long is
> Long.MAX_VALUE)
>
>   
> 411
> one
> 9223372036854775807
> 
>
> However upon querying my collection value I get back for "price.long" is
> 9223372036854776000

The value probably isn't actually rounded in solr, but in the client.
If you are looking at this from the admin console, then it's the
javascript there that is unfortunately rounding the displayed value.

http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript

https://issues.apache.org/jira/browse/SOLR-6364

We should really fix the admin somehow... this has bitten quite a few people.

-Yonik

Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley

>> This is part of a bigger issue we should work at doing better at for
>> Solr 6: debugability / supportability.
>> For a specific request, what took up the memory, what cache misses or
>> cache instantiations were there, how much request-specific memory was
>> allocated, how much shared memory was needed to satisfy the request,
>> etc.

Oh, and if we have the ability to *tell* when a request is going to
allocate a big chunk of memory,
then we should also be able to either prevent it from happening or
terminate the request shortly after.

So one could say, only allow this request to:
- cause 500MB more of shared memory to be allocated (like field cache)
- only allow it to use 5GB of shared memory total (so successive
queries don't keep upping the total amount allocated)
- only allow 100MB of request-specific memory to be allocated

-Yonik

Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley

On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitch
 wrote:
> Yonik,
>
> Is this all visible on query debug level?

Nope, unfortunately not.

This is part of a bigger issue we should work at doing better at for
Solr 6: debugability / supportability.
For a specific request, what took up the memory, what cache misses or
cache instantiations were there, how much request-specific memory was
allocated, how much shared memory was needed to satisfy the request,
etc.

-Yonik

Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley

On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes  wrote:
>
> I have a query like:
>
> q==enabled:true
>
> For purposes of this conversation, "fq=enabled:true" is set for every query, 
> I never open a new searcher, and this is the only fq I ever use, so the 
> filter cache size is 1, and the hit ratio is 1.
> The fq=enabled:true clause matches about 15% of my documents. I have some 20M 
> documents per shard, in a 5.3 solrcloud cluster.
>
> Under these circumstances, this alternate version of the query averages about 
> 1/3 faster, consumes less CPU, and generates less garbage:
>
> q= +enabled:true
>
> So it appears I have a case where using the cached fq result is more 
> expensive than just putting the same restriction in the query.
> Does someone have a clear mental model of how “q” and “fq” interact?

Lucene seems to always be changing it's execution model, so it can be
difficult to keep up.  What version of Solr are you using?
Lucene also changed how filters work,  so now, a filter is
incorporated with the query like so:

query = new BooleanQuery.Builder()
.add(query, Occur.MUST)
.add(pf.filter, Occur.FILTER)
.build();

It may be that term queries are no longer worth caching... if this is
the case, we could automatically not cache them.

It also may be the structure of the query that is making the
difference.  Solr is creating

(complicated stuff) +(filter(enabled:true))

If you added +enabled:true directly to an existing boolean query, that
may be more efficient for lucene to process (flatter structure).

If you haven't already, could you try putting parens around your
(complicated stuff) to see if it makes any difference?

-Yonik

Re: Difference between Legacy Facets and JSON Facets

2015-09-03 Thread Yonik Seeley

On Wed, Sep 2, 2015 at 2:44 PM, Toke Eskildsen  wrote:
> When incrementing counters for String faceting, segment ordinal -> index 
> ordinal mapping takes place. Legacy facets has a mechanism where temporary 
> segment-specific counters are used. These are updated directly with the 
> segment ordinals and the mapping to global ordinals is performed after the 
> counting.

Good point Toke,
That optimization won't work when there's other things to calculate
(or sort by), but I can detect the "counts-only" case and use it then.

-Yonik

Re: Difference between Legacy Facets and JSON Facets

2015-09-02 Thread Yonik Seeley

On Wed, Sep 2, 2015 at 1:19 AM, Zheng Lin Edwin Yeo
<edwinye...@gmail.com> wrote:
> The type of field is text_general.

What are some typical values for this "content" field (i.e. how many
different words does the content field contain for each document)?

-Yonik

> I found that the problem mainly happen in the content field of the
> collections with rich text document.
> It works fine for other files, and also collections indexed with CSV
> documents, even if the fieldType is text_general.
>
> Regards,
> Edwin
>
>
> On 2 September 2015 at 12:12, Yonik Seeley <ysee...@gmail.com> wrote:
>
>> On Tue, Sep 1, 2015 at 11:51 PM, Zheng Lin Edwin Yeo
>> <edwinye...@gmail.com> wrote:
>> > No, I've tested it several times after committing it.
>>
>> Hmmm, well something is really wrong for this orders of magnitude
>> difference.  I've never seen anything like that and we should
>> definitely try to get to the bottom of it.
>> What is the type of the field?
>>
>> -Yonik
>>

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Yonik Seeley

That's pretty strange...
There can be caching differences.  Is this the first time the request
is executed after a commit?
What does executing it again show?

-Yonik

On Tue, Sep 1, 2015 at 9:47 PM, Zheng Lin Edwin Yeo
<edwinye...@gmail.com> wrote:
> Hi Yonik,
>
> Thanks for pointing out the difference.
>
> I've made modification and tried with this below command for JSON Facet,
> but it is still having a QTime of 410, as compared to the Legacy Facet
> QTime of 22:
> http://localhost:8983/solr/collection1/select?q=paint={f:{field:content}}=0
>
> Is this the same as the Legacy Facet query of
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> <http://27.54.41.220:8983/edm/collection1/select?q=paint=true=content=0>
>  ?
>
>
> Regards,
> Edwin
>
>
> On 1 September 2015 at 23:24, Yonik Seeley <ysee...@gmail.com> wrote:
>
>> They aren't doing the same thing...
>>
>> The first URL is doing a straight facet on the content field.
>> The second URL is doing a facet on the content field and asking for an
>> additional statistic for each bucket.
>>
>> -Yonik
>>
>>
>> On Tue, Sep 1, 2015 at 11:08 AM, Zheng Lin Edwin Yeo
>> <edwinye...@gmail.com> wrote:
>> > I've tried the following commands and I found that the Legacy Faceting is
>> > actually much faster than JSON Faceting. Not sure why is this so, when
>> the
>> > document from this link http://yonik.com/solr-count-distinct/ states
>> that
>> > JSON Facets has a much lower request latency.
>> >
>> > (For Legacy Facet) - QTime: 22
>> >
>> > -
>> >
>> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
>> > <
>> http://27.54.41.220:8983/edm/collection1/select?q=paint=true=content=0
>> >
>> >
>> > (For JSON Facet) - QTime: 1128
>> >
>> > -
>> >
>> http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1
>> :"hll(id)"}}}=0
>> > <
>> http://27.54.41.220:8983/edm/collection1/select?q=paint=%7bf:%7btype:terms,field:content,facet:%7bstat1:%22hll(id)%22%7d%7d%7d=0
>> >
>> >
>> >
>> > Is there any problem with my URL for the JSON Facet?
>> >
>> >
>> > Regards,
>> >
>> > Edwin
>> >
>> >
>> >
>> > On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm using Solr 5.2.1, and I would like to find out, what is the
>> difference
>> >> between Legacy Facets and JSON Facets in Solr? I was told that JSON
>> Facets
>> >> has a much lesser Request Latency, but I couldn't find any major
>> difference
>> >> in speed. Or must we have a larger index in order to have any
>> significant
>> >> difference?
>> >>
>> >> Is there any significant advantage to use JSON Faceting command instead
>> of
>> >> Legacy Faceting command?
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>>

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Yonik Seeley

On Tue, Sep 1, 2015 at 11:51 PM, Zheng Lin Edwin Yeo
 wrote:
> No, I've tested it several times after committing it.

Hmmm, well something is really wrong for this orders of magnitude
difference.  I've never seen anything like that and we should
definitely try to get to the bottom of it.
What is the type of the field?

-Yonik

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Yonik Seeley

They aren't doing the same thing...

The first URL is doing a straight facet on the content field.
The second URL is doing a facet on the content field and asking for an
additional statistic for each bucket.

-Yonik


On Tue, Sep 1, 2015 at 11:08 AM, Zheng Lin Edwin Yeo
 wrote:
> I've tried the following commands and I found that the Legacy Faceting is
> actually much faster than JSON Faceting. Not sure why is this so, when the
> document from this link http://yonik.com/solr-count-distinct/ states that
> JSON Facets has a much lower request latency.
>
> (For Legacy Facet) - QTime: 22
>
> -
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> 
>
> (For JSON Facet) - QTime: 1128
>
> -
> http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1:"hll(id)"}}}=0
> 
>
>
> Is there any problem with my URL for the JSON Facet?
>
>
> Regards,
>
> Edwin
>
>
>
> On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi,
>>
>> I'm using Solr 5.2.1, and I would like to find out, what is the difference
>> between Legacy Facets and JSON Facets in Solr? I was told that JSON Facets
>> has a much lesser Request Latency, but I couldn't find any major difference
>> in speed. Or must we have a larger index in order to have any significant
>> difference?
>>
>> Is there any significant advantage to use JSON Faceting command instead of
>> Legacy Faceting command?
>>
>> Regards,
>> Edwin
>>

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-27 Thread Yonik Seeley

UnInvertingReader makes indexed fields look like docvalues fields.
The caching itself is still done in FieldCache/FieldCacheImpl
but you could perhaps wrap what is cached there to either screen out
stuff or construct a new entry based on the user.

-Yonik


On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson jej2...@gmail.com wrote:
 I think a custom UnInvertingReader would work as I could skip the process
 of putting things in the cache.  Right now in Solr 4.x though I am caching
 based but including the users authorities in the key of the cache so we're
 not rebuilding the UnivertedField on every request.  Where in 5.x is the
 object actually cached?  Will this be possible in 5.x?

 On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com wrote:

 The FieldCache has become implementation rather than interface, so I
 don't think you're going to see plugins at that level (it's all
 package protected now).

 One could either subclass or re-implement UnInvertingReader though.

 -Yonik


 On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com wrote:
  Also in this vein I think that Lucene should support factories for the
  cache creation as described @
  https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not endorsing
 the
  patch that is provided (I haven't even looked at it) just the concept in
  general.
 
  On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  That makes sense, then I could extend the SolrIndexSearcher by creating
 a
  different factory class that did whatever magic I needed.  If you
 create a
  Jira ticket for this please link it here so I can track it!  Again
 thanks
 
  On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
  tomasflo...@gmail.com wrote:
 
  I don't think there is a way to do this now. Maybe we should separate
 the
  logic of creating the SolrIndexSearcher to a factory. Moving this logic
  away from SolrCore is already a win, plus it will make it easier to
 unit
  test and extend for advanced use cases.
 
  Tomás
 
  On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
   Sorry to poke this again but I'm not following the last comment of
 how I
   could go about extending the solr index searcher and have the
 extension
   used.  Is there an example of this?  Again thanks
  
   Jamie
   On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote:
  
I had seen this as well, if I over wrote this by extending
SolrIndexSearcher how do I have my extension used?  I didn't see a
 way
   that
could be plugged in.
On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
  mkhlud...@griddynamics.com
wrote:
   
On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com
 
   wrote:
   
 Thanks Mikhail.  If I'm reading the SimpleFacets class
 correctly,
  out
 delegates to DocValuesFacets when facet method is FC, what used
 to
  be
 FieldCache I believe.  DocValuesFacets either uses DocValues or
  builds
then
 using the UninvertingReader.

   
Ah.. got it. Thanks for reminding this details.It seems like even
docValues=true doesn't help with your custom implementation.
   
   

 I am not seeing a clean extension point to add a custom
UninvertingReader
 to Solr, would the only way be to copy the FacetComponent and
SimpleFacets
 and modify as needed?

Sadly, yes. There is no proper extension point. Also, consider
   overriding
SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
particular UninvertingReader is created, there you can pass the
 own
  one,
which refers to custom FieldCache.
   
   
 On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com
 wrote:

  Hello Jamie,
  I don't understand how it could choose DocValuesFacets (it
  occurs on
  docValues=true) field, but then switches to
UninvertingReader/FieldCache
  which means docValues=false. If you can provide more details
 it
   would
be
  great.
  Beside of that, I suppose you can only implement and inject
 your
  own
  UninvertingReader, I don't think there is an extension point
 for
   this.
 It's
  too specific requirement.
 
  On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson 
  jej2...@gmail.com
 wrote:
 
   as mentioned in a previous email I have a need to provide
  security
  controls
   at the term level.  I know that Lucene/Solr doesn't support
  this
   so
I
 had
   baked something onto a 4.x baseline that was sufficient for
 my
  use
 cases.
   I am now looking to move that implementation to 5.x and am
  running
into
  an
   issue around faceting.  Previously we were able to provide a
   custom
 cache
   implementation that would create separate cache entries
 given a
  particular
   set of security controls, but in Solr 5 some faceting is
  delegated
to
   DocValuesFacets which

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-27 Thread Yonik Seeley

On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe
tomasflo...@gmail.com wrote:
 I don't think there is a way to do this now. Maybe we should separate the
 logic of creating the SolrIndexSearcher to a factory.

That should probably be extended down to where lucene creates
searchers as well (delete-by-query).
Right now there's this hacky DeleteByQueryWrapper to handle wrapping
with UnInvertingReader.

-Yonik

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-27 Thread Yonik Seeley

The FieldCache has become implementation rather than interface, so I
don't think you're going to see plugins at that level (it's all
package protected now).

One could either subclass or re-implement UnInvertingReader though.

-Yonik


On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com wrote:
 Also in this vein I think that Lucene should support factories for the
 cache creation as described @
 https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not endorsing the
 patch that is provided (I haven't even looked at it) just the concept in
 general.

 On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com wrote:

 That makes sense, then I could extend the SolrIndexSearcher by creating a
 different factory class that did whatever magic I needed.  If you create a
 Jira ticket for this please link it here so I can track it!  Again thanks

 On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
 tomasflo...@gmail.com wrote:

 I don't think there is a way to do this now. Maybe we should separate the
 logic of creating the SolrIndexSearcher to a factory. Moving this logic
 away from SolrCore is already a win, plus it will make it easier to unit
 test and extend for advanced use cases.

 Tomás

 On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote:

  Sorry to poke this again but I'm not following the last comment of how I
  could go about extending the solr index searcher and have the extension
  used.  Is there an example of this?  Again thanks
 
  Jamie
  On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote:
 
   I had seen this as well, if I over wrote this by extending
   SolrIndexSearcher how do I have my extension used?  I didn't see a way
  that
   could be plugged in.
   On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
   wrote:
  
   On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
Thanks Mikhail.  If I'm reading the SimpleFacets class correctly,
 out
delegates to DocValuesFacets when facet method is FC, what used to
 be
FieldCache I believe.  DocValuesFacets either uses DocValues or
 builds
   then
using the UninvertingReader.
   
  
   Ah.. got it. Thanks for reminding this details.It seems like even
   docValues=true doesn't help with your custom implementation.
  
  
   
I am not seeing a clean extension point to add a custom
   UninvertingReader
to Solr, would the only way be to copy the FacetComponent and
   SimpleFacets
and modify as needed?
   
   Sadly, yes. There is no proper extension point. Also, consider
  overriding
   SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
   particular UninvertingReader is created, there you can pass the own
 one,
   which refers to custom FieldCache.
  
  
On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
wrote:
   
 Hello Jamie,
 I don't understand how it could choose DocValuesFacets (it
 occurs on
 docValues=true) field, but then switches to
   UninvertingReader/FieldCache
 which means docValues=false. If you can provide more details it
  would
   be
 great.
 Beside of that, I suppose you can only implement and inject your
 own
 UninvertingReader, I don't think there is an extension point for
  this.
It's
 too specific requirement.

 On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson 
 jej2...@gmail.com
wrote:

  as mentioned in a previous email I have a need to provide
 security
 controls
  at the term level.  I know that Lucene/Solr doesn't support
 this
  so
   I
had
  baked something onto a 4.x baseline that was sufficient for my
 use
cases.
  I am now looking to move that implementation to 5.x and am
 running
   into
 an
  issue around faceting.  Previously we were able to provide a
  custom
cache
  implementation that would create separate cache entries given a
 particular
  set of security controls, but in Solr 5 some faceting is
 delegated
   to
  DocValuesFacets which delegates to UninvertingReader in my case
  (we
   are
 not
  storing DocValues).  The issue I am running into is that before
  5.x
   I
had
  the ability to influence the FieldCache that was used at the
 Solr
   level
 to
  also include a security token into the key so each cache entry
 was
scoped
  to a particular level.  With the current implementation the
   FieldCache
  seems to be an internal detail that I can't influence in
 anyway.
  Is
this
  correct?  I had noticed this Jira ticket
  https://issues.apache.org/jira/browse/LUCENE-5427, is there
 any
movement
  on
  this?  Is there another way to influence the information that
 is
  put
into
  these caches?  As always thanks in advance for any suggestions.
 
  -Jamie
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid

Re: StrDocValues

2015-08-27 Thread Yonik Seeley

On Thu, Aug 27, 2015 at 2:43 PM, Erick Erickson erickerick...@gmail.com wrote:
 Right, when scoring any document that scores 0 is removed from the
 results

Just to clarify, I think Jamie removed 0 scoring documents himself.

Solr has never done this itself.  Lucene used to a long time ago and
then stopped IIRC.

-Yonik

Re: StrDocValues

2015-08-26 Thread Yonik Seeley

On Wed, Aug 26, 2015 at 6:20 PM, Jamie Johnson jej2...@gmail.com wrote:
 I don't see it explicitly mentioned, but does the boost only get applied to
 the final documents/score that matched the provided query or is it called
 for each field that matched?  I'm assuming only once per document that
 matched the main query, is that right?

Correct.

-Yonik

Re: Search opening hours

2015-08-25 Thread Yonik Seeley

On Tue, Aug 25, 2015 at 5:02 PM, O. Klein kl...@octoweb.nl wrote:
 I'm trying to find the best way to search for stores that are open NOW.

It's probably not the *best* way, but assuming it's currently 4:10pm,
you could do

+open:[* TO 1610] +close:[1610 TO *]

And to account for days of the week have different fields for each day
openM, closeM, openT, closeT, etc...  not super elegant, but seems to
get the job done.

-Yonik

Re: Performance improvements

2015-08-24 Thread Yonik Seeley

On Mon, Aug 24, 2015 at 6:33 PM, naga sharathrayapati
sharathrayap...@gmail.com wrote:
 In order to improve the query time of nested faceting query (json facet
 api), have used 'docValues' in the schema,optimized index and increased
 cache sizes(no evictions)

 I still cannot be bring the query time to less than 1 sec.

 is there anything that i can do that can improve the performance?

What does the json facet request look like?

If you haven't already, you could try 5.3 as well... it could help if
you're calculating any facet functions that you aren't sorting by.

http://yonik.com/download/

-Yonik

Re: caches with faceting

2015-08-21 Thread Yonik Seeley

On Thu, Aug 20, 2015 at 3:46 PM, Kiran Sai Veerubhotla
sai.sq...@gmail.com wrote:
 i have used json facet api and noticed that its relying heavily on filter
 cache.

Yes.  The root domain (the set of documents that match the base query
and filters) is cached in the filter cache.
For sub-facets, the set of documents that matches a particular bucket
also utilizes the filter cache.

 index is optimized and all my fields are with docValues='true'  and the
 number of documents are 2.6 million and always faceting on almost all the
 documents with 'fq'

 the size of documentCache and queryResultCache are very minimal  10 ? is
 it ok ? i understand that documentCache stores the documents that are
 fetched from disk(segment merged) and the size is set to 2000

If your document size is large at all, you could probably reduce the
size of the doc cache with little impact.

 fieldCache is always zero is it because of docValues?

Right.

 ver 5.2.1

Version 5.3 is out now.  The official latest version link hasn't
been changed yet, but I maintain a list of download links for
different versions here:
http://yonik.com/download/

-Yonik

Re: Disable caching

2015-08-19 Thread Yonik Seeley

On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com wrote:
 Hmm...so I think I have things setup correctly, I have a custom
 QParserPlugin building a custom query that wraps the query built from the
 base parser and stores the user who is executing the query.  I've added the
 username to the hashCode and equals checks so I think everything is setup
 properly.  I ran a quick test and it definitely looks like my items are
 being cached now per user, which is really great.

 The outage that I'm running into now is the FieldValueCache doesn't take
 into account the query, so the FieldValueCache is built for user a and then
 reused for user b, which is an issue for me.  In short I'm back to my
 NoOpCache for FieldValues.  It's great that I'm in a better spot for the
 others, but is there anything that can be done with FieldValues to take
 into account the requesting user?

I guess a cache implementation that gets the user through a thread
local and either wraps the original key with an object containing the
user, or delegates to a per-user cache underneath.

-Yonik

Re: Cache

2015-08-19 Thread Yonik Seeley

On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath sharathrayap...@gmail.com wrote:
 Trying to evaluate the performance of queries with and without cache

Yeah, so to try and see how much a specific type of query costs, you can use
{!cache=false}

But I've seen some people trying to benchmark the performance of the
*system* with caching disabled, and that's not really a valid way to
go about it.

-Yonik



 On 18-Aug-2015, at 11:30 am, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati
 sharathrayap...@gmail.com wrote:
 Is it possible to clear the cache through query?

 I need this for performance valuation.

 No, but you can prevent a query from being cached:
 q={!cache=false}my query

 What are you trying to test the performance of exactly?
 If you think queries will be highly unique, the best way of testing is
 to make your test queries highly unique (for example, adding a random
 number in the mix) so that the hit rate on the query cache won't be
 unrealistically high.

 -Yonik

Re: Disable caching

2015-08-18 Thread Yonik Seeley

On Tue, Aug 18, 2015 at 9:51 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks, I'll try to delve into this.  We are currently using the parent
 query parser, within we could use {!secure} I think.  Ultimately I would
 want the solr qparser to actually do the work of parsing and I'd just wrap
 that.

Right... look at something like BoostQParserPlugin
it should be trivial to wrap any other type of query.

baseParser = subQuery(localParams.get(QueryParsing.V), null);
Query q = baseParser.getQuery();

q={!secure}my_normal_query
OR
q={!secure v=$qq)qq=my_normal_query
OR
q={!secure}{!parent ...}
OR
q={!secure v=$qq}qq={!parent. ..}

-Yonik


  Are there any examples that I could look at for this?  It's not
 clear to me what to do in the qparser once I have the user auths though.
 Again thanks, this is really good stuff.
 On Aug 18, 2015 8:54 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote:
  I really like this idea in concept.  My query would literally be just a
  wrapper at that point, what would be the appropriate place to do this?

 It depends on how much you are trying to make everything transparent
 (that there is security) or not.

 First approach is explicitly changing the query types (you obviously
 need to make sure that only trusted code can run queries against solr
 for this method):
 q=foo:barfq=inStock:true
 q={!secure id=user}foo:barfq={!secure id=user}inStock:true
   you could even make the {!secure} qparser look for global security
 params so you don't need to repeat them.
 q={!secure}foo:barfq={!secure}inStock:truesecurity_id=user

 Second approach would prob involve a search component, probably that
 runs after the query component, that would handle wrapping any queries
 or filters in the prepare() phase.  This would be slightly more
 difficult since it would require ensuring that none of the solr code /
 features you use re-grab the q or fq parameters re-parse without
 the opportunity for you to wrap them again.

  What would I need to do to the query to make it behave with the cache.

 Probably not much... record the credentials in the wrapper and use in
 the hashCode / equals.

 -Yonik


  Again thanks for the idea, I think this could be a simple way to use the
  caches.
 
  On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote:
 
  On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com
 wrote:
   when you say a security filter, are you asking if I can express my
  security
   constraint as a query?  If that is the case then the answer is no.  At
  this
   point I have a requirement to secure Terms (a nightmare I know).
 
  Heh - ok, I figured as much.
 
  So... you could also wrap the main query and any filter queries in a
  custom security query that would contain the user, and thus still be
  able to use filter and query caches unmodified. I know... that's only
  a small part of the problem though.
 
  -Yonik

Re: Disable caching

2015-08-18 Thread Yonik Seeley

On Tue, Aug 18, 2015 at 7:11 PM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, my use case is security.  Basically I am executing queries with
 certain auths and when they are executed multiple times with differing
 auths I'm getting cached results.

If it's just simple stuff like top N docs returned, can't you just use
a security filter?

The queryResult cache uses both the main query and a list of filters
(and the sort order) for the cache key.

-Yonik

Re: Disable caching

2015-08-18 Thread Yonik Seeley

On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote:
 when you say a security filter, are you asking if I can express my security
 constraint as a query?  If that is the case then the answer is no.  At this
 point I have a requirement to secure Terms (a nightmare I know).

Heh - ok, I figured as much.

So... you could also wrap the main query and any filter queries in a
custom security query that would contain the user, and thus still be
able to use filter and query caches unmodified. I know... that's only
a small part of the problem though.

-Yonik

Re: Disable caching

2015-08-18 Thread Yonik Seeley

You can comment out (some) of the caches.

There are some caches like field caches that are more at the lucene
level and can't be disabled.

Can I ask what you are trying to prevent from being cached and why?
Different caches are for different things, so it would seem to be an
odd usecase to disable them all.  Security?

-Yonik


On Tue, Aug 18, 2015 at 6:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 I see that if Solr is in realtime mode that caching is disable within the
 SolrIndexSearcher that is created in SolrCore, but is there anyway to
 disable caching without being in realtime mode?  Currently I'm implementing
 a NoOp cache that implements SolrCache but returns null for everything and
 doesn't return anything on the get requests, but it would be nice to not
 need to do this by being able to disable caching in general.  Is this
 possible?

 -Jamie

Re: Disable caching

2015-08-18 Thread Yonik Seeley

On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote:
 I really like this idea in concept.  My query would literally be just a
 wrapper at that point, what would be the appropriate place to do this?

It depends on how much you are trying to make everything transparent
(that there is security) or not.

First approach is explicitly changing the query types (you obviously
need to make sure that only trusted code can run queries against solr
for this method):
q=foo:barfq=inStock:true
q={!secure id=user}foo:barfq={!secure id=user}inStock:true
  you could even make the {!secure} qparser look for global security
params so you don't need to repeat them.
q={!secure}foo:barfq={!secure}inStock:truesecurity_id=user

Second approach would prob involve a search component, probably that
runs after the query component, that would handle wrapping any queries
or filters in the prepare() phase.  This would be slightly more
difficult since it would require ensuring that none of the solr code /
features you use re-grab the q or fq parameters re-parse without
the opportunity for you to wrap them again.

 What would I need to do to the query to make it behave with the cache.

Probably not much... record the credentials in the wrapper and use in
the hashCode / equals.

-Yonik


 Again thanks for the idea, I think this could be a simple way to use the
 caches.

 On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote:
  when you say a security filter, are you asking if I can express my
 security
  constraint as a query?  If that is the case then the answer is no.  At
 this
  point I have a requirement to secure Terms (a nightmare I know).

 Heh - ok, I figured as much.

 So... you could also wrap the main query and any filter queries in a
 custom security query that would contain the user, and thus still be
 able to use filter and query caches unmodified. I know... that's only
 a small part of the problem though.

 -Yonik

Re: Cache

2015-08-18 Thread Yonik Seeley

On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati
sharathrayap...@gmail.com wrote:
 Is it possible to clear the cache through query?

 I need this for performance valuation.

No, but you can prevent a query from being cached:
q={!cache=false}my query

What are you trying to test the performance of exactly?
If you think queries will be highly unique, the best way of testing is
to make your test queries highly unique (for example, adding a random
number in the mix) so that the hit rate on the query cache won't be
unrealistically high.

-Yonik

Re: Solr Caching (documentCache) not working

2015-08-17 Thread Yonik Seeley

On Mon, Aug 17, 2015 at 4:36 PM, Daniel Collins danwcoll...@gmail.com wrote:
 we had to turn off
 ALL the Solr caches (warming is useless at that kind of frequency

Warming and caching are related, but different.  Caching still
normally makes sense without warming, and Solr is generally written
with the assumption that caches are present.

-Yonik

Re: SolrCloud Shard Order Hash Keys

2015-08-17 Thread Yonik Seeley

On Mon, Aug 17, 2015 at 8:00 PM, Sathiya N Sundararajan
ausat...@gmail.com wrote:
 Folks:

 Question regarding SolrCloud Shard Number (Ex: shardx)  associated hash
 ranges. We are in the process of identifying the best strategy to merge
 shards that belong to collections that are chronologically older which sees
 very low volume of searches compared to the collections with most recent
 data.

 What we ran into is that often times we find that Shard numbers are hash
 ranges don’t necessarily correlate:

 shard1: 8000-aaa9
 shard2: -d554
 shard3: d555- ( holds the last range )
 shard4: 0-2aa9 ( holds the starting range )
 shard5: 2aaa-5554
 shard6: -7fff


It's not really clear what you mean by correlate... but I think
there are 2 different points to make:
1) This is the hex representation of a signed integer, so 8000 is
the start of the complete hash range, and 7fff is the end.
2) The numbers in shard1, shard2, etc, names are meaningless... just
names like shard_foo and shard_bar.  They do not need to be ordered in
any way with respect to each other.

-Yonik

 same goes for 'core_nodex’ that does not follow order neither it
 correlates with shardx. Meaning core_node1 does not contain the keys
 starting from 0 nor does it map to shard1.

 {shard1=
   {range=8000-aaa9,
 {core_node5=
   core=post_NW_201508_shard1_replica1,
   shard2=
 {range=-d554,
   {core_node6=
 core=post_NW_201508_shard2_replica1,
   shard3=
 {range=d555-,
   {core_node2=
 core=post_NW_201508_shard3_replica1,
   shard4=
 {range=0-2aa9,
   {core_node3=
 core=post_NW_201508_shard4_replica1,
   shard5=
 {range=2aaa-5554,
   {core_node4=
 core=post_NW_201508_shard5_replica1,
   shard6=
 {range=-7fff,
   {core_node1=
 core=post_NW_201508_shard6_replica1


 Why would this be a concern ?

1. Lets say if we merge the indexes of adjacent shards (to reduce the
number of shards in the collection). In this case it will be merging
core_node3: 0-2aa9”  core_node4: 2aaa-5554” . What would the
index of the new core_node directory ? core_node?
2. When we copy this data over to the cluster after recreating the
collection with reduced number of shards, how would the cluster infer the
hash range from the index data or how does it reconcile with the metadata
about the shards in the local filesystem of cluster nodes.
3. How should we approach this problem to guarantee Solr picks up the
right key order from the merged indexes ?



 *Solr 4.4*
 *HDFS for Index Storage*

Re: SOLR to pivot on date range query

2015-08-17 Thread Yonik Seeley

The JSON Facet API can embed any type of facet within any other type:
http://yonik.com/json-facet-api/

json.facet={
  dates : {
type : range,
field : entryDate,
start : 2001-...,  // use full solr date format
end : 2015...,
gap : +1MONTH,
facet : {
  type:terms,
  field:entryType
}
  }
}

-Yonik


On Mon, Aug 17, 2015 at 3:16 PM, Lewin Joy (TMS) lewin_...@toyota.com wrote:
 Hi,

 I have data that is coming in everyday. I need to query the index for a time 
 range and give the facet counts ordered by different months.
 For this, I just have a solr date field, entryDate which captures the time.

 How do I make this query? I need the results like below.

 Jan-2015 (2000)
 entryType=Sales(750)
 entryType=Complaints(200)
 entryType=Feedback(450)
 Feb-2015(3200)
 entryType=Sales(1000)
 entryType=Complaints(250)
 entryType=Feedback(600)
 Mar-2015(2800)
 entryType=Sales(980)
 entryType=Complaints(220)
 entryType=Feedback(400)


 I tried Range queries on 'entryDate' field to order the result facets by 
 month.
 But, I am not able to pivot on the 'entryType' field to bring the counts of 
 sales,complaints and feedback type record by month.

 For now, I am creating another field at index time to have the value for 
 MONTH-YEAR derived from the 'entryDate' field.
 But for older records, it becomes a hassle. Is there a way I can handle this 
 at query time?
 Or is there a better way to handle this situation?

 Please let me know. Any thoughts / suggestions are valuable.

 Thanks,
 Lewin

Re: docValues

2015-08-09 Thread Yonik Seeley

Interesting... what type of field was this? (string or numeric? single
or multi-valued?)

Without docValues, the first request would be slow (due to building
the in-memory field cache entry), but after that it should be fast.

-Yonik


On Sun, Aug 9, 2015 at 11:31 AM, Nagasharath sharathrayap...@gmail.com wrote:
 I Have tested with docValue and without docValue on the test indexes with a 
 json nested faceting query.

 Have noticed performance boot with the docValue.The response time with Cached 
 items and without cached items is good.

 I have noticed that the response time on the cached items of the index 
 without docValue is not always constant (28 Ms, 78 Ms, 94 Ms). Where as with 
 docValue is always constant( always 20 Ms)

 Decided to go with docValue.

 On 08-Aug-2015, at 10:44 pm, Erick Erickson erickerick...@gmail.com wrote:

 Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues?

 What kind of speedup? How often are you committing? Is there a speed 
 difference
 after a while or on the first few queries?

 Details matter a lot for questions like this.

 Best,
 Erick

 On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath sharathrayap...@gmail.com 
 wrote:
 Good

 Sent from my iPhone

 On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote:

 Hi,


 I am seeing a significant difference in the query time after using 
 docValue

 what kind of difference, is it good or bad?

 With Regards
 Aman Tandon

 On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com
 wrote:

 I am seeing a significant difference in the query time after using
 docValue.

 I am curious to know what's happening with 'docValue' included in the
 schema

 On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote:

 On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
 JVM-Memory has gone up from 3% to 17.1%

 In my experience, a healthy Java application (after the heap size has
 stabilized) will have a heap utilization graph where the low points are
 between 50 and 75 percent.  If the low points in heap utilization are
 consistently below 25 percent, you would be better off reducing the heap
 size and allowing the OS to use that memory instead.

 If you want to track heap utilization, JVM-Memory in the Solr dashboard
 is a very poor tool.  Use tools like visualvm or jconsole.

 https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 I need to add what I said about very low heap utilization to that wiki
 page.

 Thanks,
 Shawn

Re: SOLR Exception with SOLR Cloud 5.1 setup on Linux

2015-07-28 Thread Yonik Seeley

On Tue, Jul 28, 2015 at 6:54 PM, Shawn Heisey apa...@elyograg.org wrote:
 To get out of the hole you're in now, either build a new collection with
 the actual shard count that you want so it's correctly set up, or edit
 the clusterstate in zookeeper to change the hash range (change 8000
 to )

Actually, if you want a range that covers the entire 32 bit hash
space, it would be
8000-7fff  (hex representations of signed integers).

-Yonik

Re: serious JSON Facet bug

2015-07-24 Thread Yonik Seeley

On Fri, Jul 24, 2015 at 8:03 PM, Nagasharath sharathrayap...@gmail.com wrote:
 Is there a jira logged for this issue?

* SOLR-7781: JSON Facet API: Terms facet on string/text fields with
sub-facets caused
  a bug that resulted in filter cache lookup misses as well as the filter cache
  exceeding it's configured size. (yonik)

https://issues.apache.org/jira/browse/SOLR-7781

-Yonik

Re: serious JSON Facet bug

2015-07-23 Thread Yonik Seeley

On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo hyunat...@gmail.com wrote:
 Is there a way to patch? I am using 5.2.1 and using json facet in production.

First you should see if your queries tickle the bug...
check the size of the filter cache from the admin screen (under
plugins, filterCache)
and see if it's current size is larger than the configured maximum.

-Yonik


 On Jul 16, 2015, at 1:43 PM, Yonik Seeley ysee...@gmail.com wrote:

 To anyone using the JSON Facet API in released Solr versions:
 I discovered a serious memory leak while doing performance benchmarks
 (see http://yonik.com/facet_performance/ for some of the early results).

 Assuming you're in the evaluation / development phase of your project,
 I'd recommend using a recent developer snapshot for now:
 https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/

 The fix (and performance improvements) will also be in the next Solr
 release (5.3) of course.

 -Yonik

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-21 Thread Yonik Seeley

On Tue, Jul 21, 2015 at 3:09 AM, Ali Nazemian alinazem...@gmail.com wrote:
 Dear Erick,
 I found another thing, I did check the number of unique terms for this
 field using schema browser, It reported 1683404 number of terms! Does it
 exceed the maximum number of unique terms for fcs facet method?

The real limit is not simple since the data is not stored in a simple
way (it's compressed).

 I read
 somewhere it should be more than 16m does it true?!

More like 16MB of delta-coded terms per block of documents (the index
is split up into 256 blocks for this purpose)

See DocTermOrds.java if you want more details than that.

-Yonik

serious JSON Facet bug

2015-07-16 Thread Yonik Seeley

To anyone using the JSON Facet API in released Solr versions:
I discovered a serious memory leak while doing performance benchmarks
(see http://yonik.com/facet_performance/ for some of the early results).

Assuming you're in the evaluation / development phase of your project,
I'd recommend using a recent developer snapshot for now:
https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/

The fix (and performance improvements) will also be in the next Solr
release (5.3) of course.

-Yonik

Re: FieldCache error for multivalued fields in json facets.

2015-07-13 Thread Yonik Seeley

On Mon, Jul 13, 2015 at 1:55 AM, Iana Bondarska yana2...@gmail.com wrote:
 Hi,
 I'm using json query api for solr 5.2. When query for metrics for
 multivalued fields, I get error:
 can not use FieldCache on multivalued field: sales.

 I've found in solr wiki that to avoid using fieldcache I should set
 facet.method parameter to enum.
 Now my question is how can I add facet.enum parameter to query?
 My original query looks like this:
 {limit:0,offset:0,facet:{facet:{facet:{mechanicnumbers_sum:sum(sales)},limit:0,field:brand,type:terms}}}

sum(field) is currently only implemented for single-valued numeric fields.
Can you make the sales field single-valued, or do you actually need
multiple values per document?

-Yonik

Re: Too many Soft commits and opening searchers realtime

2015-07-08 Thread Yonik Seeley

A realtime searcher is necessary for internal bookkeeping / uses if a
normal searcher isn't opened on a commit.
This searcher doesn't have caches and hence doesn't carry the weight
that a normal searcher would.  It's also invisible to clients (it
doesn't change the view of the index for normal searches).

Your hard autocommit at 8 minutes with openSearcher=false will trigger
a realtime searcher to open on every 8 minutes along with the hard
commit.

-Yonik


On Tue, Jul 7, 2015 at 5:29 PM, Summer Shire shiresum...@gmail.com wrote:
 HI All,

 Can someone help me understand the following behavior.
 I have the following maxTimes on hard and soft commits

 yet I see a lot of Opening Searchers in the log
 org.apache.solr.search.SolrIndexSearcher- Opening Searcher@1656a258[main] 
 realtime
 also I see a soft commit happening almost every 30 secs
 org.apache.solr.update.UpdateHandler - start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 autoCommit
 maxTime48/maxTime
 openSearcherfalse/openSearcher
 /autoCommit

 autoSoftCommit
 maxTime18/maxTime
 /autoSoftCommit
 I tried disabling softCommit by setting maxTime to -1.
 On startup solrCore recognized it and logged Soft AutoCommit: disabled
 but I could still see softCommit=true
 org.apache.solr.update.UpdateHandler - start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 autoSoftCommit
 maxTime-1/maxTime
 /autoSoftCommit

 Thanks,
 Summer

Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-07-08 Thread Yonik Seeley

On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey apa...@elyograg.org wrote:
 After the fix (with luceneMatchVersion at 4.9), both aaa and bbb end
 up at position 2.

Yikes, that's definitely wrong.

-Yonik

Re: Tlog replay

2015-07-08 Thread Yonik Seeley

On Wed, Jul 8, 2015 at 12:31 PM, Summer Shire shiresum...@gmail.com wrote:
 Thanks Alessandro !

 Any idea on why I couldn't curl the solr core and pass the flag param ?

These flags are for internal use only.  Solr sets them, the client doesn't.

-Yonik

Re: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-06 Thread Yonik Seeley

Are you running with the stock Jetty-based server or did you configure
your own servlet container / config?  Any plugins / extensions to
Solr?

It would be odd for FastLRUCache to be involved - I don't think that
code has changed between 4.8.1 and 4.10.4

-Yonik


On Thu, Jul 2, 2015 at 3:36 PM, Ronald Wood rw...@smarsh.com wrote:

 We are running into an issue when doing distributed queries on Solr 4.10.4. 
 We do not use SolrCloud but instead keep track of shards that need to be 
 searched based on date ranges.

 We have been running distributed queries without incident for several years 
 now, but we only recently upgraded to 4.10.4 from 4.8.1.

 The query is relatively simple and involves 4 shards, including the 
 aggregator itself.

 For a while the server that is acting as the aggregator for the distributed 
 query handles the requests fine, but after an indefinite amount of usage (in 
 the range of 2-4 hours) it starts hanging on all distributed queries while 
 serving non-distributed versions  (no shards list is included) of the same 
 query quickly (9 ms).

 CPU, Heap and System Memory Usage do not seem unusual compared to other 
 servers.

 I had initially suspect that distributed searches combined with faceting 
 might be part of the issue, since I had seen some long-running threads that 
 seemed to spend a long time in the FastLRUCache when getting facets for a 
 single field. However, in the latest case of blocked queries, I am not seeing 
 that.

 We have two slaves that replicate from a master, and we were saw the issue 
 recur after a while of client usage, ruling out a hardware issue.

 Does anyone have any suggestions for potential avenues of attack for getting 
 to the bottom of this? Or are there any known issues that could be implicated 
 in this?

 - Ronald S. Wood

Re: fq versus q

2015-06-24 Thread Yonik Seeley

Why is cache=false set for the filter?
Grouping uses a 2 pass algorithm by default, so that means that the
filter will need to be generated twice (I think) if caching is turned
off.

Also, when you try to use the fq version, what are you using for the
main query?

-Yonik


On Wed, Jun 24, 2015 at 7:28 AM, Esther Goldbraich
estherg...@il.ibm.com wrote:
 Hi,

 We are comparing the performance of fq versus q for queries that are
 actually filters and should not be cached.
 In part of queries we see strange behavior where q performs 5-10x better
 than fq. The question is why?

 An example1:
 q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1
 to DATE2}
 sort=maildate_sort* desc
 rows=50
 start=0
 group=true
 group.query=some query (without dates)
 group.query=*:*
 group.sort=maildate_sort desc
 additional fqs

 Schema:
 field name=maildate stored=true indexed=true type=tdate/
 field name=maildate_sort stored=false indexed=false type=tdate
 docValues=true/

 Thank you,
 Esther
 -
 Esther Goldbraich
 Social Technologies  Analytics - IBM Haifa Research Lab
 Phone: +972-4-8281059

Re: Multivalued fields order of storing is guaranteed ?

2015-06-17 Thread Yonik Seeley

On Wed, Jun 17, 2015 at 6:44 AM, Alok Bhandari
alokomprakashbhand...@gmail.com wrote:
 Is it guaranteed that stored multivalued fields maintain order of insertion.

Yes.

-Yonik

Re: Parent/Child (Nested Document) Faceting

2015-06-15 Thread Yonik Seeley

On Mon, Jun 15, 2015 at 10:24 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 So why in both cases we express the parent type ?

 ( Note that regardless
 of which direction we are mapping (parents to children or children to
 parents) we provide a query that defines the complete set of parents in the
 index. In these examples, the parent filter is “type_s:book”. )

 Is this necessary for implementation reasons ? Is there to prevent to use
 other parent -children relations ? Why we don't specify the children type ?

That's just an implementation detail that separates the parents from
the children.
It's the way the original block join queries worked, so I just kept that part.
One could easily pass children and then assume that anything that
isn't marked as a child is a parent, but it would be different code to
implement that.

In the future, we may index more information by default when you index
nested child docs that would normally make specifying the parent
filter optional.

 Example:
 *Parent types* : book
 *Children types* : user_review, official_review

 Assuming we have in the same index this data.
 In two how can we distinguish between the 2 types of children ?

Today, it's up to whoever is indexing the nested documents to add type
information (like type:book, type:review).  There are no requirements
on how this is done.  In the example above, you could have
type:user_review and type:official_review or you could keep the type
as review and add an additional isOfficial:true/false to
distinguish.

Then if we're mapping from children to parents, it's the
responsibility of the base query and filters (or however the facet
domain got created) to limit to one child type if they want.

For example, a filter might be fq=isOfficial:true of you are only
querying official reviews

*But* for mapping from parents to children, you've quickly identified
a weakness that is on my TODO list ;-)
Currently all children (and grandchildren, if multi-leveled) will be
selected when using blockChildren.  We need a filter (or filters) to
apply after the transition to screen out only those children we are
interested in *before* we calculate the facets.  That would include
type info (in the case of multiple child types), but wouldn't be
limited to that.

-Yonik

Parent/Child (Nested Document) Faceting

2015-06-13 Thread Yonik Seeley

Hey Folks, I'd love some feedback on the interface for nested document
faceting (or rather switching facet domains to/from parent/child).

See the bottom of this blog:
http://yonik.com/solr-nested-objects/

Issue #1: How to specify that one should change domains before faceting?

I originally started out with a new facet type (like query facet, but
switches domains).
So if you started out querying a child of type book, you would first
do a blockParent facet to map the domain to parents, and then put
the actual facet you wanted as a sub-facet.

q=book_review:xx  /* query some child-doc of book */
json.facet=
  {  // NOTE: this was my first pass... not the current interface
books : {
  type: blockParent,
  parentFilter : type:book
  facet : {
authors : {
  type : terms,
  field : author
}
 }
  }

Although having a separate facet type to map domains is logically very
clean, it does introduce an additional level of indentation which may
not be desired.

So then I thought about including domain switching operations under a
domain directive in the facet itself:

json.facet=
{  // current form a domain switching facet
  authors : {
type: terms,
field: author,
domain : {blockParent:type:book}
  }
}

I envision some future other options for domain including the
ability to reset the domain with another query (ignoring your parent
domain), or adding additional filters to the domain before faceting,
or normal (non-block) joins.

Issue #2: Naming

I avoided toParent and toChild because people cloud be confused that
it would work on any sort of parent/child relationship (i.e. other
than nested documents).

I used blockParent and blockChildren because I was thinking about
block join.
One alternative that might be better could be nested (i.e. nestedParent).

Pluralization:
I picked the singular for blockParent and plural for blockChildren
since a single block as one parent and multiple children.  But you
could think about it in other ways since we're mapping a set of
documents at a time (i.e. both could be pluralized).

Options:
nestedParent, nestedChildren   // current option
nestedParents, nestedChildren // both plural
nestedChild, nestedParent// both singular

Feedback appreciated!

-Yonik

Re: Division with Stats Component when Grouping in Solr

2015-06-13 Thread Yonik Seeley

On Fri, Jun 12, 2015 at 10:30 AM, kingofhypocrites
kingofhypocri...@gmail.com wrote:
 I am migrating a database from SQL Server to Cassandra. Currently I have a
 setup as follows:

 - Log data in Cassandra
 - Summarize data in Spark and put into Cassandra summary tables
 - Query data in Solr

 Everything fits beautifully until I need to do stats on groups. I am hoping
 to get this to work with Solr so I can stick to one database, but I am not
 sure it's possible.

 If I had it in SQL Server, I could do it like so:
 SELECT
 site_id,
 keyword,
 SUM(visits) as visits,
 CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate,
 SUM(pageviews) as pageviews,
 CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as
 avg_pages_per_visit
 FROM
 report_all_keywords_daily
 WHERE
 site_id = 55 AND date_key = '20150606' AND date_key = '20150608'
 GROUP BY
 site_id, keyword
 ORDER BY visits DESC

This is the closest we can get with the JSON Facet API today:

json.facet={
  sites: {
type : terms,
field : site_id,
sort : visits desc,
facet : {
  visits : sum(visits),
  bounces : sum(bounces),
  pageviews : sum(pageviews)
}
  }
}

That doesn't take into account keyword when sorting the buckets.
You could nest a ketword facet inside a site facet and thus calculate
the stats for the top N keywords per site:

json.facet={
  sites: {
type : terms,
field : site_id,
facet : {
  keywords: {
   type : terms,
   field : keyword,
   sort : visits desc,
   facet : {
  visits : sum(visits),
  bounces : sum(bounces),
  pageviews : sum(pageviews)
  }
 }
  }
}

More info here:  http://yonik.com/json-facet-api/

-Yonik

Lucene/Solr Revolution 2015 Voting

2015-06-11 Thread Yonik Seeley

Hey Folks,

If you're interested in going to Lucene/Solr Revolution this year in Austin,
please vote for the sessions you would like to see!

https://lucenerevolution.uservoice.com/

-Yonik

Re: Multivalued OR query with equal score/rankings when any one value matches

2015-05-23 Thread Yonik Seeley

On Sat, May 23, 2015 at 1:29 PM, Troy Collinsworth
troycollinswo...@gmail.com wrote:
 While trying to query a multivalued String field for multiple values, when
 any one value matches the score is higher for the lower value and lower for
 the higher. I swapped the value order and it had no affect so it isn't
 positional. I want the score to be the same irrespective of the value
 matched. I also still want the score highest when both values match.

It's a bit cumbersome, but you can make each clause a constant score query.
http://yonik.com/solr/query-syntax/#ConstantScoreQuery

userIds:890^=1 userIds:931^=1
or I think the following should work as well:
userIds:(890^=1 931^=1)

-Yonik

Re: Confused about whether Real-time Gets must be sent to leader?

2015-05-21 Thread Yonik Seeley

On Thu, May 21, 2015 at 3:15 PM, Timothy Potter thelabd...@gmail.com wrote:
 I'm seeing that RTG requests get routed to any active replica of the
 shard hosting the doc requested by /get ... I was thinking only the
 leader should handle that request since there's a brief window of time
 where the latest update may not be on the replica (albeit usually very
 brief) and the latest update is definitely on the leader.

There are different levels of consistency.
You are guaranteed that after an update completes, a RTG will retrieve
that version of the update (or later).
The fact that a replica gets the update after the leader is not
material to this guarantee since the update has not yet completed.

What can happen is that if you are doing multiple RTG requests, you
can see a later version of a document, then see a previous version
(because you're hitting different shards).  This will only be an issue
in certain types of use-cases.  Optimistic concurrency, for example,
will *not* be bothered by this phenomenon.

In the past, we've talked about an option to route search requests to
the leader.  But really, any type of server affinity would work to
ensure a monotonic view of a document's history.  Off the top of my
head, I'm not really sure what types of apps require it, but I'd be
interested in hearing about them.

-Yonik

Re: please confirm: pseudo join queries can only be performed on fields of exactly the same type

2015-05-18 Thread Yonik Seeley

They should not have to be *exactly* the same type... just compatible
types such that the indexed tokens match.
When you used the keywordTokenizer, was there other analysis such as
lowercasing going on?

-Yonik


On Mon, May 18, 2015 at 10:26 AM, Matteo Grolla matteo.gro...@gmail.com wrote:
 Hi,
 I tried performing a join query
 {!join from=fA to=fB}
 where fA was string and fB was text  using keywordTokenizer

 it doesn't work, but it does if either fields are both string or both 
 text.

 If you confirm this is the correct behavior I'll update the wiki

 thanks

Re: Solr 5.1 json facets: buckets are empty for TrieIntField

2015-05-15 Thread Yonik Seeley

That was previously found and fixed - can you try a recent nightly build?
https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
-Yonik


On Fri, May 15, 2015 at 4:04 AM, Andrii Berezhynskyi
andrii.berezhyns...@home24.de wrote:
 I have a strange issue of facet buckets being empty for tint fields.
 I have the following schema:

 fieldType name=tint class=solr.TrieIntField sortMissingLast=true
 omitNorms=true/

 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

 fieldType name=tfloat class=solr.TrieFloatField sortMissingLast=true
 omitNorms=true/

 ...

 field name=price type=tint indexed=true stored=false
 multiValued=false/

 field name=width type=tfloat indexed=true stored=false
 multiValued=true/

 dynamicField name=* type=string indexed=true stored=false
 multiValued=true/

 Then I just import:

 [{
 sku: TEST_FACET,
 color: yellow,
 width: 100.23,
 price: 1200
 }]

 when I do the following faceting request:

 json.facet={

 colors:{terms: {field:color}},

 width:{terms: {field:width}},

 price:{terms: {field:price}}

 }

 I get empty buckets for price (tint):

 facets:{ count:1,

   colors:{ buckets:[{ val:yellow, count:1}]},

   width:{ buckets:[{ val:100.23, count:1}]},

   price:{ buckets:[]}}}

 Is somebody else able to reproduce this issue?

 Best regards,
 Andrii

Re: Real-Time get and Dynamic Fields: possible bug.

2015-05-14 Thread Yonik Seeley

Are the _facet fields the target of a copyField in the schema?
Realtime get either gets the values from the transaction log (and if
you didn't send it the values, they won't be there) or gets them from
the index to try and reconstruct what was sent in.

It's generally not recommended to have copyField targets stored, or
have a mix of explicitly set values and copyField values in the same
field.

-Yonik

On Thu, May 14, 2015 at 7:17 AM, Luis Cappa Banda luisca...@gmail.com wrote:
 Hi there,

 I have the following dynamicFields definition in my schema.xml:


 !-- I18n DynamicFields --

 dynamicField name=i18n* type=string indexed=true stored=true / !--
 DynamicFields used typically for faceting issues by copying values from
 other existing fields-- dynamicField name=*_facet type=string indexed=
 true stored=true multiValued=true /


 I' ve seen that when fetching documents with /select?q=id:whateverId, the
 results returned include both i18n* and *_facet fields filled. However,
 when using real-time request handler (/get?ids:whateverIds) the result
 fetched include only i18n* dynamic fields, but *_facet ones are not
 included.

 I have the impression during /get RequestHandler the server-side regular
 expression used when parsing fields and fields values to return documents
 with existing dynamic fields seems to be wrong. From the client side, I' ve
 checked that the class DocField.java that parses SolrDocument to Bean ones
 uses the following matcher:

  } else if (annotation.value().indexOf('*') = 0) { // dynamic fields are
 annotated as @Field(categories_*)

 // if the field was annotated as a dynamic field, convert the name into a
 pattern

 // the wildcard (*) is supposed to be either a prefix or a suffix, hence
 the use of replaceFirst

 name = annotation.value().replaceFirst(\\*, \\.*);

 dynamicFieldNamePatternMatcher = Pattern.compile(^ + name + $);

  } else {

 name = annotation.value();

  }

 So maybe a similar behavior from the server-side is wrong. That' s the only
 reason I find to understand why when using /select all fields are returned
 but when using /get those that matches *_facet regexp are not.

 If you can confirm that this is a bug (because maybe is the expected
 behavior, but after some years using Solr I think it is not) I can create
 the JIRA issue and debug it more deeply to apply a patch with the aim to
 help.


 Regards,


 --
 - Luis Cappa

Re: Real-Time get and Dynamic Fields: possible bug.

2015-05-14 Thread Yonik Seeley

On Thu, May 14, 2015 at 12:49 PM, Luis Cappa Banda luisca...@gmail.com wrote:
 If you don' t mark as stored a field indexed and 'facetable', I was
 expecting to not be able to return their values, so faceting has no sense.

Faceting does not use or retrieve stored field values.  The labels
faceting returns are from the indexed values.

If you want the value returned, it needs to be stored only applies
to fields in the main document list (the fields that are retrieved for
the top ranked documents).

-Yonik

Re: Real-Time get and Dynamic Fields: possible bug.

2015-05-14 Thread Yonik Seeley

On Thu, May 14, 2015 at 10:47 AM, Luis Cappa Banda luisca...@gmail.com wrote:
 Hi Yonik,

 Yes, they are the target from copyFields in the schema.xml. This *_target
 fields are suposed to be used in some specific searchable (thus, tokenized)
 fields that in the future are candidates to be faceted to return some
 stats. For example, imagine that you have a field storing a directory path
 and you want to search by. Also, you may want to facet by the whole
 directory path value (not just their terms). Thats why I' m storing both
 field values: searchable and tokenized one, string and 'facet candidate'
 one.

OK, but you don't need to *store* the values in _facet, right?
-Yonik

Re: JSON Facet Analytics API in Solr 5.1

2015-05-09 Thread Yonik Seeley

curl -g 
http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}}

Using curl with everything in the URL is definitely trickier.
Everything needs to be URL escaped.  If it's not, curl will often
silently do nothing.
For example, when I had sort:'count asc' , the command above would do
nothing.  When I remembered to URL encode the space as a +, it
started working.

It's definitely easier to use -d with curl...

curl  http://localhost:8983/solr/techproducts/query; -d
'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}'

That also allows you to format it nicer for reading as well:

curl  http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet=
{cats:{terms:{
  field:cat,
  sort:count asc
}}}'

-Yonik


On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote:
 This one does not have problem, but how do I include sort in this facet
 query. Basically, I want to write a solr query which can sort the facet
 count ascending. Something like http://localhost:8983/solr
 /demo/query?q=applejson.facet={field=price sort='count asc'}
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

 I really appreciate your help.

 Frank

 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

 On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
  Hi Yonik,
 
  I am reading your blog. It is helpful. One question for you, for
 following
  example,
 
  curl http://localhost:8983/solr/query -d 'q=*:*rows=0
   json.facet={
 categories:{
   type : terms,
   field : cat,
   sort : { x : desc},
   facet:{
 x : avg(price),
 y : sum(price)
   }
 }
   }
  '
 
 
  If I want to write it in the format of this:
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'}
 ,
  how do I do?

 What problems do you encounter when you try that?

 If you try that URL with curl, be aware that curly braces {} are
 special globbing characters in curl.  Turn them off with the -g
 option:

 curl -g 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}

 -Yonik

Re: JSON Facet Analytics API in Solr 5.1

2015-05-07 Thread Yonik Seeley

On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
 Hi Yonik,

 I am reading your blog. It is helpful. One question for you, for following
 example,

 curl http://localhost:8983/solr/query -d 'q=*:*rows=0
  json.facet={
categories:{
  type : terms,
  field : cat,
  sort : { x : desc},
  facet:{
x : avg(price),
y : sum(price)
  }
}
  }
 '


 If I want to write it in the format of this:
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'},
 how do I do?

What problems do you encounter when you try that?

If you try that URL with curl, be aware that curly braces {} are
special globbing characters in curl.  Turn them off with the -g
option:

curl -g 
http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}

-Yonik

Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Yonik Seeley

On Wed, May 6, 2015 at 8:10 PM, Steve Rowe sar...@gmail.com wrote:
 It’s by design that you can copyField the same source/dest multiple times - 
 according to Yonik (not sure where this was discussed), this capability has 
 been used in the past to effectively boost terms in the source field.

Yep, used to be relatively common.
Perhaps the API could be cleaner though if we supported that by
passing an optional numTimes or numCopies?  Seems like a sane
delete / overwrite options would thus be easier?

-Yonik

Re: blocked in org.apache.solr.core.SolrCore.getSearcher(...) ?

2015-05-03 Thread Yonik Seeley

On Sun, May 3, 2015 at 12:30 PM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 No load by/on any other thread.

Can we get a full thread dump (of all the threads) during this time?

This line:
org.apache.solr.core.SolrCore.getSearcher(boolean, boolean,
java.util.concurrent.Future[], boolean) line: 1646
Suggests there is another thread opening a searcher, and this thread
is simply waiting for it to finish.

-Yonik

Re: blocked in org.apache.solr.core.SolrCore.getSearcher(...) ?

2015-05-03 Thread Yonik Seeley

What are the other threads doing during this time?
-Yonik

On Sun, May 3, 2015 at 4:00 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 Context: Solr 5.1, EmbeddedSolrServer(-mode)

 I have a rather big index/core (1G). I was able to initially index this core 
 and could then search within it. Now when I restart my app I am no more able 
 to search.
  getSearcher seems to hang... :

 java.lang.Object.wait(long) line: not available [native method]
 java.lang.Object.wait() line: 502
 org.apache.solr.core.SolrCore.getSearcher(boolean, boolean, 
 java.util.concurrent.Future[], boolean) line: 1646
 org.apache.solr.core.SolrCore.getSearcher(boolean, boolean, 
 java.util.concurrent.Future[]) line: 1442
 org.apache.solr.core.SolrCore.getSearcher() line: 1377
 org.apache.solr.servlet.SolrRequestParsers$1(org.apache.solr.request.SolrQueryRequestBase).getSearcher()
  line: 111
 org.apache.solr.handler.component.QueryComponent.process(org.apache.solr.handler.component.ResponseBuilder)
  line: 304
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(org.apache.solr.request.SolrQueryRequest,
  org.apache.solr.response.SolrQueryResponse) line: 222
 org.apache.solr.handler.component.SearchHandler(org.apache.solr.handler.RequestHandlerBase).handleRequest(org.apache.solr.request.SolrQueryRequest,
  org.apache.solr.response.SolrQueryResponse) line: 143
 org.apache.solr.core.SolrCore.execute(org.apache.solr.request.SolrRequestHandler,
  org.apache.solr.request.SolrQueryRequest, 
 org.apache.solr.response.SolrQueryResponse) line: 1984
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(org.apache.solr.client.solrj.SolrRequest,
  java.lang.String) line: 177
 org.apache.solr.client.solrj.request.QueryRequest(org.apache.solr.client.solrj.SolrRequest).process(org.apache.solr.client.solrj.SolrClient,
  java.lang.String) line: 135
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer(org.apache.solr.client.solrj.SolrClient).query(java.lang.String,
  org.apache.solr.common.params.SolrParams) line: 943
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer(org.apache.solr.client.solrj.SolrClient).query(org.apache.solr.common.params.SolrParams)
  line: 958
 ...

 I am not seeing any interesting solr/lucene-log messages.

 What's possibly going wrong? Memory? Or does warming/starting up a searcher 
 take that long ... more than 15 minutes?

 Thx
 Clemens

Re: AW: blocked in org.apache.solr.core.SolrCore.getSearcher(...) ?

2015-05-03 Thread Yonik Seeley

https://issues.apache.org/jira/browse/SOLR-6679

If you don't use the suggest component, the easiest fix is to comment it out.

-Yonik


On Sun, May 3, 2015 at 1:11 PM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 I guess it's the searcherExecutor-7-thread-1 (30) which seems to be loading 
 (updating?) the suggestions

 
 org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:764)
 
 org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:150)
 
 org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
 
 org.apache.lucene.analysis.shingle.ShingleFilter.getNextToken(ShingleFilter.java:390)
 
 org.apache.lucene.analysis.shingle.ShingleFilter.shiftInputWindow(ShingleFilter.java:467)
 
 org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:308)
 
 org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter.incrementToken(EdgeNGramTokenFilter.java:85)
 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:612)
 
 org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
 
 org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1138)
 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.add(AnalyzingInfixSuggester.java:381)
 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:310)
 org.apache.lucene.search.suggest.Lookup.build(Lookup.java:193)
 org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:161)
 org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:193)
 
 org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:739)
 org.apache.solr.core.SolrCore$5.call(SolrCore.java:1751)
 java.util.concurrent.FutureTask.run(Unknown Source)
 java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 java.lang.Thread.run(Unknown Source)



 -Ursprüngliche Nachricht-
 Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
 Gesendet: Sonntag, 3. Mai 2015 19:08
 An: solr-user@lucene.apache.org
 Betreff: AW: AW: blocked in org.apache.solr.core.SolrCore.getSearcher(...) ?

 Hope this is „readable“:

 qtp787867107-59 (59)

   *   sun.management.ThreadImpl.getThreadInfo1(Native Method)
   *   sun.management.ThreadImpl.getThreadInfo(Unknown Source)
   *   
 org.apache.solr.handler.admin.ThreadDumpHandler.handleRequestBody(ThreadDumpHandler.java:69)
   *   
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
   *   
 org.apache.solr.handler.admin.InfoHandler.handleRequestBody(InfoHandler.java:85)
   *   
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
   *   
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:783)
   *   
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:282)
   *   
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
   *   
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   *   
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
   *   
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   *   
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   *   
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   *   
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
   *   
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
   *   
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   *   
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
   *   
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   *   
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   *   
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
   *   
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   *   org.eclipse.jetty.server.Server.handle(Server.java:368)
   *   
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
   *

Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread Yonik Seeley

On Wed, Apr 22, 2015 at 11:00 AM, didier deshommes dfdes...@gmail.com wrote:
 curl 
 http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation;
 -H Content-type:application/json

You're telling Solr the body encoding is JSON, but then you don't send any body.
We could catch that error earlier perhaps, but it still looks like an error?

-Yonik

Re: CDATA response is coming with lt: instead of

2015-04-21 Thread Yonik Seeley

On Tue, Apr 21, 2015 at 9:46 AM, mesenthil1
senthilkumar.arumu...@viacomcontractor.com wrote:
 We are using DIH for indexing XML files. As part of the xml we have xml
 enclosed with CDATA. It is getting indexed but in response the CDATA content
 is coming as decoded terms instead of symbols.

Your problem is ambiguous since we can't tell what is data, and what
is markup (transfer syntax).

If you were to index this same data using JSON, what would you pass?
Is it this:
Imagesimageuri...
Or is it this?
![CDATA[Imagesimageuri...

If it's the former, you're already set - it's working that way now.
If it's the latter, then if you index that in XML you will need to
escape it like any other XML value.  Otherwise the XML parser will
remove the CDATA stuff before it gets to the indexing part of Solr.

-Yonik

Re: JSON Facet Analytics API in Solr 5.1

2015-04-18 Thread Yonik Seeley

Alther minor benefit to the flatter structure means that the smart
merging of multiple JSON parameters works a little better in
conjunction with facets.

For example, if you already had a top_genre facet, you could insert
a top_author facet more easily:

json.facet.top_genre.facet.top_author={type:terms, field:author, limit:5}

(For anyone who doesn't know what smart merging is,  see
http://yonik.com/solr-json-request-api/ )

-Yonik


On Sat, Apr 18, 2015 at 11:36 AM, Yonik Seeley ysee...@gmail.com wrote:
 Thank you everyone for the feedback!

 I've implemented and committed the flatter structure:
 https://issues.apache.org/jira/browse/SOLR-7422
 So either form can now be used (and I'll be switching to the flatter
 method for examples when it actually reduces the levels).

 For those who want to try it out, I just made a 5.2-dev snapshot:
 https://github.com/yonik/lucene-solr/releases

 -Yonik

Re: JSON Facet Analytics API in Solr 5.1

2015-04-17 Thread Yonik Seeley

Does anyone have any thoughts on the current general structure of JSON facets?
The current general form of a facet command is:

facet_name : { facet_type : facet_args }

For example:

top_authors : { terms : {
  field : author,
  limit : 5,
}}

One alternative I considered in the past is having the type in the args:

top_authors : {
  type : terms,
  field : author,
  limit : 5
}

It's a flatter structure... probably better in some ways, but worse in
other ways.
Thoughts / preferences?

-Yonik


On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote:
 Folks, there's a new JSON Facet API in the just released Solr 5.1
 (actually, a new facet module under the covers too).

 It's marked as experimental so we have time to change the API based on
 your feedback.  So let us know what you like, what you would change,
 what's missing, or any other ideas you may have!

 I've just started the documentation for the reference guide (on our
 confluence wiki), so for now the best doc is on my blog:

 http://yonik.com/json-facet-api/
 http://yonik.com/solr-facet-functions/
 http://yonik.com/solr-subfacets/

 I'll also be hanging out more on the #solr-dev IRC channel on freenode
 if you want to hit me up there about any development ideas.

 -Yonik

Re: 5.1 'unique' facet function / calcDistinct

2015-04-16 Thread Yonik Seeley

Thanks for the feedback Levan!
Could you open a JIRA issue for unique() on numeric/date fields?
We don't yet have explicit numeric support for unique() and I think
some changes in Lucene 5 broke treating these fields as strings (i.e.
the ability to retrieve ords).

-Yonik


On Thu, Apr 16, 2015 at 7:46 AM, levanDev levandev9...@gmail.com wrote:
 Hello,

 We are looking at a couple of options for using solr to dynamically calulate
 unique values per field. In testing out Solr 5.1, I've been using the
 unique() facet function:

 http://yonik.com/solr-facet-functions/

 Overall, loving the JSON Facet API, especially the sub-faceting thus far.

 Here's my two part question:

 I. When I use the unique aggregation function on a string field
 (uniqueValues:'unique(myStringField)'), it works as expected, returns the
 number of unique fields. However when I pass in an int -- or date -- field
 (uniqueValues:'unique(myIntField)') the resulting count is 0. The cause
 might be something else, but if it can be replicated by another user, would
 be great to discuss the unique function further -- in our current use-case,
 we have a field where under 20 unique values are present but the values are
 ints.

 II. Is there a way to use the stats.calcdistinct functionality and only
 return the countDistinct portion of the response and not the full list of
 distinct values -- as provided in the distinctValues portion of the
 response. In a field with high cardinality the response size becomes too
 large.

 If there is no such option, could someone point me in the right direction
 for implementing a custom solution?

 Thank you for your time,
 Levan

Re: Using synonyms API

2015-04-15 Thread Yonik Seeley

I just tried this quickly on trunk and it still works.

/opt/code/lusolr_trunk$ curl
http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english

{
  responseHeader:{
status:0,
QTime:234},
  synonymMappings:{
initArgs:{
  ignoreCase:true,
  format:solr},
initializedOn:2015-04-14T19:39:55.157Z,
managedMap:{
  GB:[GiB,
Gigabyte],
  TV:[Television],
  happy:[glad,
joyful]}}}


Verify that your URL has the correct port number (your example below
doesn't), and that default-collection is actually the name of your
default collection (and not collection1 which is the default for the
4x series).

-Yonik


On Wed, Apr 15, 2015 at 11:11 AM, Mike Thomsen mikerthom...@gmail.com wrote:
 We recently upgraded from 4.5.0 to 4.10.4. I tried getting a list of our
 synonyms like this:

 http://localhost/solr/default-collection/schema/analysis/synonyms/english

 I got a not found error. I found this page on new features in 4.8

 http://yonik.com/solr-4-8-features/

 Do we have to do something like this with our schema to even get the
 synonyms API working?

 !-- A text type for English text where stopwords and synonyms are managed
 using the REST API --
 fieldType name=managed_en class=solr.TextField
 positionIncrementGap=100
   analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.ManagedStopFilterFactory managed=english /
 filter class=solr.ManagedSynonymFilterFactory managed=english /
   /analyzer
 /fieldType

 I wanted to ask before changing our schema.

 Thanks,

 Mike

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 2724 matches

Mail list logo