Re: Indexing books, chapters and pages

2016-03-01 Thread Alexandre Rafalovitch
Here is an - untested - possible approach. I might be missing something by combining these things in too many layers, but. 1) Have chapter as parent documents and pages as children within that. Block index them together. 2) On pages, include page text (probably not stored) as one field. Also

Re: Pull request protocol question

2016-03-01 Thread Jan Høydahl
Hi, Yes, the GitHub repo changed when we switched from svn to git, and you did the right thing. Please see http://lucene.apache.org/solr/news.html#8-february-2016-apache-lucenesolr-development-moves-to-git -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 1. mar.

Re: both way synonyms with ManagedSynonymFilterFactory

2016-03-01 Thread Jan Høydahl
Thanks for reporting! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 1. mar. 2016 kl. 13.31 skrev Bjørn Hjelle : > > Thanks a lot for following up on this and creating the patch! > > On Thu, Feb 25, 2016 at 2:49 PM, Jan Høydahl

Re: ExtendedDisMax configuration nowhere to be found

2016-03-01 Thread Jan Høydahl
We have a huge backlog of stale wiki.apache.org pages which should really just point to the refGuide. I replaced the eDisMax and DisMax pages with a simple link to the ref guide, since they do not provide any added value. -- Jan Høydahl, search solution architect Cominvent AS -

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread Jeff Wartes
I’ve been running SolrCloud clusters in various versions for a few years here, and I can only think of two or three cases that the ZK-stored cluster state was broken in a way that I had to manually intervene by hand-editing the contents of ZK. I think I’ve seen Solr fixes go by for those

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread danny teichthal
Hi, Just summarizing my questions if the long mail is a little intimidating: 1. Is there a best practice/automated tool for overcoming problems in cluster state coming from zookeeper disconnections? 2. Creating a collection via core admin is discouraged, is it true also for core.properties

Re: understand scoring

2016-03-01 Thread shamik
Doug, do we've a date for the hard copy launch? -- View this message in context: http://lucene.472066.n3.nabble.com/understand-scoring-tp4260837p4260860.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: understand scoring

2016-03-01 Thread Doug Turnbull
Supposedly Late April, early May. But don't hold me to it until I see copy edits :) Of course looks like now you can read at least the full ebook in MEAP form. -Doug On Tue, Mar 1, 2016 at 2:57 PM, shamik wrote: > Doug, do we've a date for the hard copy launch? > > > > -- >

Solr sort and facet of nested doc fields

2016-03-01 Thread Jhon Smith
I am looking for a solr solution of this model: Product (common fields) ->SKU (color, size) and STORE(store_name) <-(price)-> SKU Listing contains only products but other facets (store names, colors) and sorting (by min price) should work either. I can have 3 types of docs: products, skus and

Pull request protocol question

2016-03-01 Thread Demian Katz
Hello, A few weeks ago, I submitted a pull request to Solr in association with a JIRA ticket, and it was eventually merged. More recently, I had an almost-trivial change I wanted to share, but on GitHub, my Solr fork appeared to have changed upstreams. Was the whole Solr repo moved and

Re: understand scoring

2016-03-01 Thread Doug Turnbull
Your screenshot doesn't seem to carry over. We don't have the permission to access files in your personal gmail. But I might suggest pasting your Solr URL into Splainer: http://splainer.io. Its a tool we use to explain Solr results. I might further suggest this handy book :-p

understand scoring

2016-03-01 Thread michael solomon
Hi all, I'm struggling to understand Solr scoring but can understand why I get those results: [image: Inline image 1] (If don't see pic:

Re: SolrJ 5.5 won't work with any of my servers

2016-03-01 Thread Shawn Heisey
On 3/1/2016 9:30 AM, Shai Erera wrote: > Ah ok, in my case even 5.4.1 didn't work with binary request writer, so > probably we don't face the same issue. If I set the writer to binary on 5.4.1, it fails too. My intent when I wrote the program was to use the binary writer, but apparently I didn't

Re: Indexing books, chapters and pages

2016-03-01 Thread Jack Krupansky
The chapter seems like the optimal unit for initial searches - just combine the page text with a line break between them or index as a multivalued field and set the position increment gap to be 1 so that phrases work. You could have a separate collection for pages, with each page as a Solr

Re: SolrJ 5.5 won't work with any of my servers

2016-03-01 Thread Shai Erera
Ah ok, in my case even 5.4.1 didn't work with binary request writer, so probably we don't face the same issue. Shai On Tue, Mar 1, 2016, 17:07 Shawn Heisey wrote: > On 2/29/2016 9:14 PM, Shai Erera wrote: > > Shawn, not sure if it's the same case as yours, but I've hit

Re: Indexing books, chapters and pages

2016-03-01 Thread Emir Arnautovic
Hi, From the top of my head - probably does not solve problem completely, but may trigger brainstorming: Index chapters and include page break tokens. Use highlighting to return matches and make sure fragment size is large enough to get page break token. In such scenario you should use slop

Re: Indexing books, chapters and pages

2016-03-01 Thread Walter Underwood
You could index both pages and chapters, with a type field. You could index by chapter with the page number as a payload for each token. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 1, 2016, at 5:50 AM, Zaccheo Bagnati

RE: Solr regex documenation

2016-03-01 Thread Markus Jelsma
Just keep in mind the regex operates on tokenized and filtered tokens if you use solr.TextField. But on verbatim input in case of StringField. Markus -Original message- > From:Anil > Sent: Tuesday 1st March 2016 16:28 > To: solr-user@lucene.apache.org > Subject:

Re: Solr regex documenation

2016-03-01 Thread Anil
Regex is working Markus. i need to investigate this particular pattern. Thanks for you responses. On 29 February 2016 at 19:16, Markus Jelsma wrote: > Hmm, if you have some stemming algorithm on that field, [a-z]+works is > never going to work but [a-z]+work should.

Re: SolrJ 5.5 won't work with any of my servers

2016-03-01 Thread Shawn Heisey
On 2/29/2016 9:14 PM, Shai Erera wrote: > Shawn, not sure if it's the same case as yours, but I've hit NPEs upgrading > to 5.5 too. In my case though, SolrJ talks to a proxy servlets before the > request gets routed to Solr, and that servlet didn't handle binary content > stream well. > > I had to

Fwd: Standard highlighting doesn't work for Block Join

2016-03-01 Thread michael solomon
Hi, I have solr 5.4.1 and I'm trying to use Block Join Query Parser for search in children and return the parent. I want to apply highlight on children but it's return empty. My q parameter: "q={!parent which="is_parent:true"} normal_text:(account)" highlight parameters: "hl=true=normal_text=="

Re: Indexing books, chapters and pages

2016-03-01 Thread Zaccheo Bagnati
Thank you, Jack for your answer. There are 2 reasons: 1. the requirement is to show in the result list both books and chapters grouped, so I would have to execute the query grouping by book, retrieve first, let's say, 10 books (sorted by relevance) and then for each book repeat the query grouping

Re: Indexing books, chapters and pages

2016-03-01 Thread Jack Krupansky
Any reason not to use the simplest structure - each page is one Solr document with a book field, a chapter field, and a page text field? You can then use grouping to group results by book (title text) or even chapter (title text and/or number). Maybe initially group by book and then if the user

Re: Indexing books, chapters and pages

2016-03-01 Thread Zaccheo Bagnati
Original data is quite well structured: it comes in XML with chapters and tags to mark the original page breaks on the paper version. In this way we have the possibility to restructure it almost as we want before creating SOLR index. Il giorno mar 1 mar 2016 alle ore 14:04 Jack Krupansky <

Re: Indexing books, chapters and pages

2016-03-01 Thread Jack Krupansky
To start, what is the form of your input data - is it already divided into chapters and pages? Or... are you starting with raw PDF files? -- Jack Krupansky On Tue, Mar 1, 2016 at 6:56 AM, Zaccheo Bagnati wrote: > Hi all, > I'm searching for ideas on how to define schema

Re: both way synonyms with ManagedSynonymFilterFactory

2016-03-01 Thread Bjørn Hjelle
Thanks a lot for following up on this and creating the patch! On Thu, Feb 25, 2016 at 2:49 PM, Jan Høydahl wrote: > Created https://issues.apache.org/jira/browse/SOLR-8737 to handle this > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com >

behavior of ScriptTransformer in DIH has changed

2016-03-01 Thread Bernd Fehling
Just in case someone uses ScriptTransformer in DIH extensively and is thinking about going from Java7 to Java8, some behavior has changed due to change from Mozilla Rhino (Java7) to Oracle Nashorn (Java8). Took me a while to figure out why my DIH chrashed after changing to Java8. A good help is

Re: Indexing books, chapters and pages

2016-03-01 Thread Zaccheo Bagnati
That's fine. But how could I get, for example, obtain a list of the pages containing a match? Il giorno mar 1 mar 2016 alle ore 13:01 Binoy Dalal ha scritto: > Here's one idea. > Index each chapter as a parent document and then have individual pages to > be the child

Re: Indexing books, chapters and pages

2016-03-01 Thread Binoy Dalal
Here's one idea. Index each chapter as a parent document and then have individual pages to be the child documents. That way for a match in any chapter, you also get the individual pages as documents for presentation. On Tue, 1 Mar 2016, 17:26 Zaccheo Bagnati, wrote: > Hi

Indexing books, chapters and pages

2016-03-01 Thread Zaccheo Bagnati
Hi all, I'm searching for ideas on how to define schema and how to perform queries in this use case: we have to index books, each book is split into chapters and chapters are split into pages (pages represent original page cutting in printed version). We should show the result grouped by books and

[ISSUE] backup on a recovering index should fail

2016-03-01 Thread Gerald Reinhart
Hi, In short: backup on a recovering index should fail. We are using the backup command "http:// ... /replication?command=backup=/tmp" against one server of the cluster. Most of the time there is no issue with this command. But in some particular case, the server can be in