Re: Welcome Mayya Sharipova as Lucene/Solr committer
Woohoo - awesome news! Congrats, Mayya! Trey Grainger Founder, Searchkernel <https://searchkernel.com> On Mon, Jun 8, 2020 at 12:58 PM jim ferenczi wrote: > Hi all, > > Please join me in welcoming Mayya Sharipova as the latest Lucene/Solr > committer. > Mayya, it's tradition for you to introduce yourself with a brief bio. > > Congratulations and Welcome! > > Jim >
Re: Welcome Eric Pugh as a Lucene/Solr committer
Congratulations, Eric! On Mon, Apr 6, 2020 at 8:21 AM Jan Høydahl wrote: > Hi all, > > Please join me in welcoming Eric Pugh as the latest Lucene/Solr committer! > > Eric has been part of the Solr community for over a decade, as a code > contributor, book author, company founder, blogger and mailing list > contributor! We look forward to his future contributions! > > Congratulations and welcome! It is a tradition to introduce yourself with > a brief bio, Eric. > > Jan Høydahl > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Welcome Alessandro Benedetti as a Lucene/Solr committer
Congrats, Alessandro! -Trey On Thu, Mar 19, 2020 at 10:36 AM Ignacio Vera wrote: > Welcome Alessandro! > > On Thu, Mar 19, 2020 at 3:21 PM Namgyu Kim wrote: > >> Congrats and welcome! Alessandro :D >> >> On Thu, Mar 19, 2020 at 11:10 PM Michael Sokolov >> wrote: >> >>> Welcome Alessandro! >>> >>> On Wed, Mar 18, 2020 at 3:25 PM Alessandro Benedetti < >>> a.benede...@sease.io> wrote: >>> Thanks everyone for the warm welcome! I already know most of you but for all the others here's my brief bio :) I am Italian (possibly the only other italian in addition to Tommaso) and I have been living in the UK for the last 7 years. I am currently based in London. I started working with Apache Solr back in 2010 (and a few months later with Apache Lucene), my first project was a search API that translated the Verity query language to Lucene syntax, at the time I was a junior software engineer with a background in Information Retrieval research at Roma3 university. Since then I have explored a lot of different use cases for Apache Lucene/Solr and I spent more and more time studying and working with the internals, across various companies and positions. My favourite projects in my career have been the design and implementation of a Semantic Search engine called Sensify (when I was working in a small and cohesive R team in Zaizi, with spanish friends and colleagues from Seville), the Apache Solr Learning To Rank plugin from Bloomberg (and integrations/applications) and the Rated Ranking Evaluator project (an Open Source library for Search Quality Evaluation we contributed back to the community). In 2016 I founded my own company, Sease where we try to build a bridge between Academia and the industry through Open Source software in the domain of Information Retrieval. As David mentioned my main areas of contribution in Apache Lucene/Solr have been the More Like This, the Learning To Rank plugin, Synonyms expansion and the Suggester component. I have a lot of ideas in my to do list, so stay tuned, we'll have a lot to discuss and innovate ! It is a pleasure to join this group and I am sure we'll do great things together :) Cheers -- Alessandro Benedetti Search Consultant, R Software Engineer, Director www.sease.io On Wed, 18 Mar 2020 at 13:00, David Smiley wrote: > Hi all, > > Please join me in welcoming Alessandro Benedetti as the latest > Lucene/Solr committer! > > Alessandro has been contributing to Lucene and Solr in areas such as > More Like This, Synonym boosting, and Suggesters, and other areas for > years. Furthermore he's been a help to many users on the solr-user > mailing > list and has helped others through his blog posts and presentations about > search. We look forward to his future contributions. > > Congratulations and welcome! It is a tradition to introduce yourself > with a brief bio, Alessandro. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley >
[PSA] Activate 2019 Call for Speakers ends May 8
Hi everyone, I wanted to do a quick PSA for anyone who may have missed the announcement last month to let you know the call for speakers is currently open through *Wednesday, May 8th*, for Activate 2019 (the Search and AI Conference), focused on the Apache Solr ecosystem and the intersection of Search and AI: https://lucidworks.com/2019/04/02/activate-2019-call-for-speakers/ The Activate Conference will be held September 9-12 in Washington, D.C. The conference, rebranded last year from "Lucene/Solr Revolution", is expected to grow considerably this year, and I'd like to encourage all of you working on advancements in the Lucene/Solr project or working on solving interesting problems in this space to consider submitting a talk if you haven't already. There are tracks dedicated to Solr Development, AI-powered Search, Search Development at Scale, and numerous other related topics - including tracks for key use cases like digital commerce - that I expect most on this list will find appealing. If you're interested in presenting (your conference registration fee will be covered if accepted), please submit a talk here: https://activate-conf.com/speakers/ Just wanted to make sure everyone in the development and user community here was aware of the conference and didn't miss the opportunity to submit a talk by Wednesday if interested. All the best, Trey Grainger Chief Algorithms Officer @ Lucidworks https://www.linkedin.com/in/treygrainger/
Re: Congratulations to the new Lucene/Solr PMC chair, Cassandra Targett
Congratulations, Cassandra! On Wed, Jan 2, 2019 at 9:31 AM Joel Bernstein wrote: > Congratulations Cassandra! > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Jan 2, 2019 at 8:39 AM Tommaso Teofili > wrote: > >> Congrats Cassandra! >> Il giorno mer 2 gen 2019 alle ore 12:43 Shalin Shekhar Mangar >> ha scritto: >> > >> > Congratulations Cassandra! >> > >> > On Mon, Dec 31, 2018 at 1:08 PM Adrien Grand wrote: >> >> >> >> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache >> >> Vice President position. >> >> >> >> This year we have nominated and elected Cassandra Targett as the >> >> chair, a decision that the board approved in its December 2018 >> >> meeting. >> >> >> >> Congratulations, Cassandra! >> >> >> >> -- >> >> Adrien >> >> >> >> - >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> > >> > >> > -- >> > Regards, >> > Shalin Shekhar Mangar. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
[jira] [Commented] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604608#comment-16604608 ] Trey Grainger commented on SOLR-9418: - I uploaded an updated patch today for this issue, contributing the CareerBuilder version of this initial patch for this issue was loosely based upon (thanks for the contribution, CareerBuilder!). I've had several people ask about this feature recently, and others have proposed some alternative implementations of this, as well. Getting this posted as a reference implementation for future development. > Statistical Phrase Identifier > - > > Key: SOLR-9418 > URL: https://issues.apache.org/jira/browse/SOLR-9418 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Akash Mehta >Priority: Major > Attachments: SOLR-9418.patch, SOLR-9418.zip > > > h2. *Summary:* > The Statistical Phrase Identifier is a Solr contribution that takes in a > string of text and then leverages a language model (an Apache Lucene/Solr > inverted index) to predict how the inputted text should be divided into > phrases. The intended purpose of this tool is to parse short-text queries > into phrases prior to executing a keyword search (as opposed parsing out each > keyword as a single term). > It is being generously donated to the Solr project by CareerBuilder, with the > original source code and a quickly demo-able version located here: > [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,] > h2. *Purpose:* > Assume you're building a job search engine, and one of your users searches > for the following: > _machine learning research and development Portland, OR software engineer > AND hadoop, java_ > Most search engines will natively parse this query into the following boolean > representation: > _(machine AND learning AND research AND development AND Portland) OR > (software AND engineer AND hadoop AND java)_ > While this query may still yield relevant results, it is clear that the > intent of the user wasn't understood very well at all. By leveraging the > Statistical Phrase Identifier on this string prior to query parsing, you can > instead expect the following parsing: > _{machine learning} \{and} \{research and development} \{Portland, OR} > \{software engineer} \{AND} \{hadoop,} \{java}_ > It is then possile to modify all the multi-word phrases prior to executing > the search: > _"machine learning" and "research and development" "Portland, OR" "software > engineer" AND hadoop, java_ > Of course, you could do your own query parsing to specifically handle the > boolean syntax, but the following would eventually be interpreted correctly > by Apache Solr and most other search engines: > _"machine learning" AND "research and development" AND "Portland, OR" AND > "software engineer" AND hadoop AND java_ > h2. *History:* > This project was originally implemented by the search team at CareerBuilder > in the summer of 2015 for use as part of their semantic search system. In the > summer of 2016, Akash Mehta, implemented a much simpler version as a proof of > concept based upon publicly available information about the CareerBuilder > implementation (the first attached patch). In July of 2018, CareerBuilder > open sourced their original version > ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,] > and agreed to also donate the code to the Apache Software foundation as a > Solr contribution. An Solr patch with the CareerBuilder version was added to > this issue on September 5th, 2018, and community feedback and contributions > are encouraged. > This issue was originally titled the "Probabilistic Query Parser", but the > name has now been updated to "Statistical Phrase Identifier" to avoid > ambiguity with Solr's query parsers (per some of the feedback on this issue), > as the implementation is actually just a mechanism for identifying phrases > statistically from a string and is NOT a Solr query parser. > h2. *Example usage:* > h3. (See contrib readme or configuration files in the patch for full > configuration details) > h3. *{{Request:}}* > {code:java} > http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin > skywalker toad x men magneto professor xavier{code} > h3. *{{Response:}}* > {
[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9418: Description: h2. *Summary:* The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into phrases. The intended purpose of this tool is to parse short-text queries into phrases prior to executing a keyword search (as opposed parsing out each keyword as a single term). It is being generously donated to the Solr project by CareerBuilder, with the original source code and a quickly demo-able version located here: [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,] h2. *Purpose:* Assume you're building a job search engine, and one of your users searches for the following: _machine learning research and development Portland, OR software engineer AND hadoop, java_ Most search engines will natively parse this query into the following boolean representation: _(machine AND learning AND research AND development AND Portland) OR (software AND engineer AND hadoop AND java)_ While this query may still yield relevant results, it is clear that the intent of the user wasn't understood very well at all. By leveraging the Statistical Phrase Identifier on this string prior to query parsing, you can instead expect the following parsing: _{machine learning} \{and} \{research and development} \{Portland, OR} \{software engineer} \{AND} \{hadoop,} \{java}_ It is then possile to modify all the multi-word phrases prior to executing the search: _"machine learning" and "research and development" "Portland, OR" "software engineer" AND hadoop, java_ Of course, you could do your own query parsing to specifically handle the boolean syntax, but the following would eventually be interpreted correctly by Apache Solr and most other search engines: _"machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java_ h2. *History:* This project was originally implemented by the search team at CareerBuilder in the summer of 2015 for use as part of their semantic search system. In the summer of 2016, Akash Mehta, implemented a much simpler version as a proof of concept based upon publicly available information about the CareerBuilder implementation (the first attached patch). In July of 2018, CareerBuilder open sourced their original version ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,] and agreed to also donate the code to the Apache Software foundation as a Solr contribution. An Solr patch with the CareerBuilder version was added to this issue on September 5th, 2018, and community feedback and contributions are encouraged. This issue was originally titled the "Probabilistic Query Parser", but the name has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with Solr's query parsers (per some of the feedback on this issue), as the implementation is actually just a mechanism for identifying phrases statistically from a string and is NOT a Solr query parser. h2. *Example usage:* h3. (See contrib readme or configuration files in the patch for full configuration details) h3. *{{Request:}}* {code:java} http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin skywalker toad x men magneto professor xavier{code} h3. *{{Response:}}* {code:java} { "responseHeader":{ "status":0, "QTime":25}, "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} {x men} {magneto} {professor xavier}", "top_parsed_phrases":[ "darth vader", "obi wan kenobi", "anakin skywalker", "toad", "x-men", "magneto", "professor xavier"], "potential_parsings":[{ "parsed_phrases":["darth vader", "obi wan kenobi", "anakin skywalker", "toad", "x-men", "magneto", "professor xavier"], "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} {x-men} {magneto} {professor xavier}", "score":0.0}]}{code} was: h2. *Summary:* The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into
[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9418: Description: h2. *Summary:* The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into phrases. The intended purpose of this tool is to parse short-text queries into phrases prior to executing a keyword search (as opposed parsing out each keyword as a single term). It is being generously donated to the Solr project by CareerBuilder, with the original source code and a quickly demo-able version located here: [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,] h2. *Purpose:* Assume you're building a job search engine, and one of your users searches for the following: _machine learning research and development Portland, OR software engineer AND hadoop, java_ Most search engines will natively parse this query into the following boolean representation: _(machine AND learning AND research AND development AND Portland) OR (software AND engineer AND hadoop AND java)_ While this query may still yield relevant results, it is clear that the intent of the user wasn't understood very well at all. By leveraging the Statistical Phrase Identifier on this string prior to query parsing, you can instead expect the following parsing: _{machine learning} \{and} \{research and development} \{Portland, OR} \{software engineer} \{AND} \{hadoop,} \{java}_ It is then possile to modify all the multi-word phrases prior to executing the search: _"machine learning" and "research and development" "Portland, OR" "software engineer" AND hadoop, java_ Of course, you could do your own query parsing to specifically handle the boolean syntax, but the following would eventually be interpreted correctly by Apache Solr and most other search engines: _"machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java_ h2. *History:* This project was originally implemented by the search team at CareerBuilder in the summer of 2015 for use as part of their semantic search system. In the summer of 2016, Akash Mehta, implemented a much simpler version as a proof of concept based upon publicly available information about the CareerBuilder implementation (the first attached patch). In July of 2018, CareerBuilder open sourced their original version ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,] and agreed to also donate the code to the Apache Software foundation as a Solr contribution. An Solr patch with the CareerBuilder version was added to this issue on September 5th, 2018, and community feedback and contributions are encouraged. This issue was originally titled the "Probabilistic Query Parser", but the name has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with Solr's query parsers (per some of the feedback on this issue), as the implementation is actually just a mechanism for identifying phrases statistically from a string and is NOT a Solr query parser. h2. *Example usage:* h3. (See contrib readme or configuration files in the patch for full configuration details) h3. *{{Request:}}* {code:java} http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin skywalker toad x men magneto professor xavier{code} h3. *{{Response:}}* {code:java} { "responseHeader":{ "status":0, "QTime":25}, "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} {x men} {magneto} {professor xavier}", "top_parsed_phrases":[ "darth vader", "obi wan kenobi", "anakin skywalker", "toad", "x-men", "magneto", "professor xavier"], "potential_parsings":[{ "parsed_phrases":["darth vader", "obi wan kenobi", "anakin skywalker", "toad", "x-men", "magneto", "professor xavier"], "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} {x-men} {magneto} {professor xavier}", "score":0.0}]}{code} was: h2. *Summary:* The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into
[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9418: Description: h2. *Summary:* The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into phrases. The intended purpose of this tool is to parse short-text queries into phrases prior to executing a keyword search (as opposed parsing out each keyword as a single term). It is being generously donated to the Solr project by CareerBuilder, with the original source code and a quickly demo-able version located here: [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,] h2. *Purpose:* Assume you're building a job search engine, and one of your users searches for the following: _machine learning research and development Portland, OR software engineer AND hadoop, java_ Most search engines will natively parse this query into the following boolean representation: _(machine AND learning AND research AND development AND Portland) OR (software AND engineer AND hadoop AND java)_ While this query may still yield relevant results, it is clear that the intent of the user wasn't understood very well at all. By leveraging the Statistical Phrase Identifier on this string prior to query parsing, you can instead expect the following parsing: _{machine learning} \{and} \{research and development} \{Portland, OR} \{software engineer} \{AND} \{hadoop,} \{java}_ It is then possile to modify all the multi-word phrases prior to executing the search: _"machine learning" and "research and development" "Portland, OR" "software engineer" AND hadoop, java_ Of course, you could do your own query parsing to specifically handle the boolean syntax, but the following would eventually be interpreted correctly by Apache Solr and most other search engines: _"machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java_ h2. *History:* This project was originally implemented by the search team at CareerBuilder in the summer of 2015 for use as part of their semantic search system. In the summer of 2016, Akash Mehta, implemented a much simpler version as a proof of concept based upon publicly available information about the CareerBuilder implementation (the first attached patch). In July of 2018, CareerBuilder open sourced their original version ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,] and agreed to also donate the code to the Apache Software foundation as a Solr contribution. An Solr patch with the CareerBuilder version was added to this issue on September 5th, 2018, and community feedback and contributions are encouraged. This issue was originally titled the "Probabilistic Query Parser", but the name has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with Solr's query parsers (per some of the feedback on this issue), as the implementation is actually just a mechanism for identifying phrases statistically from a string and is NOT a Solr query parser. h2. *Example usage:* h3. (See contrib readme or configuration files in the patch for full configuration details) h3. *{{Request:}}* {code:java} http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin skywalker toad x men magneto professor xavier{code} h3. *{{Response:}}* {code:java} { "responseHeader":{ "status":0, "QTime":25}, "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} {x men} {magneto} {professor xavier}", "top_parsed_phrases":[ "darth vader", "obi wan kenobi", "anakin skywalker", "toad", "x-men", "magneto", "professor xavier"], "potential_parsings":[{ "parsed_phrases":["darth vader", "obi wan kenobi", "anakin skywalker", "toad", "x-men", "magneto", "professor xavier"], "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} {x-men} {magneto} {professor xavier}", "score":0.0}]}{code} was: The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into phrases. The inte
[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9418: Attachment: SOLR-9418.patch > Statistical Phrase Identifier > - > > Key: SOLR-9418 > URL: https://issues.apache.org/jira/browse/SOLR-9418 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Akash Mehta >Priority: Major > Attachments: SOLR-9418.patch, SOLR-9418.zip > > > The Statistical Phrase Identifier is a Solr contribution that takes in a > string of text and then leverages a language model (an Apache Lucene/Solr > inverted index) to predict how the inputted text should be divided into > phrases. The intended purpose of this tool is to parse short-text queries > into phrases prior to executing a keyword search (as opposed parsing out each > keyword as a single term). > History > This project was originally implemented at CareerBuilder in the summer of > 2015 for use as part of their semantic search system. In 2018 > > The main aim of this requestHandler is to get the best parsing for a given > query. This basically means recognizing different phrases within the query. > We need some kind of training data to generate these phrases. The way this > project works is: > 1.)Generate all possible parsings for the given query > 2.)For each possible parsing, a naive-bayes like score is calculated. > 3.)The main scoring is done by going through all the documents in the > training set and finding the probability of bunch of words occurring together > as a phrase as compared to them occurring randomly in the same document. Then > the score is normalized. Some higher importance is given to the title field > as compared to content field which is configurable. > 4.)Finally after scoring each of the possible parsing, the one with the > highest score is returned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9418: Description: The Statistical Phrase Identifier is a Solr contribution that takes in a string of text and then leverages a language model (an Apache Lucene/Solr inverted index) to predict how the inputted text should be divided into phrases. The intended purpose of this tool is to parse short-text queries into phrases prior to executing a keyword search (as opposed parsing out each keyword as a single term). History This project was originally implemented at CareerBuilder in the summer of 2015 for use as part of their semantic search system. In 2018 The main aim of this requestHandler is to get the best parsing for a given query. This basically means recognizing different phrases within the query. We need some kind of training data to generate these phrases. The way this project works is: 1.)Generate all possible parsings for the given query 2.)For each possible parsing, a naive-bayes like score is calculated. 3.)The main scoring is done by going through all the documents in the training set and finding the probability of bunch of words occurring together as a phrase as compared to them occurring randomly in the same document. Then the score is normalized. Some higher importance is given to the title field as compared to content field which is configurable. 4.)Finally after scoring each of the possible parsing, the one with the highest score is returned. was: The main aim of this requestHandler is to get the best parsing for a given query. This basically means recognizing different phrases within the query. We need some kind of training data to generate these phrases. The way this project works is: 1.)Generate all possible parsings for the given query 2.)For each possible parsing, a naive-bayes like score is calculated. 3.)The main scoring is done by going through all the documents in the training set and finding the probability of bunch of words occurring together as a phrase as compared to them occurring randomly in the same document. Then the score is normalized. Some higher importance is given to the title field as compared to content field which is configurable. 4.)Finally after scoring each of the possible parsing, the one with the highest score is returned. > Statistical Phrase Identifier > - > > Key: SOLR-9418 > URL: https://issues.apache.org/jira/browse/SOLR-9418 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Akash Mehta >Priority: Major > Attachments: SOLR-9418.zip > > > The Statistical Phrase Identifier is a Solr contribution that takes in a > string of text and then leverages a language model (an Apache Lucene/Solr > inverted index) to predict how the inputted text should be divided into > phrases. The intended purpose of this tool is to parse short-text queries > into phrases prior to executing a keyword search (as opposed parsing out each > keyword as a single term). > History > This project was originally implemented at CareerBuilder in the summer of > 2015 for use as part of their semantic search system. In 2018 > > The main aim of this requestHandler is to get the best parsing for a given > query. This basically means recognizing different phrases within the query. > We need some kind of training data to generate these phrases. The way this > project works is: > 1.)Generate all possible parsings for the given query > 2.)For each possible parsing, a naive-bayes like score is calculated. > 3.)The main scoring is done by going through all the documents in the > training set and finding the probability of bunch of words occurring together > as a phrase as compared to them occurring randomly in the same document. Then > the score is normalized. Some higher importance is given to the title field > as compared to content field which is configurable. > 4.)Finally after scoring each of the possible parsing, the one with the > highest score is returned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier
[ https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9418: Summary: Statistical Phrase Identifier (was: Probabilistic-Query-Parser RequestHandler) > Statistical Phrase Identifier > - > > Key: SOLR-9418 > URL: https://issues.apache.org/jira/browse/SOLR-9418 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Akash Mehta >Priority: Major > Attachments: SOLR-9418.zip > > > The main aim of this requestHandler is to get the best parsing for a given > query. This basically means recognizing different phrases within the query. > We need some kind of training data to generate these phrases. The way this > project works is: > 1.)Generate all possible parsings for the given query > 2.)For each possible parsing, a naive-bayes like score is calculated. > 3.)The main scoring is done by going through all the documents in the > training set and finding the probability of bunch of words occurring together > as a phrase as compared to them occurring randomly in the same document. Then > the score is normalized. Some higher importance is given to the title field > as compared to content field which is configurable. > 4.)Finally after scoring each of the possible parsing, the one with the > highest score is returned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Multiple Query-Time Analyzers in Solr
Doug - see https://issues.apache.org/jira/browse/SOLR-6492. I implemented something previously that accomplishes the stated goal (it's part of Chapter 14 of* Solr in Action <http://solrinaction.com>*). Specifically, it is a text field that allows you to dynamically change the analyzer(s) at index time (on a per document basis) or at query time (on a per-term basis) while using the same actual field in the index. One interesting note - you can actually choose *multiple* analyzers per field for the same document or query (you're not restricted to one, as in your proposed example). For example, if you wanted to index or query text in multiple languages at the same time on the same text, you could specify the analyzer for each language and it would run your text (independently) through them all prior to indexing or as part of the query construction. The syntax isn't elegant (feels a bit ugly since you can switch analyzers per-term - but therein also lies tremendous flexibility), but it works. It currently requires you to pass in the analyzers you want to use either in the content of you field (index-time) or as part of your query, which means no schema changes are necessary other than using a special field type for the dynamic analyzer behavior. Something like the schema changes you proposed would make it easier to use in most cases, though. I've unfortunately done an awful job of keeping the JIRA moving along toward getting it committed (busy schedule), but it's something you can take a look at. Would be happy to collaborate with you if you're thinking about doing work in this area. All the best, Trey Grainger Co-Author, *Solr in Action* SVP of Engineering @ Lucidworks On Thu, Nov 23, 2017 at 11:03 AM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > An alternate solution could be to create a fieldType that was a > "FacadeTextField" that searches a real TextField field with a different > query time analyzer. IE it would not have a physical representation in the > index, but just provide a handle to a "field" that is searched with a > different query time analyzer. > > For example, actor_nosyn is really a facade for searching "actor" with a > different analyzer > > > > > > > > > > > ... > > > > > ... > ... > > > This would allow edismax and other query parsers to remain unchanged > searching, ie: > > q=action movies=actor actor_nosyn title text=edismax > > > > On Thu, Nov 23, 2017 at 10:50 AM Doug Turnbull <dturnbull@ > opensourceconnections.com> wrote: > >> I wonder if there's been any thought by the community to refactoring >> fieldTypes to allow multiple query-time analyzers per indexed field? >> Currently, to get different query-time analysis behavior you have to >> duplicate a field. This is unfortunate duplication if, for example, I want >> to search a field with query time synonyms on/off. For higher scale search >> cases, allowing multiple query time analyzers against a single index field >> can be invaluable. It's one reason I created the Match Query Parser ( >> https://github.com/o19s/match-query-parser) and a major feature of >> hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms ) >> >> What I would propose is the ability to place multiple analyzers under a >> field type. For example: >> >> >> ...> analyzer> >> ... >> ... >> >> >> Notice how one query-time analyzer is "default" (and including only one >> would make it the default) >> >> This would require allowing query parsers pass the analyzer to use at >> query time. I would propose introduce a syntax for configuring query >> behavior per-field in edismax. Omitting this would continue to use the >> default behavior/analyzer. >> >> For example, one could query title and text as usual: >> >> q=action movies=actor title text=edismax >> >> I would propose introducing a syntax whereby qf could refer to a kind of >> psuedo field, configurable with a syntax similar to per-field facet settings >> >> For example, below "actor_nosyn" and "actor_syn" actually search the same >> physical field, but are configured with different analyzers >> >> q=action movies=actor_syn actor_nosyn^10 title >> text=edismax_nosyn.field=actor_ >> nosyn.analyzer=without_synonyms_syn.field= >> actor_syn.analyzer=with_synonyms >> >> Indeed, I would propose extending this syntax to control some of the >> query-specific properties that currently are tied to the fieldType, such as >> >> q=action movies=actor_syn actor_nosyn^10 title &g
[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-10494: - Attachment: SOLR-10494.branch_7x.patch Here's the most up-to-date patch against branch_7x. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Response Writers >Affects Versions: 7.0 > Reporter: Trey Grainger >Priority: Blocker > Fix For: 7.0 > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494.branch_7x.patch, > SOLR-10494.patch, SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096408#comment-16096408 ] Trey Grainger edited comment on SOLR-10494 at 7/21/17 3:57 PM: --- Hi [~janhoy]. I picked it up a few times, but was developing against master and kept running into stability issues with other tests every time I pulled. I finally switched over to just developing on the 7.x branch instead to prevent those stability issues. I have an updated patch which fixes some (now) merge conflicts with the default configset changes, and all tests appear to be passing except the TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been able to dig deep enough to understand what is effecting that one. I DO know that the issues is related to indention. If I go into the test and override it to "indent=off" then it succeeds, but I have no idea why indention being on is causing the failure. Also, doing that in the test is probably just masking another underlying problem, which may not even be test related, so I really need to understand exactly where things are breaking down to know if it's a test problem or an actual functionality problem somewhere. At any rate, I'll post my updated patch here shortly. I'm a little tight on time this next week, so hopefully I can enlist someone else to assist on my end later today, as well. was (Author: solrtrey): Hi [~janhoy]. I picked it up a few times, but was developing against master and kept running into stability issues with other tests every time I pulled. I finally switched over to just developing on the 7.x branch instead to prevent those issues. I have an updated patch which fixes some (now) merge conflicts with the default configset changes, and all tests appear to be passing except the TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been able to dig deep enough to understand what is effecting that one. I DO know that the issues is related to indention. If I go into the test and override it to "indent=off" then it succeeds, but I have no idea why indention being on is causing the failure. Also, doing that in the test is probably just masking another underlying problem, which may not even be test related, so I really need to understand exactly where things are breaking down to know if it's a test problem or an actual functionality problem somewhere. At any rate, I'll post my updated patch here shortly. I'm a little tight on time this next week, so hopefully I can enlist someone else to assist on my end later today, as well. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Response Writers > Affects Versions: 7.0 >Reporter: Trey Grainger >Priority: Blocker > Fix For: 7.0 > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, > SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to chang
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096408#comment-16096408 ] Trey Grainger commented on SOLR-10494: -- Hi [~janhoy]. I picked it up a few times, but was developing against master and kept running into stability issues with other tests every time I pulled. I finally switched over to just developing on the 7.x branch instead to prevent those issues. I have an updated patch which fixes some (now) merge conflicts with the default configset changes, and all tests appear to be passing except the TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been able to dig deep enough to understand what is effecting that one. I DO know that the issues is related to indention. If I go into the test and override it to "indent=off" then it succeeds, but I have no idea why indention being on is causing the failure. Also, doing that in the test is probably just masking another underlying problem, which may not even be test related, so I really need to understand exactly where things are breaking down to know if it's a test problem or an actual functionality problem somewhere. At any rate, I'll post my updated patch here shortly. I'm a little tight on time this next week, so hopefully I can enlist someone else to assist on my end later today, as well. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Response Writers >Affects Versions: 7.0 >Reporter: Trey Grainger >Priority: Blocker > Fix For: 7.0 > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, > SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064287#comment-16064287 ] Trey Grainger edited comment on SOLR-10494 at 6/27/17 5:23 AM: --- Ok, I think I'm nearly done. This patch ([^SOLR-10494.patch]) includes removing all the extraneous "wt=json" and "indent=on" references, adding a commented out version of "wt=xml" to the example solrconfig.xml's, unit test updates, some additional updates to the tutorials and docs (also incorporating [~ctargett]'s), and updating the admin UI (query section) to handle the new defaults. The only issue I'm running into is that for some reason I haven't figured out yet, turning "indent" on has broken some of the parent/child relationship tests (i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, SolrExampleTests.testChildDocTransformer. It initially appears to be some xml parsing issue issue with the extra whitespace, which would be odd, but I haven't dug in yet. Once I figure those out, I'll update the patch, and then I think this will be ready for review. was (Author: solrtrey): Ok, I think I'm nearly done. This patch includes removing all the extraneous "wt=json" and "indent=on" references, adding a commented out version of "wt=xml" to the example solrconfig.xml's, unit test updates, some additional updates to the tutorials and docs (also incorporating [~ctargett]'s), and updating the admin UI (query section) to handle the new defaults. The only issue I'm running into is that for some reason I haven't figured out yet, turning "indent" on has broken some of the parent/child relationship tests (i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, SolrExampleTests.testChildDocTransformer. It initially appears to be some xml parsing issue issue with the extra whitespace, which would be odd, but I haven't dug in yet. Once I figure those out, I'll update the patch, and then I think this will be ready for review. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Trey Grainger >Priority: Blocker > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, > SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-10494: - Attachment: SOLR-10494.patch Ok, I think I'm nearly done. This patch includes removing all the extraneous "wt=json" and "indent=on" references, adding a commented out version of "wt=xml" to the example solrconfig.xml's, unit test updates, some additional updates to the tutorials and docs (also incorporating [~ctargett]'s), and updating the admin UI (query section) to handle the new defaults. The only issue I'm running into is that for some reason I haven't figured out yet, turning "indent" on has broken some of the parent/child relationship tests (i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, SolrExampleTests.testChildDocTransformer. It initially appears to be some xml parsing issue issue with the extra whitespace, which would be odd, but I haven't dug in yet. Once I figure those out, I'll update the patch, and then I think this will be ready for review. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Trey Grainger >Priority: Blocker > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, > SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release planning for 7.0
Anshum, I'll be working on what I hope is a final patch for SOLR-10494 (Change default response format from xml to json) today. I expect to have it uploaded in the late evening US time. It will still need to be reviewed and (if acceptable) committed. It feels to me like the kind of change that should only be made in a major release due to back-compat concerns. If this can make it in after the branch is created, then no problem, but otherwise it might be worth waiting another day before branching. Up to you. -Trey On Sat, Jun 24, 2017 at 4:52 PM, Anshum Guptawrote: > I'll create the 7x, and 7.0 branches *tomorrow*. > > Ishan, do you mean you would be able to close it by Tuesday? You would > have to commit to both 7.0, and 7.x, in addition to master, but I think > that should be ok. > > We also have SOLR-10803 open at this moment and we'd need to come to a > decision on that as well in order to move forward with 7.0. > > P.S: If there are any objections to this plan, kindly let me know. > > -Anshum > > On Fri, Jun 23, 2017 at 5:03 AM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >> Hi Anshum, >> >> >> > I will send out an email a day before cutting the branch, as well as >> once the branch is in place. >> I'm right now on travel, and unable to finish SOLR-10574 until Monday >> (possibly Tuesday). >> Regards, >> Ishan >> >> On Tue, Jun 20, 2017 at 5:08 PM, Anshum Gupta >> wrote: >> >>> From my understanding, there's not really a 'plan' but some intention to >>> release a 6.7 at some time if enough people need it, right? In that case I >>> wouldn't hold back anything for a 6x line release and cut the 7x, and 7.0 >>> branches around, but not before the coming weekend. I will send out an >>> email a day before cutting the branch, as well as once the branch is in >>> place. >>> >>> If anyone has any objections to that, do let me know. >>> >>> Once that happens, we'd have a feature freeze on the 7.0 branch but we >>> can take our time to iron out the bugs. >>> >>> @Alan: Thanks for informing. I'll make sure that LUCENE-7877 is >>> committed before I cut the branch. I have added the right fixVersion to the >>> issue. >>> >>> -Anshum >>> >>> >>> >>> On Mon, Jun 19, 2017 at 8:33 AM Erick Erickson >>> wrote: >>> Anshum: I'm one of the people that expect a 6.7 release, but it's more along the lines of setting expectations than having features I really want to get in to the 6x code line. We nearly always have "just a few things" that someone would like to put in, and/or a bug fix or two that surfaces. I expect people to back-port stuff they consider easy/beneficial to 6.x for "a while" as 7.0 solidifies, at their discretion of course. Think of my position as giving people a target for tidying up 6.x rather than a concrete plan ;). Just seems to always happen. And if there is no 6.7, that's OK too. Additions to master-2 usually pretty swiftly stop as the hassle of merging any change into 3 code lines causes people to pick what goes into master-2 more carefully ;) Erick On Mon, Jun 19, 2017 at 8:03 AM, Alan Woodward wrote: > I’d like to get https://issues.apache.org/jira/browse/LUCENE-7877 in for 7.0 > - should be able to commit in the next couple of days. > > Alan Woodward > www.flax.co.uk > > > On 19 Jun 2017, at 15:45, Anshum Gupta wrote: > > Hi everyone, > > Here's the update about 7.0 release: > > There are still unresolved blockers for 7.0. > Solr (12): > https://issues.apache.org/jira/browse/SOLR-6630?jql= project%20%3D%20Solr%20AND%20fixVersion%20%3D%20% 22master%20(7.0)%22%20and%20resolution%20%3D% 20Unresolved%20and%20priority%20%3D%20Blocker > > Lucene (None): > https://issues.apache.org/jira/issues/?jql=project%20% 3D%20%22Lucene%20-%20Core%22%20AND%20fixVersion%20%3D%20% 22master%20(7.0)%22%20AND%20resolution%20%3D% 20Unresolved%20AND%20priority%20%3D%20Blocker > > Here are the ones that are unassigned: > https://issues.apache.org/jira/browse/SOLR-6630 > https://issues.apache.org/jira/browse/SOLR-10887 > https://issues.apache.org/jira/browse/SOLR-10803 > https://issues.apache.org/jira/browse/SOLR-10756 > https://issues.apache.org/jira/browse/SOLR-10710 > https://issues.apache.org/jira/browse/SOLR-9321 > https://issues.apache.org/jira/browse/SOLR-8256 > > The ones that are already assigned, I'd request you to update the JIRA so we > can track it better. > > In addition, I am about to create another one as I wasn’t able to extend > SolrClient easily without a code duplication on master. > > This brings us to - 'when can we cut the branch'. I can
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062350#comment-16062350 ] Trey Grainger commented on SOLR-10494: -- bq. Also should we mark this as a blocker for 7.0 to change it? - [~varunthacker] I just updated it to be a blocker, Varun. I'm working on what should be the final patch today. Hopefully this can be reviewed and make it in for 7.0. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Blocker > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, > SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-10494: - Priority: Blocker (was: Minor) > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Blocker > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, > SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061521#comment-16061521 ] Trey Grainger commented on SOLR-10494: -- Thanks, [~ctargett]! I'm building off you patch and making final changes. Been a bit slammed this week and am unavailable to work on this for the next 24-36 hours, but I expect to have the next (hopefully final, or close to it) patch pushed sometime on Sunday (in the U.S.). > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, > SOLR-10494-withdocs.patch > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056131#comment-16056131 ] Trey Grainger commented on SOLR-10494: -- yes, I'll address all of the code/config changes above. I'll get the patch updated to include the indent=on change first (fixing unit tests now... were more that broke than I was expecting due to indention) and then do the cleanup of the configs, admin, readme's, as a follow on patch. Once those are in, I can take a look at the ref-guide, website, and quickstart, though I'm afraid I may need some help pull all of those off in any reasonable timeframe for 7.0, as I'd expect there to be a lot of changes required there. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494 > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-10494: - Attachment: SOLR-10494 New patch fixing a precommit error. Comment earlier about unclosed resources was apparently pre-existing (those are warnings and not errors) and I just noticed it because of an unrelated error, so going to ignore those. Working on indent=on by default for next patch. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > Attachments: SOLR-10494, SOLR-10494 > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-10494: - Attachment: SOLR-10494 Initial patch, with all unit tests broken by the change now passing. Haven't changed to indent=on by default yet or removed setting of json explicitly in various places yet, though, as I've been trying to change one variable at a time to minimize complications. For some reason, switching to json by default has caused ant precommit to complain about resource leak in about 60 places. I'm not sure what is causing these at the moment, but want to address that first before adding any additional changes to the patch. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > Attachments: SOLR-10494 > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042701#comment-16042701 ] Trey Grainger commented on SOLR-10494: -- Question: I'm making indent=on the default. Any objections if I make indent=on the default for all TextResponseWriters, or do I need to limit the change to only the "wt=json" (now default writer) case. The writers impacted from what I can tell are: GEOJSONWriter JSONWriter XMLWriter SchemaXMLWriter PHPWriter PythonWriter RubyWriter It's a little complicated because most of these (geojson, php, python, ruby) actually inherit from the JSONWriter, so if I need to leave indent=off on those then I have to go in and set it explicitly on them since their base class will now have indent on by default. Unless anyone objects, I'm just going to set indent=on by default on all of these. Please let me know if anyone disagrees. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042630#comment-16042630 ] Trey Grainger commented on SOLR-10494: -- Started working on this two weeks ago and then got busy. The actual changes were super quick, but after I made them it was taking over 2 hours to run the unit tests with lots of failures and several test suites timing out. Just got back to this today and have pretty much everything diagnosed and am working on fixes. In short, SolrTestCaseJ4 has XPath checking hard-coded in its design, so I need to now pass in wt=xml explicitly there, and there are a handful of test suites (i.e. replication/backup/restore and hdfs) that are explicitly checking XML strings and looping forever until they get those strings back (hence timing out). I'm making changes to explicitly request XML right now for those tests where they are expecting it and will get a patch posted hopefully today. > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
[ https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013015#comment-16013015 ] Trey Grainger commented on SOLR-10494: -- Hi [~janhoy]. Sorry - I missed you first message last week. Sure - I should be able to get a patch posted this weekend. -Trey > Switch Solr's Default Response Type from XML to JSON > > > Key: SOLR-10494 > URL: https://issues.apache.org/jira/browse/SOLR-10494 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) > Reporter: Trey Grainger >Priority: Minor > Fix For: master (7.0) > > > Solr's default response format is still XML, despite the fact that Solr has > supported the JSON response format for over a decade, developer mindshare has > clearly shifted toward JSON over the years, and most modern/competing systems > also use JSON format now by default. > In fact, Solr's admin UI even explicitly adds wt=json to the request (by > default in the UI) to override the default of wt=xml, so Solr's Admin UI > effectively has a different default than the API. > We have now introduced things like the JSON faceting API, and the new more > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly > we're moving in the direction of JSON anyway. > I'd like propose that we switch the default response writer to JSON (wt=json) > instead of XML for Solr 7.0, as this seems to me like the right direction and > a good time to make this change with the next major version. > Based upon feedback from the Lucene Dev's mailing list, we want to: > 1) Change the default response writer type to "wt=json" and also change to > "indent=on" by default > 2) Make no changes on the update handler side; it already works as desired > (it returns the response in the same content-type as the request unless the > "wt" is passed in explicitly). > 3) Keep the /query request handler around since people have already used it > for years to do JSON queries > 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks > on how to change (back) the response format. > The default format change, plus the addition of "indent=on" are back compat > changes, so we need to make sure we doc those clearly in the CHANGES.txt. > There will also need to be significant adjustments to the Solr Ref Guide, > Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Change Default Response Format (wt) to JSON in Solr 7.0?
Thanks for the great feedback, everyone. Since the update handler currently already smart-defaults the response type like Yonik is describing based on the incoming content type (whether you specify the content-type header or not), it seems that we won't need to make any changes there. I just summarized everyone's feedback into action items and submitted a JIRA (SOLR-10494 <https://issues.apache.org/jira/browse/SOLR-10494>) for further tracking. If you have further comments or if I missed anything, feel free to reply there. Thanks, Trey Grainger Co-author, Solr in Action SVP of Engineering @ Lucidworks On Fri, Apr 14, 2017 at 11:35 PM, David Smiley <david.w.smi...@gmail.com> wrote: > It's a neat idea to have the response format smart-defaulted based on the > POST content-type. +1 to that! > > On Fri, Apr 14, 2017 at 11:24 PM Yonik Seeley <ysee...@gmail.com> wrote: > >> Just a reminder that we have had indented JSON query responses by >> default at the "/query" endpoint for years. That doesn't cover other >> handlers though. >> Readability/aesthetics of our docs/examples is where the biggest >> deficiency lies - lots of XML examples that could have been JSON for a >> long time now. Hopefully this change would prevent new docs from >> being written that use XML output format. >> >> Other thoughts: >> - The /query endpoint should remain, no need to break everyone who has >> been using it >> - I assume sending XML to the existing update handler should perhaps >> continue to return an XML response? >> - I assume that it's desirable to have indentation by default... but >> this is also a slight back compat change/break for people that >> currently specify JSON and expect it un-indented (for some response >> types, the difference could be large, like 2x). If we go this way, we >> need to add that to the CHANGES as well. >> >> -Yonik >> >> >> On Fri, Apr 14, 2017 at 2:53 PM, Trey Grainger <solrt...@gmail.com> >> wrote: >> > Just wanted to throw this out there for discussion. Solr's default query >> > response format is still XML, despite the fact that Solr has supported >> the >> > JSON response format for over a decade, developer mindshare has clearly >> > shifted toward JSON over the years, and most modern/competing systems >> also >> > use JSON format now by default. >> > >> > In fact, Solr's admin UI even explicitly adds wt=json to the request (by >> > default in the UI) to override the default of wt=xml, so Solr's Admin UI >> > effectively has a different default than the API. >> > >> > We have now introduced things like the JSON faceting API, and the new >> more >> > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly >> > we're moving in the direction of JSON anyway. >> > >> > I'd like propose that we switch the default response writer to JSON >> > (wt=json) instead of XML for Solr 7.0, as this seems to me like the >> right >> > direction and a good time to make this change with the next major >> version. >> > >> > Before I create a JIRA and submit a patch, though, I wanted to check >> here >> > make sure there were no strong objections to changing the default. >> > >> > -Trey Grainger >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www. > solrenterprisesearchserver.com >
[jira] [Created] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON
Trey Grainger created SOLR-10494: Summary: Switch Solr's Default Response Type from XML to JSON Key: SOLR-10494 URL: https://issues.apache.org/jira/browse/SOLR-10494 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (7.0) Reporter: Trey Grainger Priority: Minor Fix For: master (7.0) Solr's default response format is still XML, despite the fact that Solr has supported the JSON response format for over a decade, developer mindshare has clearly shifted toward JSON over the years, and most modern/competing systems also use JSON format now by default. In fact, Solr's admin UI even explicitly adds wt=json to the request (by default in the UI) to override the default of wt=xml, so Solr's Admin UI effectively has a different default than the API. We have now introduced things like the JSON faceting API, and the new more modern /V2 apis assume JSON for the areas of Solr they cover, so clearly we're moving in the direction of JSON anyway. I'd like propose that we switch the default response writer to JSON (wt=json) instead of XML for Solr 7.0, as this seems to me like the right direction and a good time to make this change with the next major version. Based upon feedback from the Lucene Dev's mailing list, we want to: 1) Change the default response writer type to "wt=json" and also change to "indent=on" by default 2) Make no changes on the update handler side; it already works as desired (it returns the response in the same content-type as the request unless the "wt" is passed in explicitly). 3) Keep the /query request handler around since people have already used it for years to do JSON queries 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks on how to change (back) the response format. The default format change, plus the addition of "indent=on" are back compat changes, so we need to make sure we doc those clearly in the CHANGES.txt. There will also need to be significant adjustments to the Solr Ref Guide, Tutorial, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Change Default Response Format (wt) to JSON in Solr 7.0?
Just wanted to throw this out there for discussion. Solr's default query response format is still XML, despite the fact that Solr has supported the JSON response format for over a decade, developer mindshare has clearly shifted toward JSON over the years, and most modern/competing systems also use JSON format now by default. In fact, Solr's admin UI even explicitly adds wt=json to the request (by default in the UI) to override the default of wt=xml, so Solr's Admin UI effectively has a different default than the API. We have now introduced things like the JSON faceting API, and the new more modern /V2 apis assume JSON for the areas of Solr they cover, so clearly we're moving in the direction of JSON anyway. I'd like propose that we switch the default response writer to JSON (wt=json) instead of XML for Solr 7.0, as this seems to me like the right direction and a good time to make this change with the next major version. Before I create a JIRA and submit a patch, though, I wanted to check here make sure there were no strong objections to changing the default. -Trey Grainger
[jira] [Commented] (SOLR-9529) Dates Dynamic Field Inconsistently Defined in Schemas
[ https://issues.apache.org/jira/browse/SOLR-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500294#comment-15500294 ] Trey Grainger commented on SOLR-9529: - Hmm... things were more inconsistent than I thought. There were two fundamental kinds of inconsistencies: 1) Inconsistencies within a single schema. --This is what I described in the issue description regarding "*_dts" being handled incorrectly. I submitted a pull request to fix this in the three places we actually define both singular and plural field types: solr/example/files/conf/managed-schema solr/server/solr/configsets/basic_configs/conf/managed-schema solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema 2) Inconsistencies across different schemas While the three schemas listed above all separate out single valued and multiValued dynamic fields into different singular and plural field types, every other schema that ships with Solr only defines a single field type (string, boolean, etc.) and uses the dynamic field definition to determine whether the dynamic field should be single or multivalued. This works fine, of course, but is just inconsistent depending upon which schema file you actually end up using. Interestingly, the tech products example (solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema), which sits at the same level as the basic_configs and the data_driven_schema_configs, for some reason handles these definitions differently, only defining one field type for both single and multivalued fields (for all types). The following places do the same thing: solr/core/src/test-files/solr/collection1/conf/schema-distrib-interval-faceting.xml solr/core/src/test-files/solr/collection1/conf/schema-docValuesFaceting.xml solr/core/src/test-files/solr/collection1/conf/schema-docValuesJoin.xml solr/core/src/test-files/solr/collection1/conf/schema-non-stored-docvalues.xml solr/core/src/test-files/solr/collection1/conf/schema_latest.xml solr/example/example-DIH/solr/db/conf/managed-schema solr/example/example-DIH/solr/mail/conf/managed-schema solr/example/example-DIH/solr/rss/conf/managed-schema solr/example/example-DIH/solr/solr/conf/managed-schema solr/example/example-DIH/solr/tika/conf/managed-schema So while my pull request fixes #1 so that all schemas are consistent with themselves, we still have inconsistency across the various schemas that ship with Solr in terms of what we name the field types for multivalued dynamic fields. If we are going to make these consistent, which way should we go - have a single field type for all single and multivalued fields (and define multivalued=true on the dynamic field definition instead), or separate out plural versions of the field type (booleans, strings, etc.) for multivalued fields? > Dates Dynamic Field Inconsistently Defined in Schemas > - > > Key: SOLR-9529 > URL: https://issues.apache.org/jira/browse/SOLR-9529 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Trey Grainger >Priority: Minor > > There is a nice convention across all of the schemas that ship with Solr to > include field types for single valued fields (i.e. "string" -> "*_s", > "boolean" -> "*_b") and separate field types for multivalued fields (i.e. > "strings" -> "*_ss", "booleans" -> "*_bs"). Those definitions all follow the > pattern (using "string" as an example): > > multiValued="true"/> > > > For some reason, however, the "date" field type doesn't follow this pattern, > and is instead defined (inconsistently) as follows: > precisionStep="0"/> > multiValued="true" precisionStep="0"/> > > stored="true"/> > Note specifically that the "*_dts" field should instead be referencing the > "dates" type and not the "date" type, and that subsequently the > multiValued="true" setting would become unnecessary on the "*_dts" > dynamicField definition. > I'll get a patch posted for this. Note that nothing is functionally broken, > it's just inconsistent and could be confusing for someone looking through the > schema or seeing their multivalued dates getting indexed into the field type > defined for single valued dates. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9529) Dates Dynamic Field Inconsistently Defined in Schemas
Trey Grainger created SOLR-9529: --- Summary: Dates Dynamic Field Inconsistently Defined in Schemas Key: SOLR-9529 URL: https://issues.apache.org/jira/browse/SOLR-9529 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Trey Grainger Priority: Minor There is a nice convention across all of the schemas that ship with Solr to include field types for single valued fields (i.e. "string" -> "*_s", "boolean" -> "*_b") and separate field types for multivalued fields (i.e. "strings" -> "*_ss", "booleans" -> "*_bs"). Those definitions all follow the pattern (using "string" as an example): For some reason, however, the "date" field type doesn't follow this pattern, and is instead defined (inconsistently) as follows: Note specifically that the "*_dts" field should instead be referencing the "dates" type and not the "date" type, and that subsequently the multiValued="true" setting would become unnecessary on the "*_dts" dynamicField definition. I'll get a patch posted for this. Note that nothing is functionally broken, it's just inconsistent and could be confusing for someone looking through the schema or seeing their multivalued dates getting indexed into the field type defined for single valued dates. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)
[ https://issues.apache.org/jira/browse/SOLR-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-9480: Attachment: SOLR-9480.patch Initial patch to get the ball rolling here. Feature should now work as described in reference links in the description. Only real changes are an update from Solr 5.1.0 to master, and cleanup of most of the precommit issues. Still plenty of work to do, particularly in reworking some of the multi-threading code to follow Solr conventions, reducing the number of files for helper classes, and eventually getting this working correctly in distributed mode (was originally built for use cases involving a single Solr core as a "representative model"). Would also be good to make a getting started tutorial with example data so its easier get started with the feature and do something interesting out of the box. Will continue working on those items as I'm able. Feedback welcome. > Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph) > -- > > Key: SOLR-9480 > URL: https://issues.apache.org/jira/browse/SOLR-9480 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Trey Grainger > Attachments: SOLR-9480.patch > > > This issue is to track the contribution of the Semantic Knowledge Graph Solr > Plugin (request handler), which exposes a graph-like interface for > discovering and traversing significant relationships between entities within > an inverted index. > This data model has been described in the following research paper: [The > Semantic Knowledge Graph: A compact, auto-generated model for real-time > traversal and ranking of any relationship within a > domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave > in October 2015 at [Lucene/Solr > Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine] > and November 2015 at the [Bay Area Search > Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/]. > The source code for this project is currently available at > [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at > CareerBuilder (where this was built) have given me the go-ahead to now > contribute this back to the Apache Solr Project, as well. > Check out the Github repository, research paper, or presentations for a more > detailed description of this contribution. Initial patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)
Trey Grainger created SOLR-9480: --- Summary: Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph) Key: SOLR-9480 URL: https://issues.apache.org/jira/browse/SOLR-9480 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Trey Grainger This issue is to track the contribution of the Semantic Knowledge Graph Solr Plugin (request handler), which exposes a graph-like interface for discovering and traversing significant relationships between entities within an inverted index. This data model has been described in the following research paper: [The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave in October 2015 at [Lucene/Solr Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine] and November 2015 at the [Bay Area Search Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/]. The source code for this project is currently available at [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at CareerBuilder (where this was built) have given me the go-ahead to now contribute this back to the Apache Solr Project, as well. Check out the Github repository, research paper, or presentations for a more detailed description of this contribution. Initial patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343254#comment-15343254 ] Trey Grainger commented on SOLR-6492: - Hi [~krantiparisa] and [~dannytei1]. Apologies for the long lapse without a response on this issue. I won't get into the reasons here (combination of personal and professional commitments), but I just wanted to say that I expect to pick this issue back up in the near future and continue work on this patch. In the meantime, I have added an ASL 2.0 license to the current code (from Solr in Action) so that folks can feel free to use what's there now: https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14 I'll turn what's there now into a patch, update it to Solr trunk, and keep iterating on it until the folks commenting on this issue are satisfied with the design and capabilities. Stay tuned... > Solr field type that supports multiple, dynamic analyzers > - > > Key: SOLR-6492 > URL: https://issues.apache.org/jira/browse/SOLR-6492 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis >Reporter: Trey Grainger > Fix For: 5.0 > > > A common request - particularly for multilingual search - is to be able to > support one or more dynamically-selected analyzers for a field. For example, > someone may have a "content" field and pass in a document in Greek (using an > Analyzer with Tokenizer/Filters for German), a separate document in English > (using an English Analyzer), and possibly even a field with mixed-language > content in Greek and English. This latter case could pass the content > separately through both an analyzer defined for Greek and another Analyzer > defined for English, stacking or concatenating the token streams based upon > the use-case. > There are some distinct advantages in terms of index size and query > performance which can be obtained by stacking terms from multiple analyzers > in the same field instead of duplicating content in separate fields and > searching across multiple fields. > Other non-multilingual use cases may include things like switching to a > different analyzer for the same field to remove a feature (i.e. turning > on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9241) Rebalance API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343228#comment-15343228 ] Trey Grainger commented on SOLR-9241: - I'm also very excited to see this patch. For the next evolution of Solr's scalability (and ultimately auto-scaling), these are exactly the kinds of core capabilities we need for seamlessly scaling up/down, resharding, and redistributing shards and replicas across a cluster. The smart merge looks interesting - seems like effectively a way to index into a larger number of shards (for indexing throughput) while merging them into a smaller number of shards for searching, enabling scaling of indexing and searching resourced independently. This obviously won't work well with Near-Realtime Searching, but I'd be curious to hear more explanation about how this works in practice for SolrCloud clusters that don't need NRT search. Agreed with Joel's comments about the update to trunk vs. 4.6.1. One thing that seems to have been added since 4.6.1 that probably overlaps with this patch is the Replica Placement Strategies (SOLR-6220) vs. the Allocation Strategies implemented here. The rest of the patch seems like all new objects that don't overlap much with the current code base. Would be interesting to know how much has changed between 4.6.1 to 6.1 collections/SolrCloud-wise that would create conflicts with this patch. Am obviously hoping not too much... Either way, very excited about the contribution and about the potential for getting these capabilities integrated into Solr. > Rebalance API for SolrCloud > --- > > Key: SOLR-9241 > URL: https://issues.apache.org/jira/browse/SOLR-9241 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.6.1 > Environment: Ubuntu, Mac OsX >Reporter: Nitin Sharma > Labels: Cluster, SolrCloud > Fix For: 4.6.1 > > Attachments: rebalance.diff > > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > This is the v1 of the patch for Solrcloud Rebalance api (as described in > http://engineering.bloomreach.com/solrcloud-rebalance-api/) , built at > Bloomreach by Nitin Sharma and Suruchi Shah. The goal of the API is to > provide a zero downtime mechanism to perform data manipulation and efficient > core allocation in solrcloud. This API was envisioned to be the base layer > that enables Solrcloud to be an auto scaling platform. (and work in unison > with other complementing monitoring and scaling features). > Patch Status: > === > The patch is work in progress and incremental. We have done a few rounds of > code clean up. We wanted to get the patch going first to get initial feed > back. We will continue to work on making it more open source friendly and > easily testable. > Deployment Status: > > The platform is deployed in production at bloomreach and has been battle > tested for large scale load. (millions of documents and hundreds of > collections). > Internals: > = > The internals of the API and performance : > http://engineering.bloomreach.com/solrcloud-rebalance-api/ > It is built on top of the admin collections API as an action (with various > flavors). At a high level, the rebalance api provides 2 constructs: > Scaling Strategy: Decides how to move the data. Every flavor has multiple > options which can be reviewed in the api spec. > Re-distribute - Move around data in the cluster based on capacity/allocation. > Auto Shard - Dynamically shard a collection to any size. > Smart Merge - Distributed Mode - Helps merging data from a larger shard setup > into smaller one. (the source should be divisible by destination) > Scale up - Add replicas on the fly > Scale Down - Remove replicas on the fly > Allocation Strategy: Decides where to put the data. (Nodes with least > cores, Nodes that do not have this collection etc). Custom implementations > can be built on top as well. One other example is Availability Zone aware. > Distribute data such that every replica is placed on different availability > zone to support HA. > Detailed API Spec: > > https://github.com/bloomreach/solrcloud-rebalance-api > Contributors: > = > Nitin Sharma > Suruchi Shah > Questions/Comments: > = > You can reach me at nitin.sha...@bloomreach.com -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Porting LTR plugin for Solr-5.5.0
Ahmet, They included at 5x patch on the JIRA here: https://issues.apache.org/jira/secure/attachment/12782146/SOLR-8542-branch_5x.patch (it's one of the files attached to the Jira). The JIRA has two patches included on it, one for master (approximately 6.0), and the one I just linked to for the 5x branch. Assuming you checkout branch 5x (which should be approximately the same code 5.5.0), then I would assume the Bloomberg patch would work. I've personally also back-ported it to 5.4.1, which required a fair number of changes related to iterator changes on Scorers, but wasn't too much trouble. Hopefully the patch above gives you what you need. Trey Grainger SVP of Engineering @ Lucidworks Co-author, Solr in Action On Wed, Apr 6, 2016 at 6:21 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 4/6/2016 1:55 PM, Ahmet Anil Pala wrote: > > Hi Michael, thanks for answering and sorry for the late reply. > > > > By 5.x branch build, do you mean this > > -> https://github.com/bloomberg/lucene-solr/tree/branch_5x > > > > It seems that LTR is not merged onto it and I doubt it is mergable > > without changes as this is exactly what I have tried and failed. As > > for your pull request SOLR-8542, it is supposed to be merged to the > > master branch which is already the version 7.0.0. Is the LTR plugin > > originally developed only compatible with Solr-6x and later or am I > > missing something here? > > I think that something like this will not make it into 5.x. With the > 6.0.0 release just around the corner (probably a matter of days), 5.x > goes into maintenance mode, and 4.x basically goes dormant. Maintenance > mode basically means that no significant changes will happen, especially > changes that might affect stability. A strict interpretation of this is > "no new features", and this is typically the stance that most committers > adopt for the previous major version. It also means that only *major* > bugs will be fixed, especially as time goes on. > > You've indicated that you couldn't merge it cleanly to branch_5x. If > the change requires significant work just to merge, and the > functionality is already present in a later release, it's probably not > going to happen. > > There are some important bugfixes that have already been committed to > 5x, which should result in a 5.5.1 release in the near future, and it is > entirely possible that somebody will volunteer to release 5.6.0, > although the current changelog for 5.6.0 only has two issues, and > neither is particularly interesting for most users. > > If somebody wants to volunteer to do the work, then what I've said might > not apply at all. > > Thanks, > Shawn > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Updated] (SOLR-8626) [ANGULAR] 404 error when clicking nodes in cloud graph view
[ https://issues.apache.org/jira/browse/SOLR-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-8626: Attachment: SOLR-8626.patch Attached a patch which fixes this issue. The issue existed in both the flat graph view and the radial view. Additionally, when one was in the radial view and clicked on the link for a node, it would switch back to flat graph view when navigating to the other node, so fixed that so that it preserves the user's current view type on the URL when navigating between node. > [ANGULAR] 404 error when clicking nodes in cloud graph view > --- > > Key: SOLR-8626 > URL: https://issues.apache.org/jira/browse/SOLR-8626 > Project: Solr > Issue Type: Bug > Components: UI >Reporter: Jan Høydahl >Assignee: Upayavira > Attachments: SOLR-8626.patch > > > h3. Reproduce: > # {{bin/solr start -c}} > # {{bin/solr create -c mycoll}} > # Goto http://localhost:8983/solr/#/~cloud > # Click a collection name in the graph -> 404 error. URL: > {{/solr/mycoll/#/~cloud}} > # Click a shard name in the graph -> 404 error. URL: {{/solr/shard1/#/~cloud}} > Only verified in Trunk, but probably exists in 5.4 as well -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-4905) Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes
Just to add another voice to the discussion, I have the exact same use case described by Paul and Mikhail that I'm working through a Proof of Concept for right now. I'd very much like to see the "single shard collection with a replica on all nodes" restriction removed. On Thu, Nov 12, 2015, 3:29 PM Mikhail Khludnev (JIRA)wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002865#comment-15002865 > ] > > Mikhail Khludnev commented on SOLR-4905: > > > [~p...@search-solutions.net] would you mind to raise a separate jira for > this? > > > Allow fromIndex parameter to JoinQParserPlugin to refer to a > single-sharded collection that has a replica on all nodes > > > -- > > > > Key: SOLR-4905 > > URL: https://issues.apache.org/jira/browse/SOLR-4905 > > Project: Solr > > Issue Type: Improvement > > Components: SolrCloud > >Reporter: Philip K. Warren > >Assignee: Timothy Potter > > Fix For: 5.1, Trunk > > > > Attachments: SOLR-4905.patch, SOLR-4905.patch, patch.txt > > > > > > Using a non-SolrCloud setup, it is possible to perform cross core joins ( > http://wiki.apache.org/solr/Join). When testing with SolrCloud, however, > neither the collection name, alias name (we have created aliases to > SolrCloud collections), or the automatically generated core name (i.e. > _shard1_replica1) work as the fromIndex parameter for a > cross-core join. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346292#comment-14346292 ] Trey Grainger commented on SOLR-6492: - Hi Kranti, The design almost exactly as you described when you said have analysis chains defined in schema.xml and these chains could be resued between multiple fields and on each field there should be a way to conditionally chose the analysis chain. Specifically, each analysis chain is just defined as a FieldType, like you would define any analysis chain you were going assign to a field. What I hadn't considered yet, however, was having the update processor choose choose the analyzers based upon a value in another field. I had previously only been considering the case where a user would either: 1) Use an automatic language identifier update processor, or 2) Pass the language in directly in the content of the field. (i.e. field name=my_fielden,es|document content here/field). Having the ability to specify the key for the analyzers in a different field would probably be more user friendly, and this would be trivial to implement, so I can look to add it. Something like this: field name=my_fielddocument content here/field field name=languageen/field field name=languagees/field Is that what you were hoping for? Solr field type that supports multiple, dynamic analyzers - Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 5.0 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190715#comment-14190715 ] Trey Grainger commented on SOLR-6492: - Hi Sharon, Your question was which code that will parse df=someMultiTextField|en,de and decide which analysis chain to use. In short, since FieldTypes have access to the schema but Analyzers and Tokenizers don't, I'm creating a new FieldType which passes the schema into a new Analyzer, which can then pass the schema into the new Tokenizer. When the Tokenizer is used, the fieldname (string) and value (reader) are passed in, so it is possible to pull the metadata (|en,de) off of either of these and dynamically choose a new analysis chain analyzer from the schema at that time. I've done this work already for pulling data out of the field content (so I know that works), but pulling the metadata from the fieldname is still pending (I'm hoping to work on it this weekend). If you want to see what I've done thusfar, you can look on github at MultiTextField, MultiTextFieldAnalyzer, and MultiTextFieldTokenizer: https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextField.java https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldAnalyzer.java https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldTokenizer.java I have some questions / feedback on your proposed solution... I'm hopping on a plane now but will post them later tonight. Thanks, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @CareerBuilder On Thu, Oct 30, 2014 at 7:32 AM, Sharon Krisher (JIRA) j...@apache.org Solr field type that supports multiple, dynamic analyzers - Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 5.0 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191418#comment-14191418 ] Trey Grainger commented on SOLR-6492: - Hi Sharon, In terms of your suggestion, I do think that using local params to pass in the language could be a more user-friendly solution than requiring them to put the params on the field name: i.e. q={!langs=en|de}hello worlddf=text vs. q=hello worlddf=text|en,de, though the syntax may get a bit weird if you want to specify different languages for different fields. For example, if using the edsimax query parser, you would need to do something like q={!langs=text1:en,de|text2:en,zh}hello worldqf=text1 text2 vs. just q=hello worldqf=text1|en,de text2|en|zh. For the most simple use-case (every field uses the same language), or for the use-case where you don't know what fields the user is querying on up-front, I think the local params syntax would be preferred for end-users. There is a big down-side to doing this, however: it requires you to implement a qparser to parse this data and put it somewhere that the Analyzer can see. This means that your multi-lingual field would only be searchable with your custom query parser (whereas if the determination of the language is passed in as part of the field name or content as I described, it should work seamlessly with all of the query parsers, since the data gets passed through all the way to the Analyzer). Your solution with the ThreadLocal storage of the data is interesting... I'm not positive whether it will work or not (i.e. does the analyzer always run on the same thread as the incoming request for both queries and indexing, and will that also continue to be the case into the future)? I know that threads are at least re-used across requests and that the TokenStreamComponents for analyzers are re-used in a threadlocal pool, but that just means you'd have to be very careful about not caching or reusing languages across requests, not that it couldn't work. Also, just out of curiosity, how do you plan to pass the languages in at index time? The Analyzer/Tokenizers only accept the fieldname (string) and the field content (reader) as parameters, so passing in additional parameters through a threadlocal seems like a bit of a hack that violates the design there (though arguably that design is too restrictive and should change). I'd be curious if anyone else thinks this would work... Thanks, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @CareerBuilder Solr field type that supports multiple, dynamic analyzers - Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 5.0 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
Trey Grainger created SOLR-6492: --- Summary: Solr field type that supports multiple, dynamic analyzers Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 4.11 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258 ] Trey Grainger commented on SOLR-6492: - I previously implemented this field type when writing chapter 14 of _Solr in Action_, but I would like to make some improvements and then submit the code back to Solr to (hopefully) be committed. The current code from _Solr in Action_ can be found here: [https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14] To use the current version, you would do the following: 1) Add the following to schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ * field name=someMultiTextField type=multiText indexed=true multiValued=true / *note that text_spanish, text_english, text_french, and text_german refer to field types which are defined elsewhere in the schema.xml: 2) Index a document with a field containing multilingual text using syntax like one of the following: field name=someMultiTextFieldsome text/field ** field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field **uses the default analyzer 3) submit a query specifying which language you want to query in: /select?q=someMultiTextField:en,de|keyword_goes_here -- Improvements to be made before the patch is finalized: 1) Make it possible to specify the field type mappings in the field name instead of the field value: field name=someMultiTextFieldde,fr|some other text/field /select?q=a bunch of keywords heredf=someMultiTextField|en,de This makes querying easier, because the languages can be detected prior to parsing of the query, which prevents prefixes from having to be substituted on each query term (which is cost-prohibitive for most because it effectively means pre-parsing the query before it goes to Solr). 2) Enable support for switching between stacking token streams from each analyzer (good default because it mostly respects position increments across languages and minimizes duplicate tokens in the index) and concatenating token streams. 3) Possibly add the ability to switch analyzers in the middle of input text: field name=someMultiTextFieldde,fr|some other el|text/field 4) Extensive unit testing Solr field type that supports multiple, dynamic analyzers - Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 4.11 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258 ] Trey Grainger edited comment on SOLR-6492 at 9/8/14 11:55 PM: -- I previously implemented this field type when writing chapter 14 of _Solr in Action_, but I would like to make some improvements and then submit the code back to Solr to (hopefully) be committed. The current code from _Solr in Action_ can be found here: [https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14] To use the current version, you would do the following: 1) Add the following to schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ * field name=someMultiTextField type=multiText indexed=true multiValued=true / *note that text_spanish, text_english, text_french, and text_german refer to field types which are defined elsewhere in the schema.xml: 2) Index a document with a field containing multilingual text using syntax like one of the following: field name=someMultiTextFieldsome text/field ** field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field **uses the default analyzer 3) submit a query specifying which language you want to query in: /select?q=someMultiTextField:en,de|keyword_goes_here -- Improvements to be made before the patch is finalized: 1) Make it possible to specify the field type mappings in the field name instead of the field value: field name=someMultiTextField|de,frsome other text/field /select?q=a bunch of keywords heredf=someMultiTextField|en,de This makes querying easier, because the languages can be detected prior to parsing of the query, which prevents prefixes from having to be substituted on each query term (which is cost-prohibitive for most because it effectively means pre-parsing the query before it goes to Solr). 2) Enable support for switching between stacking token streams from each analyzer (good default because it mostly respects position increments across languages and minimizes duplicate tokens in the index) and concatenating token streams. 3) Possibly add the ability to switch analyzers in the middle of input text: field name=someMultiTextFieldde,fr|some other el|text/field 4) Extensive unit testing was (Author: solrtrey): I previously implemented this field type when writing chapter 14 of _Solr in Action_, but I would like to make some improvements and then submit the code back to Solr to (hopefully) be committed. The current code from _Solr in Action_ can be found here: [https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14] To use the current version, you would do the following: 1) Add the following to schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ * field name=someMultiTextField type=multiText indexed=true multiValued=true / *note that text_spanish, text_english, text_french, and text_german refer to field types which are defined elsewhere in the schema.xml: 2) Index a document with a field containing multilingual text using syntax like one of the following: field name=someMultiTextFieldsome text/field ** field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field **uses the default analyzer 3) submit a query specifying which language you want to query in: /select?q=someMultiTextField:en,de|keyword_goes_here -- Improvements to be made before the patch is finalized: 1) Make it possible to specify the field type mappings in the field name instead of the field value: field name=someMultiTextFieldde,fr|some other text/field /select?q=a bunch of keywords heredf=someMultiTextField|en,de This makes querying easier, because the languages can be detected prior to parsing of the query, which prevents prefixes from having to be substituted on each query term (which is cost-prohibitive for most because it effectively means pre-parsing the query before it goes to Solr). 2) Enable support for switching between stacking token streams from each analyzer (good default because it mostly respects position increments across languages and minimizes duplicate tokens in the index
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980597#comment-13980597 ] Trey Grainger commented on SOLR-2894: - [~markrmil...@gmail.com] said: We should get this in to get more feedback. Wish I had some time to tackle it, but I won't in the near term. Is there a committer who has interest in this issue and would be willing to look over it for (hopefully) getting it pushed into trunk? It's the top voted for and the top watched issue in Solr right now, so there's clearly a lot of community interest. Thanks! Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.9, 5.0 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980662#comment-13980662 ] Trey Grainger commented on SOLR-2894: - Hi [~otis], I appreciate your interest here. That's correct: no previously working behavior was changed, and there are two things added with this patch: 1) distributed support, and 2) support for a single-level pivot facets (this previously threw an exception but is now supported: facet.pivot=aSingleFieldName). For context on #2, we found no good reason to disallow a single-level pivot facet (functions like to a field facet but with the pivot facet output format), it made implementing distributed pivot faceting easier since a single level could be considered when refining, and there was work in some downstream issues like SOLR-3583 (adding percentiles and other stats to pivot facets) which was dependent upon being able to easily alternate between any number of facet levels for analytics purposes, so we just added the support for a single level. This also makes it easier to build analytics tools without having to arbitrarily alternate between field facets and pivot facets and their corresponding output formats based upon the number of levels. The end result is that no previously working capabilities have been modified, but distributed support for any number of pivot levels has been added. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.9, 5.0 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980662#comment-13980662 ] Trey Grainger edited comment on SOLR-2894 at 4/25/14 3:54 AM: -- Hi Otis, I appreciate your interest here. That's correct: no previously working behavior was changed, and there are two things added with this patch: 1) distributed support, and 2) support for a single-level pivot facets (this previously threw an exception but is now supported: facet.pivot=aSingleFieldName). For context on #2, we found no good reason to disallow a single-level pivot facet (functions like a field facet but with the pivot facet output format), it made implementing distributed pivot faceting easier since a single level could be considered when refining, and there was work in some downstream issues like SOLR-3583 (adding percentiles and other stats to pivot facets) which was dependent upon being able to easily alternate between any number of facet levels for analytics purposes, so we just added the support for a single level. This also makes it easier to build analytics tools without having to arbitrarily alternate between field facets and pivot facets and their corresponding output formats based upon the number of levels. The end result is that no previously working capabilities have been modified, but distributed support for any number of pivot levels has been added, which should make this safe to commit to trunk. was (Author: solrtrey): Hi [~otis], I appreciate your interest here. That's correct: no previously working behavior was changed, and there are two things added with this patch: 1) distributed support, and 2) support for a single-level pivot facets (this previously threw an exception but is now supported: facet.pivot=aSingleFieldName). For context on #2, we found no good reason to disallow a single-level pivot facet (functions like to a field facet but with the pivot facet output format), it made implementing distributed pivot faceting easier since a single level could be considered when refining, and there was work in some downstream issues like SOLR-3583 (adding percentiles and other stats to pivot facets) which was dependent upon being able to easily alternate between any number of facet levels for analytics purposes, so we just added the support for a single level. This also makes it easier to build analytics tools without having to arbitrarily alternate between field facets and pivot facets and their corresponding output formats based upon the number of levels. The end result is that no previously working capabilities have been modified, but distributed support for any number of pivot levels has been added. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.9, 5.0 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973094#comment-13973094 ] Trey Grainger commented on SOLR-2894: - After nearly 2 years of on-and-off development, I think this patch is finally ready to be committed. Brett's most recent patch includes significant performance improvements as well as fixes to all of the reported issues and edge cases mentioned by the others currently using this patch. We have just finished a large spike of work to get this ready for commit, so I'd love to get it pushed in soon unless there are any objections. [~ehatcher], do you have any time to review this for suitability to be committed (since you are the reporter)? If there is anything additional that needs to be changed, I'll happily sign us up (either myself or someone on my team at CareerBuilder) to do it it will help. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.9, 5.0 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Tim Potter as Lucene/Solr committer
Congrats, Tim! Very, very awesome. Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @ CareerBuilder On Tue, Apr 8, 2014 at 10:11 AM, Timothy Potter thelabd...@gmail.comwrote: This is awesome! Thank you, what an honor to be working with such an amazing group of engineers. bio: I work at LucidWorks focusing most of my time on Solr. Most recently, I've been focused on testing / hardening SolrCloud in a large-scale cluster to support 100's of collections and billions of docs. I'm working on SOLR-5495 and 5468 and hope to contribute more to the unit/integration tests for SolrCloud in the coming months. I've also worked with Steve Rowe on the RestManager stuff coming in 4.8 (SOLR-5653). Prior to LucidWorks, I was an architect on the Big Data team at Dachis Group, where I focused on large-scale machine learning, text mining, and social network analysis problems. At Dachis Group, I designed and operated a 36-node SolrCloud cluster (~900M docs) running in AWS. I dabble in dev-ops. Lastly, I'm the co-author of Solr in Action with Trey. https://www.linkedin.com/in/thelabdude Cheers, Tim On Mon, Apr 7, 2014 at 10:40 PM, Steve Rowe sar...@gmail.com wrote: I'm pleased to announce that Tim Potter has accepted the PMC's invitation to become a committer. Tim, it's tradition that you introduce yourself with a brief bio. Once your account has been created - could take a few days - you'll be able to add yourself to the committers section of the Who We Are page on the website: http://lucene.apache.org/whoweare.html (use the ASF CMS bookmarklet at the bottom of the page here: https://cms.apache.org/#bookmark - more info here http://www.apache.org/dev/cms.html). Check out the ASF dev page - lots of useful links: http://www.apache.org/dev/. Congratulations and welcome! Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939836#comment-13939836 ] Trey Grainger commented on SOLR-5856: - Hi Steve - thanks so much for getting this committed so quickly! Everything looks great, except for the 4 book layout in the slideshow doesn't render well for me in Chrome on either Windows or a Mac (the fourth book wraps to the next line). IE, Firefox, and Safari all looked good, though. https://www.dropbox.com/s/hkcz8xzxtgfvexw/4Books.png I'd guess other Chrome users are likely seeing the same thing. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Assignee: Steve Rowe Priority: Minor Fix For: 4.8 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage
A few minutes after I sent my e-mail the page started rendering correctly in Chrome on both my windows and mac computers, so either this was just fixed or there was some temporary weirdness (perhaps on my end). At any rate, it looks good for me now. Thanks! On Tue, Mar 18, 2014 at 5:36 PM, Trey Grainger (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939836#comment-13939836] Trey Grainger commented on SOLR-5856: - Hi Steve - thanks so much for getting this committed so quickly! Everything looks great, except for the 4 book layout in the slideshow doesn't render well for me in Chrome on either Windows or a Mac (the fourth book wraps to the next line). IE, Firefox, and Safari all looked good, though. https://www.dropbox.com/s/hkcz8xzxtgfvexw/4Books.png I'd guess other Chrome users are likely seeing the same thing. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Assignee: Steve Rowe Priority: Minor Fix For: 4.8 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933162#comment-13933162 ] Trey Grainger commented on SOLR-5856: - Hi Alexandre, I agree with you. It looks like there are two Solr 3.x books, and the older one has already been previously cut from the rotating slideshow. At this point, I think the other 3.x book is going to have to be bumped. The good news is that those authors are working on a 4.x refresh that should be released in a few months, so they'll likely be back up there soon. Of course, all of the books are still on the books page, just not in the Latest books published about Apache Solr list in the header slideshow. The patch I included makes bumps the 3x book and inserts Solr in Action. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934381#comment-13934381 ] Trey Grainger commented on SOLR-5856: - That makes sense... I agree that is probably a better user experience to link to the books page. I'll update all of the slideshow links to point to the books page and resubmit the patch shortly. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-5856: Attachment: SOLR-5856.patch This updated patch modifies the slideshow to link to the books.html page as opposed to going directly to the Publisher's page (as requested by Hoss and Uwe). In order to make the site more consistent (since we're now making more than just the change to add Solr in Action), I also made the images for each of the books on the books.html page also clickable as a link to the publisher's page in order to increase the likelihood of a click-through. One of the books already did this, but it was missing on the others, and this is one of the things visitors are probably most likely to click on to try to get the book. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934528#comment-13934528 ] Trey Grainger commented on SOLR-5856: - @Alexandre, Yeah, making the homepage links go to a secondary books page probably will detract from both SEO and sales, but it's a better user experience for those visiting the Solr homepage, no? One silver lining is that it makes the books page more prominent by still having the recent book pictures on the homepage linking over the books page, making it easier to find and compare each of the different books. @Hossman Thanks for tentatively signing up to commit this. If you see anything else that needs changing, please let me know and I'd be happy to put together another patch. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5856) Add new Solr book to the Solr homepage
Trey Grainger created SOLR-5856: --- Summary: Add new Solr book to the Solr homepage Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-5856: Attachment: SOLR-5856.patch Patch attached. Uploading the image separately. Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5856.patch A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage
[ https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-5856: Attachment: book_sia.png Add new Solr book to the Solr homepage -- Key: SOLR-5856 URL: https://issues.apache.org/jira/browse/SOLR-5856 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.7 Environment: https://lucene.apache.org/solr/ Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5856.patch, book_sia.png A new Solr book (Solr in Action) has just been published by Manning publications (release date 3/15). I am providing the patch to update the website pages corresponding to the slideshow on https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html . The patch has updates to html/text files and there is a binary image file as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stats vs Analytics
Just to add more discussion to the mix, we're also building/using this at CareerBuilder: Percentiles for facets, pivot facets, and distributed pivot facets https://issues.apache.org/jira/browse/SOLR-3583 It is an extension to (distributed pivot) faceting that allows stats to be collected within the faceting component. We built it with the following needs: 1) Supports pivot faceting (stats at each level) 2) Supports distributed statistical operations If you look at slide 41 of this presentation, you'll get a really good feel for what this patch does: http://www.slideshare.net/treygrainger/building-a-real-time-big-data-analytics-platform-with-solr The primary focus initially was on calculating percentiles of numerical values in a distributed way (using bucketing similar to range faceting), but we are also in the process of adding distributed sum. Other distributable calculations are possible, we just haven't needed them yet so we haven't added them. -Trey On Tue, Feb 11, 2014 at 2:24 PM, Steve Molloy smol...@opentext.com wrote: Trying to make sense of all issues around this and not sure which way to go. Both Stats and Analytics component are missing some features I would need. Stats cannot limit or order facets for instance, and I'd like to see pivot support. On the other end Analytics doesn't support distribution at all, which is a must in my case. So, I guess what I'm trying to ask is whether I should look at extending Stats or Analytics? Which way is the community going for future releases? (Would share any extension, but that would be useless if done on the wrong component). Thanks, Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894547#comment-13894547 ] Trey Grainger commented on SOLR-2894: - FYI, the last distributed pivot facet patch functionally works, but there are some sub-optimal data structures being used and some unnecessary duplicate processing of values. As a result, we found that for certain worst-case scenarios (i.e. data is not randomly distributed across Solr cores and requires significant refinement) pivot facets with multiple levels could take over a minute to aggregate and process results. This was using a dataset of several hundred million documents and dozens of pivot facets across 120 Solr cores distributed over 20 servers, so it is a more extreme use-case than most will encounter. Nevertheless, we've refactored the code and data structures and brought the processing time from over a minute down to less than a second using the above configuration. We plan to post the patch within the next week. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.7 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895413#comment-13895413 ] Trey Grainger commented on SOLR-2894: - Thanks, Yonik. I worked on the architecture and design, but it's really been a team effort by several of us at CB. Chris worked with the initial patch, Andrew hardened it, and Brett (who will post the next version) focused on the soon-to-be-posted performance optimizations. We're deploying the new version to production right now to sanity check it before posting the patch, but I think the upcoming version will finally be ready for review for committing. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.7 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839665#comment-13839665 ] Trey Grainger commented on SOLR-5027: - Interesting. I've been playing around with the Collapsing QParser and, because of the reason Gabe mentioned, I can think very few use cases for it in it's current implementation. Specifically, because there is no way to break a tie between multiple documents with the same value (the way sorting does), a search that is sorted by score desc, modifieddt desc (newer documents break the tie) is not possible... it just collapses based upon the first document in the index with the duplicate score. Many of my use cases are even trickier... something like sort by displaypriority desc, score desc, modifieddt desc. Just brainstorming here, but if sorting documents before collapsing is not possible (due to where in the code stack the collapsing occurs), then it might be possible to just implement a sort function (ValueSource) that gave an ordinal score to each document based upon the position it would occur within all documents. If I understand what you mean when you say group head selection based upon the min/max of the function, then this would effectively allow collapsing sorted values, because the sort function would return higher values for documents which would sort higher. In that case, the sort function (which could read in the current sort parameter from the search request) could even be the default used by collapsing, since that is probably what user's are expecting to happen (this is consistent with how grouping works, for example). Thoughts? Field Collapsing PostFilter --- Key: SOLR-5027 URL: https://issues.apache.org/jira/browse/SOLR-5027 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.6, 5.0 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch This ticket introduces the *CollapsingQParserPlugin* The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. This is a high performance alternative to standard Solr field collapsing (with *ngroups*) when the number of distinct groups in the result set is high. For example in one performance test, a search with 10 million full results and 1 million collapsed groups: Standard grouping with ngroups : 17 seconds. CollapsingQParserPlugin: 300 milli-seconds. Sample syntax: Collapse based on the highest scoring document: {code} fq=(!collapse field=field_name} {code} Collapse based on the min value of a numeric field: {code} fq={!collapse field=field_name min=field_name} {code} Collapse based on the max value of a numeric field: {code} fq={!collapse field=field_name max=field_name} {code} Collapse with a null policy: {code} fq={!collapse field=field_name nullPolicy=null_policy} {code} There are three null policies: ignore : removes docs with a null value in the collapse field (default). expand : treats each doc with a null value in the collapse field as a separate group. collapse : collapses all docs with a null value into a single group using either highest score, or min/max. The CollapsingQParserPlugin also fully supports the QueryElevationComponent *Note:* The July 16 patch also includes and ExpandComponent that expands the collapsed groups for the current search result page. This functionality will be moved to it's own ticket. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839754#comment-13839754 ] Trey Grainger commented on SOLR-5027: - Thinking more about this more, it's probably going to be hard to implement an efficient sort ValueSource, as it would probably have to loop through all docs in the index during construction and sort them, caching the sort order for all docs so that it is available later when the value for each document is asked for separately. It would probably functionally work, but it seems like there's got to be a better way in the Collapse QParser itself... Field Collapsing PostFilter --- Key: SOLR-5027 URL: https://issues.apache.org/jira/browse/SOLR-5027 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.6, 5.0 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch This ticket introduces the *CollapsingQParserPlugin* The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. This is a high performance alternative to standard Solr field collapsing (with *ngroups*) when the number of distinct groups in the result set is high. For example in one performance test, a search with 10 million full results and 1 million collapsed groups: Standard grouping with ngroups : 17 seconds. CollapsingQParserPlugin: 300 milli-seconds. Sample syntax: Collapse based on the highest scoring document: {code} fq=(!collapse field=field_name} {code} Collapse based on the min value of a numeric field: {code} fq={!collapse field=field_name min=field_name} {code} Collapse based on the max value of a numeric field: {code} fq={!collapse field=field_name max=field_name} {code} Collapse with a null policy: {code} fq={!collapse field=field_name nullPolicy=null_policy} {code} There are three null policies: ignore : removes docs with a null value in the collapse field (default). expand : treats each doc with a null value in the collapse field as a separate group. collapse : collapses all docs with a null value into a single group using either highest score, or min/max. The CollapsingQParserPlugin also fully supports the QueryElevationComponent *Note:* The July 16 patch also includes and ExpandComponent that expands the collapsed groups for the current search result page. This functionality will be moved to it's own ticket. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5524) Exception when using Query Function inside Scale Function
Trey Grainger created SOLR-5524: --- Summary: Exception when using Query Function inside Scale Function Key: SOLR-5524 URL: https://issues.apache.org/jira/browse/SOLR-5524 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Trey Grainger Priority: Minor Fix For: 4.7 If you try to use the query function inside the scale function, it throws the following exception: org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo Here is an example request that invokes this: http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5524) Exception when using Query Function inside Scale Function
[ https://issues.apache.org/jira/browse/SOLR-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837319#comment-13837319 ] Trey Grainger commented on SOLR-5524: - I just debugged the code and uncovered the problem. There is a Map (called context) that is passed through to each value source to store intermediate state, and both the query and scale functions are passing the ValueSource for the query function in as the KEY to this Map (as opposed to using some composite key that makes sense in the current context). Essentially, these lines are overwriting each other: Inside ScaleFloatFunction: context.put(this.source, scaleInfo); //this.source refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object Inside QueryValueSource: context.put(this, w); //this refers to the same QueryValueSource from above, and the w refers to a Weight object As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the context Map, it unexpectedly pulls the Weight object out instead and thus the invalid case exception occurs. The NoOp multiplication works because it puts an different ValueSource between the query and the ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this (in QueryValueSource). Exception when using Query Function inside Scale Function - Key: SOLR-5524 URL: https://issues.apache.org/jira/browse/SOLR-5524 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Trey Grainger Priority: Minor Fix For: 4.7 If you try to use the query function inside the scale function, it throws the following exception: org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo Here is an example request that invokes this: http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5524) Exception when using Query Function inside Scale Function
[ https://issues.apache.org/jira/browse/SOLR-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-5524: Attachment: SOLR-5524.patch Simple patch. Just changing the ScaleFloatFunction function to use itself as the key instead of the ValueSource it is using internally (it's first parameter). This seems consistent with how other ValueSources (such as the QueryValueSource) work, and it fixes the issue at hand. Exception when using Query Function inside Scale Function - Key: SOLR-5524 URL: https://issues.apache.org/jira/browse/SOLR-5524 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Trey Grainger Priority: Minor Fix For: 4.7 Attachments: SOLR-5524.patch If you try to use the query function inside the scale function, it throws the following exception: org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo Here is an example request that invokes this: http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787277#comment-13787277 ] Trey Grainger commented on SOLR-4478: - (moving this from my previous e-mail to the solr-dev mailing list) There are two use-cases that appear broken with the new core auto-discovery mechanism: 1) *The Core Admin Handler's CREATE command no longer works to create brand new cores* (unless you have logged on the box and created the core's directory structure manually, which largely defeats the purpose of the CREATE command). With the old Solr.xml format, we could spin up as many cores as we wanted to dynamically with the following command: http://localhost:8983/solr/admin/cores?action=CREATEname=newCore1instanceDir=collection1dataDir=newCore1/data ... http://localhost:8983/solr/admin/cores?action=CREATEname=newCoreNinstanceDir=collection1dataDir=newCoreN/data In the new core discovery mode, this exception is now thrown: Error CREATEing SolrCore 'newCore1': Could not create a new core in solr/collection1/as another core is already defined there The exception is being intentionally thrown in CorePropertiesLocator.java because a core.properties file already exists in solr/collection1 (and only one can exist per directory). 2) *Having a shared configuration directory (instanceDir) across many cores no longer works.* Every core has to have it's own conf/ directory, and this doesn't seem to be overridable any longer. Previously, it was possible to have many cores share the same instanceDir (and just override their dataDir for obvious reasons). Now, it is necessary to copy and paste identical config files for each Solr core. I don't know if there's already a current roadmap for fixing this. I saw https://issues.apache.org/jira/browse/SOLR-4478, which suggested replacing instanceDir with the ability to specify a named configSet. This solves problem 2, but not problem1 (since you still can't have multiple core.properties files in the same folder). Based on Erick's comments in the JIRA ticket, it also sounds like this ticket is also dead at the moment. There is definitely a need to have a shared config directory - whether that is through a configSet or an explicit indexDir doesn't matter to me. There's also a need to be able to dynamically create Solr cores from external systems. I currently can't upgrade to core auto discovery because it doesn't allow dynamic core creation. Does anyone have some thoughts on how to best get these features working again under core autodiscovery? Adding instanceDir to core.properties seems like an easy solution, but there must be a desire not to do that or it would probably have already been done. I'm happy to contribute some time to resolving this if there is agreed upon path forward. Allow cores to specify a named config set in non-SolrCloud mode --- Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787278#comment-13787278 ] Trey Grainger commented on SOLR-4478: - (Eric's response to my post) Right, let's move this discussion to SOLR-4779. There's some history here. Sharing named config sets got a bit wrapped up in sharing the underlying solrconfig object. This latter has been taken off the table, but we should discuss fixing Trey's issues up. Here's what the thinking was: There would be a directory like solr_home/configs/configset1, solr_home/configs/configset2, etc. Then a new parameter for core.properties or create or whatever like configset=configset1 that would be smart enough to look in solr_home/configs for an entire conf directory named configste1. Trey: Does that work for your case? If so, please add your comments to 4779 and we can take it from there. FWIW, I don't think this is especially hard, but time is always at a premium. Allow cores to specify a named config set in non-SolrCloud mode --- Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787278#comment-13787278 ] Trey Grainger edited comment on SOLR-4478 at 10/5/13 5:47 PM: -- (Erick's response to my post) Right, let's move this discussion to SOLR-4779. There's some history here. Sharing named config sets got a bit wrapped up in sharing the underlying solrconfig object. This latter has been taken off the table, but we should discuss fixing Trey's issues up. Here's what the thinking was: There would be a directory like solr_home/configs/configset1, solr_home/configs/configset2, etc. Then a new parameter for core.properties or create or whatever like configset=configset1 that would be smart enough to look in solr_home/configs for an entire conf directory named configste1. Trey: Does that work for your case? If so, please add your comments to 4779 and we can take it from there. FWIW, I don't think this is especially hard, but time is always at a premium. was (Author: solrtrey): (Eric's response to my post) Right, let's move this discussion to SOLR-4779. There's some history here. Sharing named config sets got a bit wrapped up in sharing the underlying solrconfig object. This latter has been taken off the table, but we should discuss fixing Trey's issues up. Here's what the thinking was: There would be a directory like solr_home/configs/configset1, solr_home/configs/configset2, etc. Then a new parameter for core.properties or create or whatever like configset=configset1 that would be smart enough to look in solr_home/configs for an entire conf directory named configste1. Trey: Does that work for your case? If so, please add your comments to 4779 and we can take it from there. FWIW, I don't think this is especially hard, but time is always at a premium. Allow cores to specify a named config set in non-SolrCloud mode --- Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787282#comment-13787282 ] Trey Grainger edited comment on SOLR-4478 at 10/5/13 5:50 PM: -- Hi Erick, Yes, that resolves the hardest of the two problems. The other issue is that since a dedicated folder is now required per-core (to hold the core.properties file), the core _CREATE_ command needs to now also be able to create the folder for the new core if it doesn't exist. Something like: http://localhost:8983/solr/admin/cores?action=CREATEname=newCore; *coreDir=cores/newCore* configset=sharedconfig Alternatively, _instanceDir_ could continue to serve that function (instead of being deprecated): http://localhost:8983/solr/admin/cores?action=CREATEname=newCore; *instanceDir=cores/newCore* configset=sharedconfig I think the combination of adding configSet and adding the ability for the CREATE command to actually create the new folder to hold core.properties should handle the use case. was (Author: solrtrey): Hi Erick, Yes, that resolves the hardest of the two problems. The other issue is that since a dedicated folder is now required per-core (to hold the core.properties file), the core _CREATE_ command needs to now also be able to create the folder for the new core if it doesn't exist. Something like: http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*coreDir=cores/newCore*configset=sharedconfig Alternatively, _instanceDir_ could continue to serve that function (instead of being deprecated): http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*instanceDir=cores/newCore*configset=sharedconfig I think the combination of adding configSet and adding the ability for the CREATE command to actually create the new folder to hold core.properties should handle the use case. Allow cores to specify a named config set in non-SolrCloud mode --- Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787282#comment-13787282 ] Trey Grainger commented on SOLR-4478: - Hi Erick, Yes, that resolves the hardest of the two problems. The other issue is that since a dedicated folder is now required per-core (to hold the core.properties file), the core _CREATE_ command needs to now also be able to create the folder for the new core if it doesn't exist. Something like: http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*coreDir=cores/newCore*configset=sharedconfig Alternatively, _instanceDir_ could continue to serve that function (instead of being deprecated): http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*instanceDir=cores/newCore*configset=sharedconfig I think the combination of adding configSet and adding the ability for the CREATE command to actually create the new folder to hold core.properties should handle the use case. Allow cores to specify a named config set in non-SolrCloud mode --- Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Roadmap for fixing features broken by core autodiscovery
There are two use-cases that appear broken with the new core auto-discovery mechanism: *1) The Core Admin Handler's CREATE command no longer works to create brand new cores* (unless you have logged on the box and created the core's directory structure manually, which largely defeats the purpose of the CREATE command). With the old Solr.xml format, we could spin up as many cores as we wanted to dynamically with the following command: http://localhost:8983/solr/admin/cores?action=CREATEname=newCore1; instanceDir=collection1dataDir=newCore1/data ... http://localhost:8983/solr/admin/cores?action=CREATEname=newCoreN; instanceDir=collection1dataDir=newCoreN/data In the new core discovery mode, this exception is now thrown: Error CREATEing SolrCore 'newCore1': Could not create a new core in solr/collection1/as another core is already defined there The exception is being intentionally thrown in CorePropertiesLocator.java because a core.properties file already exists in solr/collection1 (and only one can exist per directory). *2) Having a shared configuration directory (instanceDir) across many cores no longer works*. Every core has to have it's own conf/ directory, and this doesn't seem to be overridable any longer. Previously, it was possible to have many cores share the same instanceDir (and just override their dataDir for obvious reasons). Now, it is necessary to copy and paste identical config files for each Solr core. I don't know if there's already a current roadmap for fixing this. I saw https://issues.apache.org/jira/browse/SOLR-4478, which suggested replacing instanceDir with the ability to specify a named configSet. This solves problem 2, but not problem1 (since you still can't have multiple core.properties files in the same folder). Based on Erick's comments in the JIRA ticket, it also sounds like this ticket is also dead at the moment. There is definitely a need to have a shared config directory - whether that is through a configSet or an explicit indexDir doesn't matter to me. There's also a need to be able to dynamically create Solr cores from external systems. I currently can't upgrade to core auto discovery because it doesn't allow dynamic core creation. Does anyone have some thoughts on how to best get these features working again under core autodiscovery? Adding instanceDir to core.properties seems like an easy solution, but there must be a desire not to do that or it would probably have already been done. I'm happy to contribute some time to resolving this if there is agreed upon path forward. Thanks, -Trey
[jira] [Created] (SOLR-5052) eDisMax Field Aliasing behaving oddly when invalid field is present
Trey Grainger created SOLR-5052: --- Summary: eDisMax Field Aliasing behaving oddly when invalid field is present Key: SOLR-5052 URL: https://issues.apache.org/jira/browse/SOLR-5052 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.3.1 Environment: AWS / Ubuntu Reporter: Trey Grainger Priority: Minor Fix For: 4.5 Field Aliasing for the eDisMax query parser behaves in a very odd manner if an invalid field is specified in any of the aliases. Essentially, instead of throwing an exception on an invalid alias, it breaks all of the other aliased fields such that they will only handle the first term correctly. Take the following example: /select?defType=edismaxf.who.qf=personLastName_t^30 personFirstName_t^10f.what.qf=itemName_t companyName_t^5f.where.qf=cityName_t^10 INVALIDFIELDNAME^20 countryName_t^35 postalCodeName_t^30q=who:(trey grainger) what:(solr) where:(atlanta, ga)debugQuery=truedf=text The terms trey, solr and atlanta correctly search across the aliased fields, but the terms grainger and ga are incorrectly being searched across the default field (text). Here is parsed query from the debug: lst name=debug str name=rawquerystring who:(trey grainger) what:(solr) where:(decatur, ga) /str str name=querystring who:(trey grainger) what:(solr) where:(decatur, ga) /str str name=parsedquery (+(DisjunctionMaxQuery((personFirstName_t:trey^10.0 | personLastName_t:trey^30.0)) DisjunctionMaxQuery((text:grainger)) DisjunctionMaxQuery((itemName_t:solr | companyName_t:solr^5.0)) DisjunctionMaxQuery((postalCodeName_t:decatur^30.0 | countryName_t:decatur^35.0 | cityName_t:decatur^10.0)) DisjunctionMaxQuery((text:ga/no_coord /str str name=parsedquery_toString +((personFirstName_t:trey^10.0 | personLastName_t:trey^30.0) (text:grainger) (itemName_t:solr | companyName_t:solr^5.0) (postalCodeName_t:decatur^30.0 | countryName_t:decatur^35.0 | cityName_t:decatur^10.0) (text:ga)) /str I think the presence of an invalid field in a qf parameter should throw an exception (or throw the invalid field away in that alias), but it shouldn't break the aliases for other fields. For the record, if there are no invalid fields in any of the aliases, all of the aliases work. If there is one invalid field in any of the aliases, all of the aliases act oddly like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709360#comment-13709360 ] Trey Grainger commented on SOLR-2894: - @[~Otis], we have this patch live in production for several use cases (as a pre-requisite for SOLR-3583, which we've also worked on @CareerBuilder), but the currently known issues which would prevent this from being committed include: 1) Tags and Excludes are not being respected beyond the first level 2) The facet.limit=-1 issue (not returning all values) 3) The lack of support for datetimes We need #1 and Andrew is working on a project currently to fix this. He's also looking to fix #3 and find a reasonably scalable solution to #2. I'm not sure when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks until this patch is all wrapped up. Meanwhile, if anyone else finds any issues with the patch, please let us know so they can be looked into. Thanks! Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.4 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [ANNOUNCE] Solr wiki editing change
Please add TreyGrainger to the the contributors group. Thanks! -Trey On Sun, Mar 24, 2013 at 11:18 PM, Steve Rowe sar...@gmail.com wrote: The wiki at http://wiki.apache.org/solr/ has come under attack by spammers more frequently of late, so the PMC has decided to lock it down in an attempt to reduce the work involved in tracking and removing spam. From now on, only people who appear on http://wiki.apache.org/solr/ContributorsGroup will be able to create/modify/delete wiki pages. Please request either on the solr-u...@lucene.apache.org or on dev@lucene.apache.org to have your wiki username added to the ContributorsGroup page - this is a one-time step. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407459#comment-13407459 ] Trey Grainger commented on SOLR-2894: - Hi Erik, Sorry, I missed your original message asking me if I could test out the latest patch - I'd be happy to help. I just tried both your patch and the April 25th patch against the Solr 4.0 Alpha revision and neither applied immediately. I'll see if I can find some time on Sunday to try to get a revision sorted out which will work with the current version. I think there are some changes in the April 24th patch which may need to be re-applied if your changes were based upon the earlier patch. I'll know more once I've had a chance to dig in later this weekend. Thanks, -Trey Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 4.0 Attachments: SOLR-2894.patch, SOLR-2894.patch, distributed_pivot.patch, distributed_pivot.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294795#comment-13294795 ] Trey Grainger commented on SOLR-2894: - For what it's worth, we're actively using the April 25th version of this patch in production at CareerBuilder (with an older version of trunk) with no issues. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 4.0 Attachments: SOLR-2894.patch, distributed_pivot.patch, distributed_pivot.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249 ] Trey Grainger commented on SOLR-2614: - Hi Terrance, We (at CareerBuilder) recently built a patch recently which could serve as a good starting point for this. We build an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.1 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249 ] Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:23 AM: -- Hi Terrance, We (at CareerBuilder) built a patch recently which could serve as a good starting point for this. We build an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder was (Author: solrtrey): Hi Terrance, We (at CareerBuilder) recently built a patch recently which could serve as a good starting point for this. We build an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.1 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249 ] Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:23 AM: -- Hi Terrance, We (at CareerBuilder) built a patch recently which could serve as a good starting point for this. We built an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder was (Author: solrtrey): Hi Terrance, We (at CareerBuilder) built a patch recently which could serve as a good starting point for this. We build an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.1 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249 ] Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:24 AM: -- Hi Terrance, We (at CareerBuilder) built a patch recently which could serve as a good starting point for this. We built an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the other stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder was (Author: solrtrey): Hi Terrance, We (at CareerBuilder) built a patch recently which could serve as a good starting point for this. We built an ability to calculate Percentiles (i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets. It works well enough for our use cases, and I'm sure the stats types mentioned could be added in. It is dependent upon the distributed pivot faceting patch (SOLR-2894), which seem to be working well but has yet to be committed. I'll see if we can get the patch posted either as part of this JIRA or separately in the next day or so, which could save you some time in implementing the other types. -Trey Grainger CareerBuilder stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.1 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
[ https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847936#action_12847936 ] Trey Grainger commented on SOLR-1837: - Re: bugs in Luke that result in missing terms - I recently fixed one such bug, and indeed it was located in the DocReconstructor - if you are aware of others then please report them using the Luke issue tracker. I just pulled down the most recent Luke code, and it does looks like that recent fix was made to cover the bug I saw. Unfortunately, the fix results in a null ref for me on my index. I'll open an issue, as it looks like all that's needed is an extra null check. Re: Document reconstruction is a very IO-intensive operation, so I would advise against using it on a production system, and also it produces inexact results (because analysis is usually a lossy operation). I hear you about it being IO-intensive. There's also other admin tools in Solr which do similarly intensive operations (the schema browser, for example, which generates a list of all fields and a distribution of terms within those fields). The intent of the tool is for one-off debugging, not for any kind of automated querying, but I'll try do some tests to see to what degree this tool is affecting our current production systems (I have not see any noticeable effect thus far). Also, regarding the process being lossy. In this case, that is kind of the point of the tool (in my use) - to see what has actually been put into the index vs what was in the document sent to the engine. For example, if I index a field with the text Wi-fi hotspots are a life-saver with payloads on parts of speech, as well as stemming I want to be able to see something like: wi [1] / fi [1] | wifi [1] / hotspot [1] / are [2] / a [3] / life [1] / saver [1] | lifesaver [1] With no payloads, this would simply be wi / fi | wifi / hotspots | hotspot / are / a / life / saver | lifesaver So I had initially named to tool the Solr Document Reconstructor, after the name you gave to the tool in Luke. Based on your comments, I think it might be less confusing for me to call it something like Document Inspector, since it is not truly reconstructing the original document. I'll try to get what I have pushed up today so you can check it out if you want. Thanks for your great work on that tool! Reconstruct a Document (stored fields, indexed fields, payloads) Key: SOLR-1837 URL: https://issues.apache.org/jira/browse/SOLR-1837 Project: Solr Issue Type: New Feature Components: Schema and Analysis, web gui Affects Versions: 1.5 Environment: All Reporter: Trey Grainger Priority: Minor Fix For: 1.5 Original Estimate: 168h Remaining Estimate: 168h One Solr feature I've been sorely in need of is the ability to inspect an index for any particular document. While the analysis page is good when you have specific content and a specific field/type your want to test the analysis process for, once a document is indexed it is not currently possible to easily see what is actually sitting in the index. One can use the Lucene Index Browser (Luke), but this has several limitations (gui only, doesn't understand solr schema, doesn't display many non-text fields in human readable format, doesn't show payloads, some bugs lead to missing terms, exposes features dangerous to use in a production Solr environment, slow or difficult to check from a remote location, etc.). The document reconstruction feature of Luke provides the base for what can become a much more powerful tool when coupled with Solr's understanding of a schema, however. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
[ https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-1837: Attachment: SOLR-1837.patch Here's what I have thusfar. Only bug I currently know about is that Solr multi-valued fields (i.e. field name=xvalue1/fieldfield name=xvalue2/field) currently display as concatenated together instead of as an array of separate fields in the stored fields view. I've referred to the tool in the admin interface as the Document Inspector instead of Document Reconstructor to prevent confusion over lost/changed/added terms due to index-time analysis. Any feedback appreciated. Reconstruct a Document (stored fields, indexed fields, payloads) Key: SOLR-1837 URL: https://issues.apache.org/jira/browse/SOLR-1837 Project: Solr Issue Type: New Feature Components: Schema and Analysis, web gui Affects Versions: 1.5 Environment: All Reporter: Trey Grainger Priority: Minor Fix For: 1.5 Attachments: SOLR-1837.patch Original Estimate: 168h Remaining Estimate: 168h One Solr feature I've been sorely in need of is the ability to inspect an index for any particular document. While the analysis page is good when you have specific content and a specific field/type your want to test the analysis process for, once a document is indexed it is not currently possible to easily see what is actually sitting in the index. One can use the Lucene Index Browser (Luke), but this has several limitations (gui only, doesn't understand solr schema, doesn't display many non-text fields in human readable format, doesn't show payloads, some bugs lead to missing terms, exposes features dangerous to use in a production Solr environment, slow or difficult to check from a remote location, etc.). The document reconstruction feature of Luke provides the base for what can become a much more powerful tool when coupled with Solr's understanding of a schema, however. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
Reconstruct a Document (stored fields, indexed fields, payloads) Key: SOLR-1837 URL: https://issues.apache.org/jira/browse/SOLR-1837 Project: Solr Issue Type: New Feature Components: Schema and Analysis, web gui Affects Versions: 1.5 Environment: All Reporter: Trey Grainger Priority: Minor Fix For: 1.5 One Solr feature I've been sorely in need of is the ability to inspect an index for any particular document. While the analysis page is good when you have specific content and a specific field/type your want to test the analysis process for, once a document is indexed it is not currently possible to easily see what is actually sitting in the index. One can use the Lucene Index Browser (Luke), but this has several limitations (gui only, doesn't understand solr schema, doesn't display many non-text fields in human readable format, doesn't show payloads, some bugs lead to missing terms, exposes features dangerous to use in a production Solr environment, slow or difficult to check from a remote location, etc.). The document reconstruction feature of Luke provides the base for what can become a much more powerful tool when coupled with Solr's understanding of a schema, however. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
[ https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-1837: Remaining Estimate: 168h (was: 120h) Original Estimate: 168h (was: 120h) Reconstruct a Document (stored fields, indexed fields, payloads) Key: SOLR-1837 URL: https://issues.apache.org/jira/browse/SOLR-1837 Project: Solr Issue Type: New Feature Components: Schema and Analysis, web gui Affects Versions: 1.5 Environment: All Reporter: Trey Grainger Priority: Minor Fix For: 1.5 Original Estimate: 168h Remaining Estimate: 168h One Solr feature I've been sorely in need of is the ability to inspect an index for any particular document. While the analysis page is good when you have specific content and a specific field/type your want to test the analysis process for, once a document is indexed it is not currently possible to easily see what is actually sitting in the index. One can use the Lucene Index Browser (Luke), but this has several limitations (gui only, doesn't understand solr schema, doesn't display many non-text fields in human readable format, doesn't show payloads, some bugs lead to missing terms, exposes features dangerous to use in a production Solr environment, slow or difficult to check from a remote location, etc.). The document reconstruction feature of Luke provides the base for what can become a much more powerful tool when coupled with Solr's understanding of a schema, however. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
[ https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847866#action_12847866 ] Trey Grainger commented on SOLR-1837: - I've been working on implementing the document reconstruction feature over the past week and have created an additional admin page which exposes it. The functionality is essentially a reworking of the lucene document reconstruction functionality in Luke, but with improvements to handle the problems listed in the jira issue description above. I'll be pushing up a patch soon and will look forward to any additional recommendations after others have had a chance to try it out. Reconstruct a Document (stored fields, indexed fields, payloads) Key: SOLR-1837 URL: https://issues.apache.org/jira/browse/SOLR-1837 Project: Solr Issue Type: New Feature Components: Schema and Analysis, web gui Affects Versions: 1.5 Environment: All Reporter: Trey Grainger Priority: Minor Fix For: 1.5 Original Estimate: 168h Remaining Estimate: 168h One Solr feature I've been sorely in need of is the ability to inspect an index for any particular document. While the analysis page is good when you have specific content and a specific field/type your want to test the analysis process for, once a document is indexed it is not currently possible to easily see what is actually sitting in the index. One can use the Lucene Index Browser (Luke), but this has several limitations (gui only, doesn't understand solr schema, doesn't display many non-text fields in human readable format, doesn't show payloads, some bugs lead to missing terms, exposes features dangerous to use in a production Solr environment, slow or difficult to check from a remote location, etc.). The document reconstruction feature of Luke provides the base for what can become a much more powerful tool when coupled with Solr's understanding of a schema, however. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-422) one double quote or two double quotes only break search
[ https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745245#action_12745245 ] Trey Grainger edited comment on SOLR-422 at 8/19/09 5:03 PM: - This issue is in the same ballpark as SOLR-874. Both concern bad parsing of fringe cases by the DisMax handler. was (Author: tgrainger): This issue is in the same ballpark as SOLR-878. Both concern bad parsing of fringe cases by the DisMax handler. one double quote or two double quotes only break search --- Key: SOLR-422 URL: https://issues.apache.org/jira/browse/SOLR-422 Project: Solr Issue Type: Bug Components: search Reporter: Doug Daniels Priority: Minor Using Dismax, searching for either one double quote character: q= or two double quote characters with no text between them: q= throws an exception. Not sure whether this is also the case for other request handlers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-422) one double quote or two double quotes only break search
[ https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-422: --- Comment: was deleted (was: This issue is in the same ballpark as SOLR-874. Both concern bad parsing of fringe cases by the DisMax handler.) one double quote or two double quotes only break search --- Key: SOLR-422 URL: https://issues.apache.org/jira/browse/SOLR-422 Project: Solr Issue Type: Bug Components: search Reporter: Doug Daniels Priority: Minor Using Dismax, searching for either one double quote character: q= or two double quote characters with no text between them: q= throws an exception. Not sure whether this is also the case for other request handlers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-422) one double quote or two double quotes only break search
[ https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-422: --- Comment: was deleted (was: These issues both concern reworking of the Dismax parser to handle fringe cases and should be dealt with together.) one double quote or two double quotes only break search --- Key: SOLR-422 URL: https://issues.apache.org/jira/browse/SOLR-422 Project: Solr Issue Type: Bug Components: search Reporter: Doug Daniels Priority: Minor Using Dismax, searching for either one double quote character: q= or two double quote characters with no text between them: q= throws an exception. Not sure whether this is also the case for other request handlers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.