Re: Welcome Mayya Sharipova as Lucene/Solr committer

2020-06-09 Thread Trey Grainger
Woohoo - awesome news! Congrats, Mayya!

Trey Grainger
Founder, Searchkernel <https://searchkernel.com>

On Mon, Jun 8, 2020 at 12:58 PM jim ferenczi  wrote:

> Hi all,
>
> Please join me in welcoming Mayya Sharipova as the latest Lucene/Solr
> committer.
> Mayya, it's tradition for you to introduce yourself with a brief bio.
>
> Congratulations and Welcome!
>
> Jim
>


Re: Welcome Eric Pugh as a Lucene/Solr committer

2020-04-06 Thread Trey Grainger
Congratulations, Eric!

On Mon, Apr 6, 2020 at 8:21 AM Jan Høydahl  wrote:

> Hi all,
>
> Please join me in welcoming Eric Pugh as the latest Lucene/Solr committer!
>
> Eric has been part of the Solr community for over a decade, as a code
> contributor, book author, company founder, blogger and mailing list
> contributor! We look forward to his future contributions!
>
> Congratulations and welcome! It is a tradition to introduce yourself with
> a brief bio, Eric.
>
> Jan Høydahl
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Welcome Alessandro Benedetti as a Lucene/Solr committer

2020-03-19 Thread Trey Grainger
Congrats, Alessandro!

-Trey

On Thu, Mar 19, 2020 at 10:36 AM Ignacio Vera  wrote:

> Welcome Alessandro!
>
> On Thu, Mar 19, 2020 at 3:21 PM Namgyu Kim  wrote:
>
>> Congrats and welcome! Alessandro :D
>>
>> On Thu, Mar 19, 2020 at 11:10 PM Michael Sokolov 
>> wrote:
>>
>>> Welcome Alessandro!
>>>
>>> On Wed, Mar 18, 2020 at 3:25 PM Alessandro Benedetti <
>>> a.benede...@sease.io> wrote:
>>>
 Thanks everyone for the warm welcome!
 I already know most of you but for all the others here's my brief bio :)

 I am Italian (possibly the only other italian in addition to Tommaso)
 and I have been living in the UK for the last 7 years.
 I am currently based in London.
 I started working with Apache Solr back in 2010 (and a few months later
 with Apache Lucene), my first project was a search API that translated the
 Verity query language to Lucene syntax, at the time I was a junior software
 engineer with a background in Information Retrieval research at Roma3
 university.
 Since then I have explored a lot of different use cases for Apache
 Lucene/Solr and I spent more and more time studying and working with the
 internals, across various companies and positions.
 My favourite projects in my career have been the design and
 implementation of a Semantic Search engine called Sensify (when I was
 working in a small and cohesive R team in Zaizi, with spanish friends and
 colleagues from Seville), the Apache Solr Learning To Rank plugin from
 Bloomberg (and integrations/applications) and the Rated Ranking Evaluator
 project (an Open Source library for Search Quality Evaluation we
 contributed back to the community).
 In 2016 I founded my own company, Sease where we try to build a bridge
 between Academia and the industry through Open Source software in the
 domain of Information Retrieval.

 As David mentioned my main areas of contribution in Apache Lucene/Solr
 have been the More Like This, the Learning To Rank plugin, Synonyms
 expansion and the Suggester component.
 I have a lot of ideas in my to do list, so stay tuned, we'll have a lot
 to discuss and innovate !

 It is a pleasure to join this group and I am sure we'll do great things
 together :)

 Cheers


 --
 Alessandro Benedetti
 Search Consultant, R Software Engineer, Director
 www.sease.io


 On Wed, 18 Mar 2020 at 13:00, David Smiley 
 wrote:

> Hi all,
>
> Please join me in welcoming Alessandro Benedetti as the latest
> Lucene/Solr committer!
>
> Alessandro has been contributing to Lucene and Solr in areas such as
> More Like This, Synonym boosting, and Suggesters, and other areas for
> years.  Furthermore he's been a help to many users on the solr-user 
> mailing
> list and has helped others through his blog posts and presentations about
> search.  We look forward to his future contributions.
>
> Congratulations and welcome!  It is a tradition to introduce yourself
> with a brief bio, Alessandro.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>



[PSA] Activate 2019 Call for Speakers ends May 8

2019-05-04 Thread Trey Grainger
Hi everyone,

I wanted to do a quick PSA for anyone who may have missed the announcement
last month to let you know the call for speakers is currently open
through *Wednesday,
May 8th*, for Activate 2019 (the Search and AI Conference), focused on the
Apache Solr ecosystem and the intersection of Search and AI:
https://lucidworks.com/2019/04/02/activate-2019-call-for-speakers/

The Activate Conference will be held September 9-12 in Washington, D.C.

The conference, rebranded last year from "Lucene/Solr Revolution", is
expected to grow considerably this year, and I'd like to encourage all of
you working on advancements in the Lucene/Solr project or working on
solving interesting problems in this space to consider submitting a talk if
you haven't already. There are tracks dedicated to Solr Development,
AI-powered Search, Search Development at Scale, and numerous other related
topics - including tracks for key use cases like digital commerce - that I
expect most on this list will find appealing.

If you're interested in presenting (your conference registration fee will
be covered if accepted), please submit a talk here:
https://activate-conf.com/speakers/

Just wanted to make sure everyone in the development and user community
here was aware of the conference and didn't miss the opportunity to submit
a talk by Wednesday if interested.

All the best,

Trey Grainger
Chief Algorithms Officer @ Lucidworks
https://www.linkedin.com/in/treygrainger/


Re: Congratulations to the new Lucene/Solr PMC chair, Cassandra Targett

2019-01-02 Thread Trey Grainger
Congratulations, Cassandra!

On Wed, Jan 2, 2019 at 9:31 AM Joel Bernstein  wrote:

> Congratulations Cassandra!
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 2, 2019 at 8:39 AM Tommaso Teofili 
> wrote:
>
>> Congrats Cassandra!
>> Il giorno mer 2 gen 2019 alle ore 12:43 Shalin Shekhar Mangar
>>  ha scritto:
>> >
>> > Congratulations Cassandra!
>> >
>> > On Mon, Dec 31, 2018 at 1:08 PM Adrien Grand  wrote:
>> >>
>> >> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache
>> >> Vice President position.
>> >>
>> >> This year we have nominated and elected Cassandra Targett as the
>> >> chair, a decision that the board approved in its December 2018
>> >> meeting.
>> >>
>> >> Congratulations, Cassandra!
>> >>
>> >> --
>> >> Adrien
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


[jira] [Commented] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604608#comment-16604608
 ] 

Trey Grainger commented on SOLR-9418:
-

I uploaded an updated patch today for this issue, contributing the 
CareerBuilder version of this initial patch for this issue was loosely based 
upon (thanks for the contribution, CareerBuilder!). I've had several people ask 
about this feature recently, and others have proposed some alternative 
implementations of this, as well.

Getting this posted as a reference implementation for future development.

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.patch, SOLR-9418.zip
>
>
> h2. *Summary:*
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> It is being generously donated to the Solr project by CareerBuilder, with the 
> original source code and a quickly demo-able version located here:  
> [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
> h2. *Purpose:*
> Assume you're building a job search engine, and one of your users searches 
> for the following:
>  _machine learning research and development Portland, OR software engineer 
> AND hadoop, java_
> Most search engines will natively parse this query into the following boolean 
> representation:
>  _(machine AND learning AND research AND development AND Portland) OR 
> (software AND engineer AND hadoop AND java)_
> While this query may still yield relevant results, it is clear that the 
> intent of the user wasn't understood very well at all. By leveraging the 
> Statistical Phrase Identifier on this string prior to query parsing, you can 
> instead expect the following parsing:
> _{machine learning} \{and} \{research and development} \{Portland, OR} 
> \{software engineer} \{AND} \{hadoop,} \{java}_
> It is then possile to modify all the multi-word phrases prior to executing 
> the search:
>  _"machine learning" and "research and development" "Portland, OR" "software 
> engineer" AND hadoop, java_
> Of course, you could do your own query parsing to specifically handle the 
> boolean syntax, but the following would eventually be interpreted correctly 
> by Apache Solr and most other search engines:
>  _"machine learning" AND "research and development" AND "Portland, OR" AND 
> "software engineer" AND hadoop AND java_ 
> h2. *History:*
> This project was originally implemented by the search team at CareerBuilder 
> in the summer of 2015 for use as part of their semantic search system. In the 
> summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
> concept based upon publicly available information about the CareerBuilder 
> implementation (the first attached patch).  In July of 2018, CareerBuilder 
> open sourced their original version 
> ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
>  and agreed to also donate the code to the Apache Software foundation as a 
> Solr contribution. An Solr patch with the CareerBuilder version was added to 
> this issue on September 5th, 2018, and community feedback and contributions 
> are encouraged.
> This issue was originally titled the "Probabilistic Query Parser", but the 
> name has now been updated to "Statistical Phrase Identifier" to avoid 
> ambiguity with Solr's query parsers (per some of the feedback on this issue), 
> as the implementation is actually just a mechanism for identifying phrases 
> statistically from a string and is NOT a Solr query parser. 
> h2. *Example usage:*
> h3. (See contrib readme or configuration files in the patch for full 
> configuration details)
> h3. *{{Request:}}*
> {code:java}
> http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
> skywalker toad x men magneto professor xavier{code}
> h3. *{{Response:}}* 
> {

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:  
[https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
h2. *Purpose:*

Assume you're building a job search engine, and one of your users searches for 
the following:
 _machine learning research and development Portland, OR software engineer AND 
hadoop, java_

Most search engines will natively parse this query into the following boolean 
representation:
 _(machine AND learning AND research AND development AND Portland) OR (software 
AND engineer AND hadoop AND java)_

While this query may still yield relevant results, it is clear that the intent 
of the user wasn't understood very well at all. By leveraging the Statistical 
Phrase Identifier on this string prior to query parsing, you can instead expect 
the following parsing:

_{machine learning} \{and} \{research and development} \{Portland, OR} 
\{software engineer} \{AND} \{hadoop,} \{java}_

It is then possile to modify all the multi-word phrases prior to executing the 
search:
 _"machine learning" and "research and development" "Portland, OR" "software 
engineer" AND hadoop, java_

Of course, you could do your own query parsing to specifically handle the 
boolean syntax, but the following would eventually be interpreted correctly by 
Apache Solr and most other search engines:
 _"machine learning" AND "research and development" AND "Portland, OR" AND 
"software engineer" AND hadoop AND java_ 
h2. *History:*

This project was originally implemented by the search team at CareerBuilder in 
the summer of 2015 for use as part of their semantic search system. In the 
summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
concept based upon publicly available information about the CareerBuilder 
implementation (the first attached patch).  In July of 2018, CareerBuilder open 
sourced their original version 
([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
 and agreed to also donate the code to the Apache Software foundation as a Solr 
contribution. An Solr patch with the CareerBuilder version was added to this 
issue on September 5th, 2018, and community feedback and contributions are 
encouraged.

This issue was originally titled the "Probabilistic Query Parser", but the name 
has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with 
Solr's query parsers (per some of the feedback on this issue), as the 
implementation is actually just a mechanism for identifying phrases 
statistically from a string and is NOT a Solr query parser. 
h2. *Example usage:*
h3. (See contrib readme or configuration files in the patch for full 
configuration details)
h3. *{{Request:}}*
{code:java}
http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
skywalker toad x men magneto professor xavier{code}
h3. *{{Response:}}* 
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":25},
    "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
{toad} {x men} {magneto} {professor xavier}",
    "top_parsed_phrases":[
      "darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "potential_parsings":[{
      "parsed_phrases":["darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} 
{x-men} {magneto} {professor xavier}",
    "score":0.0}]}{code}
 

 

  was:
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into 

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:  
[https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
h2. *Purpose:*

Assume you're building a job search engine, and one of your users searches for 
the following:
 _machine learning research and development Portland, OR software engineer AND 
hadoop, java_

Most search engines will natively parse this query into the following boolean 
representation:
 _(machine AND learning AND research AND development AND Portland) OR (software 
AND engineer AND hadoop AND java)_

While this query may still yield relevant results, it is clear that the intent 
of the user wasn't understood very well at all. By leveraging the Statistical 
Phrase Identifier on this string prior to query parsing, you can instead expect 
the following parsing:



_{machine learning} \{and} \{research and development} \{Portland, OR} 
\{software engineer} \{AND} \{hadoop,} \{java}_

It is then possile to modify all the multi-word phrases prior to executing the 
search:
 _"machine learning" and "research and development" "Portland, OR" "software 
engineer" AND hadoop, java_

Of course, you could do your own query parsing to specifically handle the 
boolean syntax, but the following would eventually be interpreted correctly by 
Apache Solr and most other search engines:
 _"machine learning" AND "research and development" AND "Portland, OR" AND 
"software engineer" AND hadoop AND java_ 
h2. *History:*

This project was originally implemented by the search team at CareerBuilder in 
the summer of 2015 for use as part of their semantic search system. In the 
summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
concept based upon publicly available information about the CareerBuilder 
implementation (the first attached patch).  In July of 2018, CareerBuilder open 
sourced their original version 
([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
 and agreed to also donate the code to the Apache Software foundation as a Solr 
contribution. An Solr patch with the CareerBuilder version was added to this 
issue on September 5th, 2018, and community feedback and contributions are 
encouraged.

This issue was originally titled the "Probabilistic Query Parser", but the name 
has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with 
Solr's query parsers (per some of the feedback on this issue), as the 
implementation is actually just a mechanism for identifying phrases 
statistically from a string and is NOT a Solr query parser. 
h2. *Example usage:*
h3. (See contrib readme or configuration files in the patch for full 
configuration details)
h3. *{{Request:}}*
{code:java}
http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
skywalker toad x men magneto professor xavier{code}
h3. *{{Response:}}* 
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":25},
    "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
{toad} {x men} {magneto} {professor xavier}",
    "top_parsed_phrases":[
      "darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "potential_parsings":[{
      "parsed_phrases":["darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} 
{x-men} {magneto} {professor xavier}",
    "score":0.0}]}{code}
 

 

  was:
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into 

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:  
[https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
h2. *Purpose:*

Assume you're building a job search engine, and one of your users searches for 
the following:
_machine learning research and development Portland, OR software engineer AND 
hadoop, java_

Most search engines will natively parse this query into the following boolean 
representation:
_(machine AND learning AND research AND development AND Portland) OR (software 
AND engineer AND hadoop AND java)_

While this query may still yield relevant results, it is clear that the intent 
of the user wasn't understood very well at all. By leveraging the Statistical 
Phrase Identifier on this string prior to query parsing, you can instead expect 
the following parsing:
_{machine learning} \{and} \{research and development} \{Portland, OR} 
\{software engineer} \{AND} \{hadoop,} \{java}_

It is then possile to modify all the multi-word phrases prior to executing the 
search:
_"machine learning" and "research and development" "Portland, OR" "software 
engineer" AND hadoop, java_

Of course, you could do your own query parsing to specifically handle the 
boolean syntax, but the following would eventually be interpreted correctly by 
Apache Solr and most other search engines:
_"machine learning" AND "research and development" AND "Portland, OR" AND 
"software engineer" AND hadoop AND java_

 
h2. *History:*

This project was originally implemented by the search team at CareerBuilder in 
the summer of 2015 for use as part of their semantic search system. In the 
summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
concept based upon publicly available information about the CareerBuilder 
implementation (the first attached patch).  In July of 2018, CareerBuilder open 
sourced their original version 
([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
 and agreed to also donate the code to the Apache Software foundation as a Solr 
contribution. An Solr patch with the CareerBuilder version was added to this 
issue on September 5th, 2018, and community feedback and contributions are 
encouraged.

This issue was originally titled the "Probabilistic Query Parser", but the name 
has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with 
Solr's query parsers (per some of the feedback on this issue), as the 
implementation is actually just a mechanism for identifying phrases 
statistically from a string and is NOT a Solr query parser.

 
h2. *Example usage:*
h3. (See contrib readme or configuration files in the patch for full 
configuration details)
h3. *{{Request:}}*
{code:java}
http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
skywalker toad x men magneto professor xavier{code}
 
h3. *{{Response:}}*

 
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":25},
    "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
{toad} {x men} {magneto} {professor xavier}",
    "top_parsed_phrases":[
      "darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "potential_parsings":[{
      "parsed_phrases":["darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} 
{x-men} {magneto} {professor xavier}",
    "score":0.0}]}{code}
 

 

  was:
The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
inte

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Attachment: SOLR-9418.patch

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.patch, SOLR-9418.zip
>
>
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> History
> This project was originally implemented at CareerBuilder in the summer of 
> 2015 for use as part of their semantic search system. In 2018
>  
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
>  1.)Generate all possible parsings for the given query
>  2.)For each possible parsing, a naive-bayes like score is calculated.
>  3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
>  4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

History

This project was originally implemented at CareerBuilder in the summer of 2015 
for use as part of their semantic search system. In 2018

 

The main aim of this requestHandler is to get the best parsing for a given 
query. This basically means recognizing different phrases within the query. We 
need some kind of training data to generate these phrases. The way this project 
works is:
 1.)Generate all possible parsings for the given query
 2.)For each possible parsing, a naive-bayes like score is calculated.
 3.)The main scoring is done by going through all the documents in the training 
set and finding the probability of bunch of words occurring together as a 
phrase as compared to them occurring randomly in the same document. Then the 
score is normalized. Some higher importance is given to the title field as 
compared to content field which is configurable.
 4.)Finally after scoring each of the possible parsing, the one with the 
highest score is returned.

  was:
The main aim of this requestHandler is to get the best parsing for a given 
query. This basically means recognizing different phrases within the query. We 
need some kind of training data to generate these phrases. The way this project 
works is:
1.)Generate all possible parsings for the given query
2.)For each possible parsing, a naive-bayes like score is calculated.
3.)The main scoring is done by going through all the documents in the training 
set and finding the probability of bunch of words occurring together as a 
phrase as compared to them occurring randomly in the same document. Then the 
score is normalized. Some higher importance is given to the title field as 
compared to content field which is configurable.
4.)Finally after scoring each of the possible parsing, the one with the highest 
score is returned.


> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.zip
>
>
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> History
> This project was originally implemented at CareerBuilder in the summer of 
> 2015 for use as part of their semantic search system. In 2018
>  
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
>  1.)Generate all possible parsings for the given query
>  2.)For each possible parsing, a naive-bayes like score is calculated.
>  3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
>  4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Summary: Statistical Phrase Identifier  (was: Probabilistic-Query-Parser 
RequestHandler)

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.zip
>
>
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
> 1.)Generate all possible parsings for the given query
> 2.)For each possible parsing, a naive-bayes like score is calculated.
> 3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
> 4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Multiple Query-Time Analyzers in Solr

2017-11-23 Thread Trey Grainger
Doug - see https://issues.apache.org/jira/browse/SOLR-6492.

I implemented something previously that accomplishes the stated goal (it's
part of Chapter 14 of* Solr in Action <http://solrinaction.com>*).
Specifically, it is a text field that allows you to dynamically change the
analyzer(s) at index time (on a per document basis) or at query time (on a
per-term basis) while using the same actual field in the index.

One interesting note - you can actually choose *multiple* analyzers per
field for the same document or query (you're not restricted to one, as in
your proposed example). For example, if you wanted to index or query text
in multiple languages at the same time on the same text, you could specify
the analyzer for each language and it would run your text (independently)
through them all prior to indexing or as part of the query construction.

The syntax isn't elegant (feels a bit ugly since you can switch analyzers
per-term - but therein also lies tremendous flexibility), but it works. It
currently requires you to pass in the analyzers you want to use either in
the content of you field (index-time) or as part of your query, which means
no schema changes are necessary other than using a special field type for
the dynamic analyzer behavior. Something like the schema changes you
proposed would make it easier to use in most cases, though.

 I've unfortunately done an awful job of keeping the JIRA moving along
toward getting it committed (busy schedule), but it's something you can
take a look at. Would be happy to collaborate with you if you're thinking
about doing work in this area.

All the best,

Trey Grainger
Co-Author, *Solr in Action*
SVP of Engineering @ Lucidworks

On Thu, Nov 23, 2017 at 11:03 AM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> An alternate solution could be to create a fieldType that was a
> "FacadeTextField" that searches a real TextField field with a different
> query time analyzer. IE it would not have a physical representation in the
> index, but just provide a handle to a "field" that is searched with a
> different query time analyzer.
>
> For example, actor_nosyn is really a facade for searching "actor" with a
> different analyzer
>
> 
>   
>
> 
>   
>
>
> 
> 
> ...
> 
>
> 
> 
> ...
> ...
> 
>
> This would allow edismax and other query parsers to remain unchanged
> searching, ie:
>
> q=action movies=actor actor_nosyn title text=edismax
>
>
>
> On Thu, Nov 23, 2017 at 10:50 AM Doug Turnbull <dturnbull@
> opensourceconnections.com> wrote:
>
>> I wonder if there's been any thought by the community to refactoring
>> fieldTypes to allow multiple query-time analyzers per indexed field?
>> Currently, to get different query-time analysis behavior you have to
>> duplicate a field. This is unfortunate duplication if, for example, I want
>> to search a field with query time synonyms on/off. For higher scale search
>> cases, allowing multiple query time analyzers against a single index field
>> can be invaluable. It's one reason I created the Match Query Parser (
>> https://github.com/o19s/match-query-parser) and a major feature of
>> hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms )
>>
>> What I would propose is the ability to place multiple analyzers under a
>> field type. For example:
>>
>> 
>> ...> analyzer>
>> ...
>> ...
>> 
>>
>> Notice how one query-time analyzer is "default" (and including only one
>> would make it the default)
>>
>> This would require allowing query parsers pass the analyzer to use at
>> query time. I would propose introduce a syntax for configuring query
>> behavior per-field in edismax. Omitting this would continue to use the
>> default behavior/analyzer.
>>
>> For example, one could query title and text as usual:
>>
>> q=action movies=actor title text=edismax
>>
>> I would propose introducing a syntax whereby qf could refer to a kind of
>> psuedo field, configurable with a syntax similar to per-field facet settings
>>
>> For example, below "actor_nosyn" and "actor_syn" actually search the same
>> physical field, but are configured with different analyzers
>>
>> q=action movies=actor_syn actor_nosyn^10 title
>> text=edismax_nosyn.field=actor_
>> nosyn.analyzer=without_synonyms_syn.field=
>> actor_syn.analyzer=with_synonyms
>>
>> Indeed, I would propose extending this syntax to control some of the
>> query-specific properties that currently are tied to the fieldType, such as
>>
>> q=action movies=actor_syn actor_nosyn^10 title
&g

[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-07-21 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494.branch_7x.patch

Here's the most up-to-date patch against branch_7x.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Affects Versions: 7.0
>    Reporter: Trey Grainger
>Priority: Blocker
> Fix For: 7.0
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.branch_7x.patch, 
> SOLR-10494.patch, SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-07-21 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096408#comment-16096408
 ] 

Trey Grainger edited comment on SOLR-10494 at 7/21/17 3:57 PM:
---

Hi [~janhoy].

I picked it up a few times, but was developing against master and kept running 
into stability issues with other tests every time I pulled. I finally switched 
over to just developing on the 7.x branch instead to prevent those stability 
issues. I have an updated patch which fixes some (now) merge conflicts with the 
default configset changes, and all tests appear to be passing except the 
TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been 
able to dig deep enough to understand what is effecting that one.

I DO know that the issues is related to indention. If I go into the test and 
override it to "indent=off" then it succeeds, but I have no idea why indention 
being on is causing the failure. Also, doing that in the test is probably just 
masking another underlying problem, which may not even be test related, so I 
really need to understand exactly where things are breaking down to know if 
it's a test problem or an actual functionality problem somewhere.

At any rate, I'll post my updated patch here shortly. I'm a little tight on 
time this next week, so hopefully I can enlist someone else to assist on my end 
later today, as well.


was (Author: solrtrey):
Hi [~janhoy].

I picked it up a few times, but was developing against master and kept running 
into stability issues with other tests every time I pulled. I finally switched 
over to just developing on the 7.x branch instead to prevent those issues. I 
have an updated patch which fixes some (now) merge conflicts with the default 
configset changes, and all tests appear to be passing except the 
TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been 
able to dig deep enough to understand what is effecting that one.

I DO know that the issues is related to indention. If I go into the test and 
override it to "indent=off" then it succeeds, but I have no idea why indention 
being on is causing the failure. Also, doing that in the test is probably just 
masking another underlying problem, which may not even be test related, so I 
really need to understand exactly where things are breaking down to know if 
it's a test problem or an actual functionality problem somewhere.

At any rate, I'll post my updated patch here shortly. I'm a little tight on 
time this next week, so hopefully I can enlist someone else to assist on my end 
later today, as well.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>    Affects Versions: 7.0
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: 7.0
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to chang

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-07-21 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096408#comment-16096408
 ] 

Trey Grainger commented on SOLR-10494:
--

Hi [~janhoy].

I picked it up a few times, but was developing against master and kept running 
into stability issues with other tests every time I pulled. I finally switched 
over to just developing on the 7.x branch instead to prevent those issues. I 
have an updated patch which fixes some (now) merge conflicts with the default 
configset changes, and all tests appear to be passing except the 
TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been 
able to dig deep enough to understand what is effecting that one.

I DO know that the issues is related to indention. If I go into the test and 
override it to "indent=off" then it succeeds, but I have no idea why indention 
being on is causing the failure. Also, doing that in the test is probably just 
masking another underlying problem, which may not even be test related, so I 
really need to understand exactly where things are breaking down to know if 
it's a test problem or an actual functionality problem somewhere.

At any rate, I'll post my updated patch here shortly. I'm a little tight on 
time this next week, so hopefully I can enlist someone else to assist on my end 
later today, as well.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Affects Versions: 7.0
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: 7.0
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-26 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064287#comment-16064287
 ] 

Trey Grainger edited comment on SOLR-10494 at 6/27/17 5:23 AM:
---

Ok, I think I'm nearly done. This patch ([^SOLR-10494.patch]) includes removing 
all the extraneous "wt=json" and "indent=on" references, adding a commented out 
version of "wt=xml" to the example solrconfig.xml's, unit test updates, some 
additional updates to the tutorials and docs (also incorporating 
[~ctargett]'s), and updating the admin UI (query section) to handle the new 
defaults.

The only issue I'm running into is that for some reason I haven't figured out 
yet, turning "indent" on has broken some of the parent/child relationship tests 
(i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, 
SolrExampleTests.testChildDocTransformer. It initially appears to be some xml 
parsing issue issue with the extra whitespace, which would be odd, but I 
haven't dug in yet.  Once I figure those out, I'll update the patch, and then I 
think this will be ready for review.


was (Author: solrtrey):
Ok, I think I'm nearly done. This patch includes removing all the extraneous 
"wt=json" and "indent=on" references, adding a commented out version of 
"wt=xml" to the example solrconfig.xml's, unit test updates, some additional 
updates to the tutorials and docs (also incorporating [~ctargett]'s), and 
updating the admin UI (query section) to handle the new defaults.

The only issue I'm running into is that for some reason I haven't figured out 
yet, turning "indent" on has broken some of the parent/child relationship tests 
(i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, 
SolrExampleTests.testChildDocTransformer. It initially appears to be some xml 
parsing issue issue with the extra whitespace, which would be odd, but I 
haven't dug in yet.  Once I figure those out, I'll update the patch, and then I 
think this will be ready for review.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-26 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494.patch

Ok, I think I'm nearly done. This patch includes removing all the extraneous 
"wt=json" and "indent=on" references, adding a commented out version of 
"wt=xml" to the example solrconfig.xml's, unit test updates, some additional 
updates to the tutorials and docs (also incorporating [~ctargett]'s), and 
updating the admin UI (query section) to handle the new defaults.

The only issue I'm running into is that for some reason I haven't figured out 
yet, turning "indent" on has broken some of the parent/child relationship tests 
(i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, 
SolrExampleTests.testChildDocTransformer. It initially appears to be some xml 
parsing issue issue with the extra whitespace, which would be odd, but I 
haven't dug in yet.  Once I figure those out, I'll update the patch, and then I 
think this will be ready for review.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release planning for 7.0

2017-06-25 Thread Trey Grainger
Anshum,

I'll be working on what I hope is a final patch for SOLR-10494 (Change
default response format from xml to json) today. I expect to have it
uploaded in the late evening US time. It will still need to be reviewed and
(if acceptable) committed. It feels to me like the kind of change that
should only be made in a major release due to back-compat concerns.

If this can make it in after the branch is created, then no problem, but
otherwise it might be worth waiting another day before branching. Up to you.

-Trey

On Sat, Jun 24, 2017 at 4:52 PM, Anshum Gupta 
wrote:

> I'll create the 7x, and 7.0 branches *tomorrow*.
>
> Ishan, do you mean you would be able to close it by Tuesday? You would
> have to commit to both 7.0, and 7.x, in addition to master, but I think
> that should be ok.
>
> We also have SOLR-10803 open at this moment and we'd need to come to a
> decision on that as well in order to move forward with 7.0.
>
> P.S: If there are any objections to this plan, kindly let me know.
>
> -Anshum
>
> On Fri, Jun 23, 2017 at 5:03 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Hi Anshum,
>>
>>
>> > I will send out an email a day before cutting the branch, as well as
>> once the branch is in place.
>> I'm right now on travel, and unable to finish SOLR-10574 until Monday
>> (possibly Tuesday).
>> Regards,
>> Ishan
>>
>> On Tue, Jun 20, 2017 at 5:08 PM, Anshum Gupta 
>> wrote:
>>
>>> From my understanding, there's not really a 'plan' but some intention to
>>> release a 6.7 at some time if enough people need it, right? In that case I
>>> wouldn't hold back anything for a 6x line release and cut the 7x, and 7.0
>>> branches around, but not before the coming weekend. I will send out an
>>> email a day before cutting the branch, as well as once the branch is in
>>> place.
>>>
>>> If anyone has any objections to that, do let me know.
>>>
>>> Once that happens, we'd have a feature freeze on the 7.0 branch but we
>>> can take our time to iron out the bugs.
>>>
>>> @Alan: Thanks for informing. I'll make sure that LUCENE-7877 is
>>> committed before I cut the branch. I have added the right fixVersion to the
>>> issue.
>>>
>>> -Anshum
>>>
>>>
>>>
>>> On Mon, Jun 19, 2017 at 8:33 AM Erick Erickson 
>>> wrote:
>>>
 Anshum:

 I'm one of the people that expect a 6.7 release, but it's more along
 the lines of setting expectations than having features I really want
 to get in to the 6x code line. We nearly always have "just a few
 things" that someone would like to put in, and/or a bug fix or two
 that surfaces.

 I expect people to back-port stuff they consider easy/beneficial to
 6.x for "a while" as 7.0 solidifies, at their discretion of course.
 Think of my position as giving people a target for tidying up 6.x
 rather than a concrete plan ;). Just seems to always happen.

 And if there is no 6.7, that's OK too. Additions to master-2 usually
 pretty swiftly stop as the hassle of merging any change into 3 code
 lines causes people to pick what goes into master-2 more carefully ;)

 Erick

 On Mon, Jun 19, 2017 at 8:03 AM, Alan Woodward  wrote:
 > I’d like to get https://issues.apache.org/jira/browse/LUCENE-7877 in
 for 7.0
 > - should be able to commit in the next couple of days.
 >
 > Alan Woodward
 > www.flax.co.uk
 >
 >
 > On 19 Jun 2017, at 15:45, Anshum Gupta 
 wrote:
 >
 > Hi everyone,
 >
 > Here's the update about 7.0 release:
 >
 > There are still  unresolved blockers for 7.0.
 > Solr (12):
 > https://issues.apache.org/jira/browse/SOLR-6630?jql=
 project%20%3D%20Solr%20AND%20fixVersion%20%3D%20%
 22master%20(7.0)%22%20and%20resolution%20%3D%
 20Unresolved%20and%20priority%20%3D%20Blocker
 >
 > Lucene (None):
 > https://issues.apache.org/jira/issues/?jql=project%20%
 3D%20%22Lucene%20-%20Core%22%20AND%20fixVersion%20%3D%20%
 22master%20(7.0)%22%20AND%20resolution%20%3D%
 20Unresolved%20AND%20priority%20%3D%20Blocker
 >
 > Here are the ones that are unassigned:
 > https://issues.apache.org/jira/browse/SOLR-6630
 > https://issues.apache.org/jira/browse/SOLR-10887
 > https://issues.apache.org/jira/browse/SOLR-10803
 > https://issues.apache.org/jira/browse/SOLR-10756
 > https://issues.apache.org/jira/browse/SOLR-10710
 > https://issues.apache.org/jira/browse/SOLR-9321
 > https://issues.apache.org/jira/browse/SOLR-8256
 >
 > The ones that are already assigned, I'd request you to update the
 JIRA so we
 > can track it better.
 >
 > In addition, I am about to create another one as I wasn’t able to
 extend
 > SolrClient easily without a code duplication on master.
 >
 > This brings us to - 'when can we cut the branch'. I can 

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-25 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062350#comment-16062350
 ] 

Trey Grainger commented on SOLR-10494:
--

bq. Also should we mark this as a blocker for 7.0 to change it? - 
[~varunthacker]

I just updated it to be a blocker, Varun. I'm working on what should be the 
final patch today. Hopefully this can be reviewed and make it in for 7.0.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, 
> SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-25 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Priority: Blocker  (was: Minor)

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, 
> SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-23 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061521#comment-16061521
 ] 

Trey Grainger commented on SOLR-10494:
--

Thanks, [~ctargett]! I'm building off you patch and making final changes. Been 
a bit slammed this week and am unavailable to work on this for the next 24-36 
hours, but I expect to have the next (hopefully final, or close to it) patch 
pushed sometime on Sunday (in the U.S.).

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, 
> SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-20 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056131#comment-16056131
 ] 

Trey Grainger commented on SOLR-10494:
--

yes, I'll address all of the code/config changes above. I'll get the patch 
updated to include the indent=on change first (fixing unit tests now... were 
more that broke than I was expecting due to indention) and then do the cleanup 
of the configs, admin, readme's, as a follow on patch.

Once those are in, I can take a look at the ref-guide, website, and quickstart, 
though I'm afraid I may need some help pull all of those off in any reasonable 
timeframe for 7.0, as I'd expect there to be a lot of changes required there.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-11 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494

New patch fixing a precommit error. Comment earlier about unclosed resources 
was apparently pre-existing (those are warnings and not errors) and I just 
noticed it because of an unrelated error, so going to ignore those. Working on 
indent=on by default for next patch.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-11 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494

Initial patch, with all unit tests broken by the change now passing. Haven't 
changed to indent=on by default yet or removed setting of json explicitly in 
various places yet, though, as I've been trying to change one variable at a 
time to minimize complications.

For some reason, switching to json by default has caused ant precommit to 
complain about resource leak in about 60 places. I'm not sure what is causing 
these at the moment, but want to address that first before adding any 
additional changes to the patch.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-08 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042701#comment-16042701
 ] 

Trey Grainger commented on SOLR-10494:
--

Question: I'm making indent=on the default. Any objections if I make indent=on 
the default for all TextResponseWriters, or do I need to limit the change to 
only the "wt=json" (now default writer) case.

The writers impacted from what I can tell are:
GEOJSONWriter
JSONWriter
XMLWriter
SchemaXMLWriter
PHPWriter
PythonWriter
RubyWriter

It's a little complicated because most of these (geojson, php, python, ruby) 
actually inherit from the JSONWriter, so if I need to leave indent=off on those 
then I have to go in and set it explicitly on them since their base class will 
now have indent on by default.

Unless anyone objects, I'm just going to set indent=on by default on all of 
these. Please let me know if anyone disagrees.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-08 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042630#comment-16042630
 ] 

Trey Grainger commented on SOLR-10494:
--

Started working on this two weeks ago and then got busy. The actual changes 
were super quick, but after I made them it was taking over 2 hours to run the 
unit tests with lots of failures and several test suites timing out.

Just got back to this today and have pretty much everything diagnosed and am 
working on fixes. In short, SolrTestCaseJ4 has XPath checking hard-coded in its 
design, so I need to now pass in wt=xml explicitly there, and there are a 
handful of test suites (i.e. replication/backup/restore and hdfs) that are 
explicitly checking XML strings and looping forever until they get those 
strings back (hence timing out).

I'm making changes to explicitly request XML right now for those tests where 
they are expecting it and will get a patch posted hopefully today.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-05-16 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013015#comment-16013015
 ] 

Trey Grainger commented on SOLR-10494:
--

Hi [~janhoy]. Sorry - I missed you first message last week. Sure - I should be 
able to get a patch posted this weekend.

-Trey

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>    Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Change Default Response Format (wt) to JSON in Solr 7.0?

2017-04-14 Thread Trey Grainger
Thanks for the great feedback, everyone.

Since the update handler currently already smart-defaults the response type
like Yonik is describing based on the incoming content type (whether you
specify the content-type header or not), it seems that we won't need to
make any changes there.

I just summarized everyone's feedback into action items and submitted a
JIRA (SOLR-10494 <https://issues.apache.org/jira/browse/SOLR-10494>) for
further tracking. If you have further comments or if I missed anything,
feel free to reply there.

Thanks,

Trey Grainger
Co-author, Solr in Action
SVP of Engineering @ Lucidworks

On Fri, Apr 14, 2017 at 11:35 PM, David Smiley <david.w.smi...@gmail.com>
wrote:

> It's a neat idea to have the response format smart-defaulted based on the
> POST content-type.  +1 to that!
>
> On Fri, Apr 14, 2017 at 11:24 PM Yonik Seeley <ysee...@gmail.com> wrote:
>
>> Just a reminder that we have had indented JSON query responses by
>> default at the "/query" endpoint for years. That doesn't cover other
>> handlers though.
>> Readability/aesthetics of our docs/examples is where the biggest
>> deficiency lies - lots of XML examples that could have been JSON for a
>> long time now.  Hopefully this change would prevent new docs from
>> being written that use XML output format.
>>
>> Other thoughts:
>> - The /query endpoint should remain, no need to break everyone who has
>> been using it
>> - I assume sending XML to the existing update handler should perhaps
>> continue to return an XML response?
>> - I assume that it's desirable to have indentation by default... but
>> this is also a slight back compat change/break for people that
>> currently specify JSON and expect it un-indented (for some response
>> types, the difference could be large, like 2x).  If we go this way, we
>> need to add that to the CHANGES as well.
>>
>> -Yonik
>>
>>
>> On Fri, Apr 14, 2017 at 2:53 PM, Trey Grainger <solrt...@gmail.com>
>> wrote:
>> > Just wanted to throw this out there for discussion. Solr's default query
>> > response format is still XML, despite the fact that Solr has supported
>> the
>> > JSON response format for over a decade, developer mindshare has clearly
>> > shifted toward JSON over the years, and most modern/competing systems
>> also
>> > use JSON format now by default.
>> >
>> > In fact, Solr's admin UI even explicitly adds wt=json to the request (by
>> > default in the UI) to override the default of wt=xml, so Solr's Admin UI
>> > effectively has a different default than the API.
>> >
>> > We have now introduced things like the JSON faceting API, and the new
>> more
>> > modern /V2 apis assume JSON for the areas of Solr they cover, so clearly
>> > we're moving in the direction of JSON anyway.
>> >
>> > I'd like propose that we switch the default response writer to JSON
>> > (wt=json) instead of XML for Solr 7.0, as this seems to me like the
>> right
>> > direction and a good time to make this change with the next major
>> version.
>> >
>> > Before I create a JIRA and submit a patch, though, I wanted to check
>> here
>> > make sure there were no strong objections to changing the default.
>> >
>> > -Trey Grainger
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.
> solrenterprisesearchserver.com
>


[jira] [Created] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-04-14 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-10494:


 Summary: Switch Solr's Default Response Type from XML to JSON
 Key: SOLR-10494
 URL: https://issues.apache.org/jira/browse/SOLR-10494
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (7.0)
Reporter: Trey Grainger
Priority: Minor
 Fix For: master (7.0)


Solr's default response format is still XML, despite the fact that Solr has 
supported the JSON response format for over a decade, developer mindshare has 
clearly shifted toward JSON over the years, and most modern/competing systems 
also use JSON format now by default.

In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
default in the UI) to override the default of wt=xml, so Solr's Admin UI 
effectively has a different default than the API.

We have now introduced things like the JSON faceting API, and the new more 
modern /V2 apis assume JSON for the areas of Solr they cover, so clearly we're 
moving in the direction of JSON anyway.

I'd like propose that we switch the default response writer to JSON (wt=json) 
instead of XML for Solr 7.0, as this seems to me like the right direction and a 
good time to make this change with the next major version.

Based upon feedback from the Lucene Dev's mailing list, we want to:
1) Change the default response writer type to "wt=json" and also change to 
"indent=on" by default
2) Make no changes on the update handler side; it already works as desired (it 
returns the response in the same content-type as the request unless the "wt" is 
passed in explicitly).
3) Keep the /query request handler around since people have already used it for 
years to do JSON queries
4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
on how to change (back) the response format.

The default format change, plus the addition of "indent=on" are back compat 
changes, so we need to make sure we doc those clearly in the CHANGES.txt. There 
will also need to be significant adjustments to the Solr Ref Guide, Tutorial, 
etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Change Default Response Format (wt) to JSON in Solr 7.0?

2017-04-14 Thread Trey Grainger
Just wanted to throw this out there for discussion. Solr's default query
response format is still XML, despite the fact that Solr has supported the
JSON response format for over a decade, developer mindshare has clearly
shifted toward JSON over the years, and most modern/competing systems also
use JSON format now by default.

In fact, Solr's admin UI even explicitly adds wt=json to the request (by
default in the UI) to override the default of wt=xml, so Solr's Admin UI
effectively has a different default than the API.

We have now introduced things like the JSON faceting API, and the new more
modern /V2 apis assume JSON for the areas of Solr they cover, so clearly
we're moving in the direction of JSON anyway.

I'd like propose that we switch the default response writer to JSON
(wt=json) instead of XML for Solr 7.0, as this seems to me like the right
direction and a good time to make this change with the next major version.

Before I create a JIRA and submit a patch, though, I wanted to check here
make sure there were no strong objections to changing the default.

-Trey Grainger


[jira] [Commented] (SOLR-9529) Dates Dynamic Field Inconsistently Defined in Schemas

2016-09-17 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500294#comment-15500294
 ] 

Trey Grainger commented on SOLR-9529:
-

Hmm... things were more inconsistent than I thought. There were two fundamental 
kinds of inconsistencies:
1) Inconsistencies within a single schema.
--This is what I described in the issue description regarding "*_dts" being 
handled incorrectly. I submitted a pull request to fix this in the three places 
we actually define both singular and plural field types: 
solr/example/files/conf/managed-schema
solr/server/solr/configsets/basic_configs/conf/managed-schema
solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema

2) Inconsistencies across different schemas
While the three schemas listed above all separate out single valued and 
multiValued dynamic fields into different singular and plural field types, 
every other schema that ships with Solr only defines a single field type 
(string, boolean, etc.) and uses the dynamic field definition to determine 
whether the dynamic field should be single or multivalued. This works fine, of 
course, but is just inconsistent depending upon which schema file you actually 
end up using. 

Interestingly, the tech products example 
(solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema), 
which sits at the same level as the basic_configs and the 
data_driven_schema_configs, for some reason handles these definitions 
differently, only defining one field type for both single and multivalued 
fields (for all types). The following places do the same thing:
 
solr/core/src/test-files/solr/collection1/conf/schema-distrib-interval-faceting.xml
 solr/core/src/test-files/solr/collection1/conf/schema-docValuesFaceting.xml
 solr/core/src/test-files/solr/collection1/conf/schema-docValuesJoin.xml
 
solr/core/src/test-files/solr/collection1/conf/schema-non-stored-docvalues.xml
 solr/core/src/test-files/solr/collection1/conf/schema_latest.xml
 solr/example/example-DIH/solr/db/conf/managed-schema
 solr/example/example-DIH/solr/mail/conf/managed-schema
 solr/example/example-DIH/solr/rss/conf/managed-schema
 solr/example/example-DIH/solr/solr/conf/managed-schema
 solr/example/example-DIH/solr/tika/conf/managed-schema

So while my pull request fixes #1 so that all schemas are consistent with 
themselves, we still have inconsistency across the various schemas that ship 
with Solr in terms of what we name the field types for multivalued dynamic 
fields. If we are going to make these consistent, which way should we go - have 
a single field type for all single and multivalued fields (and define 
multivalued=true on the dynamic field definition instead), or separate out 
plural versions of the field type (booleans, strings, etc.) for multivalued 
fields?

> Dates Dynamic Field Inconsistently Defined in Schemas
> -
>
> Key: SOLR-9529
> URL: https://issues.apache.org/jira/browse/SOLR-9529
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Trey Grainger
>Priority: Minor
>
> There is a nice convention across all of the schemas that ship with Solr to 
> include field types for single valued fields (i.e. "string" -> "*_s", 
> "boolean" -> "*_b") and separate field types for multivalued fields (i.e. 
> "strings" -> "*_ss", "booleans" -> "*_bs"). Those definitions all follow the 
> pattern (using "string" as an example):
> 
>  multiValued="true"/>
> 
> 
> For some reason, however, the "date" field type doesn't follow this pattern, 
> and is instead defined (inconsistently) as follows:
>  precisionStep="0"/>
>  multiValued="true" precisionStep="0"/>
> 
>  stored="true"/>
> Note specifically that the "*_dts" field should instead be referencing the 
> "dates" type and not the "date" type, and that subsequently the 
> multiValued="true" setting would become unnecessary on the "*_dts" 
> dynamicField definition.
> I'll get a patch posted for this. Note that nothing is functionally broken, 
> it's just inconsistent and could be confusing for someone looking through the 
> schema or seeing their multivalued dates getting indexed into the field type 
> defined for single valued dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9529) Dates Dynamic Field Inconsistently Defined in Schemas

2016-09-17 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-9529:
---

 Summary: Dates Dynamic Field Inconsistently Defined in Schemas
 Key: SOLR-9529
 URL: https://issues.apache.org/jira/browse/SOLR-9529
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Trey Grainger
Priority: Minor


There is a nice convention across all of the schemas that ship with Solr to 
include field types for single valued fields (i.e. "string" -> "*_s", "boolean" 
-> "*_b") and separate field types for multivalued fields (i.e. "strings" -> 
"*_ss", "booleans" -> "*_bs"). Those definitions all follow the pattern (using 
"string" as an example):






For some reason, however, the "date" field type doesn't follow this pattern, 
and is instead defined (inconsistently) as follows:





Note specifically that the "*_dts" field should instead be referencing the 
"dates" type and not the "date" type, and that subsequently the 
multiValued="true" setting would become unnecessary on the "*_dts" dynamicField 
definition.

I'll get a patch posted for this. Note that nothing is functionally broken, 
it's just inconsistent and could be confusing for someone looking through the 
schema or seeing their multivalued dates getting indexed into the field type 
defined for single valued dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)

2016-09-17 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9480:

Attachment: SOLR-9480.patch

Initial patch to get the ball rolling here. Feature should now work as 
described in reference links in the description. Only real changes are an 
update from Solr 5.1.0 to master, and cleanup of most of the precommit issues.

Still plenty of work to do, particularly in reworking some of the 
multi-threading code to follow Solr conventions, reducing the number of files 
for helper classes, and eventually getting this working correctly in 
distributed mode (was originally built for use cases involving a single Solr 
core as a "representative model"). Would also be good to make a getting started 
tutorial with example data so its easier get started with the feature and do 
something interesting out of the box.

Will continue working on those items as I'm able. Feedback welcome.

> Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)
> --
>
> Key: SOLR-9480
> URL: https://issues.apache.org/jira/browse/SOLR-9480
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Trey Grainger
> Attachments: SOLR-9480.patch
>
>
> This issue is to track the contribution of the Semantic Knowledge Graph Solr 
> Plugin (request handler), which exposes a graph-like interface for 
> discovering and traversing significant relationships between entities within 
> an inverted index.
> This data model has been described in the following research paper: [The 
> Semantic Knowledge Graph: A compact, auto-generated model for real-time 
> traversal and ranking of any relationship within a 
> domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave 
> in October 2015 at [Lucene/Solr 
> Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine]
>  and November 2015 at the [Bay Area Search 
> Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/].
> The source code for this project is currently available at 
> [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at 
> CareerBuilder (where this was built) have given me the go-ahead to now 
> contribute this back to the Apache Solr Project, as well.
> Check out the Github repository, research paper, or presentations for a more 
> detailed description of this contribution. Initial patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)

2016-09-05 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-9480:
---

 Summary: Graph Traversal for Significantly Related Terms (Semantic 
Knowledge Graph)
 Key: SOLR-9480
 URL: https://issues.apache.org/jira/browse/SOLR-9480
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Trey Grainger


This issue is to track the contribution of the Semantic Knowledge Graph Solr 
Plugin (request handler), which exposes a graph-like interface for discovering 
and traversing significant relationships between entities within an inverted 
index.

This data model has been described in the following research paper: [The 
Semantic Knowledge Graph: A compact, auto-generated model for real-time 
traversal and ranking of any relationship within a 
domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave in 
October 2015 at [Lucene/Solr 
Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine]
 and November 2015 at the [Bay Area Search 
Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/].

The source code for this project is currently available at 
[https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at 
CareerBuilder (where this was built) have given me the go-ahead to now 
contribute this back to the Apache Solr Project, as well.

Check out the Github repository, research paper, or presentations for a more 
detailed description of this contribution. Initial patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2016-06-21 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343254#comment-15343254
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi [~krantiparisa] and [~dannytei1]. Apologies for the long lapse without a 
response on this issue. I won't get into the reasons here (combination of 
personal and professional commitments), but I just wanted to say that I expect 
to pick this issue back up in the near future and continue work on this patch.

In the meantime, I have added an ASL 2.0 license to the current code (from Solr 
in Action) so that folks can feel free to use what's there now: 
https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

I'll turn what's there now into a patch, update it to Solr trunk, and keep 
iterating on it until the folks commenting on this issue are satisfied with the 
design and capabilities. Stay tuned...

> Solr field type that supports multiple, dynamic analyzers
> -
>
> Key: SOLR-6492
> URL: https://issues.apache.org/jira/browse/SOLR-6492
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Trey Grainger
> Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9241) Rebalance API for SolrCloud

2016-06-21 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343228#comment-15343228
 ] 

Trey Grainger commented on SOLR-9241:
-

I'm also very excited to see this patch. For the next evolution of Solr's 
scalability (and ultimately auto-scaling), these are exactly the kinds of core 
capabilities we need for seamlessly scaling up/down, resharding, and 
redistributing shards and replicas across a cluster. 

The smart merge looks interesting - seems like effectively a way to index into 
a larger number of shards (for indexing throughput) while merging them into a 
smaller number of shards for searching, enabling scaling of indexing and 
searching resourced independently. This obviously won't work well with 
Near-Realtime Searching, but I'd be curious to hear more explanation about how 
this works in practice for SolrCloud clusters that don't need NRT search.

Agreed with Joel's comments about the update to trunk vs. 4.6.1. One thing that 
seems to have been added since 4.6.1 that probably overlaps with this patch is 
the Replica Placement Strategies (SOLR-6220) vs. the Allocation Strategies 
implemented here.

The rest of the patch seems like all new objects that don't overlap much with 
the current code base. Would be interesting to know how much has changed 
between 4.6.1 to 6.1 collections/SolrCloud-wise that would create conflicts 
with this patch. Am obviously hoping not too much...

Either way, very excited about the contribution and about the potential for 
getting these capabilities integrated into Solr.

> Rebalance API for SolrCloud
> ---
>
> Key: SOLR-9241
> URL: https://issues.apache.org/jira/browse/SOLR-9241
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.6.1
> Environment: Ubuntu, Mac OsX
>Reporter: Nitin Sharma
>  Labels: Cluster, SolrCloud
> Fix For: 4.6.1
>
> Attachments: rebalance.diff
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This is the v1 of the patch for Solrcloud Rebalance api (as described in 
> http://engineering.bloomreach.com/solrcloud-rebalance-api/) , built at 
> Bloomreach by Nitin Sharma and Suruchi Shah. The goal of the API  is to 
> provide a zero downtime mechanism to perform data manipulation and  efficient 
> core allocation in solrcloud. This API was envisioned to be the base layer 
> that enables Solrcloud to be an auto scaling platform. (and work in unison 
> with other complementing monitoring and scaling features).
> Patch Status:
> ===
> The patch is work in progress and incremental. We have done a few rounds of 
> code clean up. We wanted to get the patch going first to get initial feed 
> back.  We will continue to work on making it more open source friendly and 
> easily testable.
>  Deployment Status:
> 
> The platform is deployed in production at bloomreach and has been battle 
> tested for large scale load. (millions of documents and hundreds of 
> collections).
>  Internals:
> =
> The internals of the API and performance : 
> http://engineering.bloomreach.com/solrcloud-rebalance-api/
> It is built on top of the admin collections API as an action (with various 
> flavors). At a high level, the rebalance api provides 2 constructs:
> Scaling Strategy:  Decides how to move the data.  Every flavor has multiple 
> options which can be reviewed in the api spec.
> Re-distribute  - Move around data in the cluster based on capacity/allocation.
> Auto Shard  - Dynamically shard a collection to any size.
> Smart Merge - Distributed Mode - Helps merging data from a larger shard setup 
> into smaller one.  (the source should be divisible by destination)
> Scale up -  Add replicas on the fly
> Scale Down - Remove replicas on the fly
> Allocation Strategy:  Decides where to put the data.  (Nodes with least 
> cores, Nodes that do not have this collection etc). Custom implementations 
> can be built on top as well. One other example is Availability Zone aware. 
> Distribute data such that every replica is placed on different availability 
> zone to support HA.
>  Detailed API Spec:
> 
>   https://github.com/bloomreach/solrcloud-rebalance-api
>  Contributors:
> =
>   Nitin Sharma
>   Suruchi Shah
>  Questions/Comments:
> =
>   You can reach me at nitin.sha...@bloomreach.com



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Porting LTR plugin for Solr-5.5.0

2016-04-06 Thread Trey Grainger
Ahmet,

They included at 5x patch on the JIRA here:
https://issues.apache.org/jira/secure/attachment/12782146/SOLR-8542-branch_5x.patch
(it's
one of the files attached to the Jira). The JIRA has two patches included
on it, one for master (approximately 6.0), and the one I just linked to for
the 5x branch.

Assuming you checkout branch 5x (which should be approximately the same
code 5.5.0), then I would assume the Bloomberg patch would work.  I've
personally also back-ported it to 5.4.1, which required a fair number of
changes related to iterator changes on Scorers, but wasn't too much trouble.

Hopefully the patch above gives you what you need.

Trey Grainger
SVP of Engineering @ Lucidworks
Co-author, Solr in Action


On Wed, Apr 6, 2016 at 6:21 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/6/2016 1:55 PM, Ahmet Anil Pala wrote:
> > Hi Michael, thanks for answering and sorry for the late reply.
> >
> > By 5.x branch build, do you mean this
> > -> https://github.com/bloomberg/lucene-solr/tree/branch_5x
> >
> > It seems that LTR is not merged onto it and I doubt it is mergable
> > without changes as this is exactly what I have tried and failed. As
> > for your pull request SOLR-8542, it is supposed to be merged to the
> > master branch which is already the version 7.0.0. Is the LTR plugin
> > originally developed only compatible with Solr-6x and later or am I
> > missing something here?
>
> I think that something like this will not make it into 5.x.  With the
> 6.0.0 release just around the corner (probably a matter of days), 5.x
> goes into maintenance mode, and 4.x basically goes dormant.  Maintenance
> mode basically means that no significant changes will happen, especially
> changes that might affect stability.  A strict interpretation of this is
> "no new features", and this is typically the stance that most committers
> adopt for the previous major version.  It also means that only *major*
> bugs will be fixed, especially as time goes on.
>
> You've indicated that you couldn't merge it cleanly to branch_5x.  If
> the change requires significant work just to merge, and the
> functionality is already present in a later release, it's probably not
> going to happen.
>
> There are some important bugfixes that have already been committed to
> 5x, which should result in a 5.5.1 release in the near future, and it is
> entirely possible that somebody will volunteer to release 5.6.0,
> although the current changelog for 5.6.0 only has two issues, and
> neither is particularly interesting for most users.
>
> If somebody wants to volunteer to do the work, then what I've said might
> not apply at all.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Updated] (SOLR-8626) [ANGULAR] 404 error when clicking nodes in cloud graph view

2016-03-19 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-8626:

Attachment: SOLR-8626.patch

Attached a patch which fixes this issue. The issue existed in both the flat 
graph view and the radial view. Additionally, when one was in the radial view 
and clicked on the link for a node, it would switch back to flat graph view 
when navigating to the other node, so fixed that so that it preserves the 
user's current view type on the URL when navigating between node.

> [ANGULAR] 404 error when clicking nodes in cloud graph view
> ---
>
> Key: SOLR-8626
> URL: https://issues.apache.org/jira/browse/SOLR-8626
> Project: Solr
>  Issue Type: Bug
>  Components: UI
>Reporter: Jan Høydahl
>Assignee: Upayavira
> Attachments: SOLR-8626.patch
>
>
> h3. Reproduce:
> # {{bin/solr start -c}}
> # {{bin/solr create -c mycoll}}
> # Goto http://localhost:8983/solr/#/~cloud
> # Click a collection name in the graph -> 404 error. URL: 
> {{/solr/mycoll/#/~cloud}}
> # Click a shard name in the graph -> 404 error. URL: {{/solr/shard1/#/~cloud}}
> Only verified in Trunk, but probably exists in 5.4 as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-4905) Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes

2015-11-12 Thread Trey Grainger
Just to add another voice to the discussion, I have the exact same use case
described by Paul and Mikhail that I'm working through a Proof of Concept
for right now. I'd very much like to see the "single shard collection with
a replica on all nodes" restriction removed.

On Thu, Nov 12, 2015, 3:29 PM Mikhail Khludnev (JIRA) 
wrote:

>
> [
> https://issues.apache.org/jira/browse/SOLR-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002865#comment-15002865
> ]
>
> Mikhail Khludnev commented on SOLR-4905:
> 
>
> [~p...@search-solutions.net] would you mind to raise a separate jira for
> this?
>
> > Allow fromIndex parameter to JoinQParserPlugin to refer to a
> single-sharded collection that has a replica on all nodes
> >
> --
> >
> > Key: SOLR-4905
> > URL: https://issues.apache.org/jira/browse/SOLR-4905
> > Project: Solr
> >  Issue Type: Improvement
> >  Components: SolrCloud
> >Reporter: Philip K. Warren
> >Assignee: Timothy Potter
> > Fix For: 5.1, Trunk
> >
> > Attachments: SOLR-4905.patch, SOLR-4905.patch, patch.txt
> >
> >
> > Using a non-SolrCloud setup, it is possible to perform cross core joins (
> http://wiki.apache.org/solr/Join). When testing with SolrCloud, however,
> neither the collection name, alias name (we have created aliases to
> SolrCloud collections), or the automatically generated core name (i.e.
> _shard1_replica1) work as the fromIndex parameter for a
> cross-core join.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2015-03-03 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346292#comment-14346292
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi Kranti,

The design almost exactly as you described when you said have analysis chains 
defined in schema.xml and these chains could be resued between multiple fields 
and on each field there should be a way to conditionally chose the analysis 
chain. Specifically, each analysis chain is just defined as a FieldType, like 
you would define any analysis chain you were going assign to a field.

What I hadn't considered yet, however, was having the update processor choose 
choose the analyzers based upon a value in another field.  I had previously 
only been considering the case where a user would either:
1) Use an automatic language identifier update processor, or
2) Pass the language in directly in the content of the field. (i.e. field 
name=my_fielden,es|document content here/field). 

Having the ability to specify the key for the analyzers in a different field 
would probably be more user friendly, and this would be trivial to implement, 
so I can look to add it. Something like this:
field name=my_fielddocument content here/field
field name=languageen/field
field name=languagees/field

Is that what you were hoping for?

 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 5.0


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-10-30 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190715#comment-14190715
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi Sharon,

Your question was which code that will parse df=someMultiTextField|en,de
and decide which analysis chain to use. In short, since FieldTypes have
access to the schema but Analyzers and Tokenizers don't, I'm creating a new
FieldType which passes the schema into a new Analyzer, which can then pass
the schema into the new Tokenizer. When the Tokenizer is used, the
fieldname (string) and value (reader) are passed in, so it is possible to
pull the metadata (|en,de) off of either of these and dynamically choose
a new analysis chain analyzer from the schema at that time.

I've done this work already for pulling data out of the field content (so I
know that works), but pulling the metadata from the fieldname is still
pending (I'm hoping to work on it this weekend). If you want to see what
I've done thusfar, you can look on github at MultiTextField,
MultiTextFieldAnalyzer, and MultiTextFieldTokenizer:
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextField.java
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldAnalyzer.java
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldTokenizer.java

I have some questions / feedback on your proposed solution... I'm hopping
on a plane now but will post them later tonight.

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @CareerBuilder


On Thu, Oct 30, 2014 at 7:32 AM, Sharon Krisher (JIRA) j...@apache.org



 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 5.0


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-10-30 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191418#comment-14191418
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi Sharon,

In terms of your suggestion, I do think that using local params to pass in
the language could be a more user-friendly solution than requiring them to
put the params on the field name: i.e. q={!langs=en|de}hello worlddf=text
vs. q=hello worlddf=text|en,de, though the syntax may get a bit weird if
you want to specify different languages for different fields. For example,
if using the edsimax query parser, you would need to do something like
q={!langs=text1:en,de|text2:en,zh}hello worldqf=text1 text2 vs. just
q=hello worldqf=text1|en,de text2|en|zh.

For the most simple use-case (every field uses the same language), or for
the use-case where you don't know what fields the user is querying on
up-front, I think the local params syntax would be preferred for end-users.
There is a big down-side to doing this, however: it requires you to
implement a qparser to parse this data and put it somewhere that the
Analyzer can see. This means that your multi-lingual field would only be
searchable with your custom query parser (whereas if the determination of
the language is passed in as part of the field name or content as I
described, it should work seamlessly with all of the query parsers, since
the data gets passed through all the way to the Analyzer).

Your solution with the ThreadLocal storage of the data is interesting...
I'm not positive whether it will work or not (i.e. does the analyzer always
run on the same thread as the incoming request for both queries and
indexing, and will that also continue to be the case into the future)? I
know that threads are at least re-used across requests and that the
TokenStreamComponents for analyzers are re-used in a threadlocal pool, but
that just means you'd have to be very careful about not caching or reusing
languages across requests, not that it couldn't work. Also, just out of
curiosity, how do you plan to pass the languages in at index time?

The Analyzer/Tokenizers only accept the fieldname (string) and the field
content (reader) as parameters, so passing in additional parameters through
a threadlocal seems like a bit of a hack that violates the design there
(though arguably that design is too restrictive and should change). I'd be
curious if anyone else thinks this would work...

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @CareerBuilder





 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 5.0


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-6492:
---

 Summary: Solr field type that supports multiple, dynamic analyzers
 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 4.11


A common request - particularly for multilingual search - is to be able to 
support one or more dynamically-selected analyzers for a field. For example, 
someone may have a content field and pass in a document in Greek (using an 
Analyzer with Tokenizer/Filters for German), a separate document in English 
(using an English Analyzer), and possibly even a field with mixed-language 
content in Greek and English. This latter case could pass the content 
separately through both an analyzer defined for Greek and another Analyzer 
defined for English, stacking or concatenating the token streams based upon the 
use-case.

There are some distinct advantages in terms of index size and query performance 
which can be obtained by stacking terms from multiple analyzers in the same 
field instead of duplicating content in separate fields and searching across 
multiple fields. 

Other non-multilingual use cases may include things like switching to a 
different analyzer for the same field to remove a feature (i.e. turning on/off 
query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258
 ] 

Trey Grainger commented on SOLR-6492:
-

I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextFieldde,fr|some other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and concatenating token 
streams.

3) Possibly add the ability to switch analyzers in the middle of input text:
field name=someMultiTextFieldde,fr|some other el|text/field

4) Extensive unit testing

 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 4.11


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258
 ] 

Trey Grainger edited comment on SOLR-6492 at 9/8/14 11:55 PM:
--

I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextField|de,frsome other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and concatenating token 
streams.

3) Possibly add the ability to switch analyzers in the middle of input text:
field name=someMultiTextFieldde,fr|some other el|text/field

4) Extensive unit testing


was (Author: solrtrey):
I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextFieldde,fr|some other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980597#comment-13980597
 ] 

Trey Grainger commented on SOLR-2894:
-

[~markrmil...@gmail.com] said:
We should get this in to get more feedback. Wish I had some time to tackle 
it, but I won't in the near term. 

Is there a committer who has interest in this issue and would be willing to 
look over it for (hopefully) getting it pushed into trunk?  It's the top voted 
for and the top watched issue in Solr right now, so there's clearly a lot of 
community interest. Thanks!

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980662#comment-13980662
 ] 

Trey Grainger commented on SOLR-2894:
-

Hi [~otis], I appreciate your interest here. That's correct: no previously 
working behavior was changed, and there are two things added with this patch: 
1) distributed support, and 2) support for a single-level pivot facets (this 
previously threw an exception but is now supported: 
facet.pivot=aSingleFieldName).

For context on #2, we found no good reason to disallow a single-level pivot 
facet (functions like to a field facet but with the pivot facet output format), 
it made implementing distributed pivot faceting easier since a single level 
could be considered when refining, and there was work in some downstream issues 
like SOLR-3583 (adding percentiles and other stats to pivot facets) which was 
dependent upon being able to easily alternate between any number of facet 
levels for analytics purposes, so we just added the support for a single level. 
This also makes it easier to build analytics tools without having to 
arbitrarily alternate between field facets and pivot facets and their 
corresponding output formats based upon the number of levels.

The end result is that no previously working capabilities have been modified, 
but distributed support for any number of pivot levels has been added.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980662#comment-13980662
 ] 

Trey Grainger edited comment on SOLR-2894 at 4/25/14 3:54 AM:
--

Hi Otis, I appreciate your interest here. That's correct: no previously working 
behavior was changed, and there are two things added with this patch: 1) 
distributed support, and 2) support for a single-level pivot facets (this 
previously threw an exception but is now supported: 
facet.pivot=aSingleFieldName).

For context on #2, we found no good reason to disallow a single-level pivot 
facet (functions like a field facet but with the pivot facet output format), it 
made implementing distributed pivot faceting easier since a single level could 
be considered when refining, and there was work in some downstream issues like 
SOLR-3583 (adding percentiles and other stats to pivot facets) which was 
dependent upon being able to easily alternate between any number of facet 
levels for analytics purposes, so we just added the support for a single level. 
This also makes it easier to build analytics tools without having to 
arbitrarily alternate between field facets and pivot facets and their 
corresponding output formats based upon the number of levels.

The end result is that no previously working capabilities have been modified, 
but distributed support for any number of pivot levels has been added, which 
should make this safe to commit to trunk.


was (Author: solrtrey):
Hi [~otis], I appreciate your interest here. That's correct: no previously 
working behavior was changed, and there are two things added with this patch: 
1) distributed support, and 2) support for a single-level pivot facets (this 
previously threw an exception but is now supported: 
facet.pivot=aSingleFieldName).

For context on #2, we found no good reason to disallow a single-level pivot 
facet (functions like to a field facet but with the pivot facet output format), 
it made implementing distributed pivot faceting easier since a single level 
could be considered when refining, and there was work in some downstream issues 
like SOLR-3583 (adding percentiles and other stats to pivot facets) which was 
dependent upon being able to easily alternate between any number of facet 
levels for analytics purposes, so we just added the support for a single level. 
This also makes it easier to build analytics tools without having to 
arbitrarily alternate between field facets and pivot facets and their 
corresponding output formats based upon the number of levels.

The end result is that no previously working capabilities have been modified, 
but distributed support for any number of pivot levels has been added.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-17 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973094#comment-13973094
 ] 

Trey Grainger commented on SOLR-2894:
-

After nearly 2 years of on-and-off development, I think this patch is finally 
ready to be committed. Brett's most recent patch includes significant 
performance improvements as well as fixes to all of the reported issues and 
edge cases mentioned by the others currently using this patch. We have just 
finished a large spike of work to get this ready for commit, so I'd love to get 
it pushed in soon unless there are any objections.

[~ehatcher], do you have any time to review this for suitability to be 
committed (since you are the reporter)? If there is anything additional that 
needs to be changed, I'll happily sign us up (either myself or someone on my 
team at CareerBuilder) to do it it will help.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tim Potter as Lucene/Solr committer

2014-04-08 Thread Trey Grainger
Congrats, Tim! Very, very awesome.

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @ CareerBuilder


On Tue, Apr 8, 2014 at 10:11 AM, Timothy Potter thelabd...@gmail.comwrote:

 This is awesome! Thank you, what an honor to be working with such an
 amazing group of engineers.

 bio: I work at LucidWorks focusing most of my time on Solr. Most
 recently, I've been focused on testing / hardening SolrCloud in a
 large-scale cluster to support 100's of collections and billions of
 docs. I'm working on SOLR-5495 and 5468 and hope to contribute more to
 the unit/integration tests for SolrCloud in the coming months. I've
 also worked with Steve Rowe on the RestManager stuff coming in 4.8
 (SOLR-5653).

 Prior to LucidWorks, I was an architect on the Big Data team at Dachis
 Group, where I focused on large-scale machine learning, text mining,
 and social network analysis problems. At Dachis Group, I designed and
 operated a 36-node SolrCloud cluster (~900M docs) running in AWS. I
 dabble in dev-ops. Lastly, I'm the co-author of Solr in Action with
 Trey. https://www.linkedin.com/in/thelabdude

 Cheers,
 Tim

 On Mon, Apr 7, 2014 at 10:40 PM, Steve Rowe sar...@gmail.com wrote:
  I'm pleased to announce that Tim Potter has accepted the PMC's
 invitation to become a committer.
 
  Tim, it's tradition that you introduce yourself with a brief bio.
 
  Once your account has been created - could take a few days - you'll be
 able to add yourself to the committers section of the Who We Are page on
 the website: http://lucene.apache.org/whoweare.html (use the ASF CMS
 bookmarklet at the bottom of the page here: 
 https://cms.apache.org/#bookmark - more info here 
 http://www.apache.org/dev/cms.html).
 
  Check out the ASF dev page - lots of useful links: 
 http://www.apache.org/dev/.
 
  Congratulations and welcome!
 
  Steve

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-18 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939836#comment-13939836
 ] 

Trey Grainger commented on SOLR-5856:
-

Hi Steve - thanks so much for getting this committed so quickly! Everything
looks great, except for the 4 book layout in the slideshow doesn't render
well for me in Chrome on either Windows or a Mac (the fourth book wraps to
the next line). IE, Firefox, and Safari all looked good, though.
https://www.dropbox.com/s/hkcz8xzxtgfvexw/4Books.png

I'd guess other Chrome users are likely seeing the same thing.






 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-18 Thread Trey Grainger
A few minutes after I sent my e-mail the page started rendering correctly
in Chrome on both my windows and mac computers, so either this was just
fixed or there was some temporary weirdness (perhaps on my end). At any
rate, it looks good for me now. Thanks!


On Tue, Mar 18, 2014 at 5:36 PM, Trey Grainger (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939836#comment-13939836]

 Trey Grainger commented on SOLR-5856:
 -

 Hi Steve - thanks so much for getting this committed so quickly! Everything
 looks great, except for the 4 book layout in the slideshow doesn't render
 well for me in Chrome on either Windows or a Mac (the fourth book wraps to
 the next line). IE, Firefox, and Safari all looked good, though.
 https://www.dropbox.com/s/hkcz8xzxtgfvexw/4Books.png

 I'd guess other Chrome users are likely seeing the same thing.






  Add new Solr book to the Solr homepage
  --
 
  Key: SOLR-5856
  URL: https://issues.apache.org/jira/browse/SOLR-5856
  Project: Solr
   Issue Type: Improvement
   Components: documentation
 Affects Versions: 4.7
  Environment: https://lucene.apache.org/solr/
 Reporter: Trey Grainger
 Assignee: Steve Rowe
 Priority: Minor
  Fix For: 4.8
 
  Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png
 
 
  A new Solr book (Solr in Action) has just been published by Manning
 publications (release date 3/15). I am providing the patch to update the
 website pages corresponding to the slideshow on
 https://lucene.apache.org/solr/ and
 https://lucene.apache.org/solr/books.html . The patch has updates to
 html/text files and there is a binary image file as well.



 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933162#comment-13933162
 ] 

Trey Grainger commented on SOLR-5856:
-

Hi Alexandre,

I agree with you. It looks like there are two Solr 3.x books, and the older one 
has already been previously cut from the rotating slideshow. At this point, I 
think the other 3.x book is going to have to be bumped. The good news is that 
those authors are working on a 4.x refresh that should be released in a few 
months, so they'll likely be back up there soon.

Of course, all of the books are still on the books page, just not in the 
Latest books published about Apache Solr list in the header slideshow.

The patch I included makes bumps the 3x book and inserts Solr in Action.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934381#comment-13934381
 ] 

Trey Grainger commented on SOLR-5856:
-

That makes sense... I agree that is probably a better user experience to link 
to the books page. I'll update all of the slideshow links to point to the books 
page and resubmit the patch shortly.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5856:


Attachment: SOLR-5856.patch

This updated patch modifies the slideshow to link to the books.html page as 
opposed to going directly to the Publisher's page (as requested by Hoss and 
Uwe).

In order to make the site more consistent (since we're now making more than 
just the change to add Solr in Action), I also made the images for each of the 
books on the books.html page also clickable as a link to the publisher's page 
in order to increase the likelihood of a click-through. One of the books 
already did this, but it was missing on the others, and this is one of the 
things visitors are probably most likely to click on to try to get the book.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934528#comment-13934528
 ] 

Trey Grainger commented on SOLR-5856:
-

@Alexandre,
Yeah, making the homepage links go to a secondary books page probably will 
detract from both SEO and sales, but it's a better user experience for those 
visiting the Solr homepage, no? One silver lining is that it makes the books 
page more prominent by still having the recent book pictures on the homepage 
linking over the books page, making it easier to find and compare each of the 
different books.

@Hossman
Thanks for tentatively signing up to commit this. If you see anything else that 
needs changing, please let me know and I'd be happy to put together another 
patch.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-12 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-5856:
---

 Summary: Add new Solr book to the Solr homepage
 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7


A new Solr book (Solr in Action) has just been published by Manning 
publications (release date 3/15). I am providing the patch to update the 
website pages corresponding to the slideshow on https://lucene.apache.org/solr/ 
and https://lucene.apache.org/solr/books.html . The patch has updates to 
html/text files and there is a binary image file as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-12 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5856:


Attachment: SOLR-5856.patch

Patch attached. Uploading the image separately.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-12 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5856:


Attachment: book_sia.png

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stats vs Analytics

2014-02-11 Thread Trey Grainger
Just to add more discussion to the mix, we're also building/using this at
CareerBuilder:
Percentiles for facets, pivot facets, and distributed pivot facets
https://issues.apache.org/jira/browse/SOLR-3583

It is an extension to (distributed pivot) faceting that allows stats to be
collected within the faceting component. We built it with the following
needs:
1) Supports pivot faceting (stats at each level)
2) Supports distributed statistical operations

If you look at slide 41 of this presentation, you'll get a really good feel
for what this patch does:
http://www.slideshare.net/treygrainger/building-a-real-time-big-data-analytics-platform-with-solr

The primary focus initially was on calculating percentiles of numerical
values in a distributed way (using bucketing similar to range faceting),
but we are also in the process of adding distributed sum. Other
distributable calculations are possible, we just haven't needed them yet so
we haven't added them.

-Trey


On Tue, Feb 11, 2014 at 2:24 PM, Steve Molloy smol...@opentext.com wrote:

 Trying to make sense of all issues around this and not sure which way to
 go. Both Stats and Analytics component are missing some features I would
 need. Stats cannot limit or order facets for instance, and I'd like to see
 pivot support. On the other end Analytics doesn't support distribution at
 all, which is a must in my case.

 So, I guess what I'm trying to ask is whether I should look at extending
 Stats or Analytics? Which way is the community going for future releases?
 (Would share any extension, but that would be useless if done on the wrong
 component).

 Thanks,
 Steve

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-02-07 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894547#comment-13894547
 ] 

Trey Grainger commented on SOLR-2894:
-

FYI, the last distributed pivot facet patch functionally works, but there are 
some sub-optimal data structures being used and some unnecessary duplicate 
processing of values.  As a result, we found that for certain worst-case 
scenarios (i.e. data is not randomly distributed across Solr cores and requires 
significant refinement) pivot facets with multiple levels could take over a 
minute to aggregate and process results. This was using a dataset of several 
hundred million documents and dozens of pivot facets across 120 Solr cores 
distributed over 20 servers, so it is a more extreme use-case than most will 
encounter.

Nevertheless, we've refactored the code and data structures and brought the 
processing time from over a minute down to less than a second using the above 
configuration. We plan to post the patch within the next week.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.7

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-02-07 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895413#comment-13895413
 ] 

Trey Grainger commented on SOLR-2894:
-

Thanks, Yonik. I worked on the architecture and design, but it's really been a 
team effort by several of us at CB. Chris worked with the initial patch, Andrew 
hardened it, and Brett (who will post the next version) focused on the 
soon-to-be-posted performance optimizations. We're deploying the new version to 
production right now to sanity check it before posting the patch, but I think 
the upcoming version will finally be ready for review for committing.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.7

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-12-04 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839665#comment-13839665
 ] 

Trey Grainger commented on SOLR-5027:
-

Interesting.  I've been playing around with the Collapsing QParser and, because 
of the reason Gabe mentioned, I can think very few use cases for it in it's 
current implementation.  Specifically, because there is no way to break a tie 
between multiple documents with the same value (the way sorting does), a search 
that is sorted by score desc, modifieddt desc (newer documents break the tie) 
is not possible... it just collapses based upon the first document in the index 
with the duplicate score.  Many of my use cases are even trickier... something 
like sort by displaypriority desc, score desc, modifieddt desc.

Just brainstorming here, but if sorting documents before collapsing is not 
possible (due to where in the code stack the collapsing occurs), then it might 
be possible to just implement a sort function (ValueSource) that gave an 
ordinal score to each document based upon the position it would occur within 
all documents.  If I understand what you mean when you say group head 
selection based upon the min/max of the function, then this would effectively 
allow collapsing sorted values, because the sort function would return higher 
values for documents which would sort higher.  In that case, the sort function 
(which could read in the current sort parameter from the search request) could 
even be the default used by collapsing, since that is probably what user's are 
expecting to happen (this is consistent with how grouping works, for example).

Thoughts?

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-12-04 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839754#comment-13839754
 ] 

Trey Grainger commented on SOLR-5027:
-

Thinking more about this more, it's probably going to be hard to implement an 
efficient sort ValueSource, as it would probably have to loop through all 
docs in the index during construction and sort them, caching the sort order for 
all docs so that it is available later when the value for each document is 
asked for separately.

It would probably functionally work, but it seems like there's got to be a 
better way in the Collapse QParser itself...

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5524) Exception when using Query Function inside Scale Function

2013-12-02 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-5524:
---

 Summary: Exception when using Query Function inside Scale Function
 Key: SOLR-5524
 URL: https://issues.apache.org/jira/browse/SOLR-5524
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7


If you try to use the query function inside the scale function, it throws the 
following exception:
org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo

Here is an example request that invokes this:
http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5524) Exception when using Query Function inside Scale Function

2013-12-02 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837319#comment-13837319
 ] 

Trey Grainger commented on SOLR-5524:
-

I just debugged the code and uncovered the problem.  There is a Map (called 
context) that is passed through to each value source to store intermediate 
state, and both the query and scale functions are passing the ValueSource for 
the query function in as the KEY to this Map (as opposed to using some 
composite key that makes sense in the current context).  Essentially, these 
lines are overwriting each other:

Inside ScaleFloatFunction: context.put(this.source, scaleInfo);  //this.source 
refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object
Inside QueryValueSource: context.put(this, w); //this refers to the same 
QueryValueSource from above, and the w refers to a Weight object

As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the 
context Map, it unexpectedly pulls the Weight object out instead and thus the 
invalid case exception occurs.  The NoOp multiplication works because it puts 
an different ValueSource between the query and the ScaleFloatFunction such 
that this.source (in ScaleFloatFunction) != this (in QueryValueSource).

 Exception when using Query Function inside Scale Function
 -

 Key: SOLR-5524
 URL: https://issues.apache.org/jira/browse/SOLR-5524
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7


 If you try to use the query function inside the scale function, it throws the 
 following exception:
 org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
 org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
 Here is an example request that invokes this:
 http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5524) Exception when using Query Function inside Scale Function

2013-12-02 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5524:


Attachment: SOLR-5524.patch

Simple patch.  Just changing the ScaleFloatFunction function to use itself as 
the key instead of the ValueSource it is using internally (it's first 
parameter).  This seems consistent with how other ValueSources (such as the 
QueryValueSource) work, and it fixes the issue at hand.

 Exception when using Query Function inside Scale Function
 -

 Key: SOLR-5524
 URL: https://issues.apache.org/jira/browse/SOLR-5524
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5524.patch


 If you try to use the query function inside the scale function, it throws the 
 following exception:
 org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
 org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
 Here is an example request that invokes this:
 http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787277#comment-13787277
 ] 

Trey Grainger commented on SOLR-4478:
-

(moving this from my previous e-mail to the solr-dev mailing list)

There are two use-cases that appear broken with the new core auto-discovery 
mechanism:

1) *The Core Admin Handler's CREATE command no longer works to create brand new 
cores* 
(unless you have logged on the box and created the core's directory structure 
manually, which largely defeats the purpose of the CREATE command).  With the 
old Solr.xml format, we could spin up as many cores as we wanted to dynamically 
with the following command:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore1instanceDir=collection1dataDir=newCore1/data
...
http://localhost:8983/solr/admin/cores?action=CREATEname=newCoreNinstanceDir=collection1dataDir=newCoreN/data

In the new core discovery mode, this exception is now thrown:
Error CREATEing SolrCore 'newCore1': Could not create a new core in 
solr/collection1/as another core is already defined there

The exception is being intentionally thrown in CorePropertiesLocator.java 
because a core.properties file already exists in solr/collection1 (and only one 
can exist per directory).


2) *Having a shared configuration directory (instanceDir) across many cores no 
longer works.*  
Every core has to have it's own conf/ directory, and this doesn't seem to be 
overridable any longer.  Previously, it was possible to have many cores share 
the same instanceDir (and just override their dataDir for obvious reasons).  
Now, it is necessary to copy and paste identical config files for each Solr 
core.


I don't know if there's already a current roadmap for fixing this.  I saw 
https://issues.apache.org/jira/browse/SOLR-4478, which suggested replacing 
instanceDir with the ability to specify a named configSet.  This solves problem 
2, but not problem1 (since you still can't have multiple core.properties files 
in the same folder).  Based on Erick's comments in the JIRA ticket, it also 
sounds like this ticket is also dead at the moment.

There is definitely a need to have a shared config directory - whether that is 
through a configSet or an explicit indexDir doesn't matter to me.  There's also 
a need to be able to dynamically create Solr cores from external systems.  I 
currently can't upgrade to core auto discovery because it doesn't allow dynamic 
core creation.  Does anyone have some thoughts on how to best get these 
features working again under core autodiscovery?  Adding instanceDir to 
core.properties seems like an easy solution, but there must be a desire not to 
do that or it would probably have already been done.

I'm happy to contribute some time to resolving this if there is agreed upon 
path forward.

 Allow cores to specify a named config set in non-SolrCloud mode
 ---

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787278#comment-13787278
 ] 

Trey Grainger commented on SOLR-4478:
-

(Eric's response to my post)

Right, let's move this discussion to SOLR-4779. There's some history
here. Sharing named config sets got a bit wrapped up in sharing the
underlying solrconfig object. This latter has been taken off the
table, but we should discuss fixing Trey's issues up. Here's what the
thinking was:
There would be a directory like solr_home/configs/configset1,
solr_home/configs/configset2, etc. Then a new parameter for
core.properties or create or whatever like configset=configset1 that
would be smart enough to look in solr_home/configs for an entire
conf directory named configste1.

Trey:
Does that work for your case? If so, please add your comments to 4779
and we can take it from there. FWIW, I don't think this is especially
hard, but time is always at a premium.

 Allow cores to specify a named config set in non-SolrCloud mode
 ---

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787278#comment-13787278
 ] 

Trey Grainger edited comment on SOLR-4478 at 10/5/13 5:47 PM:
--

(Erick's response to my post)

Right, let's move this discussion to SOLR-4779. There's some history
here. Sharing named config sets got a bit wrapped up in sharing the
underlying solrconfig object. This latter has been taken off the
table, but we should discuss fixing Trey's issues up. Here's what the
thinking was:
There would be a directory like solr_home/configs/configset1,
solr_home/configs/configset2, etc. Then a new parameter for
core.properties or create or whatever like configset=configset1 that
would be smart enough to look in solr_home/configs for an entire
conf directory named configste1.

Trey:
Does that work for your case? If so, please add your comments to 4779
and we can take it from there. FWIW, I don't think this is especially
hard, but time is always at a premium.


was (Author: solrtrey):
(Eric's response to my post)

Right, let's move this discussion to SOLR-4779. There's some history
here. Sharing named config sets got a bit wrapped up in sharing the
underlying solrconfig object. This latter has been taken off the
table, but we should discuss fixing Trey's issues up. Here's what the
thinking was:
There would be a directory like solr_home/configs/configset1,
solr_home/configs/configset2, etc. Then a new parameter for
core.properties or create or whatever like configset=configset1 that
would be smart enough to look in solr_home/configs for an entire
conf directory named configste1.

Trey:
Does that work for your case? If so, please add your comments to 4779
and we can take it from there. FWIW, I don't think this is especially
hard, but time is always at a premium.

 Allow cores to specify a named config set in non-SolrCloud mode
 ---

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787282#comment-13787282
 ] 

Trey Grainger edited comment on SOLR-4478 at 10/5/13 5:50 PM:
--

Hi Erick,

Yes, that resolves the hardest of the two problems.  The other issue is that 
since a dedicated folder is now required per-core (to hold the core.properties 
file), the core _CREATE_ command needs to now also be able to create the folder 
for the new core if it doesn't exist.  Something like:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore; 
*coreDir=cores/newCore* configset=sharedconfig

Alternatively, _instanceDir_ could continue to serve that function (instead of 
being deprecated):
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore; 
*instanceDir=cores/newCore* configset=sharedconfig

I think the combination of adding configSet and adding the ability for the 
CREATE command to actually create the new folder to hold core.properties should 
handle the use case.


was (Author: solrtrey):
Hi Erick,

Yes, that resolves the hardest of the two problems.  The other issue is that 
since a dedicated folder is now required per-core (to hold the core.properties 
file), the core _CREATE_ command needs to now also be able to create the folder 
for the new core if it doesn't exist.  Something like:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*coreDir=cores/newCore*configset=sharedconfig

Alternatively, _instanceDir_ could continue to serve that function (instead of 
being deprecated):
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*instanceDir=cores/newCore*configset=sharedconfig

I think the combination of adding configSet and adding the ability for the 
CREATE command to actually create the new folder to hold core.properties should 
handle the use case.

 Allow cores to specify a named config set in non-SolrCloud mode
 ---

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787282#comment-13787282
 ] 

Trey Grainger commented on SOLR-4478:
-

Hi Erick,

Yes, that resolves the hardest of the two problems.  The other issue is that 
since a dedicated folder is now required per-core (to hold the core.properties 
file), the core _CREATE_ command needs to now also be able to create the folder 
for the new core if it doesn't exist.  Something like:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*coreDir=cores/newCore*configset=sharedconfig

Alternatively, _instanceDir_ could continue to serve that function (instead of 
being deprecated):
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*instanceDir=cores/newCore*configset=sharedconfig

I think the combination of adding configSet and adding the ability for the 
CREATE command to actually create the new folder to hold core.properties should 
handle the use case.

 Allow cores to specify a named config set in non-SolrCloud mode
 ---

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Roadmap for fixing features broken by core autodiscovery

2013-10-04 Thread Trey Grainger
There are two use-cases that appear broken with the new core auto-discovery
mechanism:

*1) The Core Admin Handler's CREATE command no longer works to create brand
new cores*
(unless you have logged on the box and created the core's directory
structure manually, which largely defeats the purpose of the CREATE
command).  With the old Solr.xml format, we could spin up as many cores as
we wanted to dynamically with the following command:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore1;
instanceDir=collection1dataDir=newCore1/data
...
http://localhost:8983/solr/admin/cores?action=CREATEname=newCoreN;
instanceDir=collection1dataDir=newCoreN/data

In the new core discovery mode, this exception is now thrown:
Error CREATEing SolrCore 'newCore1': Could not create a new core in
solr/collection1/as another core is already defined there

The exception is being intentionally thrown in CorePropertiesLocator.java
because a core.properties file already exists in solr/collection1 (and only
one can exist per directory).


*2) Having a shared configuration directory (instanceDir) across many cores
no longer works*.
Every core has to have it's own conf/ directory, and this doesn't seem to
be overridable any longer.  Previously, it was possible to have many cores
share the same instanceDir (and just override their dataDir for obvious
reasons).  Now, it is necessary to copy and paste identical config files
for each Solr core.


I don't know if there's already a current roadmap for fixing this.  I saw
https://issues.apache.org/jira/browse/SOLR-4478, which suggested replacing
instanceDir with the ability to specify a named configSet.  This solves
problem 2, but not problem1 (since you still can't have multiple
core.properties files in the same folder).  Based on Erick's comments in
the JIRA ticket, it also sounds like this ticket is also dead at the moment.

There is definitely a need to have a shared config directory - whether that
is through a configSet or an explicit indexDir doesn't matter to me.
 There's also a need to be able to dynamically create Solr cores from
external systems.  I currently can't upgrade to core auto discovery because
it doesn't allow dynamic core creation.  Does anyone have some thoughts on
how to best get these features working again under core autodiscovery?
 Adding instanceDir to core.properties seems like an easy solution, but
there must be a desire not to do that or it would probably have already
been done.

I'm happy to contribute some time to resolving this if there is agreed upon
path forward.


Thanks,

-Trey


[jira] [Created] (SOLR-5052) eDisMax Field Aliasing behaving oddly when invalid field is present

2013-07-20 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-5052:
---

 Summary: eDisMax Field Aliasing behaving oddly when invalid field 
is present
 Key: SOLR-5052
 URL: https://issues.apache.org/jira/browse/SOLR-5052
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.3.1
 Environment: AWS / Ubuntu
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.5


Field Aliasing for the eDisMax query parser behaves in a very odd manner if an 
invalid field is specified in any of the aliases.  Essentially, instead of 
throwing an exception on an invalid alias, it breaks all of the other aliased 
fields such that they will only handle the first term correctly.  Take the 
following example:

/select?defType=edismaxf.who.qf=personLastName_t^30 
personFirstName_t^10f.what.qf=itemName_t 
companyName_t^5f.where.qf=cityName_t^10 INVALIDFIELDNAME^20 countryName_t^35 
postalCodeName_t^30q=who:(trey grainger) what:(solr) where:(atlanta, 
ga)debugQuery=truedf=text

The terms trey, solr and atlanta correctly search across the aliased 
fields, but the terms grainger and ga are incorrectly being searched across 
the default field (text).  Here is parsed query from the debug:

lst name=debug
str name=rawquerystring
who:(trey grainger) what:(solr) where:(decatur, ga)
/str
str name=querystring
who:(trey grainger) what:(solr) where:(decatur, ga)
/str
str name=parsedquery
(+(DisjunctionMaxQuery((personFirstName_t:trey^10.0 | 
personLastName_t:trey^30.0)) DisjunctionMaxQuery((text:grainger)) 
DisjunctionMaxQuery((itemName_t:solr | companyName_t:solr^5.0)) 
DisjunctionMaxQuery((postalCodeName_t:decatur^30.0 | countryName_t:decatur^35.0 
| cityName_t:decatur^10.0)) DisjunctionMaxQuery((text:ga/no_coord
/str
str name=parsedquery_toString
+((personFirstName_t:trey^10.0 | personLastName_t:trey^30.0) (text:grainger) 
(itemName_t:solr | companyName_t:solr^5.0) (postalCodeName_t:decatur^30.0 | 
countryName_t:decatur^35.0 | cityName_t:decatur^10.0) (text:ga))
/str

I think the presence of an invalid field in a qf parameter should throw an 
exception (or throw the invalid field away in that alias), but it shouldn't 
break the aliases for other fields.  

For the record, if there are no invalid fields in any of the aliases, all of 
the aliases work.  If there is one invalid field in any of the aliases, all of 
the aliases act oddly like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-07-15 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709360#comment-13709360
 ] 

Trey Grainger commented on SOLR-2894:
-

@[~Otis], we have this patch live in production for several use cases (as a 
pre-requisite for SOLR-3583, which we've also worked on @CareerBuilder), but 
the currently known issues which would prevent this from being committed 
include:
1) Tags and Excludes are not being respected beyond the first level
2) The facet.limit=-1 issue (not returning all values)
3) The lack of support for datetimes

We need #1 and Andrew is working on a project currently to fix this.  He's also 
looking to fix #3 and find a reasonably scalable solution to #2.  I'm not sure 
when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks 
until this patch is all wrapped up.

Meanwhile, if anyone else finds any issues with the patch, please let us know 
so they can be looked into.  Thanks!

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.4

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [ANNOUNCE] Solr wiki editing change

2013-03-30 Thread Trey Grainger
Please add TreyGrainger to the the contributors group.  Thanks!

-Trey


On Sun, Mar 24, 2013 at 11:18 PM, Steve Rowe sar...@gmail.com wrote:

 The wiki at http://wiki.apache.org/solr/ has come under attack by
 spammers more frequently of late, so the PMC has decided to lock it down in
 an attempt to reduce the work involved in tracking and removing spam.

 From now on, only people who appear on
 http://wiki.apache.org/solr/ContributorsGroup will be able to
 create/modify/delete wiki pages.

 Please request either on the solr-u...@lucene.apache.org or on
 dev@lucene.apache.org to have your wiki username added to the
 ContributorsGroup page - this is a one-time step.

 Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-07-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407459#comment-13407459
 ] 

Trey Grainger commented on SOLR-2894:
-

Hi Erik,

Sorry, I missed your original message asking me if I could test out the latest 
patch - I'd be happy to help.  I just tried both your patch and the April 25th 
patch against the Solr 4.0 Alpha revision and neither applied immediately.  
I'll see if I can find some time on Sunday to try to get a revision sorted out 
which will work with the current version.

I think there are some changes in the April 24th patch which may need to be 
re-applied if your changes were based upon the earlier patch.  I'll know more 
once I've had a chance to dig in later this weekend.

Thanks,

-Trey

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 4.0

 Attachments: SOLR-2894.patch, SOLR-2894.patch, 
 distributed_pivot.patch, distributed_pivot.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-06-13 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294795#comment-13294795
 ] 

Trey Grainger commented on SOLR-2894:
-

For what it's worth, we're actively using the April 25th version of this patch 
in production at CareerBuilder (with an older version of trunk) with no issues.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 4.0

 Attachments: SOLR-2894.patch, distributed_pivot.patch, 
 distributed_pivot.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
 ] 

Trey Grainger commented on SOLR-2614:
-

Hi Terrance,

We (at CareerBuilder) recently built a patch recently which could serve as a 
good starting point for this.  We build an ability to calculate Percentiles 
(i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot 
Facets.  It works well enough for our use cases, and I'm sure the stats types 
mentioned could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder

 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.1


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
 ] 

Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:23 AM:
--

Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good 
starting point for this.  We build an ability to calculate Percentiles (i.e. 
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.  
It works well enough for our use cases, and I'm sure the stats types mentioned 
could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder

  was (Author: solrtrey):
Hi Terrance,

We (at CareerBuilder) recently built a patch recently which could serve as a 
good starting point for this.  We build an ability to calculate Percentiles 
(i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot 
Facets.  It works well enough for our use cases, and I'm sure the stats types 
mentioned could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder
  
 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.1


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
 ] 

Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:23 AM:
--

Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good 
starting point for this.  We built an ability to calculate Percentiles (i.e. 
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.  
It works well enough for our use cases, and I'm sure the stats types mentioned 
could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder

  was (Author: solrtrey):
Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good 
starting point for this.  We build an ability to calculate Percentiles (i.e. 
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.  
It works well enough for our use cases, and I'm sure the stats types mentioned 
could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder
  
 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.1


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
 ] 

Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:24 AM:
--

Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good 
starting point for this.  We built an ability to calculate Percentiles (i.e. 
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.  
It works well enough for our use cases, and I'm sure the other stats types 
mentioned could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder

  was (Author: solrtrey):
Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good 
starting point for this.  We built an ability to calculate Percentiles (i.e. 
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.  
It works well enough for our use cases, and I'm sure the stats types mentioned 
could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder
  
 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.1


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-21 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847936#action_12847936
 ] 

Trey Grainger commented on SOLR-1837:
-

Re: bugs in Luke that result in missing terms - I recently fixed one such bug, 
and indeed it was located in the DocReconstructor - if you are aware of others 
then please report them using the Luke issue tracker.

I just pulled down the most recent Luke code, and it does looks like that 
recent fix was made to cover the bug I saw.  Unfortunately, the fix results in 
a null ref for me on my index.  I'll open an issue, as it looks like all that's 
needed is an extra null check.

Re: Document reconstruction is a very IO-intensive operation, so I would advise 
against using it on a production system, and also it produces inexact results 
(because analysis is usually a lossy operation).

I hear you about it being IO-intensive.  There's also other admin tools in Solr 
which do similarly intensive operations (the schema browser, for example, which 
generates a list of all fields and a distribution of terms within those 
fields).  The intent of the tool is for one-off debugging, not for any kind of 
automated querying, but I'll try do some tests to see to what degree this tool 
is affecting our current production systems (I have not see any noticeable 
effect thus far).

Also, regarding the process being lossy.  In this case, that is kind of the 
point of the tool (in my use) - to see what has actually been put into the 
index vs what was in the document sent to the engine.  For example, if I index 
a field with the text Wi-fi hotspots are a life-saver with payloads on parts 
of speech, as well as stemming I want to be able to see something like:
wi [1] / fi [1] | wifi [1] / hotspot [1] / are [2] / a [3] / life [1] / saver 
[1] | lifesaver [1]

With no payloads, this would simply be
wi / fi | wifi / hotspots | hotspot / are / a / life / saver | lifesaver

So I had initially named to tool the Solr Document Reconstructor, after the 
name you gave to the tool in Luke.  Based on your comments, I think it might be 
less confusing for me to call it something like Document Inspector, since it 
is not truly reconstructing the original document.

I'll try to get what I have pushed up today so you can check it out if you 
want.  Thanks for your great work on that tool!

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-21 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-1837:


Attachment: SOLR-1837.patch

Here's what I have thusfar.  Only bug I currently know about is that Solr 
multi-valued fields (i.e. field name=xvalue1/fieldfield 
name=xvalue2/field) currently display as concatenated together instead of 
as an array of separate fields in the stored fields view.

I've referred to the tool in the admin interface as the Document Inspector 
instead of Document Reconstructor to prevent confusion over 
lost/changed/added terms due to index-time analysis.

Any feedback appreciated.

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1837.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-20 Thread Trey Grainger (JIRA)
Reconstruct a Document (stored fields, indexed fields, payloads)


 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5


One Solr feature I've been sorely in need of is the ability to inspect an index 
for any particular document.  While the analysis page is good when you have 
specific content and a specific field/type your want to test the analysis 
process for, once a document is indexed it is not currently possible to easily 
see what is actually sitting in the index.

One can use the Lucene Index Browser (Luke), but this has several limitations 
(gui only, doesn't understand solr schema, doesn't display many non-text fields 
in human readable format, doesn't show payloads, some bugs lead to missing 
terms, exposes features dangerous to use in a production Solr environment, slow 
or difficult to check from a remote location, etc.).  The document 
reconstruction feature of Luke provides the base for what can become a much 
more powerful tool when coupled with Solr's understanding of a schema, however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-20 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-1837:


Remaining Estimate: 168h  (was: 120h)
 Original Estimate: 168h  (was: 120h)

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-20 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847866#action_12847866
 ] 

Trey Grainger commented on SOLR-1837:
-

I've been working on implementing the document reconstruction feature over the 
past week and have created an additional admin page which exposes it.  The 
functionality is essentially a reworking of the lucene document reconstruction 
functionality in Luke, but with improvements to handle the problems listed in 
the jira issue description above.

I'll be pushing up a patch soon and will look forward to any additional 
recommendations after others have had a chance to try it out.

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-422) one double quote or two double quotes only break search

2009-08-19 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745245#action_12745245
 ] 

Trey Grainger edited comment on SOLR-422 at 8/19/09 5:03 PM:
-

This issue is in the same ballpark as SOLR-874.  Both concern bad parsing of 
fringe cases by the DisMax handler.

  was (Author: tgrainger):
This issue is in the same ballpark as SOLR-878.  Both concern bad parsing 
of fringe cases by the DisMax handler.
  
 one double quote or two double quotes only break search
 ---

 Key: SOLR-422
 URL: https://issues.apache.org/jira/browse/SOLR-422
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Doug Daniels
Priority: Minor

 Using Dismax, searching for either one double quote character:
   q=
 or two double quote characters with no text between them:
   q=
 throws an exception.  Not sure whether this is also the case for other 
 request handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-422) one double quote or two double quotes only break search

2009-08-19 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-422:
---

Comment: was deleted

(was: This issue is in the same ballpark as SOLR-874.  Both concern bad parsing 
of fringe cases by the DisMax handler.)

 one double quote or two double quotes only break search
 ---

 Key: SOLR-422
 URL: https://issues.apache.org/jira/browse/SOLR-422
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Doug Daniels
Priority: Minor

 Using Dismax, searching for either one double quote character:
   q=
 or two double quote characters with no text between them:
   q=
 throws an exception.  Not sure whether this is also the case for other 
 request handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-422) one double quote or two double quotes only break search

2009-08-19 Thread Trey Grainger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-422:
---

Comment: was deleted

(was: These issues both concern reworking of the Dismax parser to handle fringe 
cases and should be dealt with together.)

 one double quote or two double quotes only break search
 ---

 Key: SOLR-422
 URL: https://issues.apache.org/jira/browse/SOLR-422
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Doug Daniels
Priority: Minor

 Using Dismax, searching for either one double quote character:
   q=
 or two double quote characters with no text between them:
   q=
 throws an exception.  Not sure whether this is also the case for other 
 request handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.