from:"Trey Grainger \(JIRA\)"

[jira] [Commented] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604608#comment-16604608
 ] 

Trey Grainger commented on SOLR-9418:
-

I uploaded an updated patch today for this issue, contributing the 
CareerBuilder version of this initial patch for this issue was loosely based 
upon (thanks for the contribution, CareerBuilder!). I've had several people ask 
about this feature recently, and others have proposed some alternative 
implementations of this, as well.

Getting this posted as a reference implementation for future development.

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.patch, SOLR-9418.zip
>
>
> h2. *Summary:*
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> It is being generously donated to the Solr project by CareerBuilder, with the 
> original source code and a quickly demo-able version located here:  
> [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
> h2. *Purpose:*
> Assume you're building a job search engine, and one of your users searches 
> for the following:
>  _machine learning research and development Portland, OR software engineer 
> AND hadoop, java_
> Most search engines will natively parse this query into the following boolean 
> representation:
>  _(machine AND learning AND research AND development AND Portland) OR 
> (software AND engineer AND hadoop AND java)_
> While this query may still yield relevant results, it is clear that the 
> intent of the user wasn't understood very well at all. By leveraging the 
> Statistical Phrase Identifier on this string prior to query parsing, you can 
> instead expect the following parsing:
> _{machine learning} \{and} \{research and development} \{Portland, OR} 
> \{software engineer} \{AND} \{hadoop,} \{java}_
> It is then possile to modify all the multi-word phrases prior to executing 
> the search:
>  _"machine learning" and "research and development" "Portland, OR" "software 
> engineer" AND hadoop, java_
> Of course, you could do your own query parsing to specifically handle the 
> boolean syntax, but the following would eventually be interpreted correctly 
> by Apache Solr and most other search engines:
>  _"machine learning" AND "research and development" AND "Portland, OR" AND 
> "software engineer" AND hadoop AND java_ 
> h2. *History:*
> This project was originally implemented by the search team at CareerBuilder 
> in the summer of 2015 for use as part of their semantic search system. In the 
> summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
> concept based upon publicly available information about the CareerBuilder 
> implementation (the first attached patch).  In July of 2018, CareerBuilder 
> open sourced their original version 
> ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
>  and agreed to also donate the code to the Apache Software foundation as a 
> Solr contribution. An Solr patch with the CareerBuilder version was added to 
> this issue on September 5th, 2018, and community feedback and contributions 
> are encouraged.
> This issue was originally titled the "Probabilistic Query Parser", but the 
> name has now been updated to "Statistical Phrase Identifier" to avoid 
> ambiguity with Solr's query parsers (per some of the feedback on this issue), 
> as the implementation is actually just a mechanism for identifying phrases 
> statistically from a string and is NOT a Solr query parser. 
> h2. *Example usage:*
> h3. (See contrib readme or configuration files in the patch for full 
> configuration details)
> h3. *{{Request:}}*
> {code:java}
> http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
> skywalker toad x men magneto professor xavier{code}
> h3. *{{Response:}}* 
> {code:java}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":25},
>     "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
> {toad} {x men} {magneto} {professor xavier}",
>     "top_parsed_phrases":[
>       "darth vader",
>       "obi wan kenobi",
>       "anakin skywalker",
>       "toad",
>

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:  
[https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
h2. *Purpose:*

Assume you're building a job search engine, and one of your users searches for 
the following:
 _machine learning research and development Portland, OR software engineer AND 
hadoop, java_

Most search engines will natively parse this query into the following boolean 
representation:
 _(machine AND learning AND research AND development AND Portland) OR (software 
AND engineer AND hadoop AND java)_

While this query may still yield relevant results, it is clear that the intent 
of the user wasn't understood very well at all. By leveraging the Statistical 
Phrase Identifier on this string prior to query parsing, you can instead expect 
the following parsing:

_{machine learning} \{and} \{research and development} \{Portland, OR} 
\{software engineer} \{AND} \{hadoop,} \{java}_

It is then possile to modify all the multi-word phrases prior to executing the 
search:
 _"machine learning" and "research and development" "Portland, OR" "software 
engineer" AND hadoop, java_

Of course, you could do your own query parsing to specifically handle the 
boolean syntax, but the following would eventually be interpreted correctly by 
Apache Solr and most other search engines:
 _"machine learning" AND "research and development" AND "Portland, OR" AND 
"software engineer" AND hadoop AND java_ 
h2. *History:*

This project was originally implemented by the search team at CareerBuilder in 
the summer of 2015 for use as part of their semantic search system. In the 
summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
concept based upon publicly available information about the CareerBuilder 
implementation (the first attached patch).  In July of 2018, CareerBuilder open 
sourced their original version 
([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
 and agreed to also donate the code to the Apache Software foundation as a Solr 
contribution. An Solr patch with the CareerBuilder version was added to this 
issue on September 5th, 2018, and community feedback and contributions are 
encouraged.

This issue was originally titled the "Probabilistic Query Parser", but the name 
has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with 
Solr's query parsers (per some of the feedback on this issue), as the 
implementation is actually just a mechanism for identifying phrases 
statistically from a string and is NOT a Solr query parser. 
h2. *Example usage:*
h3. (See contrib readme or configuration files in the patch for full 
configuration details)
h3. *{{Request:}}*
{code:java}
http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
skywalker toad x men magneto professor xavier{code}
h3. *{{Response:}}* 
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":25},
    "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
{toad} {x men} {magneto} {professor xavier}",
    "top_parsed_phrases":[
      "darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "potential_parsings":[{
      "parsed_phrases":["darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} 
{x-men} {magneto} {professor xavier}",
    "score":0.0}]}{code}
 

 

  was:
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:  
[https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
h2. *Purpose:*

Assume you're building a job search engine, and one of your users searches for 
the following:
 _machine learning research and development Portland, OR software engineer AND 
hadoop, java_

Most search engines will natively parse this query into the following boolean 
representation:
 _(machine AND learning AND research AND development AND Portland) OR (software 
AND engineer AND hadoop AND java)_

While this query may still yield relevant results, it is clear that the intent 
of the user wasn't understood very well at all. By leveraging the Statistical 
Phrase Identifier on this string prior to query parsing, you can instead expect 
the following parsing:



_{machine learning} \{and} \{research and development} \{Portland, OR} 
\{software engineer} \{AND} \{hadoop,} \{java}_

It is then possile to modify all the multi-word phrases prior to executing the 
search:
 _"machine learning" and "research and development" "Portland, OR" "software 
engineer" AND hadoop, java_

Of course, you could do your own query parsing to specifically handle the 
boolean syntax, but the following would eventually be interpreted correctly by 
Apache Solr and most other search engines:
 _"machine learning" AND "research and development" AND "Portland, OR" AND 
"software engineer" AND hadoop AND java_ 
h2. *History:*

This project was originally implemented by the search team at CareerBuilder in 
the summer of 2015 for use as part of their semantic search system. In the 
summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
concept based upon publicly available information about the CareerBuilder 
implementation (the first attached patch).  In July of 2018, CareerBuilder open 
sourced their original version 
([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
 and agreed to also donate the code to the Apache Software foundation as a Solr 
contribution. An Solr patch with the CareerBuilder version was added to this 
issue on September 5th, 2018, and community feedback and contributions are 
encouraged.

This issue was originally titled the "Probabilistic Query Parser", but the name 
has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with 
Solr's query parsers (per some of the feedback on this issue), as the 
implementation is actually just a mechanism for identifying phrases 
statistically from a string and is NOT a Solr query parser. 
h2. *Example usage:*
h3. (See contrib readme or configuration files in the patch for full 
configuration details)
h3. *{{Request:}}*
{code:java}
http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
skywalker toad x men magneto professor xavier{code}
h3. *{{Response:}}* 
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":25},
    "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
{toad} {x men} {magneto} {professor xavier}",
    "top_parsed_phrases":[
      "darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "potential_parsings":[{
      "parsed_phrases":["darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} 
{x-men} {magneto} {professor xavier}",
    "score":0.0}]}{code}
 

 

  was:
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
h2. *Summary:*

The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

It is being generously donated to the Solr project by CareerBuilder, with the 
original source code and a quickly demo-able version located here:  
[https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
h2. *Purpose:*

Assume you're building a job search engine, and one of your users searches for 
the following:
_machine learning research and development Portland, OR software engineer AND 
hadoop, java_

Most search engines will natively parse this query into the following boolean 
representation:
_(machine AND learning AND research AND development AND Portland) OR (software 
AND engineer AND hadoop AND java)_

While this query may still yield relevant results, it is clear that the intent 
of the user wasn't understood very well at all. By leveraging the Statistical 
Phrase Identifier on this string prior to query parsing, you can instead expect 
the following parsing:
_{machine learning} \{and} \{research and development} \{Portland, OR} 
\{software engineer} \{AND} \{hadoop,} \{java}_

It is then possile to modify all the multi-word phrases prior to executing the 
search:
_"machine learning" and "research and development" "Portland, OR" "software 
engineer" AND hadoop, java_

Of course, you could do your own query parsing to specifically handle the 
boolean syntax, but the following would eventually be interpreted correctly by 
Apache Solr and most other search engines:
_"machine learning" AND "research and development" AND "Portland, OR" AND 
"software engineer" AND hadoop AND java_

 
h2. *History:*

This project was originally implemented by the search team at CareerBuilder in 
the summer of 2015 for use as part of their semantic search system. In the 
summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
concept based upon publicly available information about the CareerBuilder 
implementation (the first attached patch).  In July of 2018, CareerBuilder open 
sourced their original version 
([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
 and agreed to also donate the code to the Apache Software foundation as a Solr 
contribution. An Solr patch with the CareerBuilder version was added to this 
issue on September 5th, 2018, and community feedback and contributions are 
encouraged.

This issue was originally titled the "Probabilistic Query Parser", but the name 
has now been updated to "Statistical Phrase Identifier" to avoid ambiguity with 
Solr's query parsers (per some of the feedback on this issue), as the 
implementation is actually just a mechanism for identifying phrases 
statistically from a string and is NOT a Solr query parser.

 
h2. *Example usage:*
h3. (See contrib readme or configuration files in the patch for full 
configuration details)
h3. *{{Request:}}*
{code:java}
http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
skywalker toad x men magneto professor xavier{code}
 
h3. *{{Response:}}*

 
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":25},
    "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
{toad} {x men} {magneto} {professor xavier}",
    "top_parsed_phrases":[
      "darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "potential_parsings":[{
      "parsed_phrases":["darth vader",
      "obi wan kenobi",
      "anakin skywalker",
      "toad",
      "x-men",
      "magneto",
      "professor xavier"],
      "parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} {toad} 
{x-men} {magneto} {professor xavier}",
    "score":0.0}]}{code}
 

 

  was:
The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

History

This project was originally implemented at CareerBuilder in the summer of 2015 
for use as part of their semantic search system. In 2018

 

The main aim of this requestHandler

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Attachment: SOLR-9418.patch

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.patch, SOLR-9418.zip
>
>
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> History
> This project was originally implemented at CareerBuilder in the summer of 
> 2015 for use as part of their semantic search system. In 2018
>  
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
>  1.)Generate all possible parsings for the given query
>  2.)For each possible parsing, a naive-bayes like score is calculated.
>  3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
>  4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Description: 
The Statistical Phrase Identifier is a Solr contribution that takes in a string 
of text and then leverages a language model (an Apache Lucene/Solr inverted 
index) to predict how the inputted text should be divided into phrases. The 
intended purpose of this tool is to parse short-text queries into phrases prior 
to executing a keyword search (as opposed parsing out each keyword as a single 
term).

History

This project was originally implemented at CareerBuilder in the summer of 2015 
for use as part of their semantic search system. In 2018

 

The main aim of this requestHandler is to get the best parsing for a given 
query. This basically means recognizing different phrases within the query. We 
need some kind of training data to generate these phrases. The way this project 
works is:
 1.)Generate all possible parsings for the given query
 2.)For each possible parsing, a naive-bayes like score is calculated.
 3.)The main scoring is done by going through all the documents in the training 
set and finding the probability of bunch of words occurring together as a 
phrase as compared to them occurring randomly in the same document. Then the 
score is normalized. Some higher importance is given to the title field as 
compared to content field which is configurable.
 4.)Finally after scoring each of the possible parsing, the one with the 
highest score is returned.

  was:
The main aim of this requestHandler is to get the best parsing for a given 
query. This basically means recognizing different phrases within the query. We 
need some kind of training data to generate these phrases. The way this project 
works is:
1.)Generate all possible parsings for the given query
2.)For each possible parsing, a naive-bayes like score is calculated.
3.)The main scoring is done by going through all the documents in the training 
set and finding the probability of bunch of words occurring together as a 
phrase as compared to them occurring randomly in the same document. Then the 
score is normalized. Some higher importance is given to the title field as 
compared to content field which is configurable.
4.)Finally after scoring each of the possible parsing, the one with the highest 
score is returned.


> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.zip
>
>
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> History
> This project was originally implemented at CareerBuilder in the summer of 
> 2015 for use as part of their semantic search system. In 2018
>  
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
>  1.)Generate all possible parsings for the given query
>  2.)For each possible parsing, a naive-bayes like score is calculated.
>  3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
>  4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9418) Statistical Phrase Identifier

2018-09-05 Thread Trey Grainger (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9418:

Summary: Statistical Phrase Identifier  (was: Probabilistic-Query-Parser 
RequestHandler)

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Akash Mehta
>Priority: Major
> Attachments: SOLR-9418.zip
>
>
> The main aim of this requestHandler is to get the best parsing for a given 
> query. This basically means recognizing different phrases within the query. 
> We need some kind of training data to generate these phrases. The way this 
> project works is:
> 1.)Generate all possible parsings for the given query
> 2.)For each possible parsing, a naive-bayes like score is calculated.
> 3.)The main scoring is done by going through all the documents in the 
> training set and finding the probability of bunch of words occurring together 
> as a phrase as compared to them occurring randomly in the same document. Then 
> the score is normalized. Some higher importance is given to the title field 
> as compared to content field which is configurable.
> 4.)Finally after scoring each of the possible parsing, the one with the 
> highest score is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-07-21 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494.branch_7x.patch

Here's the most up-to-date patch against branch_7x.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Affects Versions: 7.0
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: 7.0
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.branch_7x.patch, 
> SOLR-10494.patch, SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-07-21 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096408#comment-16096408
 ] 

Trey Grainger edited comment on SOLR-10494 at 7/21/17 3:57 PM:
---

Hi [~janhoy].

I picked it up a few times, but was developing against master and kept running 
into stability issues with other tests every time I pulled. I finally switched 
over to just developing on the 7.x branch instead to prevent those stability 
issues. I have an updated patch which fixes some (now) merge conflicts with the 
default configset changes, and all tests appear to be passing except the 
TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been 
able to dig deep enough to understand what is effecting that one.

I DO know that the issues is related to indention. If I go into the test and 
override it to "indent=off" then it succeeds, but I have no idea why indention 
being on is causing the failure. Also, doing that in the test is probably just 
masking another underlying problem, which may not even be test related, so I 
really need to understand exactly where things are breaking down to know if 
it's a test problem or an actual functionality problem somewhere.

At any rate, I'll post my updated patch here shortly. I'm a little tight on 
time this next week, so hopefully I can enlist someone else to assist on my end 
later today, as well.


was (Author: solrtrey):
Hi [~janhoy].

I picked it up a few times, but was developing against master and kept running 
into stability issues with other tests every time I pulled. I finally switched 
over to just developing on the 7.x branch instead to prevent those issues. I 
have an updated patch which fixes some (now) merge conflicts with the default 
configset changes, and all tests appear to be passing except the 
TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been 
able to dig deep enough to understand what is effecting that one.

I DO know that the issues is related to indention. If I go into the test and 
override it to "indent=off" then it succeeds, but I have no idea why indention 
being on is causing the failure. Also, doing that in the test is probably just 
masking another underlying problem, which may not even be test related, so I 
really need to understand exactly where things are breaking down to know if 
it's a test problem or an actual functionality problem somewhere.

At any rate, I'll post my updated patch here shortly. I'm a little tight on 
time this next week, so hopefully I can enlist someone else to assist on my end 
later today, as well.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Affects Versions: 7.0
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: 7.0
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-07-21 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096408#comment-16096408
 ] 

Trey Grainger commented on SOLR-10494:
--

Hi [~janhoy].

I picked it up a few times, but was developing against master and kept running 
into stability issues with other tests every time I pulled. I finally switched 
over to just developing on the 7.x branch instead to prevent those issues. I 
have an updated patch which fixes some (now) merge conflicts with the default 
configset changes, and all tests appear to be passing except the 
TestHierarchicalDocBuilder.testThreeLevelHierarchy one. I still haven't been 
able to dig deep enough to understand what is effecting that one.

I DO know that the issues is related to indention. If I go into the test and 
override it to "indent=off" then it succeeds, but I have no idea why indention 
being on is causing the failure. Also, doing that in the test is probably just 
masking another underlying problem, which may not even be test related, so I 
really need to understand exactly where things are breaking down to know if 
it's a test problem or an actual functionality problem somewhere.

At any rate, I'll post my updated patch here shortly. I'm a little tight on 
time this next week, so hopefully I can enlist someone else to assist on my end 
later today, as well.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Affects Versions: 7.0
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: 7.0
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-26 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064287#comment-16064287
 ] 

Trey Grainger edited comment on SOLR-10494 at 6/27/17 5:23 AM:
---

Ok, I think I'm nearly done. This patch ([^SOLR-10494.patch]) includes removing 
all the extraneous "wt=json" and "indent=on" references, adding a commented out 
version of "wt=xml" to the example solrconfig.xml's, unit test updates, some 
additional updates to the tutorials and docs (also incorporating 
[~ctargett]'s), and updating the admin UI (query section) to handle the new 
defaults.

The only issue I'm running into is that for some reason I haven't figured out 
yet, turning "indent" on has broken some of the parent/child relationship tests 
(i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, 
SolrExampleTests.testChildDocTransformer. It initially appears to be some xml 
parsing issue issue with the extra whitespace, which would be odd, but I 
haven't dug in yet.  Once I figure those out, I'll update the patch, and then I 
think this will be ready for review.


was (Author: solrtrey):
Ok, I think I'm nearly done. This patch includes removing all the extraneous 
"wt=json" and "indent=on" references, adding a commented out version of 
"wt=xml" to the example solrconfig.xml's, unit test updates, some additional 
updates to the tutorials and docs (also incorporating [~ctargett]'s), and 
updating the admin UI (query section) to handle the new defaults.

The only issue I'm running into is that for some reason I haven't figured out 
yet, turning "indent" on has broken some of the parent/child relationship tests 
(i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, 
SolrExampleTests.testChildDocTransformer. It initially appears to be some xml 
parsing issue issue with the extra whitespace, which would be odd, but I 
haven't dug in yet.  Once I figure those out, I'll update the patch, and then I 
think this will be ready for review.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-26 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494.patch

Ok, I think I'm nearly done. This patch includes removing all the extraneous 
"wt=json" and "indent=on" references, adding a commented out version of 
"wt=xml" to the example solrconfig.xml's, unit test updates, some additional 
updates to the tutorials and docs (also incorporating [~ctargett]'s), and 
updating the admin UI (query section) to handle the new defaults.

The only issue I'm running into is that for some reason I haven't figured out 
yet, turning "indent" on has broken some of the parent/child relationship tests 
(i.e. TestHierarchicalDocBuilder.testThreeLevelHierarchy, 
SolrExampleTests.testChildDocTransformer. It initially appears to be some xml 
parsing issue issue with the extra whitespace, which would be odd, but I 
haven't dug in yet.  Once I figure those out, I'll update the patch, and then I 
think this will be ready for review.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494.patch, 
> SOLR-10494-withdocs.patch, SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-25 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062350#comment-16062350
 ] 

Trey Grainger commented on SOLR-10494:
--

bq. Also should we mark this as a blocker for 7.0 to change it? - 
[~varunthacker]

I just updated it to be a blocker, Varun. I'm working on what should be the 
final patch today. Hopefully this can be reviewed and make it in for 7.0.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, 
> SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-25 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Priority: Blocker  (was: Minor)

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, 
> SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-23 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061521#comment-16061521
 ] 

Trey Grainger commented on SOLR-10494:
--

Thanks, [~ctargett]! I'm building off you patch and making final changes. Been 
a bit slammed this week and am unavailable to work on this for the next 24-36 
hours, but I expect to have the next (hopefully final, or close to it) patch 
pushed sometime on Sunday (in the U.S.).

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494, SOLR-10494-withdocs.patch, 
> SOLR-10494-withdocs.patch
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-20 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056131#comment-16056131
 ] 

Trey Grainger commented on SOLR-10494:
--

yes, I'll address all of the code/config changes above. I'll get the patch 
updated to include the indent=on change first (fixing unit tests now... were 
more that broke than I was expecting due to indention) and then do the cleanup 
of the configs, admin, readme's, as a follow on patch.

Once those are in, I can take a look at the ref-guide, website, and quickstart, 
though I'm afraid I may need some help pull all of those off in any reasonable 
timeframe for 7.0, as I'd expect there to be a lot of changes required there.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-11 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494

New patch fixing a precommit error. Comment earlier about unclosed resources 
was apparently pre-existing (those are warnings and not errors) and I just 
noticed it because of an unrelated error, so going to ignore those. Working on 
indent=on by default for next patch.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494, SOLR-10494
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-11 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-10494:
-
Attachment: SOLR-10494

Initial patch, with all unit tests broken by the change now passing. Haven't 
changed to indent=on by default yet or removed setting of json explicitly in 
various places yet, though, as I've been trying to change one variable at a 
time to minimize complications.

For some reason, switching to json by default has caused ant precommit to 
complain about resource leak in about 60 places. I'm not sure what is causing 
these at the moment, but want to address that first before adding any 
additional changes to the patch.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-10494
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-08 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042701#comment-16042701
 ] 

Trey Grainger commented on SOLR-10494:
--

Question: I'm making indent=on the default. Any objections if I make indent=on 
the default for all TextResponseWriters, or do I need to limit the change to 
only the "wt=json" (now default writer) case.

The writers impacted from what I can tell are:
GEOJSONWriter
JSONWriter
XMLWriter
SchemaXMLWriter
PHPWriter
PythonWriter
RubyWriter

It's a little complicated because most of these (geojson, php, python, ruby) 
actually inherit from the JSONWriter, so if I need to leave indent=off on those 
then I have to go in and set it explicitly on them since their base class will 
now have indent on by default.

Unless anyone objects, I'm just going to set indent=on by default on all of 
these. Please let me know if anyone disagrees.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-06-08 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042630#comment-16042630
 ] 

Trey Grainger commented on SOLR-10494:
--

Started working on this two weeks ago and then got busy. The actual changes 
were super quick, but after I made them it was taking over 2 hours to run the 
unit tests with lots of failures and several test suites timing out.

Just got back to this today and have pretty much everything diagnosed and am 
working on fixes. In short, SolrTestCaseJ4 has XPath checking hard-coded in its 
design, so I need to now pass in wt=xml explicitly there, and there are a 
handful of test suites (i.e. replication/backup/restore and hdfs) that are 
explicitly checking XML strings and looping forever until they get those 
strings back (hence timing out).

I'm making changes to explicitly request XML right now for those tests where 
they are expecting it and will get a patch posted hopefully today.

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-05-16 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013015#comment-16013015
 ] 

Trey Grainger commented on SOLR-10494:
--

Hi [~janhoy]. Sorry - I missed you first message last week. Sure - I should be 
able to get a patch posted this weekend.

-Trey

> Switch Solr's Default Response Type from XML to JSON
> 
>
> Key: SOLR-10494
> URL: https://issues.apache.org/jira/browse/SOLR-10494
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Trey Grainger
>Priority: Minor
> Fix For: master (7.0)
>
>
> Solr's default response format is still XML, despite the fact that Solr has 
> supported the JSON response format for over a decade, developer mindshare has 
> clearly shifted toward JSON over the years, and most modern/competing systems 
> also use JSON format now by default.
> In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
> default in the UI) to override the default of wt=xml, so Solr's Admin UI 
> effectively has a different default than the API.
> We have now introduced things like the JSON faceting API, and the new more 
> modern /V2 apis assume JSON for the areas of Solr they cover, so clearly 
> we're moving in the direction of JSON anyway.
> I'd like propose that we switch the default response writer to JSON (wt=json) 
> instead of XML for Solr 7.0, as this seems to me like the right direction and 
> a good time to make this change with the next major version.
> Based upon feedback from the Lucene Dev's mailing list, we want to:
> 1) Change the default response writer type to "wt=json" and also change to 
> "indent=on" by default
> 2) Make no changes on the update handler side; it already works as desired 
> (it returns the response in the same content-type as the request unless the 
> "wt" is passed in explicitly).
> 3) Keep the /query request handler around since people have already used it 
> for years to do JSON queries
> 4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
> on how to change (back) the response format.
> The default format change, plus the addition of "indent=on" are back compat 
> changes, so we need to make sure we doc those clearly in the CHANGES.txt. 
> There will also need to be significant adjustments to the Solr Ref Guide, 
> Tutorial, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10494) Switch Solr's Default Response Type from XML to JSON

2017-04-14 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-10494:


 Summary: Switch Solr's Default Response Type from XML to JSON
 Key: SOLR-10494
 URL: https://issues.apache.org/jira/browse/SOLR-10494
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (7.0)
Reporter: Trey Grainger
Priority: Minor
 Fix For: master (7.0)


Solr's default response format is still XML, despite the fact that Solr has 
supported the JSON response format for over a decade, developer mindshare has 
clearly shifted toward JSON over the years, and most modern/competing systems 
also use JSON format now by default.

In fact, Solr's admin UI even explicitly adds wt=json to the request (by 
default in the UI) to override the default of wt=xml, so Solr's Admin UI 
effectively has a different default than the API.

We have now introduced things like the JSON faceting API, and the new more 
modern /V2 apis assume JSON for the areas of Solr they cover, so clearly we're 
moving in the direction of JSON anyway.

I'd like propose that we switch the default response writer to JSON (wt=json) 
instead of XML for Solr 7.0, as this seems to me like the right direction and a 
good time to make this change with the next major version.

Based upon feedback from the Lucene Dev's mailing list, we want to:
1) Change the default response writer type to "wt=json" and also change to 
"indent=on" by default
2) Make no changes on the update handler side; it already works as desired (it 
returns the response in the same content-type as the request unless the "wt" is 
passed in explicitly).
3) Keep the /query request handler around since people have already used it for 
years to do JSON queries
4) Add a commented-out "wt=xml" to the solrconfig.xml as a reminder for folks 
on how to change (back) the response format.

The default format change, plus the addition of "indent=on" are back compat 
changes, so we need to make sure we doc those clearly in the CHANGES.txt. There 
will also need to be significant adjustments to the Solr Ref Guide, Tutorial, 
etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9529) Dates Dynamic Field Inconsistently Defined in Schemas

2016-09-17 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500294#comment-15500294
 ] 

Trey Grainger commented on SOLR-9529:
-

Hmm... things were more inconsistent than I thought. There were two fundamental 
kinds of inconsistencies:
1) Inconsistencies within a single schema.
--This is what I described in the issue description regarding "*_dts" being 
handled incorrectly. I submitted a pull request to fix this in the three places 
we actually define both singular and plural field types: 
solr/example/files/conf/managed-schema
solr/server/solr/configsets/basic_configs/conf/managed-schema
solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema

2) Inconsistencies across different schemas
While the three schemas listed above all separate out single valued and 
multiValued dynamic fields into different singular and plural field types, 
every other schema that ships with Solr only defines a single field type 
(string, boolean, etc.) and uses the dynamic field definition to determine 
whether the dynamic field should be single or multivalued. This works fine, of 
course, but is just inconsistent depending upon which schema file you actually 
end up using. 

Interestingly, the tech products example 
(solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema), 
which sits at the same level as the basic_configs and the 
data_driven_schema_configs, for some reason handles these definitions 
differently, only defining one field type for both single and multivalued 
fields (for all types). The following places do the same thing:
 
solr/core/src/test-files/solr/collection1/conf/schema-distrib-interval-faceting.xml
 solr/core/src/test-files/solr/collection1/conf/schema-docValuesFaceting.xml
 solr/core/src/test-files/solr/collection1/conf/schema-docValuesJoin.xml
 
solr/core/src/test-files/solr/collection1/conf/schema-non-stored-docvalues.xml
 solr/core/src/test-files/solr/collection1/conf/schema_latest.xml
 solr/example/example-DIH/solr/db/conf/managed-schema
 solr/example/example-DIH/solr/mail/conf/managed-schema
 solr/example/example-DIH/solr/rss/conf/managed-schema
 solr/example/example-DIH/solr/solr/conf/managed-schema
 solr/example/example-DIH/solr/tika/conf/managed-schema

So while my pull request fixes #1 so that all schemas are consistent with 
themselves, we still have inconsistency across the various schemas that ship 
with Solr in terms of what we name the field types for multivalued dynamic 
fields. If we are going to make these consistent, which way should we go - have 
a single field type for all single and multivalued fields (and define 
multivalued=true on the dynamic field definition instead), or separate out 
plural versions of the field type (booleans, strings, etc.) for multivalued 
fields?

> Dates Dynamic Field Inconsistently Defined in Schemas
> -
>
> Key: SOLR-9529
> URL: https://issues.apache.org/jira/browse/SOLR-9529
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Trey Grainger
>Priority: Minor
>
> There is a nice convention across all of the schemas that ship with Solr to 
> include field types for single valued fields (i.e. "string" -> "*_s", 
> "boolean" -> "*_b") and separate field types for multivalued fields (i.e. 
> "strings" -> "*_ss", "booleans" -> "*_bs"). Those definitions all follow the 
> pattern (using "string" as an example):
> 
>  multiValued="true"/>
> 
> 
> For some reason, however, the "date" field type doesn't follow this pattern, 
> and is instead defined (inconsistently) as follows:
>  precisionStep="0"/>
>  multiValued="true" precisionStep="0"/>
> 
>  stored="true"/>
> Note specifically that the "*_dts" field should instead be referencing the 
> "dates" type and not the "date" type, and that subsequently the 
> multiValued="true" setting would become unnecessary on the "*_dts" 
> dynamicField definition.
> I'll get a patch posted for this. Note that nothing is functionally broken, 
> it's just inconsistent and could be confusing for someone looking through the 
> schema or seeing their multivalued dates getting indexed into the field type 
> defined for single valued dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-9529) Dates Dynamic Field Inconsistently Defined in Schemas

2016-09-17 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-9529:
---

 Summary: Dates Dynamic Field Inconsistently Defined in Schemas
 Key: SOLR-9529
 URL: https://issues.apache.org/jira/browse/SOLR-9529
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Trey Grainger
Priority: Minor


There is a nice convention across all of the schemas that ship with Solr to 
include field types for single valued fields (i.e. "string" -> "*_s", "boolean" 
-> "*_b") and separate field types for multivalued fields (i.e. "strings" -> 
"*_ss", "booleans" -> "*_bs"). Those definitions all follow the pattern (using 
"string" as an example):






For some reason, however, the "date" field type doesn't follow this pattern, 
and is instead defined (inconsistently) as follows:





Note specifically that the "*_dts" field should instead be referencing the 
"dates" type and not the "date" type, and that subsequently the 
multiValued="true" setting would become unnecessary on the "*_dts" dynamicField 
definition.

I'll get a patch posted for this. Note that nothing is functionally broken, 
it's just inconsistent and could be confusing for someone looking through the 
schema or seeing their multivalued dates getting indexed into the field type 
defined for single valued dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)

2016-09-17 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-9480:

Attachment: SOLR-9480.patch

Initial patch to get the ball rolling here. Feature should now work as 
described in reference links in the description. Only real changes are an 
update from Solr 5.1.0 to master, and cleanup of most of the precommit issues.

Still plenty of work to do, particularly in reworking some of the 
multi-threading code to follow Solr conventions, reducing the number of files 
for helper classes, and eventually getting this working correctly in 
distributed mode (was originally built for use cases involving a single Solr 
core as a "representative model"). Would also be good to make a getting started 
tutorial with example data so its easier get started with the feature and do 
something interesting out of the box.

Will continue working on those items as I'm able. Feedback welcome.

> Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)
> --
>
> Key: SOLR-9480
> URL: https://issues.apache.org/jira/browse/SOLR-9480
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Trey Grainger
> Attachments: SOLR-9480.patch
>
>
> This issue is to track the contribution of the Semantic Knowledge Graph Solr 
> Plugin (request handler), which exposes a graph-like interface for 
> discovering and traversing significant relationships between entities within 
> an inverted index.
> This data model has been described in the following research paper: [The 
> Semantic Knowledge Graph: A compact, auto-generated model for real-time 
> traversal and ranking of any relationship within a 
> domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave 
> in October 2015 at [Lucene/Solr 
> Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine]
>  and November 2015 at the [Bay Area Search 
> Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/].
> The source code for this project is currently available at 
> [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at 
> CareerBuilder (where this was built) have given me the go-ahead to now 
> contribute this back to the Apache Solr Project, as well.
> Check out the Github repository, research paper, or presentations for a more 
> detailed description of this contribution. Initial patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)

2016-09-05 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-9480:
---

 Summary: Graph Traversal for Significantly Related Terms (Semantic 
Knowledge Graph)
 Key: SOLR-9480
 URL: https://issues.apache.org/jira/browse/SOLR-9480
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Trey Grainger


This issue is to track the contribution of the Semantic Knowledge Graph Solr 
Plugin (request handler), which exposes a graph-like interface for discovering 
and traversing significant relationships between entities within an inverted 
index.

This data model has been described in the following research paper: [The 
Semantic Knowledge Graph: A compact, auto-generated model for real-time 
traversal and ranking of any relationship within a 
domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave in 
October 2015 at [Lucene/Solr 
Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine]
 and November 2015 at the [Bay Area Search 
Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/].

The source code for this project is currently available at 
[https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at 
CareerBuilder (where this was built) have given me the go-ahead to now 
contribute this back to the Apache Solr Project, as well.

Check out the Github repository, research paper, or presentations for a more 
detailed description of this contribution. Initial patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2016-06-21 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343254#comment-15343254
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi [~krantiparisa] and [~dannytei1]. Apologies for the long lapse without a 
response on this issue. I won't get into the reasons here (combination of 
personal and professional commitments), but I just wanted to say that I expect 
to pick this issue back up in the near future and continue work on this patch.

In the meantime, I have added an ASL 2.0 license to the current code (from Solr 
in Action) so that folks can feel free to use what's there now: 
https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

I'll turn what's there now into a patch, update it to Solr trunk, and keep 
iterating on it until the folks commenting on this issue are satisfied with the 
design and capabilities. Stay tuned...

> Solr field type that supports multiple, dynamic analyzers
> -
>
> Key: SOLR-6492
> URL: https://issues.apache.org/jira/browse/SOLR-6492
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Trey Grainger
> Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9241) Rebalance API for SolrCloud

2016-06-21 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343228#comment-15343228
 ] 

Trey Grainger commented on SOLR-9241:
-

I'm also very excited to see this patch. For the next evolution of Solr's 
scalability (and ultimately auto-scaling), these are exactly the kinds of core 
capabilities we need for seamlessly scaling up/down, resharding, and 
redistributing shards and replicas across a cluster. 

The smart merge looks interesting - seems like effectively a way to index into 
a larger number of shards (for indexing throughput) while merging them into a 
smaller number of shards for searching, enabling scaling of indexing and 
searching resourced independently. This obviously won't work well with 
Near-Realtime Searching, but I'd be curious to hear more explanation about how 
this works in practice for SolrCloud clusters that don't need NRT search.

Agreed with Joel's comments about the update to trunk vs. 4.6.1. One thing that 
seems to have been added since 4.6.1 that probably overlaps with this patch is 
the Replica Placement Strategies (SOLR-6220) vs. the Allocation Strategies 
implemented here.

The rest of the patch seems like all new objects that don't overlap much with 
the current code base. Would be interesting to know how much has changed 
between 4.6.1 to 6.1 collections/SolrCloud-wise that would create conflicts 
with this patch. Am obviously hoping not too much...

Either way, very excited about the contribution and about the potential for 
getting these capabilities integrated into Solr.

> Rebalance API for SolrCloud
> ---
>
> Key: SOLR-9241
> URL: https://issues.apache.org/jira/browse/SOLR-9241
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.6.1
> Environment: Ubuntu, Mac OsX
>Reporter: Nitin Sharma
>  Labels: Cluster, SolrCloud
> Fix For: 4.6.1
>
> Attachments: rebalance.diff
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This is the v1 of the patch for Solrcloud Rebalance api (as described in 
> http://engineering.bloomreach.com/solrcloud-rebalance-api/) , built at 
> Bloomreach by Nitin Sharma and Suruchi Shah. The goal of the API  is to 
> provide a zero downtime mechanism to perform data manipulation and  efficient 
> core allocation in solrcloud. This API was envisioned to be the base layer 
> that enables Solrcloud to be an auto scaling platform. (and work in unison 
> with other complementing monitoring and scaling features).
> Patch Status:
> ===
> The patch is work in progress and incremental. We have done a few rounds of 
> code clean up. We wanted to get the patch going first to get initial feed 
> back.  We will continue to work on making it more open source friendly and 
> easily testable.
>  Deployment Status:
> 
> The platform is deployed in production at bloomreach and has been battle 
> tested for large scale load. (millions of documents and hundreds of 
> collections).
>  Internals:
> =
> The internals of the API and performance : 
> http://engineering.bloomreach.com/solrcloud-rebalance-api/
> It is built on top of the admin collections API as an action (with various 
> flavors). At a high level, the rebalance api provides 2 constructs:
> Scaling Strategy:  Decides how to move the data.  Every flavor has multiple 
> options which can be reviewed in the api spec.
> Re-distribute  - Move around data in the cluster based on capacity/allocation.
> Auto Shard  - Dynamically shard a collection to any size.
> Smart Merge - Distributed Mode - Helps merging data from a larger shard setup 
> into smaller one.  (the source should be divisible by destination)
> Scale up -  Add replicas on the fly
> Scale Down - Remove replicas on the fly
> Allocation Strategy:  Decides where to put the data.  (Nodes with least 
> cores, Nodes that do not have this collection etc). Custom implementations 
> can be built on top as well. One other example is Availability Zone aware. 
> Distribute data such that every replica is placed on different availability 
> zone to support HA.
>  Detailed API Spec:
> 
>   https://github.com/bloomreach/solrcloud-rebalance-api
>  Contributors:
> =
>   Nitin Sharma
>   Suruchi Shah
>  Questions/Comments:
> =
>   You can reach me at nitin.sha...@bloomreach.com



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8626) [ANGULAR] 404 error when clicking nodes in cloud graph view

2016-03-19 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-8626:

Attachment: SOLR-8626.patch

Attached a patch which fixes this issue. The issue existed in both the flat 
graph view and the radial view. Additionally, when one was in the radial view 
and clicked on the link for a node, it would switch back to flat graph view 
when navigating to the other node, so fixed that so that it preserves the 
user's current view type on the URL when navigating between node.

> [ANGULAR] 404 error when clicking nodes in cloud graph view
> ---
>
> Key: SOLR-8626
> URL: https://issues.apache.org/jira/browse/SOLR-8626
> Project: Solr
>  Issue Type: Bug
>  Components: UI
>Reporter: Jan Høydahl
>Assignee: Upayavira
> Attachments: SOLR-8626.patch
>
>
> h3. Reproduce:
> # {{bin/solr start -c}}
> # {{bin/solr create -c mycoll}}
> # Goto http://localhost:8983/solr/#/~cloud
> # Click a collection name in the graph -> 404 error. URL: 
> {{/solr/mycoll/#/~cloud}}
> # Click a shard name in the graph -> 404 error. URL: {{/solr/shard1/#/~cloud}}
> Only verified in Trunk, but probably exists in 5.4 as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2015-03-03 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346292#comment-14346292
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi Kranti,

The design almost exactly as you described when you said have analysis chains 
defined in schema.xml and these chains could be resued between multiple fields 
and on each field there should be a way to conditionally chose the analysis 
chain. Specifically, each analysis chain is just defined as a FieldType, like 
you would define any analysis chain you were going assign to a field.

What I hadn't considered yet, however, was having the update processor choose 
choose the analyzers based upon a value in another field.  I had previously 
only been considering the case where a user would either:
1) Use an automatic language identifier update processor, or
2) Pass the language in directly in the content of the field. (i.e. field 
name=my_fielden,es|document content here/field). 

Having the ability to specify the key for the analyzers in a different field 
would probably be more user friendly, and this would be trivial to implement, 
so I can look to add it. Something like this:
field name=my_fielddocument content here/field
field name=languageen/field
field name=languagees/field

Is that what you were hoping for?

 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 5.0


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-10-30 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190715#comment-14190715
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi Sharon,

Your question was which code that will parse df=someMultiTextField|en,de
and decide which analysis chain to use. In short, since FieldTypes have
access to the schema but Analyzers and Tokenizers don't, I'm creating a new
FieldType which passes the schema into a new Analyzer, which can then pass
the schema into the new Tokenizer. When the Tokenizer is used, the
fieldname (string) and value (reader) are passed in, so it is possible to
pull the metadata (|en,de) off of either of these and dynamically choose
a new analysis chain analyzer from the schema at that time.

I've done this work already for pulling data out of the field content (so I
know that works), but pulling the metadata from the fieldname is still
pending (I'm hoping to work on it this weekend). If you want to see what
I've done thusfar, you can look on github at MultiTextField,
MultiTextFieldAnalyzer, and MultiTextFieldTokenizer:
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextField.java
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldAnalyzer.java
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldTokenizer.java

I have some questions / feedback on your proposed solution... I'm hopping
on a plane now but will post them later tonight.

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @CareerBuilder


On Thu, Oct 30, 2014 at 7:32 AM, Sharon Krisher (JIRA) j...@apache.org



 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 5.0


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-10-30 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191418#comment-14191418
 ] 

Trey Grainger commented on SOLR-6492:
-

Hi Sharon,

In terms of your suggestion, I do think that using local params to pass in
the language could be a more user-friendly solution than requiring them to
put the params on the field name: i.e. q={!langs=en|de}hello worlddf=text
vs. q=hello worlddf=text|en,de, though the syntax may get a bit weird if
you want to specify different languages for different fields. For example,
if using the edsimax query parser, you would need to do something like
q={!langs=text1:en,de|text2:en,zh}hello worldqf=text1 text2 vs. just
q=hello worldqf=text1|en,de text2|en|zh.

For the most simple use-case (every field uses the same language), or for
the use-case where you don't know what fields the user is querying on
up-front, I think the local params syntax would be preferred for end-users.
There is a big down-side to doing this, however: it requires you to
implement a qparser to parse this data and put it somewhere that the
Analyzer can see. This means that your multi-lingual field would only be
searchable with your custom query parser (whereas if the determination of
the language is passed in as part of the field name or content as I
described, it should work seamlessly with all of the query parsers, since
the data gets passed through all the way to the Analyzer).

Your solution with the ThreadLocal storage of the data is interesting...
I'm not positive whether it will work or not (i.e. does the analyzer always
run on the same thread as the incoming request for both queries and
indexing, and will that also continue to be the case into the future)? I
know that threads are at least re-used across requests and that the
TokenStreamComponents for analyzers are re-used in a threadlocal pool, but
that just means you'd have to be very careful about not caching or reusing
languages across requests, not that it couldn't work. Also, just out of
curiosity, how do you plan to pass the languages in at index time?

The Analyzer/Tokenizers only accept the fieldname (string) and the field
content (reader) as parameters, so passing in additional parameters through
a threadlocal seems like a bit of a hack that violates the design there
(though arguably that design is too restrictive and should change). I'd be
curious if anyone else thinks this would work...

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @CareerBuilder





 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 5.0


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-6492:
---

 Summary: Solr field type that supports multiple, dynamic analyzers
 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 4.11


A common request - particularly for multilingual search - is to be able to 
support one or more dynamically-selected analyzers for a field. For example, 
someone may have a content field and pass in a document in Greek (using an 
Analyzer with Tokenizer/Filters for German), a separate document in English 
(using an English Analyzer), and possibly even a field with mixed-language 
content in Greek and English. This latter case could pass the content 
separately through both an analyzer defined for Greek and another Analyzer 
defined for English, stacking or concatenating the token streams based upon the 
use-case.

There are some distinct advantages in terms of index size and query performance 
which can be obtained by stacking terms from multiple analyzers in the same 
field instead of duplicating content in separate fields and searching across 
multiple fields. 

Other non-multilingual use cases may include things like switching to a 
different analyzer for the same field to remove a feature (i.e. turning on/off 
query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258
 ] 

Trey Grainger commented on SOLR-6492:
-

I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextFieldde,fr|some other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and concatenating token 
streams.

3) Possibly add the ability to switch analyzers in the middle of input text:
field name=someMultiTextFieldde,fr|some other el|text/field

4) Extensive unit testing

 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 4.11


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258
 ] 

Trey Grainger edited comment on SOLR-6492 at 9/8/14 11:55 PM:
--

I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextField|de,frsome other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and concatenating token 
streams.

3) Possibly add the ability to switch analyzers in the middle of input text:
field name=someMultiTextFieldde,fr|some other el|text/field

4) Extensive unit testing


was (Author: solrtrey):
I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextFieldde,fr|some other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980597#comment-13980597
 ] 

Trey Grainger commented on SOLR-2894:
-

[~markrmil...@gmail.com] said:
We should get this in to get more feedback. Wish I had some time to tackle 
it, but I won't in the near term. 

Is there a committer who has interest in this issue and would be willing to 
look over it for (hopefully) getting it pushed into trunk?  It's the top voted 
for and the top watched issue in Solr right now, so there's clearly a lot of 
community interest. Thanks!

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980662#comment-13980662
 ] 

Trey Grainger commented on SOLR-2894:
-

Hi [~otis], I appreciate your interest here. That's correct: no previously 
working behavior was changed, and there are two things added with this patch: 
1) distributed support, and 2) support for a single-level pivot facets (this 
previously threw an exception but is now supported: 
facet.pivot=aSingleFieldName).

For context on #2, we found no good reason to disallow a single-level pivot 
facet (functions like to a field facet but with the pivot facet output format), 
it made implementing distributed pivot faceting easier since a single level 
could be considered when refining, and there was work in some downstream issues 
like SOLR-3583 (adding percentiles and other stats to pivot facets) which was 
dependent upon being able to easily alternate between any number of facet 
levels for analytics purposes, so we just added the support for a single level. 
This also makes it easier to build analytics tools without having to 
arbitrarily alternate between field facets and pivot facets and their 
corresponding output formats based upon the number of levels.

The end result is that no previously working capabilities have been modified, 
but distributed support for any number of pivot levels has been added.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980662#comment-13980662
 ] 

Trey Grainger edited comment on SOLR-2894 at 4/25/14 3:54 AM:
--

Hi Otis, I appreciate your interest here. That's correct: no previously working 
behavior was changed, and there are two things added with this patch: 1) 
distributed support, and 2) support for a single-level pivot facets (this 
previously threw an exception but is now supported: 
facet.pivot=aSingleFieldName).

For context on #2, we found no good reason to disallow a single-level pivot 
facet (functions like a field facet but with the pivot facet output format), it 
made implementing distributed pivot faceting easier since a single level could 
be considered when refining, and there was work in some downstream issues like 
SOLR-3583 (adding percentiles and other stats to pivot facets) which was 
dependent upon being able to easily alternate between any number of facet 
levels for analytics purposes, so we just added the support for a single level. 
This also makes it easier to build analytics tools without having to 
arbitrarily alternate between field facets and pivot facets and their 
corresponding output formats based upon the number of levels.

The end result is that no previously working capabilities have been modified, 
but distributed support for any number of pivot levels has been added, which 
should make this safe to commit to trunk.


was (Author: solrtrey):
Hi [~otis], I appreciate your interest here. That's correct: no previously 
working behavior was changed, and there are two things added with this patch: 
1) distributed support, and 2) support for a single-level pivot facets (this 
previously threw an exception but is now supported: 
facet.pivot=aSingleFieldName).

For context on #2, we found no good reason to disallow a single-level pivot 
facet (functions like to a field facet but with the pivot facet output format), 
it made implementing distributed pivot faceting easier since a single level 
could be considered when refining, and there was work in some downstream issues 
like SOLR-3583 (adding percentiles and other stats to pivot facets) which was 
dependent upon being able to easily alternate between any number of facet 
levels for analytics purposes, so we just added the support for a single level. 
This also makes it easier to build analytics tools without having to 
arbitrarily alternate between field facets and pivot facets and their 
corresponding output formats based upon the number of levels.

The end result is that no previously working capabilities have been modified, 
but distributed support for any number of pivot levels has been added.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-17 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973094#comment-13973094
 ] 

Trey Grainger commented on SOLR-2894:
-

After nearly 2 years of on-and-off development, I think this patch is finally 
ready to be committed. Brett's most recent patch includes significant 
performance improvements as well as fixes to all of the reported issues and 
edge cases mentioned by the others currently using this patch. We have just 
finished a large spike of work to get this ready for commit, so I'd love to get 
it pushed in soon unless there are any objections.

[~ehatcher], do you have any time to review this for suitability to be 
committed (since you are the reporter)? If there is anything additional that 
needs to be changed, I'll happily sign us up (either myself or someone on my 
team at CareerBuilder) to do it it will help.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-18 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939836#comment-13939836
 ] 

Trey Grainger commented on SOLR-5856:
-

Hi Steve - thanks so much for getting this committed so quickly! Everything
looks great, except for the 4 book layout in the slideshow doesn't render
well for me in Chrome on either Windows or a Mac (the fourth book wraps to
the next line). IE, Firefox, and Safari all looked good, though.
https://www.dropbox.com/s/hkcz8xzxtgfvexw/4Books.png

I'd guess other Chrome users are likely seeing the same thing.






 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933162#comment-13933162
]

Trey Grainger commented on SOLR-5856:
-

Hi Alexandre,

I agree with you. It looks like there are two Solr 3.x books, and the older one
has already been previously cut from the rotating slideshow. At this point, I
think the other 3.x book is going to have to be bumped. The good news is that
those authors are working on a 4.x refresh that should be released in a few
months, so they'll likely be back up there soon.

Of course, all of the books are still on the books page, just not in the
Latest books published about Apache Solr list in the header slideshow.

The patch I included makes bumps the 3x book and inserts Solr in Action.

Add new Solr book to the Solr homepage
--

Key: SOLR-5856
URL: https://issues.apache.org/jira/browse/SOLR-5856
Project: Solr
Issue Type: Improvement
Components: documentation
Affects Versions: 4.7
Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
Fix For: 4.7

Attachments: SOLR-5856.patch, book_sia.png

A new Solr book (Solr in Action) has just been published by Manning
publications (release date 3/15). I am providing the patch to update the
website pages corresponding to the slideshow on
https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html
. The patch has updates to html/text files and there is a binary image file
as well.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934381#comment-13934381
 ] 

Trey Grainger commented on SOLR-5856:
-

That makes sense... I agree that is probably a better user experience to link 
to the books page. I'll update all of the slideshow links to point to the books 
page and resubmit the patch shortly.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5856:


Attachment: SOLR-5856.patch

This updated patch modifies the slideshow to link to the books.html page as 
opposed to going directly to the Publisher's page (as requested by Hoss and 
Uwe).

In order to make the site more consistent (since we're now making more than 
just the change to add Solr in Action), I also made the images for each of the 
books on the books.html page also clickable as a link to the publisher's page 
in order to increase the likelihood of a click-through. One of the books 
already did this, but it was missing on the others, and this is one of the 
things visitors are probably most likely to click on to try to get the book.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-13 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934528#comment-13934528
 ] 

Trey Grainger commented on SOLR-5856:
-

@Alexandre,
Yeah, making the homepage links go to a secondary books page probably will 
detract from both SEO and sales, but it's a better user experience for those 
visiting the Solr homepage, no? One silver lining is that it makes the books 
page more prominent by still having the recent book pictures on the homepage 
linking over the books page, making it easier to find and compare each of the 
different books.

@Hossman
Thanks for tentatively signing up to commit this. If you see anything else that 
needs changing, please let me know and I'd be happy to put together another 
patch.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-12 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-5856:
---

 Summary: Add new Solr book to the Solr homepage
 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7


A new Solr book (Solr in Action) has just been published by Manning 
publications (release date 3/15). I am providing the patch to update the 
website pages corresponding to the slideshow on https://lucene.apache.org/solr/ 
and https://lucene.apache.org/solr/books.html . The patch has updates to 
html/text files and there is a binary image file as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-12 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5856:


Attachment: SOLR-5856.patch

Patch attached. Uploading the image separately.

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5856) Add new Solr book to the Solr homepage

2014-03-12 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5856:


Attachment: book_sia.png

 Add new Solr book to the Solr homepage
 --

 Key: SOLR-5856
 URL: https://issues.apache.org/jira/browse/SOLR-5856
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.7
 Environment: https://lucene.apache.org/solr/
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5856.patch, book_sia.png


 A new Solr book (Solr in Action) has just been published by Manning 
 publications (release date 3/15). I am providing the patch to update the 
 website pages corresponding to the slideshow on 
 https://lucene.apache.org/solr/ and https://lucene.apache.org/solr/books.html 
 . The patch has updates to html/text files and there is a binary image file 
 as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-02-07 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894547#comment-13894547
 ] 

Trey Grainger commented on SOLR-2894:
-

FYI, the last distributed pivot facet patch functionally works, but there are 
some sub-optimal data structures being used and some unnecessary duplicate 
processing of values.  As a result, we found that for certain worst-case 
scenarios (i.e. data is not randomly distributed across Solr cores and requires 
significant refinement) pivot facets with multiple levels could take over a 
minute to aggregate and process results. This was using a dataset of several 
hundred million documents and dozens of pivot facets across 120 Solr cores 
distributed over 20 servers, so it is a more extreme use-case than most will 
encounter.

Nevertheless, we've refactored the code and data structures and brought the 
processing time from over a minute down to less than a second using the above 
configuration. We plan to post the patch within the next week.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.7

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-02-07 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895413#comment-13895413
 ] 

Trey Grainger commented on SOLR-2894:
-

Thanks, Yonik. I worked on the architecture and design, but it's really been a 
team effort by several of us at CB. Chris worked with the initial patch, Andrew 
hardened it, and Brett (who will post the next version) focused on the 
soon-to-be-posted performance optimizations. We're deploying the new version to 
production right now to sanity check it before posting the patch, but I think 
the upcoming version will finally be ready for review for committing.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.7

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-12-04 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839665#comment-13839665
 ] 

Trey Grainger commented on SOLR-5027:
-

Interesting.  I've been playing around with the Collapsing QParser and, because 
of the reason Gabe mentioned, I can think very few use cases for it in it's 
current implementation.  Specifically, because there is no way to break a tie 
between multiple documents with the same value (the way sorting does), a search 
that is sorted by score desc, modifieddt desc (newer documents break the tie) 
is not possible... it just collapses based upon the first document in the index 
with the duplicate score.  Many of my use cases are even trickier... something 
like sort by displaypriority desc, score desc, modifieddt desc.

Just brainstorming here, but if sorting documents before collapsing is not 
possible (due to where in the code stack the collapsing occurs), then it might 
be possible to just implement a sort function (ValueSource) that gave an 
ordinal score to each document based upon the position it would occur within 
all documents.  If I understand what you mean when you say group head 
selection based upon the min/max of the function, then this would effectively 
allow collapsing sorted values, because the sort function would return higher 
values for documents which would sort higher.  In that case, the sort function 
(which could read in the current sort parameter from the search request) could 
even be the default used by collapsing, since that is probably what user's are 
expecting to happen (this is consistent with how grouping works, for example).

Thoughts?

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-12-04 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839754#comment-13839754
 ] 

Trey Grainger commented on SOLR-5027:
-

Thinking more about this more, it's probably going to be hard to implement an 
efficient sort ValueSource, as it would probably have to loop through all 
docs in the index during construction and sort them, caching the sort order for 
all docs so that it is available later when the value for each document is 
asked for separately.

It would probably functionally work, but it seems like there's got to be a 
better way in the Collapse QParser itself...

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5524) Exception when using Query Function inside Scale Function

2013-12-02 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-5524:
---

 Summary: Exception when using Query Function inside Scale Function
 Key: SOLR-5524
 URL: https://issues.apache.org/jira/browse/SOLR-5524
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7


If you try to use the query function inside the scale function, it throws the 
following exception:
org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo

Here is an example request that invokes this:
http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5524) Exception when using Query Function inside Scale Function

2013-12-02 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837319#comment-13837319
 ] 

Trey Grainger commented on SOLR-5524:
-

I just debugged the code and uncovered the problem.  There is a Map (called 
context) that is passed through to each value source to store intermediate 
state, and both the query and scale functions are passing the ValueSource for 
the query function in as the KEY to this Map (as opposed to using some 
composite key that makes sense in the current context).  Essentially, these 
lines are overwriting each other:

Inside ScaleFloatFunction: context.put(this.source, scaleInfo);  //this.source 
refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object
Inside QueryValueSource: context.put(this, w); //this refers to the same 
QueryValueSource from above, and the w refers to a Weight object

As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the 
context Map, it unexpectedly pulls the Weight object out instead and thus the 
invalid case exception occurs.  The NoOp multiplication works because it puts 
an different ValueSource between the query and the ScaleFloatFunction such 
that this.source (in ScaleFloatFunction) != this (in QueryValueSource).

 Exception when using Query Function inside Scale Function
 -

 Key: SOLR-5524
 URL: https://issues.apache.org/jira/browse/SOLR-5524
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7


 If you try to use the query function inside the scale function, it throws the 
 following exception:
 org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
 org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
 Here is an example request that invokes this:
 http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5524) Exception when using Query Function inside Scale Function

2013-12-02 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-5524:


Attachment: SOLR-5524.patch

Simple patch.  Just changing the ScaleFloatFunction function to use itself as 
the key instead of the ValueSource it is using internally (it's first 
parameter).  This seems consistent with how other ValueSources (such as the 
QueryValueSource) work, and it fixes the issue at hand.

 Exception when using Query Function inside Scale Function
 -

 Key: SOLR-5524
 URL: https://issues.apache.org/jira/browse/SOLR-5524
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5524.patch


 If you try to use the query function inside the scale function, it throws the 
 following exception:
 org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
 org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
 Here is an example request that invokes this:
 http://localhost:8983/solr/collection1/select?q=*:*fl=scale(query($x),0,5)x=hello)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787277#comment-13787277
]

Trey Grainger commented on SOLR-4478:
-

(moving this from my previous e-mail to the solr-dev mailing list)

There are two use-cases that appear broken with the new core auto-discovery
mechanism:

1) *The Core Admin Handler's CREATE command no longer works to create brand new
cores*
(unless you have logged on the box and created the core's directory structure
manually, which largely defeats the purpose of the CREATE command). With the
old Solr.xml format, we could spin up as many cores as we wanted to dynamically
with the following command:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore1instanceDir=collection1dataDir=newCore1/data
...
http://localhost:8983/solr/admin/cores?action=CREATEname=newCoreNinstanceDir=collection1dataDir=newCoreN/data

In the new core discovery mode, this exception is now thrown:
Error CREATEing SolrCore 'newCore1': Could not create a new core in
solr/collection1/as another core is already defined there

The exception is being intentionally thrown in CorePropertiesLocator.java
because a core.properties file already exists in solr/collection1 (and only one
can exist per directory).

2) *Having a shared configuration directory (instanceDir) across many cores no
longer works.*
Every core has to have it's own conf/ directory, and this doesn't seem to be
overridable any longer. Previously, it was possible to have many cores share
the same instanceDir (and just override their dataDir for obvious reasons).
Now, it is necessary to copy and paste identical config files for each Solr
core.

I don't know if there's already a current roadmap for fixing this. I saw
https://issues.apache.org/jira/browse/SOLR-4478, which suggested replacing
instanceDir with the ability to specify a named configSet. This solves problem
2, but not problem1 (since you still can't have multiple core.properties files
in the same folder). Based on Erick's comments in the JIRA ticket, it also
sounds like this ticket is also dead at the moment.

There is definitely a need to have a shared config directory - whether that is
through a configSet or an explicit indexDir doesn't matter to me. There's also
a need to be able to dynamically create Solr cores from external systems. I
currently can't upgrade to core auto discovery because it doesn't allow dynamic
core creation. Does anyone have some thoughts on how to best get these
features working again under core autodiscovery? Adding instanceDir to
core.properties seems like an easy solution, but there must be a desire not to
do that or it would probably have already been done.

I'm happy to contribute some time to resolving this if there is agreed upon
path forward.

Allow cores to specify a named config set in non-SolrCloud mode
---

Key: SOLR-4478
URL: https://issues.apache.org/jira/browse/SOLR-4478
Project: Solr
Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
Attachments: SOLR-4478.patch, SOLR-4478.patch

Part of moving forward to the new way, after SOLR-4196 etc... I propose an
additional parameter specified on the core node in solr.xml or as a
parameter in the discovery mode core.properties file, call it configSet,
where the value provided is a path to a directory, either absolute or
relative. Really, this is as though you copied the conf directory somewhere
to be used by more than one core.
Straw-man: There will be a directory solr_home/configsets which will be the
default. If the configSet parameter is, say, myconf, then I'd expect a
directory named myconf to exist in solr_home/configsets, which would look
something like
solr_home/configsets/myconf/schema.xml
solrconfig.xml
stopwords.txt
velocity
velocity/query.vm
etc.
If multiple cores used the same configSet, schema, solrconfig etc. would all
be shared (i.e. shareSchema=true would be assumed). I don't see a good
use-case for _not_ sharing schemas, so I don't propose to allow this to be
turned off. Hmmm, what if shareSchema is explicitly set to false in the
solr.xml or properties file? I'd guess it should be honored but maybe log a
warning?
Mostly I'm putting this up for comments. I know that there are already
thoughts about how this all should work floating around, so before I start
any work on this I thought I'd at least get an idea of whether this is the
way people are thinking about going.
Configset can be either a relative or absolute path, if relative it's assumed
to be relative to solr_home.
Thoughts?

--
This

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787278#comment-13787278
]

Trey Grainger commented on SOLR-4478:
-

(Eric's response to my post)

Right, let's move this discussion to SOLR-4779. There's some history
here. Sharing named config sets got a bit wrapped up in sharing the
underlying solrconfig object. This latter has been taken off the
table, but we should discuss fixing Trey's issues up. Here's what the
thinking was:
There would be a directory like solr_home/configs/configset1,
solr_home/configs/configset2, etc. Then a new parameter for
core.properties or create or whatever like configset=configset1 that
would be smart enough to look in solr_home/configs for an entire
conf directory named configste1.

Trey:
Does that work for your case? If so, please add your comments to 4779
and we can take it from there. FWIW, I don't think this is especially
hard, but time is always at a premium.

Allow cores to specify a named config set in non-SolrCloud mode
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787278#comment-13787278
]

Trey Grainger edited comment on SOLR-4478 at 10/5/13 5:47 PM:
--

(Erick's response to my post)

Trey:
Does that work for your case? If so, please add your comments to 4779
and we can take it from there. FWIW, I don't think this is especially
hard, but time is always at a premium.

was (Author: solrtrey):
(Eric's response to my post)

Trey:
Does that work for your case? If so, please add your comments to 4779
and we can take it from there. FWIW, I don't think this is especially
hard, but time is always at a premium.

Allow cores to specify a named config set in non-SolrCloud mode
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787282#comment-13787282
 ] 

Trey Grainger edited comment on SOLR-4478 at 10/5/13 5:50 PM:
--

Hi Erick,

Yes, that resolves the hardest of the two problems.  The other issue is that 
since a dedicated folder is now required per-core (to hold the core.properties 
file), the core _CREATE_ command needs to now also be able to create the folder 
for the new core if it doesn't exist.  Something like:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore; 
*coreDir=cores/newCore* configset=sharedconfig

Alternatively, _instanceDir_ could continue to serve that function (instead of 
being deprecated):
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore; 
*instanceDir=cores/newCore* configset=sharedconfig

I think the combination of adding configSet and adding the ability for the 
CREATE command to actually create the new folder to hold core.properties should 
handle the use case.


was (Author: solrtrey):
Hi Erick,

Yes, that resolves the hardest of the two problems.  The other issue is that 
since a dedicated folder is now required per-core (to hold the core.properties 
file), the core _CREATE_ command needs to now also be able to create the folder 
for the new core if it doesn't exist.  Something like:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*coreDir=cores/newCore*configset=sharedconfig

Alternatively, _instanceDir_ could continue to serve that function (instead of 
being deprecated):
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*instanceDir=cores/newCore*configset=sharedconfig

I think the combination of adding configSet and adding the ability for the 
CREATE command to actually create the new folder to hold core.properties should 
handle the use case.

 Allow cores to specify a named config set in non-SolrCloud mode
 ---

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-10-05 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787282#comment-13787282
]

Trey Grainger commented on SOLR-4478:
-

Hi Erick,

Yes, that resolves the hardest of the two problems. The other issue is that
since a dedicated folder is now required per-core (to hold the core.properties
file), the core _CREATE_ command needs to now also be able to create the folder
for the new core if it doesn't exist. Something like:
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*coreDir=cores/newCore*configset=sharedconfig

Alternatively, _instanceDir_ could continue to serve that function (instead of
being deprecated):
http://localhost:8983/solr/admin/cores?action=CREATEname=newCore*instanceDir=cores/newCore*configset=sharedconfig

I think the combination of adding configSet and adding the ability for the
CREATE command to actually create the new folder to hold core.properties should
handle the use case.

Allow cores to specify a named config set in non-SolrCloud mode
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5052) eDisMax Field Aliasing behaving oddly when invalid field is present

2013-07-20 Thread Trey Grainger (JIRA)

Trey Grainger created SOLR-5052:
---

 Summary: eDisMax Field Aliasing behaving oddly when invalid field 
is present
 Key: SOLR-5052
 URL: https://issues.apache.org/jira/browse/SOLR-5052
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.3.1
 Environment: AWS / Ubuntu
Reporter: Trey Grainger
Priority: Minor
 Fix For: 4.5


Field Aliasing for the eDisMax query parser behaves in a very odd manner if an 
invalid field is specified in any of the aliases.  Essentially, instead of 
throwing an exception on an invalid alias, it breaks all of the other aliased 
fields such that they will only handle the first term correctly.  Take the 
following example:

/select?defType=edismaxf.who.qf=personLastName_t^30 
personFirstName_t^10f.what.qf=itemName_t 
companyName_t^5f.where.qf=cityName_t^10 INVALIDFIELDNAME^20 countryName_t^35 
postalCodeName_t^30q=who:(trey grainger) what:(solr) where:(atlanta, 
ga)debugQuery=truedf=text

The terms trey, solr and atlanta correctly search across the aliased 
fields, but the terms grainger and ga are incorrectly being searched across 
the default field (text).  Here is parsed query from the debug:

lst name=debug
str name=rawquerystring
who:(trey grainger) what:(solr) where:(decatur, ga)
/str
str name=querystring
who:(trey grainger) what:(solr) where:(decatur, ga)
/str
str name=parsedquery
(+(DisjunctionMaxQuery((personFirstName_t:trey^10.0 | 
personLastName_t:trey^30.0)) DisjunctionMaxQuery((text:grainger)) 
DisjunctionMaxQuery((itemName_t:solr | companyName_t:solr^5.0)) 
DisjunctionMaxQuery((postalCodeName_t:decatur^30.0 | countryName_t:decatur^35.0 
| cityName_t:decatur^10.0)) DisjunctionMaxQuery((text:ga/no_coord
/str
str name=parsedquery_toString
+((personFirstName_t:trey^10.0 | personLastName_t:trey^30.0) (text:grainger) 
(itemName_t:solr | companyName_t:solr^5.0) (postalCodeName_t:decatur^30.0 | 
countryName_t:decatur^35.0 | cityName_t:decatur^10.0) (text:ga))
/str

I think the presence of an invalid field in a qf parameter should throw an 
exception (or throw the invalid field away in that alias), but it shouldn't 
break the aliases for other fields.  

For the record, if there are no invalid fields in any of the aliases, all of 
the aliases work.  If there is one invalid field in any of the aliases, all of 
the aliases act oddly like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-07-15 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709360#comment-13709360
 ] 

Trey Grainger commented on SOLR-2894:
-

@[~Otis], we have this patch live in production for several use cases (as a 
pre-requisite for SOLR-3583, which we've also worked on @CareerBuilder), but 
the currently known issues which would prevent this from being committed 
include:
1) Tags and Excludes are not being respected beyond the first level
2) The facet.limit=-1 issue (not returning all values)
3) The lack of support for datetimes

We need #1 and Andrew is working on a project currently to fix this.  He's also 
looking to fix #3 and find a reasonably scalable solution to #2.  I'm not sure 
when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks 
until this patch is all wrapped up.

Meanwhile, if anyone else finds any issues with the patch, please let us know 
so they can be looked into.  Thanks!

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.4

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-07-05 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407459#comment-13407459
 ] 

Trey Grainger commented on SOLR-2894:
-

Hi Erik,

Sorry, I missed your original message asking me if I could test out the latest 
patch - I'd be happy to help.  I just tried both your patch and the April 25th 
patch against the Solr 4.0 Alpha revision and neither applied immediately.  
I'll see if I can find some time on Sunday to try to get a revision sorted out 
which will work with the current version.

I think there are some changes in the April 24th patch which may need to be 
re-applied if your changes were based upon the earlier patch.  I'll know more 
once I've had a chance to dig in later this weekend.

Thanks,

-Trey

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 4.0

 Attachments: SOLR-2894.patch, SOLR-2894.patch, 
 distributed_pivot.patch, distributed_pivot.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-06-13 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294795#comment-13294795
 ] 

Trey Grainger commented on SOLR-2894:
-

For what it's worth, we're actively using the April 25th version of this patch 
in production at CareerBuilder (with an older version of trunk) with no issues.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 4.0

 Attachments: SOLR-2894.patch, distributed_pivot.patch, 
 distributed_pivot.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
 ] 

Trey Grainger commented on SOLR-2614:
-

Hi Terrance,

We (at CareerBuilder) recently built a patch recently which could serve as a 
good starting point for this.  We build an ability to calculate Percentiles 
(i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot 
Facets.  It works well enough for our use cases, and I'm sure the stats types 
mentioned could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which 
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or 
separately in the next day or so, which could save you some time in 
implementing the other types.

-Trey Grainger
CareerBuilder

 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.1


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
]

Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:23 AM:
--

Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good
starting point for this. We build an ability to calculate Percentiles (i.e.
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.
It works well enough for our use cases, and I'm sure the stats types mentioned
could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or
separately in the next day or so, which could save you some time in
implementing the other types.

-Trey Grainger
CareerBuilder

was (Author: solrtrey):
Hi Terrance,

We (at CareerBuilder) recently built a patch recently which could serve as a
good starting point for this. We build an ability to calculate Percentiles
(i.e. 25th, 50th, etc.) and Averages using multi-level (distributed) Pivot
Facets. It works well enough for our use cases, and I'm sure the stats types
mentioned could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or
separately in the next day or so, which could save you some time in
implementing the other types.

-Trey Grainger
CareerBuilder

stats with pivot

Key: SOLR-2614
URL: https://issues.apache.org/jira/browse/SOLR-2614
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
Fix For: 4.1

Is it possible to get stats (like Stats Component: min ,max, sum, count,
missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?
I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).
Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.
Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.
Thanks a lot!
this is very import,because only counts value,it's no use for sometimes.
please add stats with pivot in solr 4.0
thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
]

Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:23 AM:
--

Hi Terrance,

We (at CareerBuilder) built a patch recently which could serve as a good
starting point for this. We built an ability to calculate Percentiles (i.e.
25th, 50th, etc.) and Averages using multi-level (distributed) Pivot Facets.
It works well enough for our use cases, and I'm sure the stats types mentioned
could be added in.

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or
separately in the next day or so, which could save you some time in
implementing the other types.

-Trey Grainger
CareerBuilder

was (Author: solrtrey):
Hi Terrance,

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or
separately in the next day or so, which could save you some time in
implementing the other types.

-Trey Grainger
CareerBuilder

stats with pivot

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2614) stats with pivot

2012-06-11 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293249#comment-13293249
]

Trey Grainger edited comment on SOLR-2614 at 6/12/12 1:24 AM:
--

Hi Terrance,

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or
separately in the next day or so, which could save you some time in
implementing the other types.

-Trey Grainger
CareerBuilder

was (Author: solrtrey):
Hi Terrance,

It is dependent upon the distributed pivot faceting patch (SOLR-2894), which
seem to be working well but has yet to be committed.

I'll see if we can get the patch posted either as part of this JIRA or
separately in the next day or so, which could save you some time in
implementing the other types.

-Trey Grainger
CareerBuilder

stats with pivot

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-21 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847936#action_12847936
 ] 

Trey Grainger commented on SOLR-1837:
-

Re: bugs in Luke that result in missing terms - I recently fixed one such bug, 
and indeed it was located in the DocReconstructor - if you are aware of others 
then please report them using the Luke issue tracker.

I just pulled down the most recent Luke code, and it does looks like that 
recent fix was made to cover the bug I saw.  Unfortunately, the fix results in 
a null ref for me on my index.  I'll open an issue, as it looks like all that's 
needed is an extra null check.

Re: Document reconstruction is a very IO-intensive operation, so I would advise 
against using it on a production system, and also it produces inexact results 
(because analysis is usually a lossy operation).

I hear you about it being IO-intensive.  There's also other admin tools in Solr 
which do similarly intensive operations (the schema browser, for example, which 
generates a list of all fields and a distribution of terms within those 
fields).  The intent of the tool is for one-off debugging, not for any kind of 
automated querying, but I'll try do some tests to see to what degree this tool 
is affecting our current production systems (I have not see any noticeable 
effect thus far).

Also, regarding the process being lossy.  In this case, that is kind of the 
point of the tool (in my use) - to see what has actually been put into the 
index vs what was in the document sent to the engine.  For example, if I index 
a field with the text Wi-fi hotspots are a life-saver with payloads on parts 
of speech, as well as stemming I want to be able to see something like:
wi [1] / fi [1] | wifi [1] / hotspot [1] / are [2] / a [3] / life [1] / saver 
[1] | lifesaver [1]

With no payloads, this would simply be
wi / fi | wifi / hotspots | hotspot / are / a / life / saver | lifesaver

So I had initially named to tool the Solr Document Reconstructor, after the 
name you gave to the tool in Luke.  Based on your comments, I think it might be 
less confusing for me to call it something like Document Inspector, since it 
is not truly reconstructing the original document.

I'll try to get what I have pushed up today so you can check it out if you 
want.  Thanks for your great work on that tool!

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-21 Thread Trey Grainger (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Trey Grainger updated SOLR-1837:

Attachment: SOLR-1837.patch

Here's what I have thusfar. Only bug I currently know about is that Solr
multi-valued fields (i.e. field name=xvalue1/fieldfield
name=xvalue2/field) currently display as concatenated together instead of
as an array of separate fields in the stored fields view.

I've referred to the tool in the admin interface as the Document Inspector
instead of Document Reconstructor to prevent confusion over
lost/changed/added terms due to index-time analysis.

Any feedback appreciated.

Reconstruct a Document (stored fields, indexed fields, payloads)

Key: SOLR-1837
URL: https://issues.apache.org/jira/browse/SOLR-1837
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis, web gui
Affects Versions: 1.5
Environment: All
Reporter: Trey Grainger
Priority: Minor
Fix For: 1.5

Attachments: SOLR-1837.patch

Original Estimate: 168h
Remaining Estimate: 168h

One Solr feature I've been sorely in need of is the ability to inspect an
index for any particular document. While the analysis page is good when you
have specific content and a specific field/type your want to test the
analysis process for, once a document is indexed it is not currently possible
to easily see what is actually sitting in the index.
One can use the Lucene Index Browser (Luke), but this has several limitations
(gui only, doesn't understand solr schema, doesn't display many non-text
fields in human readable format, doesn't show payloads, some bugs lead to
missing terms, exposes features dangerous to use in a production Solr
environment, slow or difficult to check from a remote location, etc.). The
document reconstruction feature of Luke provides the base for what can become
a much more powerful tool when coupled with Solr's understanding of a schema,
however.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-20 Thread Trey Grainger (JIRA)

Reconstruct a Document (stored fields, indexed fields, payloads)


 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5


One Solr feature I've been sorely in need of is the ability to inspect an index 
for any particular document.  While the analysis page is good when you have 
specific content and a specific field/type your want to test the analysis 
process for, once a document is indexed it is not currently possible to easily 
see what is actually sitting in the index.

One can use the Lucene Index Browser (Luke), but this has several limitations 
(gui only, doesn't understand solr schema, doesn't display many non-text fields 
in human readable format, doesn't show payloads, some bugs lead to missing 
terms, exposes features dangerous to use in a production Solr environment, slow 
or difficult to check from a remote location, etc.).  The document 
reconstruction feature of Luke provides the base for what can become a much 
more powerful tool when coupled with Solr's understanding of a schema, however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-20 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-1837:


Remaining Estimate: 168h  (was: 120h)
 Original Estimate: 168h  (was: 120h)

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2010-03-20 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847866#action_12847866
 ] 

Trey Grainger commented on SOLR-1837:
-

I've been working on implementing the document reconstruction feature over the 
past week and have created an additional admin page which exposes it.  The 
functionality is essentially a reworking of the lucene document reconstruction 
functionality in Luke, but with improvements to handle the problems listed in 
the jira issue description above.

I'll be pushing up a patch soon and will look forward to any additional 
recommendations after others have had a chance to try it out.

 Reconstruct a Document (stored fields, indexed fields, payloads)
 

 Key: SOLR-1837
 URL: https://issues.apache.org/jira/browse/SOLR-1837
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, web gui
Affects Versions: 1.5
 Environment: All
Reporter: Trey Grainger
Priority: Minor
 Fix For: 1.5

   Original Estimate: 168h
  Remaining Estimate: 168h

 One Solr feature I've been sorely in need of is the ability to inspect an 
 index for any particular document.  While the analysis page is good when you 
 have specific content and a specific field/type your want to test the 
 analysis process for, once a document is indexed it is not currently possible 
 to easily see what is actually sitting in the index.
 One can use the Lucene Index Browser (Luke), but this has several limitations 
 (gui only, doesn't understand solr schema, doesn't display many non-text 
 fields in human readable format, doesn't show payloads, some bugs lead to 
 missing terms, exposes features dangerous to use in a production Solr 
 environment, slow or difficult to check from a remote location, etc.).  The 
 document reconstruction feature of Luke provides the base for what can become 
 a much more powerful tool when coupled with Solr's understanding of a schema, 
 however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-422) one double quote or two double quotes only break search

2009-08-19 Thread Trey Grainger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745245#action_12745245
 ] 

Trey Grainger edited comment on SOLR-422 at 8/19/09 5:03 PM:
-

This issue is in the same ballpark as SOLR-874.  Both concern bad parsing of 
fringe cases by the DisMax handler.

  was (Author: tgrainger):
This issue is in the same ballpark as SOLR-878.  Both concern bad parsing 
of fringe cases by the DisMax handler.
  
 one double quote or two double quotes only break search
 ---

 Key: SOLR-422
 URL: https://issues.apache.org/jira/browse/SOLR-422
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Doug Daniels
Priority: Minor

 Using Dismax, searching for either one double quote character:
   q=
 or two double quote characters with no text between them:
   q=
 throws an exception.  Not sure whether this is also the case for other 
 request handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-422) one double quote or two double quotes only break search

2009-08-19 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-422:
---

Comment: was deleted

(was: This issue is in the same ballpark as SOLR-874.  Both concern bad parsing 
of fringe cases by the DisMax handler.)

 one double quote or two double quotes only break search
 ---

 Key: SOLR-422
 URL: https://issues.apache.org/jira/browse/SOLR-422
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Doug Daniels
Priority: Minor

 Using Dismax, searching for either one double quote character:
   q=
 or two double quote characters with no text between them:
   q=
 throws an exception.  Not sure whether this is also the case for other 
 request handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-422) one double quote or two double quotes only break search

2009-08-19 Thread Trey Grainger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trey Grainger updated SOLR-422:
---

Comment: was deleted

(was: These issues both concern reworking of the Dismax parser to handle fringe 
cases and should be dealt with together.)

 one double quote or two double quotes only break search
 ---

 Key: SOLR-422
 URL: https://issues.apache.org/jira/browse/SOLR-422
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Doug Daniels
Priority: Minor

 Using Dismax, searching for either one double quote character:
   q=
 or two double quote characters with no text between them:
   q=
 throws an exception.  Not sure whether this is also the case for other 
 request handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

75 matches

Mail list logo