date:20130506


 [ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-949:
--

Attachment: LUCENE-949.patch

Hi [~talli...@mitre.org],

Sorry it took so long, I've attached a patch based on your patch with some 
fixes:

* Removed tabs.
* Restored license header and class javadoc to {{AnalyzingQueryParser.java}} 
(your patch removed them for some reason?).
* Converted all code indentation to 2 spaces per level (you had a lot of 3 
space per level indentation).
* Converted the {{wildcardPattern}} to allow anything to be escaped, not just 
backslashes and wildcard chars '?' and '*'.  Also removed the optional 
backslashes from group 2 (the actual wildcards) - when iterating over 
wildcardPattern matches, your patch would throw away any number of real 
wildcards following an escaped wildcard.  I added a test for this.
* When multiple output tokens are produced (and there should only be one), now 
reporting all of them in the exception message instead of just the first two.
* Removed all references to chunklet in favor of output token - this 
non-standard terminology made the code harder to read.
* Changed descriptions of multiple output tokens to not necessarily be as the 
result of splitting (e.g. synonyms).
* In {{analyzeSingleChunk()}}, moved exception throwing to the source of 
problems.

I also added a {{CHANGES.txt}} entry.  

Tim, let me know if you think my changes are okay - if so, I think it's ready 
to commit.

 AnalyzingQueryParser can't work with leading wildcards.
 ---

 Key: LUCENE-949
 URL: https://issues.apache.org/jira/browse/LUCENE-949
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 2.2
Reporter: Stefan Klein
 Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch


 The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
 changes to accept leading wildcards:
   protected Query getWildcardQuery(String field, String termStr) throws 
 ParseException
   {
   String useTermStr = termStr;
   String leadingWildcard = null;
   if (*.equals(field))
   {
   if (*.equals(useTermStr))
   return new MatchAllDocsQuery();
   }
   boolean hasLeadingWildcard = (useTermStr.startsWith(*) || 
 useTermStr.startsWith(?)) ? true : false;
   if (!getAllowLeadingWildcard()  hasLeadingWildcard)
   throw new ParseException('*' or '?' not allowed as 
 first character in WildcardQuery);
   if (getLowercaseExpandedTerms())
   {
   useTermStr = useTermStr.toLowerCase();
   }
   if (hasLeadingWildcard)
   {
   leadingWildcard = useTermStr.substring(0, 1);
   useTermStr = useTermStr.substring(1);
   }
   List tlist = new ArrayList();
   List wlist = new ArrayList();
   /*
* somewhat a hack: find/store wildcard chars in order to put 
 them back
* after analyzing
*/
   boolean isWithinToken = (!useTermStr.startsWith(?)  
 !useTermStr.startsWith(*));
   isWithinToken = true;
   StringBuffer tmpBuffer = new StringBuffer();
   char[] chars = useTermStr.toCharArray();
   for (int i = 0; i  useTermStr.length(); i++)
   {
   if (chars[i] == '?' || chars[i] == '*')
   {
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = false;
   }
   else
   {
   if (!isWithinToken)
   {
   wlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = true;
   }
   tmpBuffer.append(chars[i]);
   }
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   }
   else
   {
   wlist.add(tmpBuffer.toString());
   }
   // get Analyzer from superclass and tokenize the term
   TokenStream source = getAnalyzer().tokenStream(field, new

[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.


[ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649560#comment-13649560
 ] 

Steve Rowe commented on LUCENE-949:
---

One other change I forgot to mention, Tim: I substituted MockAnalyzer where you 
used StandardAnalyzer in the test code - this allowed me to remove the 
analyzers-common dependency you introduced (and also the memory dependency, 
which didn't seem to be used for anything in your patch).

 AnalyzingQueryParser can't work with leading wildcards.
 ---

 Key: LUCENE-949
 URL: https://issues.apache.org/jira/browse/LUCENE-949
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 2.2
Reporter: Stefan Klein
 Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch


 The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
 changes to accept leading wildcards:
   protected Query getWildcardQuery(String field, String termStr) throws 
 ParseException
   {
   String useTermStr = termStr;
   String leadingWildcard = null;
   if (*.equals(field))
   {
   if (*.equals(useTermStr))
   return new MatchAllDocsQuery();
   }
   boolean hasLeadingWildcard = (useTermStr.startsWith(*) || 
 useTermStr.startsWith(?)) ? true : false;
   if (!getAllowLeadingWildcard()  hasLeadingWildcard)
   throw new ParseException('*' or '?' not allowed as 
 first character in WildcardQuery);
   if (getLowercaseExpandedTerms())
   {
   useTermStr = useTermStr.toLowerCase();
   }
   if (hasLeadingWildcard)
   {
   leadingWildcard = useTermStr.substring(0, 1);
   useTermStr = useTermStr.substring(1);
   }
   List tlist = new ArrayList();
   List wlist = new ArrayList();
   /*
* somewhat a hack: find/store wildcard chars in order to put 
 them back
* after analyzing
*/
   boolean isWithinToken = (!useTermStr.startsWith(?)  
 !useTermStr.startsWith(*));
   isWithinToken = true;
   StringBuffer tmpBuffer = new StringBuffer();
   char[] chars = useTermStr.toCharArray();
   for (int i = 0; i  useTermStr.length(); i++)
   {
   if (chars[i] == '?' || chars[i] == '*')
   {
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = false;
   }
   else
   {
   if (!isWithinToken)
   {
   wlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = true;
   }
   tmpBuffer.append(chars[i]);
   }
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   }
   else
   {
   wlist.add(tmpBuffer.toString());
   }
   // get Analyzer from superclass and tokenize the term
   TokenStream source = getAnalyzer().tokenStream(field, new 
 StringReader(useTermStr));
   org.apache.lucene.analysis.Token t;
   int countTokens = 0;
   while (true)
   {
   try
   {
   t = source.next();
   }
   catch (IOException e)
   {
   t = null;
   }
   if (t == null)
   {
   break;
   }
   if (!.equals(t.termText()))
   {
   try
   {
   tlist.set(countTokens++, t.termText());
   }
   catch (IndexOutOfBoundsException ioobe)
   {
   countTokens = -1;
   }
   }

[jira] [Created] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty

chakming wong created SOLR-4788:
---

 Summary: Multiple Entities DIH: 
dataimporter.[entityName].last_index_time is empty
 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong


?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig


in above setup dataimporter.entity1.last_index_time is empty string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code} 
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig
{code} 

in above setup dataimporter.entity1.last_index_time is empty string

  was:
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig


in above setup dataimporter.entity1.last_index_time is empty string


 Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
 -

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code} 
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field
   ... 
 /
 /entity
 /document
 /dataConfig
 {code} 
 in above setup dataimporter.entity1.last_index_time is empty string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig
{code} 

in above setup dataimporter.entity1.last_index_time is empty string

  was:
{code} 
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig
{code} 

in above setup dataimporter.entity1.last_index_time is empty string


 Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
 -

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field
   ... 
 /
 /entity
 /document
 /dataConfig
 {code} 
 in above setup dataimporter.entity1.last_index_time is empty string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig
{code} 

in above setup dataimporter.entity1.last_index_time is empty string


 Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
 -

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field
   ... 
 /
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field
  ... 
/
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
 -

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
 -

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
 -

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Summary: Multiple Entities DIH delta import: 
dataimporter.[entityName].last_index_time is empty  (was: Multiple Entities 
DIH: dataimporter.[entityName].last_index_time is empty)

 Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time 
 is empty
 --

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time 
 is empty
 --

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
  03\:02\:06
 last_index_time=2013-05-06 03\:05\:22
 entity2.last_index_time=2013-05-06 03\:03\:14
 entity3.last_index_time=2013-05-06 03\:05\:22
 {code}
 {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time 
 is empty
 --

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
  03\:02\:06
 last_index_time=2013-05-06 03\:05\:22
 entity2.last_index_time=2013-05-06 03\:03\:14
 entity3.last_index_time=2013-05-06 03\:05\:22
 {code}
 {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty string*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
...


requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdihconfig.xml/str
/lst
/requestHandler
...
{code}

{code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*

  was:
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time 
 is empty
 --

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
  03\:02\:06
 last_index_time=2013-05-06 03\:05\:22
 entity2.last_index_time=2013-05-06 03\:03\:14
 entity3.last_index_time=2013-05-06 03\:05\:22
 {code}
 {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 ...
 requestHandler name=/dataimport 
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
 str name=configdihconfig.xml/str
 /lst
 /requestHandler
 ...
 {code}
 {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field ...
   ... 
 ... /field
 /entity
 entity name=entity2
   ... 
   ...
 /entity
 entity name=entity3
   ... 
   ...
 /entity
 /document
 /dataConfig
 {code} 
 In above setup, *dataimporter.entity1.last_index_time* is *empty

[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty


 [ 
https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chakming wong updated SOLR-4788:


Description: 
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
...


requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdihconfig.xml/str
/lst
/requestHandler
...
{code}

{code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string* and 
cause the sql query having error

  was:
{code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
 03\:02\:06
last_index_time=2013-05-06 03\:05\:22
entity2.last_index_time=2013-05-06 03\:03\:14
entity3.last_index_time=2013-05-06 03\:05\:22
{code}

{code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
...


requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdihconfig.xml/str
/lst
/requestHandler
...
{code}

{code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 
encoding=UTF-8 ?
dataConfig
dataSource name=source1
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://*:*/*
user=* password=*/

document name=strings
entity name=entity1 pk=id dataSource=source1
query=SELECT * FROM table_a
deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified 
 '${dataimporter.entity1.last_index_time}'
deltaImportQuery=SELECT * FROM table_a WHERE id = 
'${dataimporter.entity1.id}'
transformer=TemplateTransformer
field ...
  ... 
... /field
/entity
entity name=entity2
  ... 
  ...
/entity
entity name=entity3
  ... 
  ...
/entity
/document
/dataConfig
{code} 

In above setup, *dataimporter.entity1.last_index_time* is *empty string*


 Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time 
 is empty
 --

 Key: SOLR-4788
 URL: https://issues.apache.org/jira/browse/SOLR-4788
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: chakming wong

 {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06
  03\:02\:06
 last_index_time=2013-05-06 03\:05\:22
 entity2.last_index_time=2013-05-06 03\:03\:14
 entity3.last_index_time=2013-05-06 03\:05\:22
 {code}
 {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 ...
 requestHandler name=/dataimport 
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
 str name=configdihconfig.xml/str
 /lst
 /requestHandler
 ...
 {code}
 {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 
 encoding=UTF-8 ?
 dataConfig
 dataSource name=source1
 type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://*:*/*
 user=* password=*/
 document name=strings
 entity name=entity1 pk=id dataSource=source1
 query=SELECT * FROM table_a
 deltaQuery=SELECT table_a_id FROM table_b WHERE 
 last_modified  '${dataimporter.entity1.last_index_time}'
 deltaImportQuery=SELECT * FROM table_a WHERE id = 
 '${dataimporter.entity1.id}'
 transformer=TemplateTransformer
 field

[jira] [Created] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes

Shai Erera created LUCENE-4982:
--

 Summary: Make MockIndexOutputWrapper check disk full on copyBytes
 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Shai Erera
Assignee: Shai Erera


While working on the consistency test for Replicator (LUCENE-4975), I noticed 
that I don't trip disk-full exceptions and tracked it down to 
MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


 [ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4982:
---

Component/s: (was: general/test)
 modules/test-framework

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera

 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


 [ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4982:
---

Attachment: LUCENE-4982.patch

Patch adds a test to TestMockDirWrapper and factors out checkDiskFull method in 
MockIOWrapper. The signature is a bit ugly, but that's needed because 
checkDiskFull copies the remaining bytes, and writeBytes copies from an array 
while copyBytes from DataInput. I don't think it's the end of the world, but if 
anyone has an idea how to do it better...

I ran core tests and they passed (actually only 3 tests under core set 
dir.maxSize).

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent

2013-05-06 Thread Alexander Buhr (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649604#comment-13649604
 ] 

Alexander Buhr commented on SOLR-3177:
--

is this going to be released at some point?

 Excluding tagged filter in StatsComponent
 -

 Key: SOLR-3177
 URL: https://issues.apache.org/jira/browse/SOLR-3177
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1
Reporter: Mathias H.
Priority: Minor
  Labels: localparams, stats, statscomponent
 Attachments: SOLR-3177.patch


 It would be useful to exclude the effects of some fq params from the set of 
 documents used to compute stats -- similar to 
 how you can exclude tagged filters when generating facet counts... 
 https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
 So that it's possible to do something like this... 
 http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 
 20]q=*:*stats=truestats.field={!ex=priceFilter}price 
 If you want to create a price slider this is very useful because then you can 
 filter the price ([1 TO 20) and nevertheless get the lower and upper bound of 
 the unfiltered price (min=0, max=100):
 {noformat}
 |-[---]--|
 $0 $1 $20$100
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene


[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649613#comment-13649613
 ] 

Adrien Grand commented on LUCENE-4975:
--

+1 to commit too.

Looking at the code, there seems to be specialized implementations for faceting 
because of the need to replicate the taxonomy indexes too, so I was wondering 
that maybe this facet-specific code should be under lucene/facets rather than 
lucene/replicator so that lucene/replicator doesn't need to depend on all 
modules that have specific replication needs. (I'm not sure what the best 
option is yet, this can be addressed afterwards.)

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

[
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649631#comment-13649631
]

Shai Erera commented on LUCENE-4975:

I've been wondering about that too, but chose to keep the facet replication
code under replicator for few reasons:

* A Revision contains files from multiple sources, and the taxonomy index is
partly responsible for that. And ReplicationClient respects that -- so I guess
it's not entirely true that the Replicator is unaware of taxonomy (even though
it would still work if I pulled the taxonomy stuff out of it).

* I think it makes less sense to require lucene-replicator.jar for every
faceted search app which makes use of lucene-facet.jar. The key reason is that
replicator requires few additional jars such as httpclient, httpcore, jetty,
servlet-api. Requiring lucene-facet.jar seems less painful to me, than
requiring every faceted search app out there to include all these jars even if
it doesn't want to do replication.

* I like to keep things local to the module. There are many similarities
between IndexAndTaxoRevision to IndexRevision (likewise for their handlers and
tests). Therefore whenever I made change to one, I knew I should go make a
similar change to the other.

All in all, I guess arguments can be made both ways, but I prefer for the now
to keep things local to the replicator module. Even in the future, I would
imagine that if we added support for replicating a suggester files, then it
would make sense to put a dependency between replicator and suggester, rather
than the other way around.

Add Replication module to Lucene

Key: LUCENE-4975
URL: https://issues.apache.org/jira/browse/LUCENE-4975
Project: Lucene - Core
Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch,
LUCENE-4975.patch

I wrote a replication module which I think will be useful to Lucene users who
want to replicate their indexes for e.g high-availability, taking hot backups
etc.
I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-06 Thread SooMyung Lee (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649633#comment-13649633
]

SooMyung Lee commented on LUCENE-4956:
--

[~cm] I'm sorry that I didn't reply to your comment on the last weekend! I'm
seeing that [~steve_rowe] solved your problem. am I right?
[~steve_rowe] I checked the method. isNounPart() is no more necessary.
Spaces should be inserted between phrases in a korean sentence, but many people
are confused in where inserting spaces.

The isNounPart() method examine if spaces should be inserted at a specific
position only when a noun existing in the dictionary precede it.
After testing, I found that the method is superfluous.
I'm sorry not to correct the source code before contributing.

the korean analyzer that has a korean morphological analyzer and dictionaries
-

Key: LUCENE-4956
URL: https://issues.apache.org/jira/browse/LUCENE-4956
Project: Lucene - Core
Issue Type: New Feature
Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
Labels: newbie
Attachments: kr.analyzer.4x.tar

Korean language has specific characteristic. When developing search service
with lucene solr in korean, there are some problems in searching and
indexing. The korean analyer solved the problems with a korean morphological
anlyzer. It consists of a korean morphological analyzer, dictionaries, a
korean tokenizer and a korean filter. The korean anlyzer is made for lucene
and solr. If you develop a search service with lucene in korean, It is the
best idea to choose the korean analyzer.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4975) Add Replication module to Lucene

[
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-4975:
---

Attachment: LUCENE-4975.patch

bq. maybe also call MDW.setRandomIOExceptionRateOnOpen

Thanks Mike! I added that and a slew of problems surfaced, most of them in the
test, but I improved the handlers' implementation to cleanup after themselves
if e.g. a copy or sync to the handlerDir failed. While this wasn't a bug, it
leaves the target index directory clean.

There's one nocommit which bugs me though -- I had to add
dir.setPreventDoubleWrite(false) because when the handler fails during copying
of say _2.fdt to the index dir, the file is deleted from the indexDir, and the
client re-attemts to upgrade. At this point, MDW complains that _2.fdt was
already written to, even though I deleted it.

Adding this setPrevent was the only way I could make MDW happy, but I don't
like it since I do want to catch errors in the handler/client if they e.g.
attempt to copy over an existing file.

Maybe we can make MDW respond somehow to delete()? I know that has bad
implications on its own, e.g. code which deletes and then accidentally
recreates files with older names ... any ideas?

Add Replication module to Lucene

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene


[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649635#comment-13649635
 ] 

Adrien Grand commented on LUCENE-4975:
--

Good points, you convinced me. :-)

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch, LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest

[
https://issues.apache.org/jira/browse/LUCENE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649638#comment-13649638
]

Shai Erera commented on LUCENE-4980:

I was confused by the name MultiFacetsAccumulator as I thought it takes
something like a MapFacetRequest,FacetsAccumulator, but I see that it only
distinguishes RangeAccumulator from others. So I'm worried about someone gets
confused about the name and use it incorrectly. I don't have a better name in
mind though ... RangeAndRegularFacetsAccumulator?

What if RangeAccumulator did that under the covers? I.e. instead of rejecting
non-RangeFacetRequest, it created FA over all such requests? Multi is quite
simple though, so I like it .. maybe FacetAccumulatorRangeWrapper? I think as
long as we keep the word Range in the name, it's less likely users will get
confused.

Minor comments about the class: (a) can you rename 'a' and 'ra'? (b) why do you
need to hold onto fspOrig? Is it because FA.searchParams isn't final?

Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
-

Key: LUCENE-4980
URL: https://issues.apache.org/jira/browse/LUCENE-4980
Project: Lucene - Core
Issue Type: Bug
Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, 4.4

Attachments: LUCENE-4980.patch

I tried to combine these two and there were several issues:
* It's ... really tricky to manage the two different
FacetAccumulators across that N FacetCollectors that DrillSideways
creates ... to fix this I added a new MultiFacetsAccumulator that
switches for you.
* There was still one place in DS/DDQ that wasn't properly handling
a non-Term drill-down.
* There was a bug in the collector method for DrillSideways
whereby if a given segment had no hits, it was skipped, which is
incorrect because it must still be visited to tally up the
sideways counts.
* Separately I noticed that DrillSideways was doing too much work:
it would count up drill-down counts *and* drill-sideways counts
against the same dim (but then discard the drill-down counts in
the end).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene


[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649643#comment-13649643
 ] 

Robert Muir commented on LUCENE-4975:
-

{quote}
Even in the future, I would imagine that if we added support for replicating a 
suggester files, then it would make sense to put a dependency between 
replicator and suggester, rather than the other way around.
{quote}

Wait: how does this make sense?!

It should be the other way around: if suggester has a sidecar it needs special 
logic for replication. 

It does not need faceting.

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch, LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

[
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649650#comment-13649650
]

Shai Erera commented on LUCENE-4975:

As I said, arguments can be made both ways ... I don't know what's the best way
here. I can see your point, but I don't feel good about having facet depend on
replicator. I see Replicator as a higher-level service that besides providing
the replication framework, also comes pre-built for replicating Lucene stuff. I
don't mind seeing it grow to accommodate other Revision types in the future.
For example, IndexAndTaxonomyRevision is just an example for replicating
multiple indexes together. It can easily be duplicated to replicate few indexes
at once, e.g. a MultiIndexRevision. Where would that object be? Cannot be in
core, so why should IndexAndTaxo be in facet?

Add Replication module to Lucene

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes

2013-05-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649654#comment-13649654
 ] 

Michael McCandless commented on LUCENE-4982:


+1, good catch.

Who tests the tester!

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene


[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649655#comment-13649655
 ] 

Adrien Grand commented on LUCENE-4975:
--

Then maybe we could have sub-modules for specific replication strategies? 
lucene/replicator would only know how to handle raw indexes, while 
lucene/replicator/facets or lucene/replicator/suggest would implement custom 
logic?

This way lucene/facet wouldn't need to pull all lucene/replicator transitive 
dependencies, and lucene/replicator wouldn't depend on any lucene module but 
lucene/core.


 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch, LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene


[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649658#comment-13649658
 ] 

Robert Muir commented on LUCENE-4975:
-

I still haven't had a change to look at the patch: but it sounds like some work 
needs to be done here to prevent dll hell.

having replicator depend upon all sidecar modules is a no-go.

it sounds like an interface is missing.

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch, LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest

2013-05-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649661#comment-13649661
 ] 

Michael McCandless commented on LUCENE-4980:


bq. What if RangeAccumulator did that under the covers?

Well ... I have a TODO to also support SortedSetDocValuesAccumulator.  So I'm 
not quite sure what to name it / where to put it.

Another option here is to commit this class only under src/test ... it's 
technically only needed right now by the test case to expose the bugs ... but 
then I'm using the class in the Jira search app, because I need to use 
DrillSideways with range and non-range facets, and without it things get very 
messy.  So we need to fix something here, but we can do it in a separate issue 
after fixing these bugs.

bq. Minor comments about the class: (a) can you rename 'a' and 'ra'? 

Will do ...

bq. (b) why do you need to hold onto fspOrig? Is it because FA.searchParams 
isn't final?

I need fspOrig in accumulator() to un-collate the wrapped ListFacetResult 
back in the same order as the original requests ...

 Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
 -

 Key: LUCENE-4980
 URL: https://issues.apache.org/jira/browse/LUCENE-4980
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4980.patch


 I tried to combine these two and there were several issues:
   * It's ... really tricky to manage the two different
 FacetAccumulators across that N FacetCollectors that DrillSideways
 creates ... to fix this I added a new MultiFacetsAccumulator that
 switches for you.
   * There was still one place in DS/DDQ that wasn't properly handling
 a non-Term drill-down.
   * There was a bug in the collector method for DrillSideways
 whereby if a given segment had no hits, it was skipped, which is
 incorrect because it must still be visited to tally up the
 sideways counts.
   * Separately I noticed that DrillSideways was doing too much work:
 it would count up drill-down counts *and* drill-sideways counts
 against the same dim (but then discard the drill-down counts in
 the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4785) New MaxScoreQParserPlugin

2013-05-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-4785:
--

Attachment: SOLR-4785.patch

First patch with tests and support for tie parameter

 New MaxScoreQParserPlugin
 -

 Key: SOLR-4785
 URL: https://issues.apache.org/jira/browse/SOLR-4785
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4785.patch


 A customer wants to contribute back this component.
 It is a QParser which behaves exactly like lucene parser (extends it), but 
 returns the Max score from the clauses, i.e. max(c1,c2,c3..) instead of the 
 default which is sum(c1,c2,c3...). It does this by wrapping all SHOULD 
 clauses in a DisjunctionMaxQuery with tie=1.0. Any MUST or PROHIBITED clauses 
 are passed through as-is. Non-boolean queries, e.g. NumericRange 
 falls-through to lucene parser.
 To use, add to solrconfig.xml:
 {code:xml}
   queryParser name=maxscore class=solr.MaxScoreQParserPlugin/
 {code}
 Then use it in a query
 {noformat}
 q=A AND B AND {!maxscore v=$max}max=C OR (D AND E)
 {noformat}
 This will return the score of A+B+max(C,sum(D+E))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

2013-05-06 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649670#comment-13649670
 ] 

Tim Allison commented on LUCENE-949:


Steve, no problem on the delay.  Thank you for your help!  Changes sound great. 
 Thank you.



 AnalyzingQueryParser can't work with leading wildcards.
 ---

 Key: LUCENE-949
 URL: https://issues.apache.org/jira/browse/LUCENE-949
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 2.2
Reporter: Stefan Klein
 Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch


 The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
 changes to accept leading wildcards:
   protected Query getWildcardQuery(String field, String termStr) throws 
 ParseException
   {
   String useTermStr = termStr;
   String leadingWildcard = null;
   if (*.equals(field))
   {
   if (*.equals(useTermStr))
   return new MatchAllDocsQuery();
   }
   boolean hasLeadingWildcard = (useTermStr.startsWith(*) || 
 useTermStr.startsWith(?)) ? true : false;
   if (!getAllowLeadingWildcard()  hasLeadingWildcard)
   throw new ParseException('*' or '?' not allowed as 
 first character in WildcardQuery);
   if (getLowercaseExpandedTerms())
   {
   useTermStr = useTermStr.toLowerCase();
   }
   if (hasLeadingWildcard)
   {
   leadingWildcard = useTermStr.substring(0, 1);
   useTermStr = useTermStr.substring(1);
   }
   List tlist = new ArrayList();
   List wlist = new ArrayList();
   /*
* somewhat a hack: find/store wildcard chars in order to put 
 them back
* after analyzing
*/
   boolean isWithinToken = (!useTermStr.startsWith(?)  
 !useTermStr.startsWith(*));
   isWithinToken = true;
   StringBuffer tmpBuffer = new StringBuffer();
   char[] chars = useTermStr.toCharArray();
   for (int i = 0; i  useTermStr.length(); i++)
   {
   if (chars[i] == '?' || chars[i] == '*')
   {
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = false;
   }
   else
   {
   if (!isWithinToken)
   {
   wlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = true;
   }
   tmpBuffer.append(chars[i]);
   }
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   }
   else
   {
   wlist.add(tmpBuffer.toString());
   }
   // get Analyzer from superclass and tokenize the term
   TokenStream source = getAnalyzer().tokenStream(field, new 
 StringReader(useTermStr));
   org.apache.lucene.analysis.Token t;
   int countTokens = 0;
   while (true)
   {
   try
   {
   t = source.next();
   }
   catch (IOException e)
   {
   t = null;
   }
   if (t == null)
   {
   break;
   }
   if (!.equals(t.termText()))
   {
   try
   {
   tlist.set(countTokens++, t.termText());
   }
   catch (IndexOutOfBoundsException ioobe)
   {
   countTokens = -1;
   }
   }
   }
   try
   {
   source.close();
   }
   catch (IOException e)
   {
   // ignore

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

[
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649674#comment-13649674
]

Shai Erera commented on LUCENE-4975:

Ok, so there are 3 options I see: (1) have Replicator depend on Facet (and in
the future on other modules), (2) have Facet depend on Replicator and (3) move
Revision and ReplicationHandler (interfaces) someplace else, core or a new
module we call 'commons' and Replicator and Facet depend on it. Tests though
will need to depend on replicator though, since they need ReplicationClient.

BTW, the jetty dependencies are tests only, but I don't know how to make ivy
resolve the dependencies just for tests. The only thing replicator depends on
is servlet-api, for ReplicationService and httpclient for ReplicationClient. I
think these need to remain in the module ...

If we made Facet depend on Replicator (I'm not totally against it), would that
require you to have lucene-replicator.jar on the classpath, even if you don't
use replication? If not, then perhaps this dependency isn't so bad ... it's
just a compile-time dependency. Tests will still need to depend on replicator
for runtime, but that's ok I think.

Add Replication module to Lucene

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib


[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649692#comment-13649692
 ] 

Joel Bernstein commented on SOLR-4787:
--

Thanks David! Yeah, agreed the BSearch class is not ideal. I'll have a look at 
the SorterTemplate and get the integers sorted in place.


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib


[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649695#comment-13649695
 ] 

Adrien Grand commented on SOLR-4787:


Hi Joel. {{SorterTemplate}} has just been refactored into 
{{org.apache.lucene.util.Sorter}} (LUCENE-4946). You can have a look at 
Passage.sort() 
(https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/Passage.java)
 to see how to use it to sort parallel arrays.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


[ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649702#comment-13649702
 ] 

Robert Muir commented on LUCENE-4982:
-

Its not clear to me if with the patch we will double-count against disk full if 
copyBytes calls writeBytes behind the scenes...

Maybe we can make the test have a max size of 2 bytes and copyBytes twice to it 
just so this is obvious?

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib


[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649704#comment-13649704
 ] 

Joel Bernstein commented on SOLR-4787:
--

Hi Adrien, thanks for the information. I'll take a look at the Sorter today.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


[ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649711#comment-13649711
 ] 

Shai Erera commented on LUCENE-4982:


I can modify the test sure. But the problem is that copyBytes doesn't call 
writeBytes, otherwise I would have tripped it. I.e., we call 
delegate.copyBytes, which internally may call *its* writeBytes, but not 
MockIO.writeBytes.

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


 [ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4982:
---

Attachment: LUCENE-4982.patch

I modified the test to set maxSize=2 and then write 2 bytes in two calls. The 
first should succeed, the second fail. However, even the first fails and now I 
don't know if it's a bug in the test or MockIO.checkDiskFull(). The latter 
(copy of the original code) does {{freeSpace = len}} -- is this ok? I mean, if 
I have room for 2 bytes and the caller asks to write 2 bytes, should we really 
fail on diskFull?

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch, LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[ANNOUNCE] Apache Lucene 4.3 released

2013-05-06 Thread Simon Willnauer

May 2013, Apache Lucene™ 4.3 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.3

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text search, especially
cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release is
available for immediate download at:
   http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.3 Release Highlights:

* Significant performance improvements for minShouldMatch BooleanQuery due to
  skipping resulting in up to 4000% faster queries.

* A new SortingAtomicReader which allows sorting an index based on a
sort criteria (e.g.
  a numeric DocValues field), as well as SortingMergePolicy which
sorts documents before
  segments are merged.

* DocIdSetIterator and Scorer now has a cost API that provides an upper bound
  of the number of documents the iterator might match. This API allows
optimisation
  during query execution or how filters are applied.

* Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a
payload. The suggesters
  also use an ending offset to determine whether the last token was
finished or not, so that
  a query i  will no longer suggest Isla de Muerta for example.

* Lucene Spatial Module can now search for indexed shapes by Within,
Contains, and Disjoint
  relationships, in addition to typical Intersects.

* PostingsHighlighter now allows custom passage scores, per-field
BreakIterators and has been
  detached from TopDocs. Additionally, subclasses can override where
string values for highlighting
  are pulled from alternatively to stored fields.

* New SearcherTaxonomyManager manages near-real-time reopens of both
IndexSearcher and
  TaxonomyReader (for faceting).

* Added new facet method to the facet module to compute facet counts
using   SortedSetDocValuesField, without a separate taxonomy index.

- DrillSideways class, for computing sideways facet counts, is now
more flexible: it allows more
  than one FacetRequest per dimension and now allows drilling down on
dimensions that do not have a facet request.

- Various bugfixes and optimizations since the 4.2.1 release.

Please read CHANGES.txt for a full list of new features.

Please report any feedback to the mailing lists
(http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases.  It is possible that the mirror you
are using may not have replicated the release yet.  If that is the
case, please try another mirror.  This also goes for Maven access.

Happy searching,
Lucene/Solr developers

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene


[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649727#comment-13649727
 ] 

Adrien Grand commented on LUCENE-4975:
--

bq. Then maybe we could have sub-modules for specific replication strategies?

To make my point a little clearer, I was suggesting something pretty much like 
the analysis module: analyzers that require additional dependencies (such as 
icu or morfologik) are in their own sub-module so that you don't need to pull 
the ICU or Morfologik JARs if you just want to use LetterTokenizer (which is in 
lucene/analysis/common).

Likewise, we could have the interface and the logic to replicate simple (no 
sidecar data) indexes in lucene/replicator/common and have sub-modules for 
facet (lucene/replicator/facet) or suggesters (lucene/replicator/suggesters).

This may look overkill but at least this would help us keep dependencies clean 
between modules.

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch, LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[ANNOUNCE] Apache Solr 4.3 released

2013-05-06 Thread Simon Willnauer

May 2013, Apache Solr™ 4.3 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.3.

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.3 is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.3.0 Release Highlights:

* Tired of maintaining core information in solr.xml? Now you can configure
  Solr to automatically find cores by walking an arbitrary directory.

* Shard Splitting: You can now split SolrCloud shards to expand your cluster as
  you grow.

* The read side schema REST API has been improved and expanded upon: all schema
  information is now available and the full live schema can now be returned in
  json or xml.  Ground work is included for the upcoming write side of the
  schema REST API.

* Spatial queries can now search for indexed shapes by IsWithin,
Contains and
  IsDisjointTo relationships, in addition to typical Intersects.

* Faceting now supports local parameters for faceting on the same field with
  different options.

* Significant performance improvements for minShouldMatch (mm) queries due to
  skipping resulting in up to 4000% faster queries.

* Various new highlighting configuration parameters.

* A new solr.xml format that is closer to that of solrconfig.xml. The example
  still uses the old format, but 4.4 will ship with the new format.

* Lucene 4.3.0 bug fixes and optimizations.

Solr 4.3.0 also includes many other new features as well as numerous
optimizations and bugfixes.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

Happy searching,
Lucene/Solr developers

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

[
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649731#comment-13649731
]

Shai Erera commented on LUCENE-4975:

I think that's not a bad idea! replicator/common will include the interfaces
(Revision and ReplicationHandler) + the framework impl and also
IndexRevision/Handler. replicator/facet will include the taxonomy parts and
depend on replicator/common and facet.

I can also move the facet related code under oal.replicator.facet and then
suppress the Lucene3x codec for just these tests.

If others agree, I'll make the changes (mostly build.xml changes).

Add Replication module to Lucene

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4662) Finalize what we're going to do with solr.xml, auto-discovery, config sets.

2013-05-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649750#comment-13649750
 ] 

Jan Høydahl commented on SOLR-4662:
---

Where did the sharedLib stuff go in the new solr.xml? Will it work with {{str 
name=sharedLiblib/str}}? This should be documented in XML comments.

 Finalize what we're going to do with solr.xml, auto-discovery, config sets.
 ---

 Key: SOLR-4662
 URL: https://issues.apache.org/jira/browse/SOLR-4662
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Blocker
 Fix For: 4.3, 5.0

 Attachments: SOLR-4662.patch, SOLR-4662.patch, SOLR-4662.patch, 
 SOLR-4662.patch, SOLR-4662.patch, SOLR-4662.patch


 Spinoff from SOLR-4615, breaking it out here so we can address the changes in 
 pieces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [ANNOUNCE] Apache Lucene 4.3 released

2013-05-06 Thread Uwe Schindler

Congratulations!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@gmail.com]
 Sent: Monday, May 06, 2013 3:08 PM
 To: dev@lucene.apache.org; java-user; gene...@lucene.apache.org;
 annou...@apache.org
 Subject: [ANNOUNCE] Apache Lucene 4.3 released
 
 May 2013, Apache Lucene™ 4.3 available
 The Lucene PMC is pleased to announce the release of Apache Lucene 4.3
 
 Apache Lucene is a high-performance, full-featured text search engine
 library written entirely in Java. It is a technology suitable for nearly any
 application that requires full-text search, especially cross-platform.
 
 This release contains numerous bug fixes, optimizations, and improvements,
 some of which are highlighted below. The release is available for immediate
 download at:
http://lucene.apache.org/core/mirrors-core-latest-redir.html
 
 See the CHANGES.txt file included with the release for a full list of details.
 
 Lucene 4.3 Release Highlights:
 
 * Significant performance improvements for minShouldMatch BooleanQuery
 due to
   skipping resulting in up to 4000% faster queries.
 
 * A new SortingAtomicReader which allows sorting an index based on a sort
 criteria (e.g.
   a numeric DocValues field), as well as SortingMergePolicy which sorts
 documents before
   segments are merged.
 
 * DocIdSetIterator and Scorer now has a cost API that provides an upper
 bound
   of the number of documents the iterator might match. This API allows
 optimisation
   during query execution or how filters are applied.
 
 * Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a
 payload. The suggesters
   also use an ending offset to determine whether the last token was finished
 or not, so that
   a query i  will no longer suggest Isla de Muerta for example.
 
 * Lucene Spatial Module can now search for indexed shapes by Within,
 Contains, and Disjoint
   relationships, in addition to typical Intersects.
 
 * PostingsHighlighter now allows custom passage scores, per-field
 BreakIterators and has been
   detached from TopDocs. Additionally, subclasses can override where string
 values for highlighting
   are pulled from alternatively to stored fields.
 
 * New SearcherTaxonomyManager manages near-real-time reopens of both
 IndexSearcher and
   TaxonomyReader (for faceting).
 
 * Added new facet method to the facet module to compute facet counts
 using   SortedSetDocValuesField, without a separate taxonomy index.
 
 - DrillSideways class, for computing sideways facet counts, is now more
 flexible: it allows more
   than one FacetRequest per dimension and now allows drilling down on
 dimensions that do not have a facet request.
 
 - Various bugfixes and optimizations since the 4.2.1 release.
 
 Please read CHANGES.txt for a full list of new features.
 
 Please report any feedback to the mailing lists
 (http://lucene.apache.org/core/discussion.html)
 
 Note: The Apache Software Foundation uses an extensive mirroring network
 for distributing releases.  It is possible that the mirror you are using may 
 not
 have replicated the release yet.  If that is the case, please try another 
 mirror.
 This also goes for Maven access.
 
 Happy searching,
 Lucene/Solr developers
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: solr no longer webapp

2013-05-06 Thread Alexey Serba


 * Shouldn't we be able to plug and play the underlying http layer
 technology?
 * shouldn't we be able to try and use embedded jetty and its nice
 integration with guice+restlet? Check out using netty?


+1

[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


 [ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4982:
---

Attachment: LUCENE-4982.patch

I changed the check to {{freeSpace  len}}, but then the test failed to trip 
disk-full the second time, unless I call out.flush() in between. Debugging 
tells me that RAMOutputStream sets RAMFile.length only on flush(), therefore 
even if I attempt to write a 2K byte[] (with maxSize=2), the test doesn't fail.

Seems like getRecomputedActualSizeInBytes is not very useful, since if the Dir 
is not RAMDir, it just calls sizeInBytes() which computes the size from the 
file-system, and if it is, then RAMFile.length isn't up-to-date, leading to 
incorrect (0) size computed, unless some files were flushed already.

But getRecomputed cannot flush the streams either in that case ...

So I think I'll leave the test like that. In a real test which wants to trip on 
disk-full, it will usually involve indexing, hence files will be flushed and 
recomputed will return some number, not really the actual number of bytes used, 
but some number.

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch, LUCENE-4982.patch, LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes

[
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649784#comment-13649784
]

Shai Erera edited comment on LUCENE-4982 at 5/6/13 3:13 PM:

I changed the check to {{freeSpace len}}, but then the test failed to trip
disk-full the second time, unless I call out.flush() in between. Debugging
tells me that RAMOutputStream sets RAMFile.length only on flush(), therefore
even if I attempt to write a 2K byte[] (with maxSize=2), the test doesn't fail.

Seems like getRecomputedActualSizeInBytes is not very accurate. It only returns
the size of the flushed files (even for FSDir). This may be ok, dunno. It just
felt wrong for RAMDirectory, since there is no real buffering happening.

Anyway, I guess we'll have to live with that. Disk-full is anyway a best
effort, so in this test, I'll just call flush(). In real tests that want to
trip disk-full, usually indexing happens and therefore files get flushed, and
the size measure is closer.

was (Author: shaie):
I changed the check to {{freeSpace len}}, but then the test failed to
trip disk-full the second time, unless I call out.flush() in between. Debugging
tells me that RAMOutputStream sets RAMFile.length only on flush(), therefore
even if I attempt to write a 2K byte[] (with maxSize=2), the test doesn't fail.

Seems like getRecomputedActualSizeInBytes is not very useful, since if the Dir
is not RAMDir, it just calls sizeInBytes() which computes the size from the
file-system, and if it is, then RAMFile.length isn't up-to-date, leading to
incorrect (0) size computed, unless some files were flushed already.

But getRecomputed cannot flush the streams either in that case ...

So I think I'll leave the test like that. In a real test which wants to trip on
disk-full, it will usually involve indexing, hence files will be flushed and
recomputed will return some number, not really the actual number of bytes used,
but some number.

Make MockIndexOutputWrapper check disk full on copyBytes

Key: LUCENE-4982
URL: https://issues.apache.org/jira/browse/LUCENE-4982
Project: Lucene - Core
Issue Type: Improvement
Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
Attachments: LUCENE-4982.patch, LUCENE-4982.patch, LUCENE-4982.patch

While working on the consistency test for Replicator (LUCENE-4975), I noticed
that I don't trip disk-full exceptions and tracked it down to
MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd
like to add this check.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4981) Deprecate PositionFilter


 [ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4981:
-

Attachment: LUCENE-4981.patch

Here is the patch for 4.x. The patch for trunk is simpler as PositionFilter and 
PositionFilterFactory would simply be removed.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-06 Thread Christian Moen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649833#comment-13649833
 ] 

Christian Moen commented on LUCENE-4956:


bq. I think we're ready for the incubator-general vote. [~cm], do you agree?

+1 

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode


[ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649856#comment-13649856
 ] 

Commit Tag Bot commented on SOLR-3240:
--

[trunk commit] jdyer
http://svn.apache.org/viewvc?view=revisionrevision=1479638

SOLR-3240: add spellcheck.collateMaxCollectDocs for estimating collation 
hit-counts.

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3240) add spellcheck 'approximate collation count' mode

2013-05-06 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer reassigned SOLR-3240:


Assignee: James Dyer

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
Assignee: James Dyer
 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-06 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649857#comment-13649857
 ] 

Jack Krupansky commented on LUCENE-4956:


I am not really familiar with the incubator-general vote. From looking at the 
legal clearance page, it sounds like the vote is simply accepting the 
donation, as opposed to voting that the branch is ready to commit to trunk, 
correct?

I did a Jira search and found no previous references to incubator-general 
vote - from Google search I got the impression it was more related to podlings 
rather than simple code module contributions.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

A couple of high level Solr issues

2013-05-06 Thread Shawn Heisey

A solr-user list discussion led to some general thoughts about Solr that 
I think need some further discussion.  I'm ready to open an issue, I 
just thought it might be better to define a direction first.


Some of the option/attribute names in config files aren't 
self-descriptive, or have become not quite correct due to Solr's 
evolution.  One example is instanceDir but there are probably others.


I'm not sure that instanceDir was ever a good name.  It probably made 
complete sense when multicore first came into being, as it was meant to 
replace multiple instances of Solr.  Coming up with a better name is a 
little tricky.  A simple and currently relevant replacement would be 
coreDir ... but if you're using SolrCloud, Jack Krupansky has put 
forth some names that might be better: replicaDir, shardReplicaDir, or 
even the wordy but extremely accurate collectionShardReplicaDir.


In recent years, Solr has evolved from multicore-capable to multicore in 
even the simple example.   If we have a similar migration so that all 
Solr installations are SolrCloud installations, then having replicas 
instead of cores (replacing instanceDir with replicaDir) might be the 
right way to go.  Will we ever have a larger abstraction than 
collections?  If we think that could ever happen, we should probably 
think of a name for it.


I think that we need to start a general overhaul of various identifiers 
in config files and APIs, planning ahead to accommodate future 
(in)sanity.  Because it could be extremely disruptive to ongoing 
development, that probably needs to happen in a branch.


Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Including JTS in an Apache project

2013-05-06 Thread Ahmed Eldawy

Thanks Dave for the detailed response. As I understand, spatial 4j is a
separate project that acts as a plugin to solr and Lucene. However, it is
still licensed under Apache license. Does including it as is inside Lucene
or solr break the Apache license or you have it as a separate project for
another reason?
I took a loot at spatial 4j and it looks very nice. I like the idea that
you define your own interface and use JTS as another implementation for
that interface. However, I don't think I will be able to use it in Pig for
one reason. As per their website, JTS conforms with the OGC standard for
SQL [http://www.opengeospatial.org/standards]. It is important to follow
the OGC for the addition I'm proposing to Pig as it makes it more
acceptable in the GIS community which I'm targeting. I talked with people
from different industrial and research organizations and they all said that
they can only use it if it conforms with OGC standards just as JTS and
PostGIS [http://postgis.net/] do.
What I can do for now is to make this extension as a separate open source
project under the Apache license. However, as far as I understand, I cannot
merge this extension with Apache Pig unless we resolve the license issue.

Thanks
Ahmed


Best regards,
Ahmed Eldawy


On Sun, May 5, 2013 at 11:38 PM, David Smiley (@MITRE.org) 
dsmi...@mitre.org wrote:

 Hi Ahmed,

 I faced your conundrum with JTS early last year.  As you know, the Apache
 Software Foundation doesn't like it's projects depending on GPL and even
 LGPL licensed libraries.  The ASF does not have clear unambiguous language
 on how its projects can depend on them in a limited sense.  Different PMCs
 (projects) have different standards.  I've heard of one project (CXF?) that
 uses Java reflection to use an LGPL library.  I think another downloads the
 LGPL library as part of the build, and then the code has a compile-time
 dependency (I could be mistaken).  If memory serves, in both cases the
 dependency fit an optional role and not a core purpose of the software.
  The
 Lucene PMC in particular didn't formally vote to my knowledge but there was
 a time when it was clear to me that such approaches were not acceptable.

 The approach that the Lucene spatial developers took (me, Ryan, Chris) was
 to create a non-ASF project called Spatial4j that is ASL licensed.
 Spatial4j *optionally* depends on JTS -- it's only for advanced shapes
 (namely polygons) and for WKT parsing.
 https://github.com/spatial4j/spatial4j  BTW, WKT parsing will be handled
 by
 Spatial4j itself in the near future without JTS. Spatial4j is not a subset
 of JTS; it critically has things JTS doesn't like a native circle (not a
 polygon approximation) and the concept of the world being a sphere instead
 of flat ;-)  That's right, JTS, as critical as it is in the world of
 open-source spatial, doesn't have any geodetic calculations, just
 Euclidean.
 Spatial4j adds dateline wrap support to JTS shapes so you can represent
 Fiji
 for example, but not yet Antarctica (no pole wrap).  So I encourage the
 Apache Pig project to take a look at using Spatial4j instead of directly
 using JTS for the same reasons that the Lucene project uses it.  If you
 ultimately decide not to then please let me know why, as I see Spatial4j
 being an excellent fit for ASF projects in particular because of the
 licensing issue.

 So your statement Apache Solr *uses* JTS is incorrect.  No it doesn't,
 and
 nor does Lucene; not at all.  Instead, those projects use Spatial4j, which
 has an abstraction (Shape), and it has an implementation of that
 abstraction
 that depends on JTS.  It also has implementations that don't depend on JTS.

 p.s. Last week I did a long presentation on Spatial in
 Lucene/Solr/Spatial4j
 and I'd be happy to share the slides with you. The organizers will but they
 haven't yet.

 ~ David Smiley


 Ahmed El-dawy wrote
  Hi all,
   I saw that Apache solr uses JTS (Java Topology Suite) [
  http://www.vividsolutions.com/jts/JTSHome.htm] for supporting a spatial
  data type [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4].
  Using JTS in an Apache project is not a straight forward thing as JTS is
  licensed under LGPL which has some compatibility issued when included in
  an
  Apache project. Now, I need to dome something very similar in another
  Apache project (Pig [http://pig.apache.org/]) and I'm faced with the
  licensing issue. I'm asking for your advice for the best way we can do to
  use JTS without breaking the license issue. Does referring to JTS classes
  from the code of an Apache project without actually including the classes
  violate the license? Do we have to load the classes dynamically (using
  Class#forName) or there is another way to do it?
  Thanks in advance
 
  Best regards,
  Ahmed Eldawy





 -
  Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context:

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649868#comment-13649868
]

Robert Muir commented on LUCENE-4956:
-

Jack, thats correct.

It is a vote for IP clearance. For example, Simon called an IP clearance vote
on the incubator list for Kuromoji before we integrated it into Lucene.

the korean analyzer that has a korean morphological analyzer and dictionaries
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode


[ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649873#comment-13649873
 ] 

Commit Tag Bot commented on SOLR-3240:
--

[branch_4x commit] jdyer
http://svn.apache.org/viewvc?view=revisionrevision=1479644

SOLR-3240: add spellcheck.collateMaxCollectDocs for estimating collation 
hit-counts.

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
Assignee: James Dyer
 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode


[ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649874#comment-13649874
 ] 

Robert Muir commented on SOLR-3240:
---

Thanks for taking care James: nice work

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
Assignee: James Dyer
 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: A couple of high level Solr issues

2013-05-06 Thread Jack Krupansky

And multicore is one of my examples of what should go away as a legacy 
term. It should be simply multiple collections, independent of whether it 
is single node and single-shard, regardless of whether cloud/distrib is 
involved.


Actually, multicore appears to be two distinct use cases:

1. Multiple collections.
2. Replication.

-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Monday, May 06, 2013 12:58 PM
To: dev@lucene.apache.org
Subject: A couple of high level Solr issues

A solr-user list discussion led to some general thoughts about Solr that
I think need some further discussion.  I'm ready to open an issue, I
just thought it might be better to define a direction first.

Some of the option/attribute names in config files aren't
self-descriptive, or have become not quite correct due to Solr's
evolution.  One example is instanceDir but there are probably others.

I'm not sure that instanceDir was ever a good name.  It probably made
complete sense when multicore first came into being, as it was meant to
replace multiple instances of Solr.  Coming up with a better name is a
little tricky.  A simple and currently relevant replacement would be
coreDir ... but if you're using SolrCloud, Jack Krupansky has put
forth some names that might be better: replicaDir, shardReplicaDir, or
even the wordy but extremely accurate collectionShardReplicaDir.

In recent years, Solr has evolved from multicore-capable to multicore in
even the simple example.   If we have a similar migration so that all
Solr installations are SolrCloud installations, then having replicas
instead of cores (replacing instanceDir with replicaDir) might be the
right way to go.  Will we ever have a larger abstraction than
collections?  If we think that could ever happen, we should probably
think of a name for it.

I think that we need to start a general overhaul of various identifiers
in config files and APIs, planning ahead to accommodate future
(in)sanity.  Because it could be extremely disruptive to ongoing
development, that probably needs to happen in a branch.

Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries


[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649879#comment-13649879
 ] 

Steve Rowe commented on LUCENE-4956:


Hi Jack, 

From [http://incubator.apache.org/ip-clearance/], which is (quoting from that 
page):

{quote}
Intellectual property clearance

One of the Incubator's roles is to ensure that proper attention is paid to 
intellectual property. From time to time, an external codebase is brought into 
the ASF that is not a separate incubating project but still represents a 
substantial contribution that was not developed within the ASF's source control 
system and on our public mailing lists. This is a short form of the Incubation 
checklist, designed to allow code to be imported with alacrity while still 
providing for oversight.
[...]
Once a PMC directly checks-in a filled-out short form, the Incubator PMC will 
need to approve the paper work after which point the receiving PMC is free to 
import the code.
{quote}

The short form referred to above is an XML template, which I've completed for 
this code base, and which is at some (apparently regular?) interval converted 
to HTML (this is also linked from the above-linked IP clearance page as Korean 
Analyzer): 
[http://incubator.apache.org/ip-clearance/lucene-korean-analyzer.html]

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode


[ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649882#comment-13649882
 ] 

Commit Tag Bot commented on SOLR-3240:
--

[trunk commit] jdyer
http://svn.apache.org/viewvc?view=revisionrevision=1479645

SOLR-3240: add spellcheck.collateMaxCollectDocs (removing dead code).

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
Assignee: James Dyer
 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode


[ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649884#comment-13649884
 ] 

Commit Tag Bot commented on SOLR-3240:
--

[branch_4x commit] jdyer
http://svn.apache.org/viewvc?view=revisionrevision=1479647

SOLR-3240: add spellcheck.collateMaxCollectDocs (removing dead code).

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
Assignee: James Dyer
 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: A couple of high level Solr issues

2013-05-06 Thread Shawn Heisey


On 5/6/2013 10:58 AM, Shawn Heisey wrote:

A solr-user list discussion led to some general thoughts about Solr that
I think need some further discussion.  I'm ready to open an issue, I
just thought it might be better to define a direction first.


This started out as an email about two issues, but in the end I decided 
to only put one of them in this email, so the subject is wrong!



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [ANNOUNCE] Apache Lucene 4.3 released

2013-05-06 Thread Erick Erickson

Great! well done...

On Mon, May 6, 2013 at 10:03 AM, Uwe Schindler u...@thetaphi.de wrote:
 Congratulations!

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@gmail.com]
 Sent: Monday, May 06, 2013 3:08 PM
 To: dev@lucene.apache.org; java-user; gene...@lucene.apache.org;
 annou...@apache.org
 Subject: [ANNOUNCE] Apache Lucene 4.3 released

 May 2013, Apache Lucene™ 4.3 available
 The Lucene PMC is pleased to announce the release of Apache Lucene 4.3

 Apache Lucene is a high-performance, full-featured text search engine
 library written entirely in Java. It is a technology suitable for nearly any
 application that requires full-text search, especially cross-platform.

 This release contains numerous bug fixes, optimizations, and improvements,
 some of which are highlighted below. The release is available for immediate
 download at:
http://lucene.apache.org/core/mirrors-core-latest-redir.html

 See the CHANGES.txt file included with the release for a full list of 
 details.

 Lucene 4.3 Release Highlights:

 * Significant performance improvements for minShouldMatch BooleanQuery
 due to
   skipping resulting in up to 4000% faster queries.

 * A new SortingAtomicReader which allows sorting an index based on a sort
 criteria (e.g.
   a numeric DocValues field), as well as SortingMergePolicy which sorts
 documents before
   segments are merged.

 * DocIdSetIterator and Scorer now has a cost API that provides an upper
 bound
   of the number of documents the iterator might match. This API allows
 optimisation
   during query execution or how filters are applied.

 * Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a
 payload. The suggesters
   also use an ending offset to determine whether the last token was finished
 or not, so that
   a query i  will no longer suggest Isla de Muerta for example.

 * Lucene Spatial Module can now search for indexed shapes by Within,
 Contains, and Disjoint
   relationships, in addition to typical Intersects.

 * PostingsHighlighter now allows custom passage scores, per-field
 BreakIterators and has been
   detached from TopDocs. Additionally, subclasses can override where string
 values for highlighting
   are pulled from alternatively to stored fields.

 * New SearcherTaxonomyManager manages near-real-time reopens of both
 IndexSearcher and
   TaxonomyReader (for faceting).

 * Added new facet method to the facet module to compute facet counts
 using   SortedSetDocValuesField, without a separate taxonomy index.

 - DrillSideways class, for computing sideways facet counts, is now more
 flexible: it allows more
   than one FacetRequest per dimension and now allows drilling down on
 dimensions that do not have a facet request.

 - Various bugfixes and optimizations since the 4.2.1 release.

 Please read CHANGES.txt for a full list of new features.

 Please report any feedback to the mailing lists
 (http://lucene.apache.org/core/discussion.html)

 Note: The Apache Software Foundation uses an extensive mirroring network
 for distributing releases.  It is possible that the mirror you are using may 
 not
 have replicated the release yet.  If that is the case, please try another 
 mirror.
 This also goes for Maven access.

 Happy searching,
 Lucene/Solr developers

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct

[
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649960#comment-13649960
]

Andy Fowler commented on SOLR-4773:
---

I'm thinking this is the cause of a bug I'm seeing in the 4.3.0 release. To
reproduce using the multicore example:

* echo solr/solr multicore/solr.xml to put it into core discovery mode
* place a core.properties file in core0/ and core1/ directories, just with
loadOnStartup and transient properties defined.
* start example `java -Dsolr.solr.home=multicore -jar start.jar`

You should receive a More than one core points to data dir 'multicore/data/'
failure on startup. Setting a relative path in each core.properties file
doesn't work — it only works when I provide discrete dataDir for each core.

New discovery mode needs to ensure that instanceDir is correct
--

Key: SOLR-4773
URL: https://issues.apache.org/jira/browse/SOLR-4773
Project: Solr
Issue Type: Bug
Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
Fix For: 5.0, 4.4

Attachments: SOLR-4773.patch, SOLR-4773.patch

Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example
fails because we can't find solrconfig. The construction of the instanceDir
in SolrCoreDiscoverer constructs a path with an extra solr (e.g.
solr/solr/core).
I'll attach a patch shortly.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3240) add spellcheck 'approximate collation count' mode

2013-05-06 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-3240.
--

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

 add spellcheck 'approximate collation count' mode
 -

 Key: SOLR-3240
 URL: https://issues.apache.org/jira/browse/SOLR-3240
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Robert Muir
Assignee: James Dyer
 Fix For: 5.0, 4.4

 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch


 SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
 will actually net results (taking into account context like filtering).
 In order to do this (from my understanding), it generates candidate queries,
 executes them, and saves the total hit count: collation.setHits(hits).
 For a large index it seems this might be doing too much work: in particular
 I'm interested in ensuring this feature can work fast enough/well for 
 autosuggesters.
 So I think we should offer an 'approximate' mode that uses an 
 early-terminating
 Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
 count based on docid space. 
 I'm not sure what needs to happen on the solr side (possibly support for 
 custom collectors?),
 but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3917) Port pruning module to trunk apis

2013-05-06 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3917:


Attachment: LUCENE-3917-Initial-port-of-index-pruning.patch

Recently at $DAYJOB the horror that is high frequency terms in OR search came 
to bite us, as a result I have an interest in pruning again.

As such I made an attempt to forward port the existing pruning package directly 
to Lucene 4.0.

This is largely a mechanical port, I have not put any real thought into it so 
its probably terrible.

This does not pass its unit test, and is a mess internally in the code, I am 
going to try to get the unit test working and then loop back on making the code 
more lucene 4.x friendly.

One question that occurs from this is how AtomicReaders are handled, do we want 
to pruning per segment with global stats, prune based on segment stats or just 
do the terrible thing and work with a SlowCompositeReader.

I also think, given the work that went on with LUCENE-4752 it might be possible 
to do the pruning in a similar fashion to the sorting merge such that we do a 
pruning merge.

 Port pruning module to trunk apis
 -

 Key: LUCENE-3917
 URL: https://issues.apache.org/jira/browse/LUCENE-3917
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/other
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
 Fix For: 4.3

 Attachments: LUCENE-3917-Initial-port-of-index-pruning.patch


 Pruning module was added in LUCENE-1812, but we need to port
 this to trunk (4.0)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 37891 - Failure!

2013-05-06 Thread builder

Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/37891/

No tests ran.

Build Log:
[...truncated 119 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes


[ 
https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650009#comment-13650009
 ] 

Shai Erera commented on LUCENE-4982:


I thought about this some more and I realize that getComputedActualSizeInBytes 
works as expected. checkDiskFull should only trip if the Directory size has 
reached the limit, and it cannot tell how many bytes are pending in a buffer. 
The test would fail not only w/ RAMDirectory, but also a Directory which 
buffers writes (which I believe all our directories do), and therefore flush() 
is important for the test.

So to summarize the changes in this issue:

* Added checkDiskFull to MockIOWrapper so it can trip writeBytes and copyBytes.
* Changed checkDiskFull to do {{freeSpace  len}} because {{freeSpace == len}} 
is still valid.
* Added a test

I plan to commit this tomorrow.

 Make MockIndexOutputWrapper check disk full on copyBytes
 

 Key: LUCENE-4982
 URL: https://issues.apache.org/jira/browse/LUCENE-4982
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/test-framework
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4982.patch, LUCENE-4982.patch, LUCENE-4982.patch


 While working on the consistency test for Replicator (LUCENE-4975), I noticed 
 that I don't trip disk-full exceptions and tracked it down to 
 MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd 
 like to add this check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode

Andy Fowler created SOLR-4789:
-

 Summary: CoreAdminHandler should write core.properties files in 
discovery mode
 Key: SOLR-4789
 URL: https://issues.apache.org/jira/browse/SOLR-4789
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.3
Reporter: Andy Fowler


When using the new core discovery method, cores created via CoreAdminHandler 
are never persisted, since they should be writing files to 
$INSTANCEDIR/core.properties. CoreAdminHandler should probably write 
core.properties files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode

2013-05-06 Thread Shawn Heisey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey reassigned SOLR-4789:
--

Assignee: Erick Erickson

Erick requested assignment via #solr irc channel.

 CoreAdminHandler should write core.properties files in discovery mode
 -

 Key: SOLR-4789
 URL: https://issues.apache.org/jira/browse/SOLR-4789
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.3
Reporter: Andy Fowler
Assignee: Erick Erickson

 When using the new core discovery method, cores created via CoreAdminHandler 
 are never persisted, since they should be writing files to 
 $INSTANCEDIR/core.properties. CoreAdminHandler should probably write 
 core.properties files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode

2013-05-06 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650023#comment-13650023
 ] 

Shawn Heisey edited comment on SOLR-4789 at 5/6/13 7:34 PM:


Erick requested assignment via #lucene-dev irc channel.

  was (Author: elyograg):
Erick requested assignment via #solr irc channel.
  
 CoreAdminHandler should write core.properties files in discovery mode
 -

 Key: SOLR-4789
 URL: https://issues.apache.org/jira/browse/SOLR-4789
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.3
Reporter: Andy Fowler
Assignee: Erick Erickson

 When using the new core discovery method, cores created via CoreAdminHandler 
 are never persisted, since they should be writing files to 
 $INSTANCEDIR/core.properties. CoreAdminHandler should probably write 
 core.properties files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode


[ 
https://issues.apache.org/jira/browse/SOLR-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650029#comment-13650029
 ] 

Mark Miller commented on SOLR-4789:
---

Interesting - I thought there was code that created this file when trying to 
read it the first time and not finding.

Still a lot of tests to add for this new code path I think - I've made it the 
default now so that devs can start running into these problems faster.

 CoreAdminHandler should write core.properties files in discovery mode
 -

 Key: SOLR-4789
 URL: https://issues.apache.org/jira/browse/SOLR-4789
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.3
Reporter: Andy Fowler
Assignee: Erick Erickson

 When using the new core discovery method, cores created via CoreAdminHandler 
 are never persisted, since they should be writing files to 
 $INSTANCEDIR/core.properties. CoreAdminHandler should probably write 
 core.properties files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct


[ 
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650032#comment-13650032
 ] 

Mark Miller commented on SOLR-4773:
---

bq. You should receive a More than one core points to data dir 
'multicore/data/' failure on startup. Setting a relative path in each 
core.properties file doesn't work — it only works when I provide discrete 
dataDir for each core.

I've ripped all that data dir checking out for 4.4.

 New discovery mode needs to ensure that instanceDir is correct
 --

 Key: SOLR-4773
 URL: https://issues.apache.org/jira/browse/SOLR-4773
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4773.patch, SOLR-4773.patch


 Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example 
 fails because we can't find solrconfig. The construction of the instanceDir 
 in SolrCoreDiscoverer constructs a path with an extra solr (e.g. 
 solr/solr/core).
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct


[ 
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650033#comment-13650033
 ] 

Andy Fowler commented on SOLR-4773:
---

Confirmed by compiling branch_4x that this fixes the bug I noticed in 4.3.0 
release. To future travelers, this means that each core.properties file needs a 
discrete dataDir property in 4.3.0.

 New discovery mode needs to ensure that instanceDir is correct
 --

 Key: SOLR-4773
 URL: https://issues.apache.org/jira/browse/SOLR-4773
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4773.patch, SOLR-4773.patch


 Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example 
 fails because we can't find solrconfig. The construction of the instanceDir 
 in SolrCoreDiscoverer constructs a path with an extra solr (e.g. 
 solr/solr/core).
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4583) Change the examples to use solr.properties and auto-discover cores rather than solr.xml


 [ 
https://issues.apache.org/jira/browse/SOLR-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4583.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.3

Actually, rather than solr.properties it's new-style solr.xml. But Mark Miller 
fixed it.

 Change the examples to use solr.properties and auto-discover cores rather 
 than solr.xml
 ---

 Key: SOLR-4583
 URL: https://issues.apache.org/jira/browse/SOLR-4583
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.3, 5.0


 If we're going to move forward with obsoleting solr.xml and auto-discovering 
 cores, we need to have as many people using this as possible. I'd like to 
 change the examples to NOT use solr.xml so that this bus leaves the station.
 solr.xml will still work as it does today, but before we make the cut-over we 
 need enough mileage on it to be confident.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4583) Change the examples to use new-style solr.xml and auto-discover cores rather than old-style solr.xml that defined cores


 [ 
https://issues.apache.org/jira/browse/SOLR-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4583:
-

Summary: Change the examples to use new-style solr.xml and auto-discover 
cores rather than old-style solr.xml that defined cores  (was: Change the 
examples to use solr.properties and auto-discover cores rather than solr.xml)

 Change the examples to use new-style solr.xml and auto-discover cores rather 
 than old-style solr.xml that defined cores
 ---

 Key: SOLR-4583
 URL: https://issues.apache.org/jira/browse/SOLR-4583
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.3, 5.0


 If we're going to move forward with obsoleting solr.xml and auto-discovering 
 cores, we need to have as many people using this as possible. I'd like to 
 change the examples to NOT use solr.xml so that this bus leaves the station.
 solr.xml will still work as it does today, but before we make the cut-over we 
 need enough mileage on it to be confident.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct

[
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650042#comment-13650042
]

Erick Erickson commented on SOLR-4773:
--

bq: I've ripped all that data dir checking out for 4.4.

Does that include the checking for cores with the same name? Seems like that
makes it easier for people to shoot themselves in the foot without giving them
_any_ clues what went wrong. And core discovery makes that pretty easy to do,
just copy the core.properties file around and forget to change the name
parameter.. Or an absolute path to the datadir

New discovery mode needs to ensure that instanceDir is correct
--

Attachments: SOLR-4773.patch, SOLR-4773.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct


[ 
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650058#comment-13650058
 ] 

Mark Miller commented on SOLR-4773:
---

bq. Does that include the checking for cores with the same name?

Yes, all this checking and how it was done just further complicates the code 
and we want to get away from pre configuration as a way to create collections 
anyhow.

We should just keep this simple - a core should fail to be created in the core 
container if there is an existing core with the same name, that's it.

I feel all the transient and other recent changes to CoreContainer are really 
starting to significantly complicate what was already a design that needed some 
love, so I'm trying to simplify as much as possible so we can more easily 
refactor down the line.

 New discovery mode needs to ensure that instanceDir is correct
 --

 Key: SOLR-4773
 URL: https://issues.apache.org/jira/browse/SOLR-4773
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4773.patch, SOLR-4773.patch


 Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example 
 fails because we can't find solrconfig. The construction of the instanceDir 
 in SolrCoreDiscoverer constructs a path with an extra solr (e.g. 
 solr/solr/core).
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4787) Join Contrib


 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

Changed the BSearch class to use the SorterTemplate rather then 
Collections.sort. Much more efficient inplace sorting. SorterTemplate builds 
with Solr 4.2.1. Will need to get this working with trunk as well using the new 
Sorter class.

Found major bug in my original logic for how segment level readers were being 
used between the join cores and fixed that as well.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4787) Join Contrib


[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650063#comment-13650063
 ] 

Joel Bernstein edited comment on SOLR-4787 at 5/6/13 8:16 PM:
--

Changed the BSearch class to use the SorterTemplate rather then 
Collections.sort. Much more efficient inplace sorting. SorterTemplate builds 
with Solr 4.2.1. Will need to get this working with trunk as well using the new 
Sorter class.

Thanks David and Adrien for tips on this.

Found major bug in my original logic for how segment level readers were being 
used between the join cores and fixed that as well.

  was (Author: joel.bernstein):
Changed the BSearch class to use the SorterTemplate rather then 
Collections.sort. Much more efficient inplace sorting. SorterTemplate builds 
with Solr 4.2.1. Will need to get this working with trunk as well using the new 
Sorter class.

Found major bug in my original logic for how segment level readers were being 
used between the join cores and fixed that as well.
  
 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact

Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 37891 - Failure!

2013-05-06 Thread Robert Muir

jvm crash

On Mon, May 6, 2013 at 2:52 PM, buil...@flonkings.com wrote:

 Build:
 builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/37891/

 No tests ran.

 Build Log:
 [...truncated 119 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct

[
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650098#comment-13650098
]

Erick Erickson commented on SOLR-4773:
--

Makes sense, I wasn't altogether happy with the complexification. But we're
leaving the user high and dry when tracking down errors.

Take 4.x and just copy collection1 to collection2 and fire up solr. No warnings
in the log. No errors in the log. But you can't get to collection2, you get a
404 error. And any index mods are done in the collection2 directory.

Admittedly the configuration is foo'd and Solr is doing exactly what the
defined behavior is (identically named cores last one wins). But how the hell
is someone supposed to track that down? Especially with lots of cores? They
don't get a single clue in the place we always say to look, the solr log.

I see where there are tests for creating a core with the same name as an
existing core via the core admin handler, but I don't see at a glance any
coverage for this scenario.

New discovery mode needs to ensure that instanceDir is correct
--

Attachments: SOLR-4773.patch, SOLR-4773.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct


[ 
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650104#comment-13650104
 ] 

Andy Fowler commented on SOLR-4773:
---

Just to throw in my $0.02 as an app developer and solr consumer w/ far less 
knowledge on the rest of the worlds' use cases: if I accidentally put solr into 
a state where two cores were sharing a dataDir, I would really want some sort 
of strong warning, or just an absolute failure.

I really like the way that cores are moving to being just a simple directory on 
the FS, rather than a block in a monolithic XML file. But if the cores are 
moving toward more backing by directory + properties file, it seems like 
accidentally sharing a dataDir could be a really bad thing.

 New discovery mode needs to ensure that instanceDir is correct
 --

 Key: SOLR-4773
 URL: https://issues.apache.org/jira/browse/SOLR-4773
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4773.patch, SOLR-4773.patch


 Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example 
 fails because we can't find solrconfig. The construction of the instanceDir 
 in SolrCoreDiscoverer constructs a path with an extra solr (e.g. 
 solr/solr/core).
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct


[ 
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650117#comment-13650117
 ] 

Mark Miller commented on SOLR-4773:
---

You should get an error as I said - we just shouldn't be trying to detect it 
that way. Corecontainer should throw an exception when a core is added with an 
existing name. 

 New discovery mode needs to ensure that instanceDir is correct
 --

 Key: SOLR-4773
 URL: https://issues.apache.org/jira/browse/SOLR-4773
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4773.patch, SOLR-4773.patch


 Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example 
 fails because we can't find solrconfig. The construction of the instanceDir 
 in SolrCoreDiscoverer constructs a path with an extra solr (e.g. 
 solr/solr/core).
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error

Erick Erickson created SOLR-4790:


 Summary: When defining a core with the same name (discovery mode 
or not), CoreContainer should throw an error
 Key: SOLR-4790
 URL: https://issues.apache.org/jira/browse/SOLR-4790
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson


When you define a core with the same name as another core (discovery mode 
definitely, old-style xml probably), last one wins. Which means it's very hard 
to track down what caused the problem.

What's worse, the last-encountered core replaces the first one, leading to 
cores that change an unexpected index.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct


[ 
https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650123#comment-13650123
 ] 

Erick Erickson commented on SOLR-4773:
--

New JIRA for same-named cores, see SOLR-4790.

 New discovery mode needs to ensure that instanceDir is correct
 --

 Key: SOLR-4773
 URL: https://issues.apache.org/jira/browse/SOLR-4773
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4773.patch, SOLR-4773.patch


 Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example 
 fails because we can't find solrconfig. The construction of the instanceDir 
 in SolrCoreDiscoverer constructs a path with an extra solr (e.g. 
 solr/solr/core).
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650137#comment-13650137
 ] 

Steve Rowe commented on LUCENE-4981:


Adrien, can you hold off committing for a little bit?  I'm not sure if 
QueryParser.setAutoGeneratePhraseQueries is sufficient for all cases that the 
PositionFilter hack addresses - I want to do some investigation. 

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650160#comment-13650160
 ] 

Adrien Grand commented on LUCENE-4981:
--

Sure I can wait. (Even when committed, the old behavior will still be available 
by using luceneMatchVersion=4.3).

I would like to start marking all our broken components (the offenders in 
TestRandomChains) as deprecated so that people start thinking about ways to 
solve their problems without them, stop getting highlighting bugs and can 
eventually smoothly upgrade to 5.0 when we release it. I already started 
deprecating/fixing some tokenizers / token filters for 4.4 (LUCENE-4955 and 
LUCENE-4963) and would like to get as many of them fixed as possible for the 
next release.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error


 [ 
https://issues.apache.org/jira/browse/SOLR-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4790:
-

Issue Type: Improvement  (was: Bug)

 When defining a core with the same name (discovery mode or not), 
 CoreContainer should throw an error
 

 Key: SOLR-4790
 URL: https://issues.apache.org/jira/browse/SOLR-4790
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson

 When you define a core with the same name as another core (discovery mode 
 definitely, old-style xml probably), last one wins. Which means it's very 
 hard to track down what caused the problem.
 What's worse, the last-encountered core replaces the first one, leading to 
 cores that change an unexpected index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650164#comment-13650164
 ] 

Steve Rowe commented on LUCENE-4981:


Thanks for working on fixing the broken stuff.

In addition to use cases, I want to investigate the exact nature of the 
brokenness PositionFilter introduces - maybe it's fixable?  I'll re-enable it 
in TestRandomChains and iterate until it breaks.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650166#comment-13650166
 ] 

Robert Muir commented on LUCENE-4981:
-

I'm not sure its fixable: by definition it corrupts the structure because you 
lose all posincs. so synonyms no longer become synonyms, holes disappear, or 
whatever. and this doesnt even factor in posLength...

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650176#comment-13650176
 ] 

Steve Rowe commented on LUCENE-4981:


The comment in TestRandomChains says:

{code:java}
// TODO: corrumpts graphs (offset consistency check):
PositionFilter.class,
{code}

which is what made me wonder what about the nature of brokenness: why are 
offsets a problem?

I agree, Robert, PositionFilter corrupts by design.  And if we do end up 
keeping it, position length should be addressed (it's not now), maybe by always 
setting it to 1.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650177#comment-13650177
 ] 

Adrien Grand commented on LUCENE-4981:
--

bq. why are offsets a problem?

There are invariants that need to be maintained by token filters: all tokens 
that start at the same position must have the same start offset and all tokens 
that end at the same position (start position + position length) must have the 
same end offset (see ValidatingFilter). By arbitrarily changing position 
increments, PositionFilter breaks these invariants.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650191#comment-13650191
 ] 

Robert Muir commented on LUCENE-4981:
-

{quote}
which is what made me wonder what about the nature of brokenness: why are 
offsets a problem?
{quote}

I think Adrien describes it correctly: afaik it doesn't do anything super-evil 
like make start offsets go backwards or anything, but it breaks those 
invariants Adrien describes which can cause a follow-on-filter (e.g. shingle) 
to cause further craziness, e.g. things going backwards or endOffset  
startOffset or other problems.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error


 [ 
https://issues.apache.org/jira/browse/SOLR-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4790:
-

Attachment: SOLR-4790.patch

Gaah. Maintaining all the backwards junk is a pain, this is S much simpler 
than what was in there before. It only works for discovery mode, I'll take a 
quick look at what it would take to deal with old-style in a second. If it's 
too complicated I'll pass on it since old-style is going to end-of-life.

Anyway, preliminary patch, I'm running the test suite now and have yet to look 
it over, but is this along the lines you [~markrmil...@gmail.com] had in mind?

 When defining a core with the same name (discovery mode or not), 
 CoreContainer should throw an error
 

 Key: SOLR-4790
 URL: https://issues.apache.org/jira/browse/SOLR-4790
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4790.patch


 When you define a core with the same name as another core (discovery mode 
 definitely, old-style xml probably), last one wins. Which means it's very 
 hard to track down what caused the problem.
 What's worse, the last-encountered core replaces the first one, leading to 
 cores that change an unexpected index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650193#comment-13650193
 ] 

Steve Rowe commented on LUCENE-4981:


Thanks for the pointer Adrien, I'll take a look at ValidatingFilter.

It might be possible, by creating new positions, to enable offset consistency 
in PositionFilter.  Not sure it's worth the effort though.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650195#comment-13650195
 ] 

Robert Muir commented on LUCENE-4981:
-

The Validatingfilter should be the same logic in BaseTokenStreamTestCase:196

I think its in a separate filter because then its applied at each stage of 
the analysis in TestRandomChains so if there is a bug in a complex analysis 
chain we know the culprit.

 Deprecate PositionFilter
 

 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4981.patch


 According to the documentation 
 (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
  PositionFilter is mainly useful to make query parsers generate boolean 
 queries instead of phrase queries although this problem can be solved at 
 query parsing level instead of analysis level (eg. using 
 QueryParser.setAutoGeneratePhraseQueries).
 So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
 propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error