AW: [VOTE] Release PyLucene 3.6.0

2012-05-06 Thread Thomas Koch
I was able to build JCC and PyLucene - after having fixed the ivy-issue - on
Win7-32 with Python2.7 and Java1.6:
 Installed c:\python27\lib\site-packages\jcc-2.13-py2.7-win32.egg 

The initial build of PyLucene failed with:
pylucene-3.6.0-1\lucene-java-3.6.0\lucene\common-build.xml:526: The
following error occurred while executing this line:
pylucene-3.6.0-1\lucene-java-3.6.0\lucene\common-build.xml:298: Ivy is not
available

After running the ivy-bootstrap the build went fine.

cd pylucene-3.6.0-1\lucene-java-3.6.0\lucene
ant ivy-bootstrap
Buildfile: pylucene-3.6.0-1\lucene-java-3.6.0\lucene\build.xml

ivy-bootstrap:
[mkdir] Created dir: C:\Users\Thomas Koch\.ant\lib
 [echo] installing ivy 2.2.0 to C:\Users\Thomas Koch\.ant\lib
  [get] Getting:
http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  [get] To: C:\Users\Thomas Koch\.ant\lib\ivy-2.2.0.jar
BUILD SUCCESSFUL
Total time: 1 second

Note: running ant ivy-bootstrap the 2nd time seems to detect the already
installed version - and not download it again:

ivy-bootstrap:
 [echo] installing ivy 2.2.0 to C:\Users\Thomas Koch\.ant\lib
  [get] Getting:
http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  [get] To: C:\Users\Thomas Koch\.ant\lib\ivy-2.2.0.jar
  [get] Not modified - so not downloaded
BUILD SUCCESSFUL
Total time: 1 second


So I'd say either adding ant ivy-bootstrap to the make all target (of
PyLucene) or simply adding a make ivy target (and some hint in the docs)
could help here.

Alternatively it should be clearly marked as a required component to build
PyLucene - as the BUILD readme of java-lucene tells:
 ...Set up your development environment 
 (JDK 1.5 or greater, Ant 1.7.1+, Ivy 2.2.0)

However it seems that the ivy.jar in user's local ant dir may not be
sufficient depending on the ANT global config. I understand the Java-Lucene
guys decided to fallback to a wiki page and some error details in the build
process (and declare ivy as required - see above) - cf. lucene jira where
the problem is discussed:
https://issues.apache.org/jira/browse/LUCENE-3946

Finally: my ant-1.8 does not recognize the mentioned ant --noconfig
option, but only supports
  -nouserlib Run ant without using the jar files from
 ${user.home}/.ant/lib
  -noclasspathRun ant without using CLASSPATH

It's somewhat ironic that bringing in a very powerful dependency manager
(Apache Ivy website) results in another dependency issue...

Regards,
Thomas

-Ursprüngliche Nachricht-
Von: Andi Vajda [mailto:va...@apache.org] 
Gesendet: Samstag, 5. Mai 2012 23:53
An: pylucene-dev@lucene.apache.org
Betreff: Re: [VOTE] Release PyLucene 3.6.0

 Please vote to release these artifacts as PyLucene 3.6.0-1.
 
 Lucene fails to compile because I don't have ivy installed and the 
 Makefile doesn't call ivy-bootstrap automatically.

Right. The Lucene error should be clear enough but adding a make target
could help. How to communicate that target, though ?

Andi..




[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13815 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13815/

All tests passed

Build Log (for compile errors):
[...truncated 24077 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13816 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13816/

All tests passed

Build Log (for compile errors):
[...truncated 24057 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13817 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13817/

All tests passed

Build Log (for compile errors):
[...truncated 24079 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13818 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13818/

All tests passed

Build Log (for compile errors):
[...truncated 24054 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13819 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13819/

All tests passed

Build Log (for compile errors):
[...truncated 24057 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13820 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13820/

All tests passed

Build Log (for compile errors):
[...truncated 24095 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 13820 - Still Failing

2012-05-06 Thread Michael McCandless
I committed a fix...

Mike McCandless

http://blog.mikemccandless.com


On Sun, May 6, 2012 at 5:37 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13820/

 All tests passed

 Build Log (for compile errors):
 [...truncated 24095 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13821 - Still Failing

2012-05-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13821/

All tests passed

Build Log (for compile errors):
[...truncated 24140 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-05-06 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269191#comment-13269191
 ] 

Andrzej Bialecki  commented on SOLR-139:


David, please see LUCENE-3837 for a low-level partial update of inverted fields 
without re-indexing other fields. That is very much work in progress, and it's 
more complex. This issue provides a shortcut to a retrieve stored fields, 
modify, delete original doc, add modified doc sequence that users would have 
to execute manually.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, SOLR-139.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4032) don't write offsetlength every skip

2012-05-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269196#comment-13269196
 ] 

Michael McCandless commented on LUCENE-4032:


+1

 don't write offsetlength every skip
 ---

 Key: LUCENE-4032
 URL: https://issues.apache.org/jira/browse/LUCENE-4032
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-4032.patch


 We currently write this every skip, but we should try to avoid this (like 
 payloads).
 This reduces skip data on my test corpus: .frq goes from 52354303 - 50896066

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.

2012-05-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269210#comment-13269210
 ] 

Robert Muir commented on LUCENE-3830:
-

patch looks good: I guess the bulk read in RollingCharBuffer should help
other things like Kuromoji that use it too?!

 MappingCharFilter could be improved by switching to an FST.
 ---

 Key: LUCENE-3830
 URL: https://issues.apache.org/jira/browse/LUCENE-3830
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 4.0

 Attachments: LUCENE-3830.patch, LUCENE-3830.patch, LUCENE-3830.patch, 
 PerfTestMappingCharFilter.java


 MappingCharFilter stores an overly complex tree-like structure for matching 
 input patterns. The input is a union of fixed strings mapped to a set of 
 fixed strings; an fst matcher would be ideal here and provide both memory and 
 speed improvement I bet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.

2012-05-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269211#comment-13269211
 ] 

Michael McCandless commented on LUCENE-3830:


bq.  I guess the bulk read in RollingCharBuffer should help other things like 
Kuromoji that use it too?!

I haven't tested but I think it should help!

 MappingCharFilter could be improved by switching to an FST.
 ---

 Key: LUCENE-3830
 URL: https://issues.apache.org/jira/browse/LUCENE-3830
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 4.0

 Attachments: LUCENE-3830.patch, LUCENE-3830.patch, LUCENE-3830.patch, 
 PerfTestMappingCharFilter.java


 MappingCharFilter stores an overly complex tree-like structure for matching 
 input patterns. The input is a union of fixed strings mapped to a set of 
 fixed strings; an fst matcher would be ideal here and provide both memory and 
 speed improvement I bet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.

2012-05-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3830.


Resolution: Fixed

 MappingCharFilter could be improved by switching to an FST.
 ---

 Key: LUCENE-3830
 URL: https://issues.apache.org/jira/browse/LUCENE-3830
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 4.0

 Attachments: LUCENE-3830.patch, LUCENE-3830.patch, LUCENE-3830.patch, 
 PerfTestMappingCharFilter.java


 MappingCharFilter stores an overly complex tree-like structure for matching 
 input patterns. The input is a union of fixed strings mapped to a set of 
 fixed strings; an fst matcher would be ideal here and provide both memory and 
 speed improvement I bet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3948) Experiment with placing poms outside of src

2012-05-06 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3948:


Attachment: LUCENE-3948.patch

Patch, brought up to date with the {{modules/}}-{{lucene/}} move.  Also: added 
info to {{dev-tools/maven/README.maven}}, and modified {{svn:ignore}} 
properties, to ignore top-level {{maven-build/}}, and to stop ignoring 
{{pom.xml}} files.

I'll commit this tomorrow if there are no objections.

 Experiment with placing poms outside of src
 ---

 Key: LUCENE-3948
 URL: https://issues.apache.org/jira/browse/LUCENE-3948
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Chris Male
Priority: Minor
 Attachments: LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, 
 LUCENE-3948.patch, LUCENE-3948.patch


 Recent work in LUCENE-3944 has changed how our generated pom.xml files are 
 handled during release preparation, placing them in build/ instead.  However 
 get-maven-poms still places the poms inside src/ so you can use them to drive 
 a build.  What I think would be ideal is if we could unify the release 
 handling of the poms and the normal building handling, so that the poms can 
 sit outside of src and serve both purposes.  
 Some time ago I investigated how the ANT project handles its own Maven 
 integration and it has its poms sitting in their own directory.  They then 
 reference the actual src locations inside the poms.  This works for ANT but 
 with a warning since some of their tests don't work due to how the Maven 
 surefire plugin works, so they skip their tests.
 I have done some quick testing of my own and this process does seem to work 
 for our poms and tests.  I now want to take this to a full scale POC and see 
 if it works fully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3948) Experiment with placing poms outside of src

2012-05-06 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned LUCENE-3948:
---

Assignee: Steven Rowe

 Experiment with placing poms outside of src
 ---

 Key: LUCENE-3948
 URL: https://issues.apache.org/jira/browse/LUCENE-3948
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Chris Male
Assignee: Steven Rowe
Priority: Minor
 Attachments: LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, 
 LUCENE-3948.patch, LUCENE-3948.patch


 Recent work in LUCENE-3944 has changed how our generated pom.xml files are 
 handled during release preparation, placing them in build/ instead.  However 
 get-maven-poms still places the poms inside src/ so you can use them to drive 
 a build.  What I think would be ideal is if we could unify the release 
 handling of the poms and the normal building handling, so that the poms can 
 sit outside of src and serve both purposes.  
 Some time ago I investigated how the ANT project handles its own Maven 
 integration and it has its poms sitting in their own directory.  They then 
 reference the actual src locations inside the poms.  This works for ANT but 
 with a warning since some of their tests don't work due to how the Maven 
 surefire plugin works, so they skip their tests.
 I have done some quick testing of my own and this process does seem to work 
 for our poms and tests.  I now want to take this to a full scale POC and see 
 if it works fully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4032) don't write offsetlength every skip

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4032.
-

   Resolution: Fixed
Fix Version/s: 4.0

 don't write offsetlength every skip
 ---

 Key: LUCENE-4032
 URL: https://issues.apache.org/jira/browse/LUCENE-4032
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4032.patch


 We currently write this every skip, but we should try to avoid this (like 
 payloads).
 This reduces skip data on my test corpus: .frq goes from 52354303 - 50896066

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated SOLR-3439:
-

Attachment: Lincoln-Gettysburg-Address.pdf
Lincoln-Gettysburg-Address.docx

Test documents for SolrCell. Both have a bunch of metadata fields defined. The 
PDF was generated from the Word doc.

We can consider them for inclusion in exampledocs, but for now they are posted 
here for reference and anybody wanting to test this issue.

 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269238#comment-13269238
 ] 

Yonik Seeley commented on SOLR-3439:


I agree with adding a stored content field, but I don't think we should add any 
more copyFields.
One of the biggest out of the box experience items that people make their 
decision based on is performance - so we shouldn't make the example 
schema/config slower.

 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269244#comment-13269244
 ] 

Jack Krupansky commented on SOLR-3439:
--

We could have the copyFields default to being commented out, but then the 
content would not be searched by default. Or we could not index the content 
field, but then it can't be searched by itself.

For non-SolrCell applications, will copyField of the empty content field be a 
significant performance drag?

Or is it only the apps that use SolrCell where there are concerns about the 
copyField impact?

I agree that performance should be a consideration, but I suspect that these 
couple of copyFields(I'll post the preliminary patch as soon as the tests 
finish running) are small potatoes in the overall performance picture.


 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269245#comment-13269245
 ] 

Yonik Seeley commented on SOLR-3439:


bq. For non-SolrCell applications, will copyField of the empty content field 
be a significant performance drag?

No, but if it's used, it can be a big performance drag (indexing content 
twice).  I'm not sure how important it is to be searched by default... i.e. 
with edismax, someone would just need to add content to the qf parameter.

 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269247#comment-13269247
 ] 

Jack Krupansky commented on SOLR-3439:
--

Right, so if it is the double indexing that is a serious concern, maybe having 
content stored but not indexed is a reasonable compromise. It would be 
searchable due to the CopyField but not double-indexed. This would still give a 
reasonablly friendly out of the box experience (default search works and 
content is returned), and obviously they can hand-tune for more specific 
control.

But if content is stored but not indexed, the user can't simply add content 
to qf - they need to make it indexed, which is what my preliminary patch does.


 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated SOLR-3439:
-

Attachment: SOLR-3439.patch

Preliminary patch. content is both stored and indexed, with multiple copy 
fields.

 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4034) improve functionquery tests, fix some minor bugs

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4034:


Attachment: LUCENE-4034.patch

 improve functionquery tests, fix some minor bugs
 

 Key: LUCENE-4034
 URL: https://issues.apache.org/jira/browse/LUCENE-4034
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-4034.patch


 Currently functionqueries have basically no simple low-level tests.
 Found a few minor problems:
 * fix -1 summation (in case some, but not all segments are preflex): 
 TotalTermFreq/SumTotalTermFreq
 * fix omitTF case (due to LUCENE-2929, docsenum will return null if you ask 
 for freqs but the field is omitTF).
 * fix some indexedField vs field mixups
 * fix QueryUtils searcher-wrapping to also set the similarity the same as it 
 was on the original searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4034) improve functionquery tests, fix some minor bugs

2012-05-06 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4034:
---

 Summary: improve functionquery tests, fix some minor bugs
 Key: LUCENE-4034
 URL: https://issues.apache.org/jira/browse/LUCENE-4034
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-4034.patch

Currently functionqueries have basically no simple low-level tests.

Found a few minor problems:
* fix -1 summation (in case some, but not all segments are preflex): 
TotalTermFreq/SumTotalTermFreq
* fix omitTF case (due to LUCENE-2929, docsenum will return null if you ask for 
freqs but the field is omitTF).
* fix some indexedField vs field mixups
* fix QueryUtils searcher-wrapping to also set the similarity the same as it 
was on the original searcher.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3505) BooleanScorer2.freq() doesnt work unless you call score() first.

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3505:


Attachment: LUCENE-3505.patch

patch brought up to trunk. still doesnt have any tests.

 BooleanScorer2.freq() doesnt work unless you call score() first.
 

 Key: LUCENE-3505
 URL: https://issues.apache.org/jira/browse/LUCENE-3505
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3505.patch, LUCENE-3505.patch


 its 0, the freq() is then calculated as a side effect of score()... we should 
 at least document this or throw UOE for freq() instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4035) Collation via docvalues

2012-05-06 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4035:
---

 Summary: Collation via docvalues
 Key: LUCENE-4035
 URL: https://issues.apache.org/jira/browse/LUCENE-4035
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Robert Muir


Currently collated sort is via an Analyzer into an indexedfield, which is 
uninverted in the fieldcache.

Instead we could support this with docvalues, and take advantage of future 
improvements like LUCENE-3729.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4035) Collation via docvalues

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4035:


Attachment: LUCENE-4035.patch

just a quick prototype patch... not happy about how the sort apis work with 
this (see LUCENE-4033)

 Collation via docvalues
 ---

 Key: LUCENE-4035
 URL: https://issues.apache.org/jira/browse/LUCENE-4035
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-4035.patch


 Currently collated sort is via an Analyzer into an indexedfield, which is 
 uninverted in the fieldcache.
 Instead we could support this with docvalues, and take advantage of future 
 improvements like LUCENE-3729.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Annotation for run this test, but don't fail build if it fails ?

2012-05-06 Thread Dawid Weiss
So, I started thinking about it -- I can implement something that will
report failures (much like we do right now) it's quite tricky to fit
it into the reporting system and continuous integration system. Here's
why -- if a test doesn't fail then its output (sysout/syserrs) are not
currently printed (to provide a cleaner view of what's been executed).
Verbose log is on disk but it'd have to be scanned by hand (and copied
as a build artifact). Yet another problem is that jenkins wouldn't
_fail_ on such pseudo-failures because the set of JUnit statuses is
not extensible (it'd be something like FAILED+IGNORE) so we'd need to
either go with IGNORED, ASSUMPTION_IGNORED or SUCCESS, none of which
are a good match, really. ASSUMPTION_IGNORED status is probably most
convenient here because of how it can be technically propagated back
to JUnit.

Any ideas? Hoss -- how do you envision monitoring of these tests? Manually?

Dawid

 If we could leave these tests running on every build, then we could at least
 monitor the relative frequency of the failures -- ie: last week testFoo
 failed in 10% of the builds, this week it fails in every build, so somebody
 definiteily broke something or last week testFoor failed in 10% of the
 builds, and after my attempted hardening it only fails in 5% of the builds
 so i may be on to something.

 what do folks think?

 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4034) improve functionquery tests, fix some minor bugs

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4034.
-

   Resolution: Fixed
Fix Version/s: 4.0

I committed this... if anyone has any comments (especially regarding omitTF, it 
seemed to me e.g. tf() should be consistent with what termquery does), let me 
know.

 improve functionquery tests, fix some minor bugs
 

 Key: LUCENE-4034
 URL: https://issues.apache.org/jira/browse/LUCENE-4034
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4034.patch


 Currently functionqueries have basically no simple low-level tests.
 Found a few minor problems:
 * fix -1 summation (in case some, but not all segments are preflex): 
 TotalTermFreq/SumTotalTermFreq
 * fix omitTF case (due to LUCENE-2929, docsenum will return null if you ask 
 for freqs but the field is omitTF).
 * fix some indexedField vs field mixups
 * fix QueryUtils searcher-wrapping to also set the similarity the same as it 
 was on the original searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Annotation for run this test, but don't fail build if it fails ?

2012-05-06 Thread Yonik Seeley
On Sun, May 6, 2012 at 2:39 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 Any ideas? Hoss -- how do you envision monitoring of these tests? Manually?

If the tests are run many times a day, it would be great to get a
daily report of the percent of time the tests pass.  Then if it goes
from 5% to 50%, we can go uh-oh...

The crux of the problem remains that (for solr devs) it's still much
more useful to have a test fail intermittently than to disable and not
run the test at all.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Annotation for run this test, but don't fail build if it fails ?

2012-05-06 Thread Dawid Weiss
 If the tests are run many times a day, it would be great to get a
 daily report of the percent of time the tests pass.  Then if it goes
 from 5% to 50%, we can go uh-oh...

Yeah, well... but this is beyond the runner as it aggregates over time
-- it looks like a jenkins plugin that would analyze test run logs and
provide such statistics. I also admit I've never seen anything like
this -- a suite of tests with an allowed failure ratio over time and a
threshold that would trigger a warning...

 The crux of the problem remains that (for solr devs) it's still much
 more useful to have a test fail intermittently than to disable and not
 run the test at all.

These are weird tests if they allow for a (predictable?) failure from
time to time. I don't say it's a bad concept, but I think unit tests
may not be a good framework for handling this.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Annotation for run this test, but don't fail build if it fails ?

2012-05-06 Thread Yonik Seeley
On Sun, May 6, 2012 at 3:38 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 I also admit I've never seen anything like
 this -- a suite of tests with an allowed failure ratio over time and a
 threshold that would trigger a warning...

Not so much an allowed failure rate... more of it fails sometimes
and no one has had the time to try to get it to pass with a greater
percentage of time.
And even when people put effort into get it to pass more often, it's
still not 100%.

As those tests exist now, there are a few choices
a) turn them off (this is bad because it seriously decreases coverage)
b) somehow deal with the intermittent failures

Given that we're not running on a realtime system, the fact that many
higher level tests have timing and scheduling dependencies means that
we will never achieve a 100% pass rate on such tests.

 These are weird tests if they allow for a (predictable?) failure from
 time to time. I don't say it's a bad concept, but I think unit tests
 may not be a good framework for handling this.

Yeah, these aren't really unit tests.  Should we try to move them
somewhere else?  Or run them separately and email the results to a
different list?

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jira doesn't know that 3.6 is released

2012-05-06 Thread Jack Krupansky
I was going to file a Jira on an issue with Solr 3.6, but I noticed that Jira 
still thinks that 3.6 is “Unreleased”.

-- Jack Krupansky

[jira] [Created] (SOLR-3441) Make ElisionFilterFactory MultiTermAware

2012-05-06 Thread Jack Krupansky (JIRA)
Jack Krupansky created SOLR-3441:


 Summary: Make ElisionFilterFactory MultiTermAware
 Key: SOLR-3441
 URL: https://issues.apache.org/jira/browse/SOLR-3441
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Jack Krupansky
Priority: Minor


The ElisionFilterFactory (which removes l' from l'avion) is not MultiTermAware 
- which includes release 3.6. I wanted to use a wildcard such as: (l'aub*).

Seems simple enough to address. I'll attach a patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3441) Make ElisionFilterFactory MultiTermAware

2012-05-06 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated SOLR-3441:
-

Attachment: SOLR-3441.patch

Preliminary patch.

 Make ElisionFilterFactory MultiTermAware
 

 Key: SOLR-3441
 URL: https://issues.apache.org/jira/browse/SOLR-3441
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Jack Krupansky
Priority: Minor
 Attachments: SOLR-3441.patch


 The ElisionFilterFactory (which removes l' from l'avion) is not 
 MultiTermAware - which includes release 3.6. I wanted to use a wildcard such 
 as: (l'aub*).
 Seems simple enough to address. I'll attach a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269305#comment-13269305
 ] 

Jan Høydahl commented on SOLR-3439:
---

Really, the copyField thing in todays example schema is an *anti pattern* since 
we teach people to duplicate all their content while most people would be 
better off using DisMax. I have had several customers who build their whole 
search on the model from example schema and then get into performance problems 
due to the 2x index increase.

How would you feel if we instead get rid of *all* the copyFields and configure 
the default handler with defType=edismaxqf=name,features,manu,content 
Then we can leave a copyField section commented out in the schema with an 
explanation of what use cases it is good for.

 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4024) FuzzyQuery should never do edit distance 2

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4024.
-

Resolution: Fixed

 FuzzyQuery should never do edit distance  2
 

 Key: LUCENE-4024
 URL: https://issues.apache.org/jira/browse/LUCENE-4024
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-4024.patch


 Edit distance 1 and 2 are now very very fast compared to 3.x (100X-200X 
 faster) ... but edit distance 3 will fallback to the super-slow scan all 
 terms in 3.x, which is not graceful degradation.
 Not sure how to fix it ... mabye we have a SlowFuzzyQuery?  And FuzzyQuery 
 throws exc if you try to ask it to be slow?  Or, we add boolean (off by 
 default) that you must turn on to allow slow one..?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Why does ValueSource still implement 'Serializable' if Java serialization is out in Lucene 4.0?

2012-05-06 Thread Robert Muir
sounds like a relic. care to toss up a patch?

On Sun, May 6, 2012 at 7:11 PM, Benson Margulies bimargul...@gmail.com wrote:
 ?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Anyone else interested in json serialization of queries

2012-05-06 Thread Benson Margulies
I'm pecking away at my idea of providing query serialization via
Jackson without adding annotations or setters to the Query classes. If
anyone else likes this idea well enough to pitch in, please let me
know and I'll unleash you on the github repo.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4036) HaversineConstFunction ignores one of its two values, is this on purpose?

2012-05-06 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-4036:


 Summary: HaversineConstFunction ignores one of its two values, is 
this on purpose?
 Key: LUCENE-4036
 URL: https://issues.apache.org/jira/browse/LUCENE-4036
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0
Reporter: Benson Margulies


org.apache.solr.search.function.distance.HaversineConstFunction.parser.new 
ValueSourceParser() {...}.parse(FunctionQParser)

has an unused variable warning for 'vs2', and uses vs1 to initialize mv2. Maybe 
vs2 should just be deleted?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4037) ValueSource still implements Serializable

2012-05-06 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-4037:


 Summary: ValueSource still implements Serializable
 Key: LUCENE-4037
 URL: https://issues.apache.org/jira/browse/LUCENE-4037
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0
Reporter: Benson Margulies
Priority: Minor
 Attachments: LUCENE-4037.patch

4.0 eliminates the use of Serializable. Here's a leftover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4037) ValueSource still implements Serializable

2012-05-06 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated LUCENE-4037:
-

Attachment: LUCENE-4037.patch

 ValueSource still implements Serializable
 -

 Key: LUCENE-4037
 URL: https://issues.apache.org/jira/browse/LUCENE-4037
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0
Reporter: Benson Margulies
Priority: Minor
 Attachments: LUCENE-4037.patch


 4.0 eliminates the use of Serializable. Here's a leftover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Why does ValueSource still implement 'Serializable' if Java serialization is out in Lucene 4.0?

2012-05-06 Thread Benson Margulies
Done.

On Sun, May 6, 2012 at 7:13 PM, Robert Muir rcm...@gmail.com wrote:
 sounds like a relic. care to toss up a patch?

 On Sun, May 6, 2012 at 7:11 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 ?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4037) ValueSource still implements Serializable

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4037:


Attachment: LUCENE-4037.patch

Thanks Benson! I searched around and found some other cruft... some old 
readResolve()'s etc... here's the patch.

I'll commit soon.

 ValueSource still implements Serializable
 -

 Key: LUCENE-4037
 URL: https://issues.apache.org/jira/browse/LUCENE-4037
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0
Reporter: Benson Margulies
Priority: Minor
 Attachments: LUCENE-4037.patch, LUCENE-4037.patch


 4.0 eliminates the use of Serializable. Here's a leftover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4037) ValueSource still implements Serializable

2012-05-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4037.
-

   Resolution: Fixed
Fix Version/s: 4.0

 ValueSource still implements Serializable
 -

 Key: LUCENE-4037
 URL: https://issues.apache.org/jira/browse/LUCENE-4037
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0
Reporter: Benson Margulies
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-4037.patch, LUCENE-4037.patch


 4.0 eliminates the use of Serializable. Here's a leftover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3296) Explore alternatives to Commons CSV

2012-05-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269324#comment-13269324
 ] 

Chris Male commented on SOLR-3296:
--

After some research (thanks Steven), it seems the likely cause of the failure 
is that their repositories timeout after some period if they aren't synced to 
the central repository.  Because I submitted the bundle on a Friday, it perhaps 
didn't get looked into until too late.  

So I've resubmitted the bundle (on a Monday now), fingers crossed.

 Explore alternatives to Commons CSV
 ---

 Key: SOLR-3296
 URL: https://issues.apache.org/jira/browse/SOLR-3296
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Chris Male
 Attachments: SOLR-3295-CSV-tests.patch, SOLR-3296_noggit.patch, 
 pom.xml, pom.xml


 In LUCENE-3930 we're implementing some less than ideal solutions to make 
 available the unreleased version of commons-csv.  We could remove these 
 solutions if we didn't rely on this lib.  So I think we should explore 
 alternatives. 
 I think [opencsv|http://opencsv.sourceforge.net/] is an alternative to 
 consider, I've used it in many commercial projects.  Bizarrely Commons-CSV's 
 website says that Opencsv uses a BSD license, but this isn't the case, 
 OpenCSV uses ASL2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269326#comment-13269326
 ] 

Jack Krupansky commented on SOLR-3439:
--

The concept of copyField is implicitly a judgment that a query of the merged 
fields is significantly better than the dismax query of the separate fields. 
But, is that really the case?

And it is common to boost various document components differently, such as the 
title.

That said, I am a little reluctant to change the overall pattern/approach 
simply to add one field. Maybe the pattern change should be a separate issue.


 Add content field to example schema to make SolrCell easier to use out of 
 the box
 ---

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Priority: Minor
 Fix For: 4.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata. It would be 
 stored and indexed.
 I further propose that a copyField be added for the title, description, 
 (and maybe a couple of others) and content fields to add them to the text 
 field for searching. Again, trying to improve the out of the box user 
 experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3948) Experiment with placing poms outside of src

2012-05-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269330#comment-13269330
 ] 

Chris Male commented on LUCENE-3948:


+1 to committing

 Experiment with placing poms outside of src
 ---

 Key: LUCENE-3948
 URL: https://issues.apache.org/jira/browse/LUCENE-3948
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Chris Male
Assignee: Steven Rowe
Priority: Minor
 Attachments: LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, 
 LUCENE-3948.patch, LUCENE-3948.patch


 Recent work in LUCENE-3944 has changed how our generated pom.xml files are 
 handled during release preparation, placing them in build/ instead.  However 
 get-maven-poms still places the poms inside src/ so you can use them to drive 
 a build.  What I think would be ideal is if we could unify the release 
 handling of the poms and the normal building handling, so that the poms can 
 sit outside of src and serve both purposes.  
 Some time ago I investigated how the ANT project handles its own Maven 
 integration and it has its poms sitting in their own directory.  They then 
 reference the actual src locations inside the poms.  This works for ANT but 
 with a warning since some of their tests don't work due to how the Maven 
 surefire plugin works, so they skip their tests.
 I have done some quick testing of my own and this process does seem to work 
 for our poms and tests.  I now want to take this to a full scale POC and see 
 if it works fully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org