date:20110629


 [ 
https://issues.apache.org/jira/browse/LUCENE-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3261:
---

Attachment: facet-userguide.pdf

Attaching the userguide from LUCENE-3079.

> Faceting module userguide
> -
>
> Key: LUCENE-3261
> URL: https://issues.apache.org/jira/browse/LUCENE-3261
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Attachments: facet-userguide.pdf
>
>
> In LUCENE-3079 I've uploaded a userguide for the faceting module. I'd like to 
> discuss where is the best place to include the module. We include it with the 
> code (in our SVN), so that it's always attached to some branch (or in other 
> words a release). That way we can have versions of it per releases that 
> reflect API changes.
> This document is like the file format document, or any other document we put 
> under site-versioned. So we have two places:
> * facet/docs
> * site/userguides
> Unlike the site, which its PDFs are built automatically by Forrest, we cannot 
> convert ODT to PDF with it, so it's a challenge to put it there. What we do 
> today (in our SVN) is whoever updates the userguide, creates a PDF too, 
> that's easy from OpenOffice.
> I'll upload the file later when I'm in front of it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3241) Remove Lucene core's FunctionQuery impls


[ 
https://issues.apache.org/jira/browse/LUCENE-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057635#comment-13057635
 ] 

Chris Male commented on LUCENE-3241:


I will re-evaluate the tests and port what I can.

> Remove Lucene core's FunctionQuery impls
> 
>
> Key: LUCENE-3241
> URL: https://issues.apache.org/jira/browse/LUCENE-3241
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3241.patch
>
>
> As part of the consolidation of FunctionQuerys, we want to remove Lucene 
> core's impls.  Included in this work, we will make sure that all the 
> functionality provided by the core impls is also provided by the new module.  
> Any tests will be ported across too, to increase the test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3079) Faceting module


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057599#comment-13057599
 ] 

Shai Erera edited comment on LUCENE-3079 at 6/30/11 5:55 AM:
-

I opened LUCENE-3261 and LUCENE-3262 to track userguide + benchmark issues. 
Will update more as we go along.

bq. I think we should close this issue

+1.

We can now say Lucene has a faceting module ! Perhaps we should advertise it on 
the user-list?

Great job at porting to trunk Robert !

  was (Author: shaie):
I opened LUCENE-3261 and LUCENE-3260 to track userguide + benchmark issues. 
Will update more as we go along.

bq. I think we should close this issue

+1.

We can now say Lucene has a faceting module ! Perhaps we should advertise it on 
the user-list?

Great job at porting to trunk Robert !
  
> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079.patch, 
> LUCENE-3079_4x.patch, LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, 
> facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2625) NPE in Term Vector Components


 [ 
https://issues.apache.org/jira/browse/SOLR-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated SOLR-2625:
--

Affects Version/s: 3.3

> NPE in Term Vector Components
> -
>
> Key: SOLR-2625
> URL: https://issues.apache.org/jira/browse/SOLR-2625
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.2, 3.3
>Reporter: Daniel Erenrich
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 3.4
>
> Attachments: SOLR-2625.patch, SOLR-2625.patch
>
>
> It seems this bug was first noted here 
> http://lucene.472066.n3.nabble.com/NullPointerException-with-TermVectorComponent-td504361.html
> It still is present in the current version.
> tv.tf_idf=true -> NPE and tv.all=true
> The query: tv.tf_idf=true&q=user:39699693
> The error: 
> HTTP ERROR 500
> Problem accessing /solr/select/tvrh/. Reason:
> null
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:337)
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:330)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:513)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:396)
>   at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:373)
>   at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:786)
>   at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:525)
>   at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:245)
>   at 
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:225)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> It works just fine if I do: tv.all=true&q=user:39699693

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2625) NPE in Term Vector Components


 [ 
https://issues.apache.org/jira/browse/SOLR-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved SOLR-2625.
---

   Resolution: Fixed
Fix Version/s: 3.4

> NPE in Term Vector Components
> -
>
> Key: SOLR-2625
> URL: https://issues.apache.org/jira/browse/SOLR-2625
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.2
>Reporter: Daniel Erenrich
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 3.4
>
> Attachments: SOLR-2625.patch, SOLR-2625.patch
>
>
> It seems this bug was first noted here 
> http://lucene.472066.n3.nabble.com/NullPointerException-with-TermVectorComponent-td504361.html
> It still is present in the current version.
> tv.tf_idf=true -> NPE and tv.all=true
> The query: tv.tf_idf=true&q=user:39699693
> The error: 
> HTTP ERROR 500
> Problem accessing /solr/select/tvrh/. Reason:
> null
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:337)
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:330)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:513)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:396)
>   at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:373)
>   at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:786)
>   at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:525)
>   at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:245)
>   at 
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:225)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> It works just fine if I do: tv.all=true&q=user:39699693

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3241) Remove Lucene core's FunctionQuery impls


[ 
https://issues.apache.org/jira/browse/LUCENE-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057624#comment-13057624
 ] 

Robert Muir commented on LUCENE-3241:
-

+1 to nuke and remove the duplication, and add stuff to migrate.txt (like the 
analyzers) saying such and such has moved here.

I'm confused about the tests... I see more tests in lucene-core under the 
function package than in the queries module? but i didnt look hard... just want 
to make sure we don't lose anything here.


> Remove Lucene core's FunctionQuery impls
> 
>
> Key: LUCENE-3241
> URL: https://issues.apache.org/jira/browse/LUCENE-3241
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3241.patch
>
>
> As part of the consolidation of FunctionQuerys, we want to remove Lucene 
> core's impls.  Included in this work, we will make sure that all the 
> functionality provided by the core impls is also provided by the new module.  
> Any tests will be ported across too, to increase the test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2625) NPE in Term Vector Components


 [ 
https://issues.apache.org/jira/browse/SOLR-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated SOLR-2625:
--

Attachment: SOLR-2625.patch

final patch. I added a randomized test to trunk, merged it to 3.x and fixed the 
NPE in 3.x.
trunk doesn't have this problem since the problematic code was removed earlier.

I will commit in a bit

Thanks Daniel

> NPE in Term Vector Components
> -
>
> Key: SOLR-2625
> URL: https://issues.apache.org/jira/browse/SOLR-2625
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.2
>Reporter: Daniel Erenrich
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: SOLR-2625.patch, SOLR-2625.patch
>
>
> It seems this bug was first noted here 
> http://lucene.472066.n3.nabble.com/NullPointerException-with-TermVectorComponent-td504361.html
> It still is present in the current version.
> tv.tf_idf=true -> NPE and tv.all=true
> The query: tv.tf_idf=true&q=user:39699693
> The error: 
> HTTP ERROR 500
> Problem accessing /solr/select/tvrh/. Reason:
> null
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:337)
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:330)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:513)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:396)
>   at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:373)
>   at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:786)
>   at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:525)
>   at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:245)
>   at 
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:225)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> It works just fine if I do: tv.all=true&q=user:39699693

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3241) Remove Lucene core's FunctionQuery impls


 [ 
https://issues.apache.org/jira/browse/LUCENE-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male reassigned LUCENE-3241:
--

Assignee: Chris Male

> Remove Lucene core's FunctionQuery impls
> 
>
> Key: LUCENE-3241
> URL: https://issues.apache.org/jira/browse/LUCENE-3241
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3241.patch
>
>
> As part of the consolidation of FunctionQuerys, we want to remove Lucene 
> core's impls.  Included in this work, we will make sure that all the 
> functionality provided by the core impls is also provided by the new module.  
> Any tests will be ported across too, to increase the test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3241) Remove Lucene core's FunctionQuery impls


[ 
https://issues.apache.org/jira/browse/LUCENE-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057620#comment-13057620
 ] 

Chris Male commented on LUCENE-3241:


Command for patch:

{code}
svn move 
lucene/src/java/org/apache/lucene/search/function/NumericIndexDocValueSource.java
 modules/queries/src/java/org/apache/lucene/queries/function/valuesource/
{code}

> Remove Lucene core's FunctionQuery impls
> 
>
> Key: LUCENE-3241
> URL: https://issues.apache.org/jira/browse/LUCENE-3241
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3241.patch
>
>
> As part of the consolidation of FunctionQuerys, we want to remove Lucene 
> core's impls.  Included in this work, we will make sure that all the 
> functionality provided by the core impls is also provided by the new module.  
> Any tests will be ported across too, to increase the test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3241) Remove Lucene core's FunctionQuery impls


 [ 
https://issues.apache.org/jira/browse/LUCENE-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3241:
---

Attachment: LUCENE-3241.patch

Patch that deprecates the contents of org.apache.lucene.search.function.

I've gone down this road, instead of straight out nuking, since they're in 
core.  If people don't feel this necessary, I'll happily remove them.

The tests for the package do not add anything therefore I haven't moved them.

Ports NumericIndexDocValueSource to Queries module.

> Remove Lucene core's FunctionQuery impls
> 
>
> Key: LUCENE-3241
> URL: https://issues.apache.org/jira/browse/LUCENE-3241
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3241.patch
>
>
> As part of the consolidation of FunctionQuerys, we want to remove Lucene 
> core's impls.  Included in this work, we will make sure that all the 
> functionality provided by the core impls is also provided by the new module.  
> Any tests will be ported across too, to increase the test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3264) crank up faceting module tests


 [ 
https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3264:


Attachment: LUCENE-3264.patch

> crank up faceting module tests
> --
>
> Key: LUCENE-3264
> URL: https://issues.apache.org/jira/browse/LUCENE-3264
> Project: Lucene - Java
>  Issue Type: Test
>  Components: modules/facet
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3264.patch
>
>
> The faceting module has a large set of good tests.
> lets switch them over to use all of our test infra (randomindexwriter, random 
> iwconfig, mockanalyzer, newDirectory, ...)
> I don't want to address multipliers and atLeast() etc on this issue, I think 
> we should follow up with that on a separate issue, that also looks at speed 
> and making sure the nightly build is exhaustive.
> for now, lets just get the coverage in, it will be good to do before any 
> refactoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3264) crank up faceting module tests

crank up faceting module tests
--

 Key: LUCENE-3264
 URL: https://issues.apache.org/jira/browse/LUCENE-3264
 Project: Lucene - Java
  Issue Type: Test
  Components: modules/facet
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0


The faceting module has a large set of good tests.

lets switch them over to use all of our test infra (randomindexwriter, random 
iwconfig, mockanalyzer, newDirectory, ...)
I don't want to address multipliers and atLeast() etc on this issue, I think we 
should follow up with that on a separate issue, that also looks at speed and 
making sure the nightly build is exhaustive.

for now, lets just get the coverage in, it will be good to do before any 
refactoring.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3242) Rename Lucene common-build.xml project name


 [ 
https://issues.apache.org/jira/browse/LUCENE-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3242:
---

Attachment: LUCENE-3242.patch

Patch which converts common-build.xml (lucene's) project name to common-lucene. 
 I also updated common.dir and common.build.dir to common-lucene.dir and 
common-lucene.build.dir.

If someone could give this a shake down, that'd be great.


> Rename Lucene common-build.xml project name
> ---
>
> Key: LUCENE-3242
> URL: https://issues.apache.org/jira/browse/LUCENE-3242
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3242.patch
>
>
> While adding the new common module, I ran into a name collision with Lucene's 
> common-build.xml project name.  I've since renamed the common module's 
> project name to be clearer, but I think we should rename common-build's one 
> as well.  Solr's common-build.xml uses common-solr, so lets rename Lucene's 
> to common-lucene.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3263) Create Build-A-Module process

Create Build-A-Module process
-

 Key: LUCENE-3263
 URL: https://issues.apache.org/jira/browse/LUCENE-3263
 Project: Lucene - Java
  Issue Type: New Feature
  Components: general/build
Reporter: Chris Male
Priority: Minor


Over the last few weeks, we've had a number of modules made.  This process 
seems only likely to continue with the potential for modules within modules as 
well.

When creating a module, there is usually a consistent series of steps that need 
to be done.  For me these are:

- Create module directory
- Add LICENSE.txt and NOTICE.txt
- Create build.xml with dependencies on other modules (if there are any)
- Update parent build.xml (in case of modules)
- Add java and test directories to dev-tools/eclipse/dot.classpath 
- Create module directory in dev-tools/idea
- Add .iml to dev-tools/idea/path/to/module
- Add module to dev-tools/idea/.idea/modules.xml
- Add module to dev-tools/idea/.idea/workspace.xml
- Create module directory in dev-tools/maven
- Add pom.xml.template to dev-tools/maven/path/to/module

I think we can create a script which provided some basic information, can 
complete the majority of the above tasks.  Of course if the module requires 
some custom build targets or dependencies, then human involvement will be 
required afterwards.  But at the very least, it'll reduce the effort required 
to make a new module and lower the risk of a step being missed (which I've done 
a few times).  

We can also use this as a chance to build in any verification of the 
configurations, so people can feel more comfortable using them. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-06-29 Thread Mike Sokolov (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-2878:
-

Attachment: PosHighlighter.patch

Attaching a patch with a simple highlighter  using the positions() iterators.  
Includes a PositionTreeIterator for pulling out leaf positions.  The names are 
getting a bit long: I almost wrote PositionIntervalIteratorTree?  Maybe a 
PositionIntervalIterator could just be a Positions?  The variables are all 
called positions... 


> Allow Scorer to expose positions and payloads aka. nuke spans 
> --
>
> Key: LUCENE-2878
> URL: https://issues.apache.org/jira/browse/LUCENE-2878
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Bulk Postings branch
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
> PosHighlighter.patch
>
>
> Currently we have two somewhat separate types of queries, the one which can 
> make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
> doesn't really do scoring comparable to what other queries do and at the end 
> of the day they are duplicating lot of code all over lucene. Span*Queries are 
> also limited to other Span*Query instances such that you can not use a 
> TermQuery or a BooleanQuery with SpanNear or anthing like that. 
> Beside of the Span*Query limitation other queries lacking a quiet interesting 
> feature since they can not score based on term proximity since scores doesn't 
> expose any positional information. All those problems bugged me for a while 
> now so I stared working on that using the bulkpostings API. I would have done 
> that first cut on trunk but TermScorer is working on BlockReader that do not 
> expose positions while the one in this branch does. I started adding a new 
> Positions class which users can pull from a scorer, to prevent unnecessary 
> positions enums I added ScorerContext#needsPositions and eventually 
> Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
> currently only TermQuery / TermScorer implements this API and other simply 
> return null instead. 
> To show that the API really works and our BulkPostings work fine too with 
> positions I cut over TermSpanQuery to use a TermScorer under the hood and 
> nuked TermSpans entirely. A nice sideeffect of this was that the Position 
> BulkReading implementation got some exercise which now :) work all with 
> positions while Payloads for bulkreading are kind of experimental in the 
> patch and those only work with Standard codec. 
> So all spans now work on top of TermScorer ( I truly hate spans since today ) 
> including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
> to implement the other codecs yet since I want to get feedback on the API and 
> on this first cut before I go one with it. I will upload the corresponding 
> patch in a minute. 
> I also had to cut over SpanQuery.getSpans(IR) to 
> SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
> first but after that pain today I need a break first :).
> The patch passes all core tests 
> (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
> look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3079) Faceting module


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057599#comment-13057599
 ] 

Shai Erera commented on LUCENE-3079:


I opened LUCENE-3261 and LUCENE-3260 to track userguide + benchmark issues. 
Will update more as we go along.

bq. I think we should close this issue

+1.

We can now say Lucene has a faceting module ! Perhaps we should advertise it on 
the user-list?

Great job at porting to trunk Robert !

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079.patch, 
> LUCENE-3079_4x.patch, LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, 
> facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3262) Facet benchmarking

Facet benchmarking
--

 Key: LUCENE-3262
 URL: https://issues.apache.org/jira/browse/LUCENE-3262
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/benchmark, modules/facet
Reporter: Shai Erera


A spin off from LUCENE-3079. We should define few benchmarks for faceting 
scenarios, so we can evaluate the new faceting module as well as any 
improvement we'd like to consider in the future (such as cutting over to 
docvalues, implement FST-based caches etc.).

Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here as 
a starting point.

We've also done some preliminary job for extending Benchmark for faceting, so 
I'll attach it here as well.

We should perhaps create a Wiki page where we clearly describe the benchmark 
scenarios, then include results of 'default settings' and 'optimized settings', 
or something like that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3261) Faceting module userguide

Faceting module userguide
-

 Key: LUCENE-3261
 URL: https://issues.apache.org/jira/browse/LUCENE-3261
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor


In LUCENE-3079 I've uploaded a userguide for the faceting module. I'd like to 
discuss where is the best place to include the module. We include it with the 
code (in our SVN), so that it's always attached to some branch (or in other 
words a release). That way we can have versions of it per releases that reflect 
API changes.

This document is like the file format document, or any other document we put 
under site-versioned. So we have two places:
* facet/docs
* site/userguides

Unlike the site, which its PDFs are built automatically by Forrest, we cannot 
convert ODT to PDF with it, so it's a challenge to put it there. What we do 
today (in our SVN) is whoever updates the userguide, creates a PDF too, that's 
easy from OpenOffice.

I'll upload the file later when I'm in front of it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3260) need a test that uses termsenum.seekExact() (which returns true), then calls next()


[ 
https://issues.apache.org/jira/browse/LUCENE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057596#comment-13057596
 ] 

Shai Erera commented on LUCENE-3260:


Patch looks good Mike.

One minor comment, atLeast(200) means we'll always run at least 200 iterations. 
Did you do it only for capturing the bug? Robert and Simon have done a great 
job at speeding up tests, so perhaps we should have a lower value, like 10 here?

> need a test that uses termsenum.seekExact() (which returns true), then calls 
> next()
> ---
>
> Key: LUCENE-3260
> URL: https://issues.apache.org/jira/browse/LUCENE-3260
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-3260.patch
>
>
> i tried to do some seekExact (where the result must exist) then next()ing in 
> the faceting module,
> and it seems like there could be a bug here.
> I think we should add a test that mixes seekExact/seekCeil/next like this, to 
> ensure that
> if seekExact returns true, that the enum is properly positioned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3259) need to clarify/change D&Penum api for hasPayload/getPayload


[ 
https://issues.apache.org/jira/browse/LUCENE-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057595#comment-13057595
 ] 

Robert Muir commented on LUCENE-3259:
-

{quote}
If D&PEnum says getPayload() returns null if there is no payload, then why do 
you say it's not defined? I don't mind if we change the contract to 
hasPayload() first, then getPayload().
{quote}

Let me rephrase what I mean: currently if you call getPayload(), and there is 
no payload, it does not actually always return null :) So its "defined" but 
does not work as defined.

The only safe thing at the moment to do if you are not sure if there is a 
payload, is to check hasPayload() first, and if this returns false, do not mess 
with getPayload().

If you are sure there is a payload, you don't need to do anything with 
hasPayload().


> need to clarify/change D&Penum api for hasPayload/getPayload
> 
>
> Key: LUCENE-3259
> URL: https://issues.apache.org/jira/browse/LUCENE-3259
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>
> We encountered this bug while integrating the faceting module:
> * D&PEnum says getPayload() will return null if there is no payload.
> * however, in some cases this is not what happens.
> * things do work (with no exceptions), if you always check hasPayload() first.
> The easiest fix could be to correct the javadocs, and say that you should 
> always check hasPayload() first... otherwise getPayload() is not defined.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3259) need to clarify/change D&Penum api for hasPayload/getPayload


[ 
https://issues.apache.org/jira/browse/LUCENE-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057593#comment-13057593
 ] 

Shai Erera commented on LUCENE-3259:


If D&PEnum says getPayload() returns null if there is no payload, then why do 
you say it's not defined? I don't mind if we change the contract to 
hasPayload() first, then getPayload().

But if we want to follow, e.g. DocIdSetIterator, where you call nextDoc() and 
get the doc ID back, without calling next() followed by docID(), then I think 
getPayload() should be enough here too. Especially for cases where we know a 
payload was written.

What do you think?

> need to clarify/change D&Penum api for hasPayload/getPayload
> 
>
> Key: LUCENE-3259
> URL: https://issues.apache.org/jira/browse/LUCENE-3259
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>
> We encountered this bug while integrating the faceting module:
> * D&PEnum says getPayload() will return null if there is no payload.
> * however, in some cases this is not what happens.
> * things do work (with no exceptions), if you always check hasPayload() first.
> The easiest fix could be to correct the javadocs, and say that you should 
> always check hasPayload() first... otherwise getPayload() is not defined.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3258) File leak when IOException occurs during index optimization.


 [ 
https://issues.apache.org/jira/browse/LUCENE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-3258.


   Resolution: Fixed
Fix Version/s: 3.3

Already fixed in 3.3

> File leak when IOException occurs during index optimization.
> 
>
> Key: LUCENE-3258
> URL: https://issues.apache.org/jira/browse/LUCENE-3258
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.3
> Environment: SUSE Linux 11, Java 6
>Reporter: Nick Kirsch
> Fix For: 3.3
>
>
> I am not sure if this issue requires a fix due to the nature of its 
> occurrence, or if it exists in other versions of Lucene.
> I am using Lucene Java 3.0.3 on a SUSE Linux machine with Java 6 and have 
> noticed there are a number of file handles that are not being released from 
> my java application. There are IOExceptions in my log regarding disk full, 
> which causes a merge and the optimization to fail. The index is not currupt 
> upon encountering the IOException. I am using CFS for my index format, so 3X 
> my largest index size during optimization certainly consumes all of my 
> available disk. 
> I realize that I need to add more disk space to my machine, but I 
> investigated how to clean up the leaking file handles. After failing to find 
> a misuse of Lucene's IndexWriter in the code I have wrapping Lucene, I did a 
> quick search for close() being invoked in the Lucene Jave source code. I 
> found a number of source files that attempt to close more than one object 
> within the same close() method. I think a try/catch should be put around each 
> of these close() attempts to avoid skipping a subsequent closes. The catch 
> may be able to ignore a caught exception to avoid masking the original 
> exception like done in SimpleFSDirectory.close().
> Locations in Lucene Java source where I suggest a try/catch should be used:
> - org.apache.lucene.index.FormatPostingFieldsWriter.finish()
> - org.apache.lucene.index.TermInfosWriter.close()
> - org.apache.lucene.index.SegmentTermPositions.close()
> - org.apache.lucene.index.SegmentMergeInfo.close()
> - org.apache.lucene.index.SegmentMerger.mergeTerms() (The finally block)
> - org.apache.lucene.index.DirectoryReader.close()
> - org.apache.lucene.index.FieldsReader.close()
> - org.apache.lucene.index.MultiLevelSkipListReader.close()
> - org.apache.lucene.index.MultipleTermPositions.close()
> - org.apache.lucene.index.SegmentMergeQueue.close()
> - org.apache.lucene.index.SegmentMergeDocs.close()
> - org.apache.lucene.index.TermInfosReader.close()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-3258) File leak when IOException occurs during index optimization.


 [ 
https://issues.apache.org/jira/browse/LUCENE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera reopened LUCENE-3258:



Reopening to change resolution

> File leak when IOException occurs during index optimization.
> 
>
> Key: LUCENE-3258
> URL: https://issues.apache.org/jira/browse/LUCENE-3258
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.3
> Environment: SUSE Linux 11, Java 6
>Reporter: Nick Kirsch
> Fix For: 3.3
>
>
> I am not sure if this issue requires a fix due to the nature of its 
> occurrence, or if it exists in other versions of Lucene.
> I am using Lucene Java 3.0.3 on a SUSE Linux machine with Java 6 and have 
> noticed there are a number of file handles that are not being released from 
> my java application. There are IOExceptions in my log regarding disk full, 
> which causes a merge and the optimization to fail. The index is not currupt 
> upon encountering the IOException. I am using CFS for my index format, so 3X 
> my largest index size during optimization certainly consumes all of my 
> available disk. 
> I realize that I need to add more disk space to my machine, but I 
> investigated how to clean up the leaking file handles. After failing to find 
> a misuse of Lucene's IndexWriter in the code I have wrapping Lucene, I did a 
> quick search for close() being invoked in the Lucene Jave source code. I 
> found a number of source files that attempt to close more than one object 
> within the same close() method. I think a try/catch should be put around each 
> of these close() attempts to avoid skipping a subsequent closes. The catch 
> may be able to ignore a caught exception to avoid masking the original 
> exception like done in SimpleFSDirectory.close().
> Locations in Lucene Java source where I suggest a try/catch should be used:
> - org.apache.lucene.index.FormatPostingFieldsWriter.finish()
> - org.apache.lucene.index.TermInfosWriter.close()
> - org.apache.lucene.index.SegmentTermPositions.close()
> - org.apache.lucene.index.SegmentMergeInfo.close()
> - org.apache.lucene.index.SegmentMerger.mergeTerms() (The finally block)
> - org.apache.lucene.index.DirectoryReader.close()
> - org.apache.lucene.index.FieldsReader.close()
> - org.apache.lucene.index.MultiLevelSkipListReader.close()
> - org.apache.lucene.index.MultipleTermPositions.close()
> - org.apache.lucene.index.SegmentMergeQueue.close()
> - org.apache.lucene.index.SegmentMergeDocs.close()
> - org.apache.lucene.index.TermInfosReader.close()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3256) Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3256.


Resolution: Fixed
  Assignee: Chris Male

Committed revision 1141366.

> Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery 
> 
>
> Key: LUCENE-3256
> URL: https://issues.apache.org/jira/browse/LUCENE-3256
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3256.patch, LUCENE-3256.patch, LUCENE-3256.patch
>
>
> Lucene's CustomScoreQuery and Solr's BoostedQuery do essentially the same 
> thing: they boost the scores of Documents by the value from a ValueSource.  
> BoostedQuery does this in a direct fashion, by accepting a ValueSource. 
> CustomScoreQuery on the other hand, accepts a series of ValueSourceQuerys.  
> ValueSourceQuery seems to do exactly the same thing as FunctionQuery.
> With Lucene's ValueSource being deprecated / removed, we need to resolve 
> these dependencies and simplify the code.
> Therefore I recommend we do the following things:
> - Move CustomScoreQuery (and CustomScoreProvider) to the new Queries module 
> and change it over to use FunctionQuerys instead of ValueSourceQuerys.  
> - Deprecate Solr's BoostedQuery in favour of the new CustomScoreQuery.  CSQ 
> provides a lot of support for customizing the scoring process.
> - Move and consolidate all tests of CSQ and BoostedQuery, to the Queries 
> module and have them test CSQ instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3256) Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057565#comment-13057565
 ] 

Chris Male commented on LUCENE-3256:


Command for patch is:

{code}
svn move 
lucene/src/java/org/apache/lucene/search/function/CustomScoreQuery.java 
modules/queries/src/java/org/apache/lucene/queries/
svn move 
lucene/src/java/org/apache/lucene/search/function/CustomScoreProvider.java 
modules/queries/src/java/org/apache/lucene/queries/
svn move solr/src/java/org/apache/solr/search/function/BoostedQuery.java 
modules/queries/src/java/org/apache/lucene/queries/function/
svn --parents mkdir modules/queries/src/test/org/apache/lucene/queries/function
svn move 
lucene/src/test/org/apache/lucene/search/function/TestCustomScoreQuery.java 
modules/queries/src/test/org/apache/lucene/queries/
{code}

> Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery 
> 
>
> Key: LUCENE-3256
> URL: https://issues.apache.org/jira/browse/LUCENE-3256
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3256.patch, LUCENE-3256.patch, LUCENE-3256.patch
>
>
> Lucene's CustomScoreQuery and Solr's BoostedQuery do essentially the same 
> thing: they boost the scores of Documents by the value from a ValueSource.  
> BoostedQuery does this in a direct fashion, by accepting a ValueSource. 
> CustomScoreQuery on the other hand, accepts a series of ValueSourceQuerys.  
> ValueSourceQuery seems to do exactly the same thing as FunctionQuery.
> With Lucene's ValueSource being deprecated / removed, we need to resolve 
> these dependencies and simplify the code.
> Therefore I recommend we do the following things:
> - Move CustomScoreQuery (and CustomScoreProvider) to the new Queries module 
> and change it over to use FunctionQuerys instead of ValueSourceQuerys.  
> - Deprecate Solr's BoostedQuery in favour of the new CustomScoreQuery.  CSQ 
> provides a lot of support for customizing the scoring process.
> - Move and consolidate all tests of CSQ and BoostedQuery, to the Queries 
> module and have them test CSQ instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3256) Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3256:
---

Attachment: LUCENE-3256.patch

New patch which incorporates Yonik's thoughts.  BoostedQuery is now a 1st class 
construct and pushed to the module.  CustomScoreQuery now accepts arbitrary 
Querys as scorers.

Everything is jigged around a little again, since CSQ isn't specifically tied 
to FunctionQuery (except on the test level).

Command is coming up.

This one is ready to go.

> Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery 
> 
>
> Key: LUCENE-3256
> URL: https://issues.apache.org/jira/browse/LUCENE-3256
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3256.patch, LUCENE-3256.patch, LUCENE-3256.patch
>
>
> Lucene's CustomScoreQuery and Solr's BoostedQuery do essentially the same 
> thing: they boost the scores of Documents by the value from a ValueSource.  
> BoostedQuery does this in a direct fashion, by accepting a ValueSource. 
> CustomScoreQuery on the other hand, accepts a series of ValueSourceQuerys.  
> ValueSourceQuery seems to do exactly the same thing as FunctionQuery.
> With Lucene's ValueSource being deprecated / removed, we need to resolve 
> these dependencies and simplify the code.
> Therefore I recommend we do the following things:
> - Move CustomScoreQuery (and CustomScoreProvider) to the new Queries module 
> and change it over to use FunctionQuerys instead of ValueSourceQuerys.  
> - Deprecate Solr's BoostedQuery in favour of the new CustomScoreQuery.  CSQ 
> provides a lot of support for customizing the scoring process.
> - Move and consolidate all tests of CSQ and BoostedQuery, to the Queries 
> module and have them test CSQ instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2454) Nested Document query support


[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057538#comment-13057538
 ] 

Michael McCandless commented on LUCENE-2454:


bq. Do you think there any efficiencies to be gained on the document retrieve 
side of things if you know that the documents commonly being retrieved are 
physically nearby

Good question!  I think OS level caching should mostly solve this?

> Nested Document query support
> -
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 3.0.2
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
> LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3171) BlockJoinQuery/Collector


 [ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3171.


   Resolution: Fixed
Fix Version/s: (was: 3.3)
   3.4

> BlockJoinQuery/Collector
> 
>
> Key: LUCENE-3171
> URL: https://issues.apache.org/jira/browse/LUCENE-3171
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: Michael McCandless
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch
>
>
> I created a single-pass Query + Collector to implement nested docs.
> The approach is similar to LUCENE-2454, in that the app must index
> documents in "join order", as a block (IW.add/updateDocuments), with
> the parent doc at the end of the block, except that this impl is one
> pass.
> Once you join at indexing time, you can take any query that matches
> child docs and join it up to the parent docID space, using
> BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
> docs by provided Sort, to gather results, grouped by parent; this
> collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
> retains the child docs corresponding to each collected parent doc.
> After searching is done, you retrieve the TopGroups from a provided
> BlockJoinQuery.
> Like LUCENE-2454, this is less general than the arbitrary joins in
> Solr (SOLR-2272) or parent/child from ElasticSearch
> (https://github.com/elasticsearch/elasticsearch/issues/553), since you
> must do the join at indexing time as a doc block, but it should be
> able to handle nested joins as well as joins to multiple tables,
> though I don't yet have test cases for these.
> I put this in a new Join module (modules/join); I think as we
> refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2454) Nested Document query support


 [ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2454.


Resolution: Duplicate

Duplicate of LUCENE-3171.

> Nested Document query support
> -
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 3.0.2
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
> LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3260) need a test that uses termsenum.seekExact() (which returns true), then calls next()


 [ 
https://issues.apache.org/jira/browse/LUCENE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3260:
---

Attachment: LUCENE-3260.patch

Patch, w/ new test showing the issue in MTE when you next() after seekExact(), 
and w/ the fix for MTE.

I also removed unnecessary seek calls from LuceneTaxonomyWriter.

> need a test that uses termsenum.seekExact() (which returns true), then calls 
> next()
> ---
>
> Key: LUCENE-3260
> URL: https://issues.apache.org/jira/browse/LUCENE-3260
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-3260.patch
>
>
> i tried to do some seekExact (where the result must exist) then next()ing in 
> the faceting module,
> and it seems like there could be a bug here.
> I think we should add a test that mixes seekExact/seekCeil/next like this, to 
> ensure that
> if seekExact returns true, that the enum is properly positioned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Troy Howard

I pretty much agree with Rory.

And as others have said, this issue has been discussed many times. What is
most important about the fact that it has been discussed many times is that
it has not been resolve, even though it has been discussed so many times.

That means that the both the developer community that contributes to the
project and the user community that uses the library have an interest in
*both*. I think we have enough interest and support from the community to
develop both of these at the same time.

Some key points:
- Being a useful index/search library is the goal of any implementation of
Lucene. Being useful is more important than being identical to one another.
Don't forget that Java Lucene has bugs, design problems, and may not always
be the best implementation of Lucene.
- Unit tests should validate the code's "correctness" in terms of
functionality/bugs
- The library can contain multiple APIs for the same tasks. Fluent? LINQ?
Just Like Java? Just like pylucene? All of the above?
- Implementation details between .NET and Java are *very* significant and
often account for a lot of the bugs that are Lucene.Net only. Our attempt to
be a "line-by-line" port is what is introducing bugs, not the the other way
around
- The only reason we are having this discussion is because C# and Java are
very similar languages. If this was a F# port or a VB.NET port, we wouldn't
even be discussing this. Instead we'd say "make it work the way that makes
the most sense in {{insert language here}}".

That said, DIGY has a very good point. Continued development on the library
is the most important part of the project's goals. A dead project helps no
one. If the current active contributors are writing a line-by-line port,
then that's what it will be. If they are writing a complete re-write, then
that is what it will be. Some might find it easier to write line-by-line,
but others might find that task daunting. The opposite is also true. It
depends on the person, how much time they have, and what they consider
"easy" or "manageable" or "worth doing".

As always, if you want the code base to be something specific, submit a
patch for that, and it will be. If not, then you need to convince someone
else to write that patch. And just so it's clear, *anyone* can write and
submit a patch and be a contributor, not just the project committers.

Thanks,
Troy

On Wed, Jun 29, 2011 at 3:06 PM, Rory Plaire  wrote:

> For what it's worth, I've participated in a number of projects which have
> been "ported" from Java to .Net with varying levels of "translation" into
> the native style and functionalty of the .Net framework. The largest are
> NTS, a JTS port and NHibernate, a Java Hibernate port. My experience is
> that
> a line-by-line port isn't as valuable as people would imagine.
>
> Even if we discount the reality that a line-by-line port is really
> unachievable due to various differences between the frameworks, keeping
> even
> identical code in sync will always take some work: full automation on this
> large of a project is infeasible. During manual effort, therefore, making
> readable changes to the code is really not that much more work.
>
> For update maintenance, porting over code from recent versions of both
> projects to the .Net versions, and ".Nettifying" that code is little
> trouble. Since both projects use source control, it's easy to see the
> changes and translate them.
>
> When it comes to debugging issues, in NTS or NHibernate, I go to the Java
> sources, and even if the classes were largely rewritten to take advantage
> of
> IEnumerable or generics or structures, running unit tests, tracing the
> code,
> and seeing the output of each has always been straightforward.
>
> Since I'm using .Net, I'd want the Lucene.Net project to be more .Net than
> a
> line-by-line port of Java, in order to take advantage of the Framework as
> well as provide a better code base for .Net developers to maintain. If
> large
> .Net projects ported from Java do this, and have found considerable
> success,
> it is, in my view, a well-proven practice and shouldn't be avoided due to
> uncertainty of how the resulting code should work. Ultimately, that is what
> unit tests are for, anyway.
>

[jira] [Assigned] (LUCENE-3260) need a test that uses termsenum.seekExact() (which returns true), then calls next()


 [ 
https://issues.apache.org/jira/browse/LUCENE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3260:
--

Assignee: Michael McCandless

> need a test that uses termsenum.seekExact() (which returns true), then calls 
> next()
> ---
>
> Key: LUCENE-3260
> URL: https://issues.apache.org/jira/browse/LUCENE-3260
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
>
> i tried to do some seekExact (where the result must exist) then next()ing in 
> the faceting module,
> and it seems like there could be a bug here.
> I think we should add a test that mixes seekExact/seekCeil/next like this, to 
> ensure that
> if seekExact returns true, that the enum is properly positioned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: calling a Python function from Java?

Andi Vajda  wrote:

> 
> On Jun 29, 2011, at 22:06, Bill Janssen  wrote:
> 
> > Andi Vajda  wrote:
> > 
> >> Put everything into a class and call all the python stuff from there. 
> > 
> > I'd like to make the method on the Java class be static, so I'd like
> > that method to create an instance and call a protected or
> > package-private method that is implemented by the Python class.  But
> > JCC doesn't seem to wrap non-public or static methods...?
> 
> Jcc wraps all public methods whose signature contains only classes or
> types in the set of classes to be wrapped, including static ones.

OK, I'll try the static method, then.  What do you think about wrapping
protected methods of classes marked as Python-extensible?

Bill

[jira] [Issue Comment Edited] (LUCENE-3256) Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057515#comment-13057515
 ] 

Chris Male edited comment on LUCENE-3256 at 6/29/11 10:15 PM:
--

{quote}
I'm not sure if we should change the implementation of BoostedQuery to use 
CustomScoreQuery. It's going to be slower as it goes through more levels of 
indirection. The edismax parser creates BoostedQuery instances (as does the 
boost qparser), so this is going to be a heavily used implementation and should 
be optimized. Having a specific BoosgtedQuery is even nicer for debugging 
purposes where the toString is simpler and more specific.
{quote}

Very valid point.  I will make BoostedQuery a 1st class construct.

{quote}
Actually, looking closer at CustomScoreQuery, I don't even see why it's not 
more generic... why does it require ValueSourceQueries and not just combine the 
scores of arbitrary queries? It already just operates on scorers and doesn't 
seem to use value sources at all.
{quote}

For the life of me I'm sure it used to do just that.  I'll open an issue to 
make the change.

Thanks for your review Yonik!

  was (Author: cmale):
{quote}
I'm not sure if we should change the implementation of BoostedQuery to use 
CustomScoreQuery. It's going to be slower as it goes through more levels of 
indirection. The edismax parser creates BoostedQuery instances (as does the 
boost qparser), so this is going to be a heavily used implementation and should 
be optimized. Having a specific BoosgtedQuery is even nicer for debugging 
purposes where the toString is simpler and more specific.
{quote}

Very valid point.  I will make BoostedQuery a 1st class construct.

{code}
Actually, looking closer at CustomScoreQuery, I don't even see why it's not 
more generic... why does it require ValueSourceQueries and not just combine the 
scores of arbitrary queries? It already just operates on scorers and doesn't 
seem to use value sources at all.
{code}

For the life of me I'm sure it used to do just that.  I'll open an issue to 
make the change.

Thanks for your review Yonik!
  
> Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery 
> 
>
> Key: LUCENE-3256
> URL: https://issues.apache.org/jira/browse/LUCENE-3256
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3256.patch, LUCENE-3256.patch
>
>
> Lucene's CustomScoreQuery and Solr's BoostedQuery do essentially the same 
> thing: they boost the scores of Documents by the value from a ValueSource.  
> BoostedQuery does this in a direct fashion, by accepting a ValueSource. 
> CustomScoreQuery on the other hand, accepts a series of ValueSourceQuerys.  
> ValueSourceQuery seems to do exactly the same thing as FunctionQuery.
> With Lucene's ValueSource being deprecated / removed, we need to resolve 
> these dependencies and simplify the code.
> Therefore I recommend we do the following things:
> - Move CustomScoreQuery (and CustomScoreProvider) to the new Queries module 
> and change it over to use FunctionQuerys instead of ValueSourceQuerys.  
> - Deprecate Solr's BoostedQuery in favour of the new CustomScoreQuery.  CSQ 
> provides a lot of support for customizing the scoring process.
> - Move and consolidate all tests of CSQ and BoostedQuery, to the Queries 
> module and have them test CSQ instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3256) Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057515#comment-13057515
 ] 

Chris Male commented on LUCENE-3256:


{quote}
I'm not sure if we should change the implementation of BoostedQuery to use 
CustomScoreQuery. It's going to be slower as it goes through more levels of 
indirection. The edismax parser creates BoostedQuery instances (as does the 
boost qparser), so this is going to be a heavily used implementation and should 
be optimized. Having a specific BoosgtedQuery is even nicer for debugging 
purposes where the toString is simpler and more specific.
{quote}

Very valid point.  I will make BoostedQuery a 1st class construct.

{code}
Actually, looking closer at CustomScoreQuery, I don't even see why it's not 
more generic... why does it require ValueSourceQueries and not just combine the 
scores of arbitrary queries? It already just operates on scorers and doesn't 
seem to use value sources at all.
{code}

For the life of me I'm sure it used to do just that.  I'll open an issue to 
make the change.

Thanks for your review Yonik!

> Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery 
> 
>
> Key: LUCENE-3256
> URL: https://issues.apache.org/jira/browse/LUCENE-3256
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3256.patch, LUCENE-3256.patch
>
>
> Lucene's CustomScoreQuery and Solr's BoostedQuery do essentially the same 
> thing: they boost the scores of Documents by the value from a ValueSource.  
> BoostedQuery does this in a direct fashion, by accepting a ValueSource. 
> CustomScoreQuery on the other hand, accepts a series of ValueSourceQuerys.  
> ValueSourceQuery seems to do exactly the same thing as FunctionQuery.
> With Lucene's ValueSource being deprecated / removed, we need to resolve 
> these dependencies and simplify the code.
> Therefore I recommend we do the following things:
> - Move CustomScoreQuery (and CustomScoreProvider) to the new Queries module 
> and change it over to use FunctionQuerys instead of ValueSourceQuerys.  
> - Deprecate Solr's BoostedQuery in favour of the new CustomScoreQuery.  CSQ 
> provides a lot of support for customizing the scoring process.
> - Move and consolidate all tests of CSQ and BoostedQuery, to the Queries 
> module and have them test CSQ instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Digy

> I do not know if too much emphasis should be placed on "user" vs.
"contributor".  
I am sorry for this misunderstanding.
What I tried to say with "contributor"(not "committer") was the "people that
works on Lucene.Net source code", not the ones who just consume it.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, June 29, 2011 11:23 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

I do not know if too much emphasis should be placed on "user" vs.
"contributor".  The project needs to also consider those of us who use
Lucene.NET source releases only.
It is much easier to locally patch/fix the source when I can compare it
directly to Lucene core.

- Neal
 

-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, June 29, 2011 2:58 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

As a Lucene.Net user I wouldn't care whether it is line-by-line port or not.

But as a contributer, I would prefer a parallel code that makes the life
easier for manual ports of new releases(until this process is automated)

PS: I presume no one thinks of functional or index-level incompatibility.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, June 29, 2011 10:47 PM
To: lucene-net-u...@lucene.apache.org
Cc: lucene-net-...@incubator.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

This is has been discussed many times.
Lucene.NET is not valid, the code cannot be trusted, if it is not a
line-by-line port.  It ceases to be Lucene.

- Neal

-Original Message-
From: Scott Lombard [mailto:lombardena...@gmail.com] 
Sent: Wednesday, June 29, 2011 1:58 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

 

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

 

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

 

Scott

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

I do not know if too much emphasis should be placed on "user" vs. 
"contributor".  The project needs to also consider those of us who use 
Lucene.NET source releases only.
It is much easier to locally patch/fix the source when I can compare it 
directly to Lucene core.

- Neal
 

-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, June 29, 2011 2:58 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

As a Lucene.Net user I wouldn't care whether it is line-by-line port or not.

But as a contributer, I would prefer a parallel code that makes the life
easier for manual ports of new releases(until this process is automated)

PS: I presume no one thinks of functional or index-level incompatibility.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, June 29, 2011 10:47 PM
To: lucene-net-u...@lucene.apache.org
Cc: lucene-net-...@incubator.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

This is has been discussed many times.
Lucene.NET is not valid, the code cannot be trusted, if it is not a
line-by-line port.  It ceases to be Lucene.

- Neal

-Original Message-
From: Scott Lombard [mailto:lombardena...@gmail.com] 
Sent: Wednesday, June 29, 2011 1:58 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

 

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

 

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

 

Scott

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

Others have done a much better , more through job of explaining the issues in 
previous discussions.  It would be best to re-read those.

One way to understand it, is if Lucene.NET cannot be compared to the reference 
source code (Lucene core "java Lucene") than it becomes nearly impossible to 
validate that Lucene.NET is functioning correctly, that bug fixes made in 
Lucene core have been implemented in Lucene.NET, etc.  The same goes for unit 
tests, if they cannot be compared with the ones from Lucene core line-by-line 
than there is no way to know that they perform the intended tests and run 
correctly.

- Neal

-Original Message-
From: Wyatt Barnett [mailto:wyatt.barn...@gmail.com] 
Sent: Wednesday, June 29, 2011 2:57 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

Those are pretty strong words -- I'd really like to know why I
shouldn't trust anything but a line-by-line port. Can you explain a
bit?

On Wed, Jun 29, 2011 at 3:47 PM, Granroth, Neal V.
 wrote:
> This is has been discussed many times.
> Lucene.NET is not valid, the code cannot be trusted, if it is not a 
> line-by-line port.  It ceases to be Lucene.
>
> - Neal
>
> -Original Message-
> From: Scott Lombard [mailto:lombardena...@gmail.com]
> Sent: Wednesday, June 29, 2011 1:58 PM
> To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
> Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
>
>
>
> After the large community response about moving the code base from .Net 2.0
> to Net 4.0 I am trying to figure out what is the need for a line-by-line
> port.  Starting with Digy's excellent work on the conversion to generics a
> priority of the 2.9.4g release is the 2 packages would not be
> interchangeable.  So faster turnaround from a java release won't matter to
> non line-by-line users they will have to wait until the updates are made to
> the non line-by-line code base.
>
>
>
> My question is there really a user base for the line-by-line port?  Anyone
> have a comment?
>
>
>
> Scott
>
>
>
>
>
>
>
>

[jira] [Created] (LUCENE-3260) need a test that uses termsenum.seekExact() (which returns true), then calls next()

need a test that uses termsenum.seekExact() (which returns true), then calls 
next()
---

 Key: LUCENE-3260
 URL: https://issues.apache.org/jira/browse/LUCENE-3260
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


i tried to do some seekExact (where the result must exist) then next()ing in 
the faceting module,
and it seems like there could be a bug here.

I think we should add a test that mixes seekExact/seekCeil/next like this, to 
ensure that
if seekExact returns true, that the enum is properly positioned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3259) need to clarify/change D&Penum api for hasPayload/getPayload

need to clarify/change D&Penum api for hasPayload/getPayload


 Key: LUCENE-3259
 URL: https://issues.apache.org/jira/browse/LUCENE-3259
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


We encountered this bug while integrating the faceting module:
* D&PEnum says getPayload() will return null if there is no payload.
* however, in some cases this is not what happens.
* things do work (with no exceptions), if you always check hasPayload() first.

The easiest fix could be to correct the javadocs, and say that you should 
always check hasPayload() first... otherwise getPayload() is not defined.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage


[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057486#comment-13057486
 ] 

Simon Willnauer commented on SOLR-2462:
---

bq. We just ran into this bug when we upgraded to 3.2
3.3 should be released in the next two days which has a fix for this. So maybe 
you just check the mailinglist for the release mail tomorrow or the day after!

simon

> Using spellcheck.collate can result in extremely high memory usage
> --
>
> Key: SOLR-2462
> URL: https://issues.apache.org/jira/browse/SOLR-2462
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.1
>Reporter: James Dyer
>Assignee: Robert Muir
>Priority: Critical
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a 
> ranked list of *every* possible correction combination.  But if returning 
> several corrections per term, and if several words are misspelled, the 
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime 
> "spellcheck.collate" is used.  It is not necessary to use any features that 
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking 
> our Solr servers down with "infinite" GC loops.  It was pretty easy for this 
> to happen as occasionally a user will accidently paste the URL into the 
> Search box on our app.  This URL results in a search with ~12 misspelled 
> words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] release 3.3 (take two)

2011-06-29 Thread Robert Muir

It looks like the vote has passed... I'll work on getting these artifacts out.

Thanks everyone for voting.

On Tue, Jun 28, 2011 at 11:08 AM, Yonik Seeley
 wrote:
> +1
>
> -Yonik
> http://www.lucidimagination.com
>
> On Sun, Jun 26, 2011 at 11:12 AM, Robert Muir  wrote:
>> Artifacts here:
>>
>> http://s.apache.org/lusolr330rc1
>>
>> working release notes here:
>>
>> http://wiki.apache.org/lucene-java/ReleaseNote33
>> http://wiki.apache.org/solr/ReleaseNote33
>>
>> To see the changes between the previous release candidate (rc0):
>> svn diff -r 1139028:1139775
>> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3
>>
>> Here is my +1
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

I do not know if too much emphasis should be placed on "user" vs. 
"contributor".  The project needs to also consider those of us who use 
Lucene.NET source releases only.
It is much easier to locally patch/fix the source when I can compare it 
directly to Lucene core.

- Neal
 

-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, June 29, 2011 2:58 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

As a Lucene.Net user I wouldn't care whether it is line-by-line port or not.

But as a contributer, I would prefer a parallel code that makes the life
easier for manual ports of new releases(until this process is automated)

PS: I presume no one thinks of functional or index-level incompatibility.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, June 29, 2011 10:47 PM
To: lucene-net-u...@lucene.apache.org
Cc: lucene-net-...@incubator.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

This is has been discussed many times.
Lucene.NET is not valid, the code cannot be trusted, if it is not a
line-by-line port.  It ceases to be Lucene.

- Neal

-Original Message-
From: Scott Lombard [mailto:lombardena...@gmail.com] 
Sent: Wednesday, June 29, 2011 1:58 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

 

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

 

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

 

Scott

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-29 Thread Mitsu Hadeishi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057483#comment-13057483
 ] 

Mitsu Hadeishi commented on SOLR-2462:
--

We just ran into this bug when we upgraded to 3.2, and suddenly SOLR was 
blowing up as soon as we built the spellcheck dictionary. I attempted to apply 
the patch to the 3.2 source code tgz file downloadable from 
http://www.apache.org/dyn/closer.cgi/lucene/solr, but it didn't apply cleanly. 
I manually applied the patch, as we're using the released version of 3.2.

> Using spellcheck.collate can result in extremely high memory usage
> --
>
> Key: SOLR-2462
> URL: https://issues.apache.org/jira/browse/SOLR-2462
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.1
>Reporter: James Dyer
>Assignee: Robert Muir
>Priority: Critical
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a 
> ranked list of *every* possible correction combination.  But if returning 
> several corrections per term, and if several words are misspelled, the 
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime 
> "spellcheck.collate" is used.  It is not necessary to use any features that 
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking 
> our Solr servers down with "infinite" GC loops.  It was pretty easy for this 
> to happen as occasionally a user will accidently paste the URL into the 
> Search box on our app.  This URL results in a search with ~12 misspelled 
> words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Digy

Hi Scott,
Please avoid crossposting(as I do now). Since when I reply to your eMail, it
goes to one of the lists and thread is splitted into two.
It may be good for announcements but not for discussions.

DIGY

-Original Message-
From: Scott Lombard [mailto:lombardena...@gmail.com] 
Sent: Wednesday, June 29, 2011 9:58 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

 

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

 

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

 

Scott

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Scott Lombard

When I look at the goals of Lucene.Net I am trying to understand what is
more important to Lucene.Net users, .NET functionality or a line-for-line
port.

.NET and Java are close but not the same.  In the past when give the choice
between a better .NET way or stay with the Java implementation the project
chose to keep the Java implementation.  If users don't care that it is a
line-for-line port then contributors will have more freedom to use a better
.NET way, while keeping functionality and index compatibility.  

As contributors we can figure out how to get from the Java to Lucene.Net.
This will probably be an automated tool, but the source that the tool
outputs wouldn't need to be highly polished or even compile.  The primary
purpose would be to simplify the process of get from Java to .NET for a
release.


Scott


> -Original Message-
> From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
> Sent: Wednesday, June 29, 2011 4:17 PM
> To: lucene-net-...@lucene.apache.org
> Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
> 
> For the sake of continued conversation, Scott could you define what you
> mean
> by line-by-line port vs non-line-by-line port since technically your the
> thread starter?
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jun 29, 2011 at 3:58 PM, Digy  wrote:
> 
> > As a Lucene.Net user I wouldn't care whether it is line-by-line port or
> > not.
> >
> > But as a contributer, I would prefer a parallel code that makes the life
> > easier for manual ports of new releases(until this process is automated)
> >
> > PS: I presume no one thinks of functional or index-level
> incompatibility.
> >
> > DIGY
> >
> > -Original Message-
> > From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
> > Sent: Wednesday, June 29, 2011 10:47 PM
> > To: lucene-net-u...@lucene.apache.org
> > Cc: lucene-net-...@incubator.apache.org
> > Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
> >
> > This is has been discussed many times.
> > Lucene.NET is not valid, the code cannot be trusted, if it is not a
> > line-by-line port.  It ceases to be Lucene.
> >
> > - Neal
> >
> > -Original Message-
> > From: Scott Lombard [mailto:lombardena...@gmail.com]
> > Sent: Wednesday, June 29, 2011 1:58 PM
> > To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
> > Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
> >
> >
> >
> > After the large community response about moving the code base from .Net
> 2.0
> > to Net 4.0 I am trying to figure out what is the need for a line-by-line
> > port.  Starting with Digy's excellent work on the conversion to generics
> a
> > priority of the 2.9.4g release is the 2 packages would not be
> > interchangeable.  So faster turnaround from a java release won't matter
> to
> > non line-by-line users they will have to wait until the updates are made
> to
> > the non line-by-line code base.
> >
> >
> >
> > My question is there really a user base for the line-by-line port?
> Anyone
> > have a comment?
> >
> >
> >
> > Scott
> >
> >
> >
> >
> >
> >
> >
> >

[jira] [Updated] (SOLR-2625) NPE in Term Vector Components


 [ 
https://issues.apache.org/jira/browse/SOLR-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated SOLR-2625:
--

Attachment: SOLR-2625.patch

phew, this is not the nicest part of the lucene / solr codebase. Here is a 
patch that triggers the bug in a testcase, an easy fix and some more cleanups 
along the way. 

I will look closer into this tomorrow


> NPE in Term Vector Components
> -
>
> Key: SOLR-2625
> URL: https://issues.apache.org/jira/browse/SOLR-2625
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.2
>Reporter: Daniel Erenrich
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: SOLR-2625.patch
>
>
> It seems this bug was first noted here 
> http://lucene.472066.n3.nabble.com/NullPointerException-with-TermVectorComponent-td504361.html
> It still is present in the current version.
> tv.tf_idf=true -> NPE and tv.all=true
> The query: tv.tf_idf=true&q=user:39699693
> The error: 
> HTTP ERROR 500
> Problem accessing /solr/select/tvrh/. Reason:
> null
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:337)
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:330)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:513)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:396)
>   at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:373)
>   at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:786)
>   at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:525)
>   at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:245)
>   at 
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:225)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> It works just fine if I do: tv.all=true&q=user:39699693

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2625) NPE in Term Vector Components


 [ 
https://issues.apache.org/jira/browse/SOLR-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned SOLR-2625:
-

Assignee: Simon Willnauer

> NPE in Term Vector Components
> -
>
> Key: SOLR-2625
> URL: https://issues.apache.org/jira/browse/SOLR-2625
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.2
>Reporter: Daniel Erenrich
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: SOLR-2625.patch
>
>
> It seems this bug was first noted here 
> http://lucene.472066.n3.nabble.com/NullPointerException-with-TermVectorComponent-td504361.html
> It still is present in the current version.
> tv.tf_idf=true -> NPE and tv.all=true
> The query: tv.tf_idf=true&q=user:39699693
> The error: 
> HTTP ERROR 500
> Problem accessing /solr/select/tvrh/. Reason:
> null
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:337)
>   at 
> org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:330)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:513)
>   at 
> org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:396)
>   at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:373)
>   at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:786)
>   at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:525)
>   at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:245)
>   at 
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:225)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> It works just fine if I do: tv.all=true&q=user:39699693

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3079) Faceting module


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057468#comment-13057468
 ] 

Robert Muir commented on LUCENE-3079:
-

Committed revision 1141246.

I think we should close this issue soon, and open followup issues?
Maybe just start with a separate issue for the documentation guide?

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079.patch, 
> LUCENE-3079_4x.patch, LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, 
> facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-29 Thread Uwe Schindler

DONE. Now let's update build.xmls!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Wednesday, June 29, 2011 10:37 PM
> To: dev@lucene.apache.org
> Subject: Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
> 
> I think this vote has passed. Let's move to 1.6!
> Uwe can you update Jenkins?
> 
> Simon
> 
> On Tue, Jun 28, 2011 at 11:30 PM, DM Smith 
> wrote:
> > +1 from old-stick-in-the-mud, whose vote does not count :)
> >
> > BTW, today Apple released Java 1.5.0_30. So while Oracle has not
> > supplied security updates or bug fixes to 1.5 since Nov 2009, except
> > to premier customers, Apple is still actively supporting it for OS X 10.5,
> Leopard.
> >
> > On 06/28/2011 10:48 AM, Robert Muir wrote:
> >>
> >> +1!
> >>
> >> but, who are you and what have you done with our generics policeman?!
> >> :)
> >>
> >> On Mon, Jun 27, 2011 at 1:45 PM, Uwe
> Schindler  wrote:
> >>>
> >>> My +1 for trunk :-)
> >>>
> >>> I will change hudson scripts once this vote passes!
> >>>
> >>> -
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >>> eMail: u...@thetaphi.de
> >>>
> >>>
>  -Original Message-
>  From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>  Sent: Monday, June 27, 2011 7:38 PM
>  To: dev@lucene.apache.org
>  Subject: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
> 
>  This issue has been discussed on various occasions and lately on
>  LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
> 
>  The main reasons for this have been discussed on the issue but let me
>  put
>  them out here too:
> 
>  - Lack of testing on Jenkins with Java 5
>  - Java 5 end of lifetime is reached a long time ago so Java 5 is totally
>  unmaintained which means for us that bugs have to either be hacked
>  around, tests disabled, warnings placed, but some things simply cannot
>  be
>  fixed... we cannot actually "support" something that is no longer
>  maintained:
>  we do find JRE bugs
>  (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
> that
>  bugs actually get fixed: cannot do everything with hacks.\
>  - due to Java 5 we legitimate performance hits like 20% slower grouping
>  speed.
> 
>  For reference please read through the issue mentioned above.
> 
>  A lot of the committers seem to be on the same page here to drop Java
>  5 support so I am calling out an official vote.
> 
>  all Lucene 3.x releases will remain with Java 5 support this vote is for
>  trunk
>  only.
> 
> 
>  Here is my +1
> 
>  Simon
> 
>  -
>  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> additional
>  commands, e-mail: dev-h...@lucene.apache.org
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
> >>>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: setuptools not really setuptools on Ubuntu


On Jun 29, 2011, at 22:17, Christian Heimes  wrote:

> Am 29.06.2011 18:13, schrieb Andi Vajda:
>> Sigh. The setuptools story is getting worse. I wonder how the 'distribute' 
>> project is doing... It's the solution I used for the Python 3.1 jcc port I 
>> did last summer. In particular, I wonder if they integrated my patch, for 
>> that issue 43 I filed like four years ago.
>> 
>> Do you know if there is a way to detect this special version of setuptools ?
>> If so, I could ensure the patch is applied if still needed.
> 
> 'They' for setuptools is really just P.J. Eby. There hasn't been any
> serious development on setuptools in the past few years. Luckily Tarek
> has forked setuptools and started his work on distribute and distutils2.
> He is a very active developer and IMHO open to new ideas. Have you
> talked to him about the requirements for JCC? I'm sure he is going to
> integrate your patch soonish.

That would be great. I think I filed the equivalent of setuptools issue 43 on 
distribute a year or two ago.

> There isn't a reason to support vanilla
> setuptools anymore once the patch is part of distribute. distribute is
> fully backward compatible to setuptoools.

Yep, that would be perfect. 

Andi..

> 
> See http://pypi.python.org/pypi/distribute#about-the-fork for some
> background information.
> 
> Christian
>

Re: calling a Python function from Java?


On Jun 29, 2011, at 22:06, Bill Janssen  wrote:

> Andi Vajda  wrote:
> 
>> Put everything into a class and call all the python stuff from there. 
> 
> I'd like to make the method on the Java class be static, so I'd like
> that method to create an instance and call a protected or
> package-private method that is implemented by the Python class.  But
> JCC doesn't seem to wrap non-public or static methods...?

Jcc wraps all public methods whose signature contains only classes or types in 
the set of classes to be wrapped, including static ones. 

Andi..

> 
> Bill

[jira] [Updated] (LUCENE-3079) Faceting module


 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3079:


Attachment: LUCENE-3079.patch

updated patch: all tests pass.

I changed the nocommits to TODO (Facet):'s, and added verbage for the reason to 
each one.

we also have two TODOs for two bugs (The MTE.seekExact and SepCodec hasPayload 
bug) that we should fix, but currently we have workarounds in place (when we 
fix these bugs we can then remove the workarounds).

I'll svn move to modules, and doublecheck things like javadoc warnings, and 
commit later today.

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079.patch, 
> LUCENE-3079_4x.patch, LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, 
> facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-29 Thread Simon Willnauer

I think this vote has passed. Let's move to 1.6!
Uwe can you update Jenkins?

Simon

On Tue, Jun 28, 2011 at 11:30 PM, DM Smith  wrote:
> +1 from old-stick-in-the-mud, whose vote does not count :)
>
> BTW, today Apple released Java 1.5.0_30. So while Oracle has not supplied
> security updates or bug fixes to 1.5 since Nov 2009, except to premier
> customers, Apple is still actively supporting it for OS X 10.5, Leopard.
>
> On 06/28/2011 10:48 AM, Robert Muir wrote:
>>
>> +1!
>>
>> but, who are you and what have you done with our generics policeman?! :)
>>
>> On Mon, Jun 27, 2011 at 1:45 PM, Uwe Schindler  wrote:
>>>
>>> My +1 for trunk :-)
>>>
>>> I will change hudson scripts once this vote passes!
>>>
>>> -
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>>
>>>
 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
 Sent: Monday, June 27, 2011 7:38 PM
 To: dev@lucene.apache.org
 Subject: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put
 them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is totally
 unmaintained which means for us that bugs have to either be hacked
 around, tests disabled, warnings placed, but some things simply cannot
 be
 fixed... we cannot actually "support" something that is no longer
 maintained:
 we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important that
 bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping
 speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is for
 trunk
 only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Closed] (LUCENENET-428) How to do that the results are displayed in the first original tokens and them with synonyms?

2011-06-29 Thread Digy (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy closed LUCENENET-428.
--

Resolution: Invalid

Please post questions to the mailing list, not in JIRA

> How to do that the results are displayed in the first original tokens and 
> them with synonyms?
> -
>
> Key: LUCENENET-428
> URL: https://issues.apache.org/jira/browse/LUCENENET-428
> Project: Lucene.Net
>  Issue Type: Wish
>  Components: Lucene.Net Core
>Affects Versions: Lucene.Net 2.9.4
> Environment: .net 4.0
>Reporter: Vladimir
>
> How to do that the results are displayed in the first original tokens and 
> them with synonyms?
> My Analyzer(part) :
> public override TokenStream TokenStream(string fieldName, TextReader reader)
> {
> TokenStream result = new StandardTokenizer(reader);
> result = new LowerCaseFilter(result);
>   result = new StopFilter(result, stoptable);
> result = new SynonymFilter(result, synonymEngine); 
> result = new ExtendedRussianStemFilter(result, charset);
> return result;
> }
> My SynonymFilter :
> internal class SynonymFilter : TokenFilter
> {
> private readonly ISynonymEngine engine;
> private readonly Queue synonymTokenQueue
> = new Queue();
> public SynonymFilter(TokenStream tokenStream, ISynonymEngine engine) 
> : base(tokenStream)
> {
> this.engine = engine;
> }
> public override Token Next()
> {
> if (synonymTokenQueue.Count > 0)
> {
> return synonymTokenQueue.Dequeue();
> }
> 
> Token t = input.Next();
> 
> if (t == null)
> return null;
> if (t.Type() == "")
> return t;
> 
> IEnumerable synonyms = engine.GetSynonyms(t.TermText());
> 
> if (synonyms == null)
> {
> return t;
> }
> 
> foreach (string syn in synonyms)
> {
> if (!t.TermText().Equals(syn))
> {
> var synToken = new Token(syn, t.StartOffset(),
>  t.EndOffset(), "");
> 
> synToken.SetPositionIncrement(0);
> synonymTokenQueue.Enqueue(synToken);
> }
> }
> return t;
> }
> }
> Thanks!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-3258) File leak when IOException occurs during index optimization.

2011-06-29 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057448#comment-13057448
 ] 

Uwe Schindler commented on LUCENE-3258:
---

I don't think "won't" fix is the correct "resolution". It's "fixed in 3.3", 
right?

> File leak when IOException occurs during index optimization.
> 
>
> Key: LUCENE-3258
> URL: https://issues.apache.org/jira/browse/LUCENE-3258
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.3
> Environment: SUSE Linux 11, Java 6
>Reporter: Nick Kirsch
>
> I am not sure if this issue requires a fix due to the nature of its 
> occurrence, or if it exists in other versions of Lucene.
> I am using Lucene Java 3.0.3 on a SUSE Linux machine with Java 6 and have 
> noticed there are a number of file handles that are not being released from 
> my java application. There are IOExceptions in my log regarding disk full, 
> which causes a merge and the optimization to fail. The index is not currupt 
> upon encountering the IOException. I am using CFS for my index format, so 3X 
> my largest index size during optimization certainly consumes all of my 
> available disk. 
> I realize that I need to add more disk space to my machine, but I 
> investigated how to clean up the leaking file handles. After failing to find 
> a misuse of Lucene's IndexWriter in the code I have wrapping Lucene, I did a 
> quick search for close() being invoked in the Lucene Jave source code. I 
> found a number of source files that attempt to close more than one object 
> within the same close() method. I think a try/catch should be put around each 
> of these close() attempts to avoid skipping a subsequent closes. The catch 
> may be able to ignore a caught exception to avoid masking the original 
> exception like done in SimpleFSDirectory.close().
> Locations in Lucene Java source where I suggest a try/catch should be used:
> - org.apache.lucene.index.FormatPostingFieldsWriter.finish()
> - org.apache.lucene.index.TermInfosWriter.close()
> - org.apache.lucene.index.SegmentTermPositions.close()
> - org.apache.lucene.index.SegmentMergeInfo.close()
> - org.apache.lucene.index.SegmentMerger.mergeTerms() (The finally block)
> - org.apache.lucene.index.DirectoryReader.close()
> - org.apache.lucene.index.FieldsReader.close()
> - org.apache.lucene.index.MultiLevelSkipListReader.close()
> - org.apache.lucene.index.MultipleTermPositions.close()
> - org.apache.lucene.index.SegmentMergeQueue.close()
> - org.apache.lucene.index.SegmentMergeDocs.close()
> - org.apache.lucene.index.TermInfosReader.close()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: setuptools not really setuptools on Ubuntu

2011-06-29 Thread Christian Heimes

Am 29.06.2011 18:13, schrieb Andi Vajda:
> Sigh. The setuptools story is getting worse. I wonder how the 'distribute' 
> project is doing... It's the solution I used for the Python 3.1 jcc port I 
> did last summer. In particular, I wonder if they integrated my patch, for 
> that issue 43 I filed like four years ago.
> 
> Do you know if there is a way to detect this special version of setuptools ?
> If so, I could ensure the patch is applied if still needed.

'They' for setuptools is really just P.J. Eby. There hasn't been any
serious development on setuptools in the past few years. Luckily Tarek
has forked setuptools and started his work on distribute and distutils2.
He is a very active developer and IMHO open to new ideas. Have you
talked to him about the requirements for JCC? I'm sure he is going to
integrate your patch soonish. There isn't a reason to support vanilla
setuptools anymore once the patch is part of distribute. distribute is
fully backward compatible to setuptoools.

See http://pypi.python.org/pypi/distribute#about-the-fork for some
background information.

Christian

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Michael Herndon

For the sake of continued conversation, Scott could you define what you mean
by line-by-line port vs non-line-by-line port since technically your the
thread starter?







On Wed, Jun 29, 2011 at 3:58 PM, Digy  wrote:

> As a Lucene.Net user I wouldn't care whether it is line-by-line port or
> not.
>
> But as a contributer, I would prefer a parallel code that makes the life
> easier for manual ports of new releases(until this process is automated)
>
> PS: I presume no one thinks of functional or index-level incompatibility.
>
> DIGY
>
> -Original Message-
> From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
> Sent: Wednesday, June 29, 2011 10:47 PM
> To: lucene-net-u...@lucene.apache.org
> Cc: lucene-net-...@incubator.apache.org
> Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
>
> This is has been discussed many times.
> Lucene.NET is not valid, the code cannot be trusted, if it is not a
> line-by-line port.  It ceases to be Lucene.
>
> - Neal
>
> -Original Message-
> From: Scott Lombard [mailto:lombardena...@gmail.com]
> Sent: Wednesday, June 29, 2011 1:58 PM
> To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
> Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
>
>
>
> After the large community response about moving the code base from .Net 2.0
> to Net 4.0 I am trying to figure out what is the need for a line-by-line
> port.  Starting with Digy's excellent work on the conversion to generics a
> priority of the 2.9.4g release is the 2 packages would not be
> interchangeable.  So faster turnaround from a java release won't matter to
> non line-by-line users they will have to wait until the updates are made to
> the non line-by-line code base.
>
>
>
> My question is there really a user base for the line-by-line port?  Anyone
> have a comment?
>
>
>
> Scott
>
>
>
>
>
>
>
>

[jira] [Commented] (LUCENE-3079) Faceting module


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057445#comment-13057445
 ] 

Shai Erera commented on LUCENE-3079:


Thanks guys for doing this port so quickly. The patch looks good. I suggest 
that we change the 'nocommit' to TODO (Facet): and commit it (under modules/). 
Then we can iterate on the TODOs and resolve them one by one, in followup 
issues. Makes sense?

Robert, would you like to do the honors? :)

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079_4x.patch, 
> LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: calling a Python function from Java?

Andi Vajda  wrote:

> Put everything into a class and call all the python stuff from there. 

I'd like to make the method on the Java class be static, so I'd like
that method to create an instance and call a protected or
package-private method that is implemented by the Python class.  But
JCC doesn't seem to wrap non-public or static methods...?

Bill

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Digy

As a Lucene.Net user I wouldn't care whether it is line-by-line port or not.

But as a contributer, I would prefer a parallel code that makes the life
easier for manual ports of new releases(until this process is automated)

PS: I presume no one thinks of functional or index-level incompatibility.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, June 29, 2011 10:47 PM
To: lucene-net-u...@lucene.apache.org
Cc: lucene-net-...@incubator.apache.org
Subject: RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

This is has been discussed many times.
Lucene.NET is not valid, the code cannot be trusted, if it is not a
line-by-line port.  It ceases to be Lucene.

- Neal

-Original Message-
From: Scott Lombard [mailto:lombardena...@gmail.com] 
Sent: Wednesday, June 29, 2011 1:58 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

 

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

 

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

 

Scott

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Wyatt Barnett

Those are pretty strong words -- I'd really like to know why I
shouldn't trust anything but a line-by-line port. Can you explain a
bit?

On Wed, Jun 29, 2011 at 3:47 PM, Granroth, Neal V.
 wrote:
> This is has been discussed many times.
> Lucene.NET is not valid, the code cannot be trusted, if it is not a 
> line-by-line port.  It ceases to be Lucene.
>
> - Neal
>
> -Original Message-
> From: Scott Lombard [mailto:lombardena...@gmail.com]
> Sent: Wednesday, June 29, 2011 1:58 PM
> To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
> Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
>
>
>
> After the large community response about moving the code base from .Net 2.0
> to Net 4.0 I am trying to figure out what is the need for a line-by-line
> port.  Starting with Digy's excellent work on the conversion to generics a
> priority of the 2.9.4g release is the 2 packages would not be
> interchangeable.  So faster turnaround from a java release won't matter to
> non line-by-line users they will have to wait until the updates are made to
> the non line-by-line code base.
>
>
>
> My question is there really a user base for the line-by-line port?  Anyone
> have a comment?
>
>
>
> Scott
>
>
>
>
>
>
>
>

RE: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

This is has been discussed many times.
Lucene.NET is not valid, the code cannot be trusted, if it is not a 
line-by-line port.  It ceases to be Lucene.

- Neal

-Original Message-
From: Scott Lombard [mailto:lombardena...@gmail.com] 
Sent: Wednesday, June 29, 2011 1:58 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

Scott

[jira] [Commented] (LUCENE-3079) Faceting module


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057428#comment-13057428
 ] 

Michael McCandless commented on LUCENE-3079:


bq. the previous fail is somehow a bug in memorycodec (the seed randomly 
selected it)

I just committed a fix for this; it was because .getPayload() in MemoryCodec 
was (incorrectly) assuming caller did not change the .bytes of the returned 
BytesRef between calls.

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079_4x.patch, 
> LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2565) Prevent IW#close and cut over to IW#commit

2011-06-29 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057425#comment-13057425
 ] 

Mark Miller commented on SOLR-2565:
---

Okay - looks like this applies and tests pass.

> Prevent IW#close and cut over to IW#commit
> --
>
> Key: SOLR-2565
> URL: https://issues.apache.org/jira/browse/SOLR-2565
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Simon Willnauer
> Fix For: 4.0
>
> Attachments: SOLR-2565.patch
>
>
> Spinnoff from SOLR-2193. We already have a branch to work on this issue here 
> https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193 
> The main goal here is to prevent solr from closing the IW and use IW#commit 
> instead. AFAIK the main issues here are:
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> 2. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 3. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 4. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.
> Eventually this is a preparation for NRT support in Solr which I will create 
> a followup issue for.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3258) File leak when IOException occurs during index optimization.


[ 
https://issues.apache.org/jira/browse/LUCENE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057407#comment-13057407
 ] 

Robert Muir commented on LUCENE-3258:
-

just to followup, the changes here didn't make it until Lucene 3.3.0.
This isn't yet released, but should be any time soon (like within days)

you can try out the release candidate here: http://s.apache.org/lusolr330rc1

furthermore, if you want you can use lucene's test-framework jar in your own 
tests to help you track down any file leaks in your own application, by 
wrapping your directory with MockDirectoryWrapper, or by extending 
LuceneTestCase and using newDirectory() and newFSDirectory().

> File leak when IOException occurs during index optimization.
> 
>
> Key: LUCENE-3258
> URL: https://issues.apache.org/jira/browse/LUCENE-3258
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.3
> Environment: SUSE Linux 11, Java 6
>Reporter: Nick Kirsch
>
> I am not sure if this issue requires a fix due to the nature of its 
> occurrence, or if it exists in other versions of Lucene.
> I am using Lucene Java 3.0.3 on a SUSE Linux machine with Java 6 and have 
> noticed there are a number of file handles that are not being released from 
> my java application. There are IOExceptions in my log regarding disk full, 
> which causes a merge and the optimization to fail. The index is not currupt 
> upon encountering the IOException. I am using CFS for my index format, so 3X 
> my largest index size during optimization certainly consumes all of my 
> available disk. 
> I realize that I need to add more disk space to my machine, but I 
> investigated how to clean up the leaking file handles. After failing to find 
> a misuse of Lucene's IndexWriter in the code I have wrapping Lucene, I did a 
> quick search for close() being invoked in the Lucene Jave source code. I 
> found a number of source files that attempt to close more than one object 
> within the same close() method. I think a try/catch should be put around each 
> of these close() attempts to avoid skipping a subsequent closes. The catch 
> may be able to ignore a caught exception to avoid masking the original 
> exception like done in SimpleFSDirectory.close().
> Locations in Lucene Java source where I suggest a try/catch should be used:
> - org.apache.lucene.index.FormatPostingFieldsWriter.finish()
> - org.apache.lucene.index.TermInfosWriter.close()
> - org.apache.lucene.index.SegmentTermPositions.close()
> - org.apache.lucene.index.SegmentMergeInfo.close()
> - org.apache.lucene.index.SegmentMerger.mergeTerms() (The finally block)
> - org.apache.lucene.index.DirectoryReader.close()
> - org.apache.lucene.index.FieldsReader.close()
> - org.apache.lucene.index.MultiLevelSkipListReader.close()
> - org.apache.lucene.index.MultipleTermPositions.close()
> - org.apache.lucene.index.SegmentMergeQueue.close()
> - org.apache.lucene.index.SegmentMergeDocs.close()
> - org.apache.lucene.index.TermInfosReader.close()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3025) TestIndexWriterExceptions fails on windows (2)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3025.
-

Resolution: Duplicate

fixed in LUCENE-3147

> TestIndexWriterExceptions fails on windows (2)
> --
>
> Key: LUCENE-3025
> URL: https://issues.apache.org/jira/browse/LUCENE-3025
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>
> Note: this is a different problem than LUCENE-2991 (I disabled the assert for 
> that problem).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2991) TestIndexWriterExceptions fails on windows


 [ 
https://issues.apache.org/jira/browse/LUCENE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2991.
-

Resolution: Duplicate

fixed in LUCENE-3147

> TestIndexWriterExceptions fails on windows
> --
>
> Key: LUCENE-2991
> URL: https://issues.apache.org/jira/browse/LUCENE-2991
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>
> because of leftover segments file (presumably cannot be deleted because its 
> open).
> The real bug is that this doesn't fail on linux too, so there is a problem 
> with
> the tests framework where mockdirectorywrapper doesn't properly simulate 
> windows.
> i've disabled the assertion to unbreak the build for now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3258) File leak when IOException occurs during index optimization.


 [ 
https://issues.apache.org/jira/browse/LUCENE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-3258.


   Resolution: Won't Fix
Fix Version/s: (was: 3.0.3)

These issues were fixed in LUCENE-3147 and have been released w/ Lucene 3.2.0. 
I don't think we should backport those fixes to the 3.0.x branch, nor do we 
have the test-framework in place there to test them.

> File leak when IOException occurs during index optimization.
> 
>
> Key: LUCENE-3258
> URL: https://issues.apache.org/jira/browse/LUCENE-3258
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.3
> Environment: SUSE Linux 11, Java 6
>Reporter: Nick Kirsch
>
> I am not sure if this issue requires a fix due to the nature of its 
> occurrence, or if it exists in other versions of Lucene.
> I am using Lucene Java 3.0.3 on a SUSE Linux machine with Java 6 and have 
> noticed there are a number of file handles that are not being released from 
> my java application. There are IOExceptions in my log regarding disk full, 
> which causes a merge and the optimization to fail. The index is not currupt 
> upon encountering the IOException. I am using CFS for my index format, so 3X 
> my largest index size during optimization certainly consumes all of my 
> available disk. 
> I realize that I need to add more disk space to my machine, but I 
> investigated how to clean up the leaking file handles. After failing to find 
> a misuse of Lucene's IndexWriter in the code I have wrapping Lucene, I did a 
> quick search for close() being invoked in the Lucene Jave source code. I 
> found a number of source files that attempt to close more than one object 
> within the same close() method. I think a try/catch should be put around each 
> of these close() attempts to avoid skipping a subsequent closes. The catch 
> may be able to ignore a caught exception to avoid masking the original 
> exception like done in SimpleFSDirectory.close().
> Locations in Lucene Java source where I suggest a try/catch should be used:
> - org.apache.lucene.index.FormatPostingFieldsWriter.finish()
> - org.apache.lucene.index.TermInfosWriter.close()
> - org.apache.lucene.index.SegmentTermPositions.close()
> - org.apache.lucene.index.SegmentMergeInfo.close()
> - org.apache.lucene.index.SegmentMerger.mergeTerms() (The finally block)
> - org.apache.lucene.index.DirectoryReader.close()
> - org.apache.lucene.index.FieldsReader.close()
> - org.apache.lucene.index.MultiLevelSkipListReader.close()
> - org.apache.lucene.index.MultipleTermPositions.close()
> - org.apache.lucene.index.SegmentMergeQueue.close()
> - org.apache.lucene.index.SegmentMergeDocs.close()
> - org.apache.lucene.index.TermInfosReader.close()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: setuptools not really setuptools on Ubuntu

Thank you for the details. 

Andi..

On Jun 29, 2011, at 19:37, Bill Janssen  wrote:

> Bill Janssen  wrote:
> 
>> Andi Vajda  wrote:
>> 
>>> Sigh. The setuptools story is getting worse.
>> 
>>> I wonder how the
>>> 'distribute' project is doing... It's the solution I used for the
>>> Python 3.1 jcc port I did last summer. In particular, I wonder if they
>>> integrated my patch, for that issue 43 I filed like four years ago.
>> 
>> The way forward is "packaging" 
>> (which I believe is also "distutils2").  This is the derivation of
>> "distribute".
> 
> Just watched the PyCon talk on this:  "packaging" is the Python 3.3+ name,
> "distutils2" is the Python 2 name.  Same codebase and APIs, as much as
> possible.
> 
>> See http://guide.python-distribute.org/_images/state_of_packaging.jpg,
>> in 
>> http://guide.python-distribute.org/introduction.html#current-state-of-packaging.
>> 
>> ``So basically, I have forked Distutils and renamed its package into
>> Distutils2. The project is located in http://hg.python.org/distutils2
>> and the goal is to put it back into the standard library as soon as it
>> reaches a state where it starts to be used by the community. Distutils
>> will just die slowly, probably pulling Setuptools and Distribute with
>> it.''
>> 
>> ``The Distribute project is still important because it can help us
>> releasing bug fixes or Python 3 support things today.''
>> 
>> ``Distutils2 will be 2.4 to 3.2 compatible and will get back from
>> Distribute the good bits and implement the PEPs that were accepted
>> lately PEP 345 and PEP 386.''
>> 
>>> Do you know if there is a way to detect this special version of
>>> setuptools?
>> 
>> No, sorry.
>> 
>> Bill
>> 
>>> If so, I could ensure the patch is applied if still needed.
>>> 
>>> Andi..
>>> 
>>> 
 
 Bill

Re: calling a Python function from Java?

Put everything into a class and call all the python stuff from there. 

Andi..

On Jun 29, 2011, at 18:18, Bill Janssen  wrote:

> I'm building a Java wrapper for the Python regex module, and I'd like to
> be able to call the module function "escape" from Java.  It takes a
> string and returns a string.  But I don't see how I can do that given
> the current PythonVM?  Some trick with module instantiation, perhaps?
> Or does the API need to be expanded to make this work?
> 
> Bill

[jira] [Created] (LUCENE-3258) File leak when IOException occurs during index optimization.

2011-06-29 Thread Nick Kirsch (JIRA)

File leak when IOException occurs during index optimization.


 Key: LUCENE-3258
 URL: https://issues.apache.org/jira/browse/LUCENE-3258
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.3
 Environment: SUSE Linux 11, Java 6
Reporter: Nick Kirsch
 Fix For: 3.0.3


I am not sure if this issue requires a fix due to the nature of its occurrence, 
or if it exists in other versions of Lucene.

I am using Lucene Java 3.0.3 on a SUSE Linux machine with Java 6 and have 
noticed there are a number of file handles that are not being released from my 
java application. There are IOExceptions in my log regarding disk full, which 
causes a merge and the optimization to fail. The index is not currupt upon 
encountering the IOException. I am using CFS for my index format, so 3X my 
largest index size during optimization certainly consumes all of my available 
disk. 

I realize that I need to add more disk space to my machine, but I investigated 
how to clean up the leaking file handles. After failing to find a misuse of 
Lucene's IndexWriter in the code I have wrapping Lucene, I did a quick search 
for close() being invoked in the Lucene Jave source code. I found a number of 
source files that attempt to close more than one object within the same close() 
method. I think a try/catch should be put around each of these close() attempts 
to avoid skipping a subsequent closes. The catch may be able to ignore a caught 
exception to avoid masking the original exception like done in 
SimpleFSDirectory.close().

Locations in Lucene Java source where I suggest a try/catch should be used:
- org.apache.lucene.index.FormatPostingFieldsWriter.finish()
- org.apache.lucene.index.TermInfosWriter.close()
- org.apache.lucene.index.SegmentTermPositions.close()
- org.apache.lucene.index.SegmentMergeInfo.close()
- org.apache.lucene.index.SegmentMerger.mergeTerms() (The finally block)
- org.apache.lucene.index.DirectoryReader.close()
- org.apache.lucene.index.FieldsReader.close()
- org.apache.lucene.index.MultiLevelSkipListReader.close()
- org.apache.lucene.index.MultipleTermPositions.close()
- org.apache.lucene.index.SegmentMergeQueue.close()
- org.apache.lucene.index.SegmentMergeDocs.close()
- org.apache.lucene.index.TermInfosReader.close()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-06-29 Thread Scott Lombard

 

After the large community response about moving the code base from .Net 2.0
to Net 4.0 I am trying to figure out what is the need for a line-by-line
port.  Starting with Digy's excellent work on the conversion to generics a
priority of the 2.9.4g release is the 2 packages would not be
interchangeable.  So faster turnaround from a java release won't matter to
non line-by-line users they will have to wait until the updates are made to
the non line-by-line code base.  

 

My question is there really a user base for the line-by-line port?  Anyone
have a comment?

 

Scott

[jira] [Commented] (LUCENE-3079) Faceting module


[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057399#comment-13057399
 ] 

Robert Muir commented on LUCENE-3079:
-

the previous fail is somehow a bug in memorycodec (the seed randomly selected 
it):
ant test -Dtestcase=FacetsPayloadProcessorProviderTest -Dtests.codec=Memory


> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079_4x.patch, 
> LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3079) Faceting module


 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3079:


Attachment: LUCENE-3079_4x.patch

updated patch, one of the fails was caused by my use of seekExact, i think at 
least for MultiTermsEnum, if you seekExact, and even if it finds your term, its 
not positioned correctly, so if you then call next() its unsafe.

another one of the fails, is calling getPayload() before hasPayload() will not 
work with SepCodec.

thanks to mike for helping track some of these down. 

i also added to build.xml the logic to depend on the analyzers module, now all 
tests pass (some of the time).

but i have at least one random fail:NOTE: reproduce with: ant test 
-Dtestcase=FacetsPayloadProcessorProviderTest 
-Dtestmethod=testTaxonomyMergeUtils 
-Dtests.seed=3732021887561370529:1102439953879128238


> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079_4x.patch, 
> LUCENE-3079_4x_broken.patch, TestPerformanceHack.java, facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Solr-3.x - Build # 394 - Failure

2011-06-29 Thread Robert Muir

On Wed, Jun 29, 2011 at 2:00 PM, Chris Hostetter
 wrote:
>
> : Failure to fetch junit's package list yet again... but Hoss is working
> : on this I think!
>
> I posted a straw-man patch, but i haven't relaly had time to seriously
> test it on modules/contrib ... and i thik rmuir had some reservations
> about putting the stuff in dev-tools ... but if someone is itching go
> ahead and commit. (i'm a little swamped right now)
>

right, if the javadocs target in lucene/build.xml has a hard
dependency on dev-tools,
then the lucene source release won't work.

but we could do some other things to fix this:
* make this a soft dependency (e.g. the javadocs task will use
dev-tools/plists when they are available, otherwise it downloads)
* move dev-tools under lucene/ so we don't worry about this stuff
* put the package-lists somewhere other than dev-tools (even if its
just on hudson)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Solr-3.x - Build # 394 - Failure

2011-06-29 Thread Chris Hostetter


: Failure to fetch junit's package list yet again... but Hoss is working
: on this I think!

I posted a straw-man patch, but i haven't relaly had time to seriously 
test it on modules/contrib ... and i thik rmuir had some reservations 
about putting the stuff in dev-tools ... but if someone is itching go 
ahead and commit. (i'm a little swamped right now)


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3079) Faceting module


 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3079:


Attachment: LUCENE-3079_4x_broken.patch

here's a hack patch, tried to quickly move this thing to trunk apis... 3 out of 
the 250 tests fail though, I'm 99% positive i jacked something up in the 
taxonomywriter (this is the most complicated one to convert), so maybe that one 
should just be started over.

but maybe some of the patch (except taxonomywriter, again i think its broken) 
would be useful in getting it ported to 4.x

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079_4x_broken.patch, 
> TestPerformanceHack.java, facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

calling a Python function from Java?

I'm building a Java wrapper for the Python regex module, and I'd like to
be able to call the module function "escape" from Java.  It takes a
string and returns a string.  But I don't see how I can do that given
the current PythonVM?  Some trick with module instantiation, perhaps?
Or does the API need to be expanded to make this work?

Bill

Re: calling Python from Java fails...

Andi Vajda  wrote:

> On Jun 29, 2011, at 16:43, Bill Janssen  wrote:
> 
> > Andi Vajda  wrote:
> > 
> >>> By the way, you might want to add a paragraph in that section about
> >>> adding the ["-framework", "Python"] flags for building JCC on OS X.  I
> >>> tripped over that again.
> >> 
> >> If you send a paragraph to this effect, I'll integrate it into the docs.
> > 
> > How do you feel about adding javadocs for the Java API, too?  If I write
> > it, that is?
> 
> You mean adding docs in the java source code ?

Only partially.  I also meant adding a build step which creates the
javadocs from the in-source documentation, and making those javadocs
part of the distribution, and adding those pre-built docs to the Apache
Web site.

Without the other part, I'm not that incentivized to add docs to the
source code, which is pretty easy to read by itself.

Bill

Re: setuptools not really setuptools on Ubuntu

On Jun 29, 2011, at 18:04, Bill Janssen  wrote:

> I hit another gotcha building shared on Ubuntu.  I thought I had
> setuptools installed.  When I built JCC, I got no error message about
> patch 43, and the config.py said "shared".  But the library that got
> built was "libjcc.a", not "libjcc.so".
> 
> Turns out that the Ubuntu package "python-setuptools" isn't really
> setuptools; it's this:
> 
>  Description: Python Distutils Enhancements (setuptools compatibility)
>   Extensions to the python-distutils for large or complex distributions.
>   .
>   Package providing compatibility with old setuptools (0.6c9).
>  Homepage: http://packages.python.org/distribute
>  Python-Version: 2.6
>  Bugs: https://bugs.launchpad.net/ubuntu/+filebug
> 
> I had to download setuptools-0.6c11 from PyPI and install it manually
> to get things to work.

Sigh. The setuptools story is getting worse. I wonder how the 'distribute' 
project is doing... It's the solution I used for the Python 3.1 jcc port I did 
last summer. In particular, I wonder if they integrated my patch, for that 
issue 43 I filed like four years ago.

Do you know if there is a way to detect this special version of setuptools ?
If so, I could ensure the patch is applied if still needed.

Andi..

> 
> Bill

[jira] [Commented] (SOLR-2565) Prevent IW#close and cut over to IW#commit

2011-06-29 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057318#comment-13057318
 ] 

Mark Miller commented on SOLR-2565:
---

If this patch applies, I'll commit it.

> Prevent IW#close and cut over to IW#commit
> --
>
> Key: SOLR-2565
> URL: https://issues.apache.org/jira/browse/SOLR-2565
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Simon Willnauer
> Fix For: 4.0
>
> Attachments: SOLR-2565.patch
>
>
> Spinnoff from SOLR-2193. We already have a branch to work on this issue here 
> https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193 
> The main goal here is to prevent solr from closing the IW and use IW#commit 
> instead. AFAIK the main issues here are:
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> 2. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 3. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 4. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.
> Eventually this is a preparation for NRT support in Solr which I will create 
> a followup issue for.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

setuptools not really setuptools on Ubuntu

I hit another gotcha building shared on Ubuntu.  I thought I had
setuptools installed.  When I built JCC, I got no error message about
patch 43, and the config.py said "shared".  But the library that got
built was "libjcc.a", not "libjcc.so".

Turns out that the Ubuntu package "python-setuptools" isn't really
setuptools; it's this:

  Description: Python Distutils Enhancements (setuptools compatibility)
   Extensions to the python-distutils for large or complex distributions.
   .
   Package providing compatibility with old setuptools (0.6c9).
  Homepage: http://packages.python.org/distribute
  Python-Version: 2.6
  Bugs: https://bugs.launchpad.net/ubuntu/+filebug

I had to download setuptools-0.6c11 from PyPI and install it manually
to get things to work.

Bill

Re: calling Python from Java fails...


On Jun 29, 2011, at 16:43, Bill Janssen  wrote:

> Andi Vajda  wrote:
> 
>>> By the way, you might want to add a paragraph in that section about
>>> adding the ["-framework", "Python"] flags for building JCC on OS X.  I
>>> tripped over that again.
>> 
>> If you send a paragraph to this effect, I'll integrate it into the docs.
> 
> How do you feel about adding javadocs for the Java API, too?  If I write
> it, that is?

You mean adding docs in the java source code ?
By all means !

Andi..

> 
> Bill

[jira] [Issue Comment Edited] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-06-29 Thread Mike Sokolov (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057306#comment-13057306
]

Mike Sokolov edited comment on LUCENE-2878 at 6/29/11 3:49 PM:
---

bq. there should be only one consumer really. Which usecase have you in mind
where multiple consumers are using the iterator?

I guess I am coming at this from the perspective of a Highlighter; the
Highlighter wants to iterate over the top-level Scorer, finding each of its
matching positions, and then for each of those, it wants to iterate over all
the individual terms' positions. Possibly some clever HL of the future will be
interested in intermediate-level nodes in the tree as well, like highlighting a
near-span, or coalescing phrases. The problem I see is that with the current
API the only way to retrieve the lower-level positions is to advance their
iterators, but if that is done directly (without the knowledge of the enclosing
scorer and its iterator), the scoring will be messed up. I guess that's what I
meant by multiple consumers - of course you are right, there should be only one
"writer" consumer that can advance the iteration. My idea is that there could
be many readers, though. In any case, I think it is typical for an iterator
that you can read the current position as many times as you want, rather than
"read once" and expect the caller to cache the value?

bq. what is the returned PI here again? In the TermScorer case that is trivial
but what would a BooleanScorer return here?

It has its own PI right? I think it is the minimum interval containing some
terms that satisfy the boolean conditions.

bq. I think that could make sense but let me explain the reason why this is
there right now. So currently a socrer has a defined PositionIterator which
could be a problem later. for instance I want to have the minimal positions
interval (ordered) of all boolean clauses for query X but for query Y I want
the same interval unorderd (out of order) I need to replace the logic in the
scorer somehow. So to make that more flexible I exposed all subs here so you
can run your own alg. I would love to see better solutions since I only hacked
this up in a couple of days though.

Hmm I haven't yet looked at how BooleanScorer2 and BooleanScorer works, but I
understand there is some additional complexity there. Perhaps if the only
distinction is order/unordered there might be a special case for that when you
create the Scorer, rather than exposing internals to the caller? But I don't
know - would have to understand this better. Maybe there are other cases where
that could be needed.

bq. Mike, would you be willing to upload a patch for your hacked collector etc
to see what you have done?

The PosHiglighter is a bit messy - filled with debugging and testing code and
so on, and it's also slow because of the need to match positions->offsets in
kind of a gross way.. Robert M had an idea for storing this mapping in the
index which would improve things there, but I haven't done that. In any case,
I'll be happy to share the patch when I get back home and can clean it up a
bit. Maybe if I have a chance I will look into implementing OR-queries - I
stumbled on that limitation right away!

was (Author: sokolov):
bq. there should be only one consumer really. Which usecase have you in
mind where multiple consumers are using the iterator?

bq. what is the returned PI here again? In the TermScorer case that is trivial
but what would a BooleanScorer return here?

It has its own PI right? I think it is the minimum interval containing some
terms that satisfy the boolean conditions.

bq. I think that could make sense but let me explain the reason why this is
there right now. So currently a socrer has a defined PositionI

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-06-29 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057306#comment-13057306
 ] 

Mike Sokolov commented on LUCENE-2878:
--

bq. there should be only one consumer really. Which usecase have you in mind 
where multiple consumers are using the iterator?

I guess I am coming at this from the perspective of a Highlighter; the 
Highlighter wants to iterate over the top-level Scorer, finding each of its 
matching positions, and then for each of those, it wants to iterate over all 
the individual terms' positions.  Possibly some clever HL of the future will be 
interested in intermediate-level nodes in the tree as well, like highlighting a 
near-span, or coalescing phrases.  The problem I see is that with the current 
API the only way to retrieve the lower-level positions is to advance their 
iterators, but if that is done directly (without the knowledge of the enclosing 
scorer and its iterator), the scoring will be messed up.  I guess that's what I 
meant by multiple consumers - of course you are right, there should be only one 
"writer" consumer that can advance the iteration.  My idea is that there could 
be many readers, though.  In any case, I think it is typical for an iterator 
that you can read the current position as many times as you want, rather than 
"read once" and expect the caller to cache the value?

bq. what is the returned PI here again? In the TermScorer case that is trivial 
but what would a BooleanScorer return here?

It has its own PI right?  I think it is the minimum interval containing some 
terms that satisfy the boolean conditions.

bq. I think that could make sense but let me explain the reason why this is 
there right now. So currently a socrer has a defined PositionIterator which 
could be a problem later. for instance I want to have the minimal positions 
interval (ordered) of all boolean clauses for query X but for query Y I want 
the same interval unorderd (out of order) I need to replace the logic in the 
scorer somehow. So to make that more flexible I exposed all subs here so you 
can run your own alg. I would love to see better solutions since I only hacked 
this up in a couple of days though.

Hmm I haven't yet looked at how BooleanScorer2 and BooleanScorer works, but I 
understand there is some additional complexity there.  Perhaps if the only 
distinction is order/unordered there might be a special case for that when you 
create the Scorer, rather than exposing internals to the caller?  But I don't 
know - would have to understand this better.  Maybe there are other cases where 
that could be needed.

{Mike, would you be willing to upload a patch for your hacked collector etc to 
see what you have done?}

The PosHiglighter is a bit messy - filled with debugging and testing code and 
so on, and it's also slow because of the need to match positions->offsets in 
kind of a gross way.. Robert M had an idea for storing this mapping in the 
index which would improve things there, but I haven't done that. In any case, 
I'll be happy to share the patch when I get back home and can clean it up a 
bit. Maybe if I have a chance I will look into implementing OR-queries - I 
stumbled on that limitation right away!


> Allow Scorer to expose positions and payloads aka. nuke spans 
> --
>
> Key: LUCENE-2878
> URL: https://issues.apache.org/jira/browse/LUCENE-2878
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Bulk Postings branch
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch
>
>
> Currently we have two somewhat separate types of queries, the one which can 
> make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
> doesn't really do scoring comparable to what other queries do and at the end 
> of the day they are duplicating lot of code all over lucene. Span*Queries are 
> also limited to other Span*Query instances such that you can not use a 
> TermQuery or a BooleanQuery with SpanNear or anthing like that. 
> Beside of the Span*Query limitation other queries lacking a quiet interesting 
> feature since they can not score based on term proximity since scores doesn't 
> expose any positional information. All those problems bugged me for a while 
> now so I stared working on that using the bulkpostings API. I would have done 
> that first cut on trunk but TermScorer is working on BlockReader that do not 
> expose positions while the one in this branch does. I started adding a new 
> Positions class which users can pull from a scorer, to pr

[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field


 [ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3216:


Attachment: LUCENE-3216.patch

I committed the latest patch, this patch is a first sketch using the CFS 
separately in DocValuesConsumer / Producer to reduce the number of files 
created by DocValues. Yet, this is currently two files per codec in a segment 
(.cfs & .cfe) which is not too bad though but we could go even further and have 
a global CFS for all docValues that could be pulled on demand

the patch still has some nocommits but all tests pass.

> Store DocValues per segment instead of per field
> 
>
> Key: LUCENE-3216
> URL: https://issues.apache.org/jira/browse/LUCENE-3216
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
> LUCENE-3216.patch, LUCENE-3216_floats.patch
>
>
> currently we are storing docvalues per field which results in at least one 
> file per field that uses docvalues (or at most two per field per segment 
> depending on the impl.). Yet, we should try to by default pack docvalues into 
> a single file if possible. To enable this we need to hold all docvalues in 
> memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #165: POMs out of sync

2011-06-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/165/

No tests ran.

Build Log (for compile errors):
[...truncated 7065 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2553) Nested Field Collapsing

2011-06-29 Thread Martijn Laarman (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057285#comment-13057285
 ] 

Martijn Laarman commented on SOLR-2553:
---

I created Lucene issue: https://issues.apache.org/jira/browse/LUCENE-3257 to go 
along this solr issue as suggested.

> Nested Field Collapsing
> ---
>
> Key: SOLR-2553
> URL: https://issues.apache.org/jira/browse/SOLR-2553
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Martijn Laarman
>
> Currently specifying grouping on multiple fields returns multiple datasets. 
> It would be nice if Solr supported cascading / nested grouping by applying 
> the first group over the entire result set, the next over each group and so 
> forth and so forth. 
> Even if limited to supporting nesting grouping 2 levels deep would cover alot 
> of use cases. 
> group.field=location&group.field=type
> -Location X
> ---Type 1
> -documents
> ---Type 2
> documents
> -Location Y
> ---Type 1
> documents
> ---Type 2
> documents
> instead of 
> -Location X
> -- documents
> -Location Y
> --documents
> -Type 1
> --documents
> -Type2
> --documents

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3257) Nested Field Collapsing

2011-06-29 Thread Martijn Laarman (JIRA)

Nested Field Collapsing
---

 Key: LUCENE-3257
 URL: https://issues.apache.org/jira/browse/LUCENE-3257
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn Laarman


Currently specifying grouping on multiple fields returns multiple datasets.

It would be nice if Solr supported cascading / nested grouping by applying the 
first group over the entire result set, the next over each group and so forth 
and so forth.

Even if limited to supporting nesting grouping 2 levels deep would cover alot 
of use cases.

group.field=location&group.field=type

-Location X
---Type 1
-documents
---Type 2
documents
-Location Y
---Type 1
documents
---Type 2
documents

instead of 
-Location X
- documents
-Location Y
--documents

-Type 1
--documents
-Type2
--documents

See also https://issues.apache.org/jira/browse/SOLR-2553

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3256) Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery

2011-06-29 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057232#comment-13057232
 ] 

Yonik Seeley commented on LUCENE-3256:
--

I'm not sure if we should change the implementation of BoostedQuery to use 
CustomScoreQuery.  It's going to be slower as it goes through more levels of 
indirection.  The edismax parser creates BoostedQuery instances (as does the 
boost qparser), so this is going to be a heavily used implementation and should 
be optimized.  Having a specific BoosgtedQuery is even nicer for debugging 
purposes where the toString is simpler and more specific.

Actually, looking closer at CustomScoreQuery, I don't even see why it's not 
more generic... why does it require ValueSourceQueries and not just combine the 
scores of arbitrary queries?  It already just operates on scorers and doesn't 
seem to use value sources at all.


> Consolidate CustomScoreQuery, ValueSourceQuery and BoostedQuery 
> 
>
> Key: LUCENE-3256
> URL: https://issues.apache.org/jira/browse/LUCENE-3256
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-3256.patch, LUCENE-3256.patch
>
>
> Lucene's CustomScoreQuery and Solr's BoostedQuery do essentially the same 
> thing: they boost the scores of Documents by the value from a ValueSource.  
> BoostedQuery does this in a direct fashion, by accepting a ValueSource. 
> CustomScoreQuery on the other hand, accepts a series of ValueSourceQuerys.  
> ValueSourceQuery seems to do exactly the same thing as FunctionQuery.
> With Lucene's ValueSource being deprecated / removed, we need to resolve 
> these dependencies and simplify the code.
> Therefore I recommend we do the following things:
> - Move CustomScoreQuery (and CustomScoreProvider) to the new Queries module 
> and change it over to use FunctionQuerys instead of ValueSourceQuerys.  
> - Deprecate Solr's BoostedQuery in favour of the new CustomScoreQuery.  CSQ 
> provides a lot of support for customizing the scoring process.
> - Move and consolidate all tests of CSQ and BoostedQuery, to the Queries 
> module and have them test CSQ instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-flexscoring-branch - Build # 60 - Failure

2011-06-29 Thread Apache Jenkins Server

Build: 
https://builds.apache.org/job/Lucene-Solr-tests-only-flexscoring-branch/60/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
expected:<2> but was:<3>

Stack Trace:
junit.framework.AssertionFailedError: expected:<2> but was:<3>
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1413)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1331)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:208)




Build Log (for compile errors):
[...truncated 8535 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3079) Faceting module


 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3079:
---

Attachment: facet-userguide.pdf

Committed revision 1141060 (3x). Will start the port to trunk.

I'm also attaching a userguide we wrote which should help newcomers get up to 
speed w/ the package. It is not meant to be an end-to-end cover of all the 
functionality and API, but rather as a complementary asset to the Javadocs, 
example code and source code itself.

I think it will be good if we check it in with the source, e.g. under 
contrib/facet/docs or something, in ODT (+PDF?) format, and that it will be 
included in the release binaries (i.e. along with the .jar).

What do you think?

> Faceting module
> ---
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Shai Erera
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, TestPerformanceHack.java, 
> facet-userguide.pdf
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3255) Corrupted segment file not detected and wipes index contents


[ 
https://issues.apache.org/jira/browse/LUCENE-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057186#comment-13057186
 ] 

Michael McCandless commented on LUCENE-3255:


Actually, we can do something even simpler here: in the 1.9.x days Lucene never 
wrote a generation (_N) segments file.  It always wrote just "segments", so, if 
we see first int is a 0, and the file has a generation in it, then it's corrupt.

> Corrupted segment file not detected and wipes index contents
> 
>
> Key: LUCENE-3255
> URL: https://issues.apache.org/jira/browse/LUCENE-3255
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9.4, 3.2
>Reporter: Mark Harwood
>Assignee: Michael McCandless
> Fix For: 3.4
>
> Attachments: AllZerosSegmentFile, BadSegmentsFileTest.java, 
> CorruptionCheckerForPreLucene3.java, LUCENE-3255.patch, 
> LUCENE-3255_testcase.patch
>
>
> Lucene will happily wipe an existing index if presented with a latest 
> generation segments_n file of all zeros. File format documentation says 
> segments_N files should start with a format of -9 but SegmentInfos.read 
> accepts >=0 as valid for backward compatibility reasons.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field