date:20110706


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3233:


Attachment: LUCENE-3233.patch

fixed some bugs, added some tests, but there is a problem, I started to add a 
little benchmark and I hit this on my largish synonyms file:
{noformat}
java.lang.IllegalStateException: max arc size is too large (445)
{noformat}

Just run the TestFSTSynonymFilterFactory and you will see it, i enabled some 
prints and it doesn't appear like anything totally stupid is going on... giving 
up for the night :)

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3233:


Attachment: synonyms.zip

attaching my synonyms.txt test file that i was using: its derived from wordnet

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir

2011-07-06 Thread Varun Thacker (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060331#comment-13060331
]

Varun Thacker commented on LUCENE-2795:
---

{quote}
Hey Varun - saw you asked for someone with a mac to run some code for you in
IRC but you popped off before I saw - what do you need? Just apply the patch
and run the tests?
{quote}

This patch will apply to the LUCENE2793 branch. Othewise in file :
lucene/contrib/misc/src/java/org/apache/lucene/store/NativePosixUtil.cpp after
line 117 inside the if add this line - {code} fcntl(fd, F_NOCACHE, 1); {code}

And then by running {code}ant build-native-unix{code} from the /contrib/misc
folder to check if it compiles successfully.

Thanks.

Genericize DirectIOLinuxDir - UnixDir
--

Key: LUCENE-2795
URL: https://issues.apache.org/jira/browse/LUCENE-2795
Project: Lucene - Java
Issue Type: Improvement
Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
Labels: gsoc2011, lucene-gsoc-11, mentor
Attachments: LUCENE-2795.patch

Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to
use it for indexWriter and not IndexReader (searching). It's a trap.
But, once we do LUCENE-2793, we can make it fully general purpose because
then a single native Dir impl can be used.
I'd also like to make it generic to other Unices, if we can, so that it
becomes UnixDirectory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9364 - Failure

2011-07-06 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9364/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull

Error Message:
CFS has no entries

Stack Trace:
java.lang.IllegalStateException: CFS has no entries
at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139)
at 
org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181)
at 
org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58)
at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031)
at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539)
at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)




Build Log (for compile errors):
[...truncated 10589 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-06 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060342#comment-13060342
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

Mark,

changing the Pathes should be really easy, like Ryan said. So, should we use 
something other then {{request.getContextPath()}}? Maybe combined with a 
Conditional?

bq. but it complains about the javascript variable class in script.js (L969) 
when i repackage.
[that's already 
changed|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/webapp/web/js/script.js?r1=1138322r2=1138323;]
 : Or did you use another Version?

Stefan

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-06 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060435#comment-13060435
 ] 

Erik Hatcher commented on SOLR-2399:


Perhaps we should close this issue and open new ones since this one is getting 
incredibly long in comments and it's already been committed.

But... one issue I have is that the schema/config views don't take advantage of 
my browsers ability to render XML as a collapsible/expandable tree structure.  
It's surely nice as it is now for browsers that don't do XML like this... so 
maybe we leave it like it is but also provide a direct link to the show file 
request handler for those files like the old-school admin links do.  Thoughts?

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-06 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060441#comment-13060441
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

bq. ... so maybe we leave it like it is but also provide a direct link to the 
show file request handler for those files like the old-school admin links do. 
Thoughts?
Either this, yes -- or we add tabs to the relevant views. The current one as 
first and an additional one with raw view?

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: failonjavadocwarning to false for ant generate-maven-artifacts

2011-07-06 Thread Eric Charles


Hi Steven,

Tks for the explanation.
I should have seen that from the ant targets!

Diverging from the thread title and probably already discussed... but 
why pom are not committed and maintained in SVN?


(Even if 'mvn install' works fine, I still have issues importing the 
modules hierarchy in eclipse due to e.g. the src/test-framework module. 
Having pom in trunk would allow to fix this one for all.)


Thx.

On 04/07/11 20:07, Steven A Rowe wrote:

Hi Eric,

'ant get-maven-poms' will generate the pom.xml files for you.

'ant generate-maven-artifacts' has to generate the javadoc for each module, and 
javadoc generation fails on warnings.  When the javadoc tool fails to download 
the package list from Oracle, which seems to happen often, the resulting 
warning fails the build.

Steve

-Original Message-
From: Eric Charles [mailto:eric.char...@u-mangate.com]
Sent: Monday, July 04, 2011 5:07 AM
To: dev@lucene.apache.org
Subject: failonjavadocwarning to false for ant generate-maven-artifacts

Hi,
In current trunk, I had to set failonjavadocwarning to false to successfully 
generate the pom (via ant generate-maven-artifacts).

(invoking ant javadoc in lucene folder also fails).

I was simply looking for the pom.xml generation, but much more was done.

I'm not worry about that (just willing to share it).
Thx.
--
Eric

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




--
Eric

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services

2011-07-06 Thread Noble Paul (JIRA)

A CoreContainer Plugin interface to create Container level Services
---

 Key: SOLR-2638
 URL: https://issues.apache.org/jira/browse/SOLR-2638
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Noble Paul


It can help register services such as Zookeeper .

interface

{code:java}
public abstract class ContainerPlugin {
  /**Called before initializing any core.
   * @param container
   * @param attrs
   */
  public abstract void init(CoreContainer container, MapString,String attrs);

  /**Callback after all cores are initialized
   */
  public void postInit(){}

  /**Callback after each core is created, but before registration
   * @param core
   */
  public void onCoreCreate(SolrCore core){}

  /**Callback for server shutdown
   */
  public void shutdown(){}

}
{code}

It may be specified in solr.xml as
{code:xml}
solr

  plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 
zkClientTimeout=8000/

  cores adminPath=/admin/cores defaultCoreName=collection1 
host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr 
core name=collection1 shard=${shard:} 
collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
instanceDir=./
  /cores
/solr
{code}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-07-06 Thread Luca Stancapiano (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060469#comment-13060469
 ] 

Luca Stancapiano commented on LUCENE-3167:
--

Some OSGI informations are inside the pom.xml . So if we want an automatism to 
create the OSGI attributes in the MANIFEST.MF, we have to read inside the 
pom.xml.template files. 

Theese are the OSGI informations to add in the MANIFEST:

Bundle-License: http://www.apache.org/licenses/LICENSE-2.0.txt
(project.licenses.license.url in the parent pom.xml.template)
Bundle-SymbolicName: org.apache.lucene.misc
(project.groupId+project.artifactId in the pom.xml.template)
Bundle-Name: Lucene Miscellaneous  (project.name attribute in the 
pom.xml.template)
Bundle-Vendor: The Apache Software Foundation   (from the parent 
pom.xml.template)  
Bundle-Version: 4.0-SNAPSHOT   ($version variable from ant)
Bundle-Description: Miscellaneous Lucene extensions 
(project.description from pom.xml.template)  
Bundle-DocURL: http://www.apache.org/
(project.documentation.url in the parent pom.xml.template)

Else we should duplicate the informations. What is the better road?


 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano

 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™

[
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060471#comment-13060471
]

Michael McCandless commented on LUCENE-3233:

bq. java.lang.IllegalStateException: max arc size is too large (445)

Ahh -- to fix this we have to call Builder.setAllowArrayArcs(false), ie,
disable the array arcs in the FST (and this binary search lookup for finding
arcs!). I had to do this also for MemoryCodec, since postings encoded as
output per arc can be more than 256 bytes, in general.

This will hurt perf, ie, the arc lookup cannot use a binary search; it's
because of a silly limitation in the FST representation, that we use a single
byte to hold the max size of all arcs, so that if any arc is 256 bytes we are
unable to encode it as an array. We could fix this (eg, use vInt), however,
arcs with such widely varying sizes (due to widely varying outputs on each arc)
will be very wasteful in space because all arcs will use up a fixed number of
bytes when represented as an array.

For now I think we should just call the above method, and then test the
resulting perf.

HuperDuperSynonymsFilter™
-

Key: LUCENE-3233
URL: https://issues.apache.org/jira/browse/LUCENE-3233
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch,
LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch,
LUCENE-3233.patch, synonyms.zip

The current synonymsfilter uses a lot of ram and cpu, especially at build
time.
I think yesterday I heard about huge synonyms files three times.
So, I think we should use an FST-based structure, sharing the inputs and
outputs.
And we should be more efficient with the tokenStream api, e.g. using
save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-07-06 Thread Gunnar Wagenknecht (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060472#comment-13060472
 ] 

Gunnar Wagenknecht commented on LUCENE-3167:


The approach I implemented in my patch (attached to LUCENE-1344) used template 
files for BND. I wonder if both - Ant as well as Maven could use those files.

 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano

 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services

2011-07-06 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-2638:
-

Attachment: SOLR-2638.patch

First cut

 A CoreContainer Plugin interface to create Container level Services
 ---

 Key: SOLR-2638
 URL: https://issues.apache.org/jira/browse/SOLR-2638
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-2638.patch


 It can help register services such as Zookeeper .
 interface
 {code:java}
 public abstract class ContainerPlugin {
   /**Called before initializing any core.
* @param container
* @param attrs
*/
   public abstract void init(CoreContainer container, MapString,String 
 attrs);
   /**Callback after all cores are initialized
*/
   public void postInit(){}
   /**Callback after each core is created, but before registration
* @param core
*/
   public void onCoreCreate(SolrCore core){}
   /**Callback for server shutdown
*/
   public void shutdown(){}
 }
 {code}
 It may be specified in solr.xml as
 {code:xml}
 solr
   plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 
 zkClientTimeout=8000/
   cores adminPath=/admin/cores defaultCoreName=collection1 
 host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr 
 core name=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 instanceDir=./
   /cores
 /solr
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field


 [ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3216:


Attachment: LUCENE-3216.patch

here is a new patch that moves the DocValues configuration to setters.

I also added a randomizeCodec(Codec) to LuceneTestCase that sets the CFS flag 
at random. 



 Store DocValues per segment instead of per field
 

 Key: LUCENE-3216
 URL: https://issues.apache.org/jira/browse/LUCENE-3216
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
 LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
 LUCENE-3216_floats.patch


 currently we are storing docvalues per field which results in at least one 
 file per field that uses docvalues (or at most two per field per segment 
 depending on the impl.). Yet, we should try to by default pack docvalues into 
 a single file if possible. To enable this we need to hold all docvalues in 
 memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-07-06 Thread Luca Stancapiano (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060522#comment-13060522
 ] 

Luca Stancapiano commented on LUCENE-3167:
--

It depends by the structure of your templates. Can you tell me how is organized 
the tree of the template files?

 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano

 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3279) Allow CFS be empty

Allow CFS be empty
--

 Key: LUCENE-3279
 URL: https://issues.apache.org/jira/browse/LUCENE-3279
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
 Fix For: 3.4, 4.0


since we changed CFS semantics slightly closing a CFS directory on an error can 
lead to an exception. Yet, an empty CFS is still a valid CFS so for consistency 
we should allow CFS to be empty.
here is an example:

{noformat}
1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull

Error Message:
CFS has no entries

Stack Trace:
java.lang.IllegalStateException: CFS has no entries
   at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139)
   at 
org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181)
   at 
org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58)
   at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139)
   at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
   at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
   at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
   at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710)
   at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706)
   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513)
   at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064)
   at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031)
   at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539)
   at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74)
   at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
   at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
{noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: failonjavadocwarning to false for ant generate-maven-artifacts

2011-07-06 Thread Steven A Rowe

Eric,

On 7/6/2011 at 5:35 AM, Eric Charles wrote:
 Diverging from the thread title and probably already discussed... 
 but why pom are not committed and maintained in SVN?

The POMs *are* committed to SVN, under dev-tools/maven/, as pom.xml.template 
files, which have their version filled in when they are copied over to where 
they can be used and renamed to pom.xml.  The Maven configuration is a 
non-official build, and maintaining the POMs outside of the main source tree is 
one way in which this fact is conveyed to users.

 Even if 'mvn install' works fine, I still have issues importing the
 modules hierarchy in eclipse due to e.g. the src/test-framework module.

Do you know about the eclipse configuration available via 'ant eclipse' ?

Steve

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3279) Allow CFS be empty


 [ 
https://issues.apache.org/jira/browse/LUCENE-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3279:
---

Assignee: Simon Willnauer

 Allow CFS be empty
 --

 Key: LUCENE-3279
 URL: https://issues.apache.org/jira/browse/LUCENE-3279
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3279.patch


 since we changed CFS semantics slightly closing a CFS directory on an error 
 can lead to an exception. Yet, an empty CFS is still a valid CFS so for 
 consistency we should allow CFS to be empty.
 here is an example:
 {noformat}
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull
 Error Message:
 CFS has no entries
 Stack Trace:
 java.lang.IllegalStateException: CFS has no entries
at 
 org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139)
at 
 org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181)
at 
 org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58)
at 
 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139)
at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
at 
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3279) Allow CFS be empty


 [ 
https://issues.apache.org/jira/browse/LUCENE-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3279:


Attachment: LUCENE-3279.patch

here is a patch

 Allow CFS be empty
 --

 Key: LUCENE-3279
 URL: https://issues.apache.org/jira/browse/LUCENE-3279
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3279.patch


 since we changed CFS semantics slightly closing a CFS directory on an error 
 can lead to an exception. Yet, an empty CFS is still a valid CFS so for 
 consistency we should allow CFS to be empty.
 here is an example:
 {noformat}
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull
 Error Message:
 CFS has no entries
 Stack Trace:
 java.lang.IllegalStateException: CFS has no entries
at 
 org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139)
at 
 org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181)
at 
 org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58)
at 
 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139)
at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
at 
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: failonjavadocwarning to false for ant generate-maven-artifacts

2011-07-06 Thread Eric Charles


Hi Steven,

On 06/07/11 14:17, Steven A Rowe wrote:

Eric,

On 7/6/2011 at 5:35 AM, Eric Charles wrote:

Diverging from the thread title and probably already discussed...
but why pom are not committed and maintained in SVN?


The POMs *are* committed to SVN, under dev-tools/maven/, as pom.xml.template files, 
which have theirversion  filled in when they are copied over to where they 
can be used and renamed to pom.xml.  The Maven configuration is a non-official build, 
and maintaining the POMs outside of the main source tree is one way in which this 
fact is conveyed to users.



Yes, I just saw this.
It's just that they are not committed on the standard place and they 
need to be generated before to be used.



Even if 'mvn install' works fine, I still have issues importing the
modules hierarchy in eclipse due to e.g. the src/test-framework module.


Do you know about the eclipse configuration available via 'ant eclipse' ?



I've given a try to 'ant eclipse', and yes, it create one eclipse 
project with may src folders (the lucene, contrib, modules, solr...).


Before coming to mailing list, I looked across the lucene website and 
wiki for the information.
Now I see it's on 
http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips 
and in the SVN README.txt. Sorry for the noise.


I'm more used to work with m2eclipse that allows me to directly import 
modules as different eclipse projects, having snapshots resolved from 
maven repo. But that's just a developer habit and I'm fine working with 
the generated eclipse project. Nevertheless, I would have preferred 
having the final pom committed to be sure it imports fine in eclipse 
(now, I'm stuck with the src/test-framework module).



Steve

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




--
Eric

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060537#comment-13060537
]

Simon Willnauer commented on LUCENE-2793:
-

bq. Made the necessary changes and hopefully addressed all the nocommits.
varun, I still see lots of nocommits here. Would be good if you could address
them this week. You don't need to solve them but discuss them here with us. you
can do that in a patch and add your comments to the parts where you are not
sure how to resolve.

I would like to commit the patches this week so we can merge to trunk soonish.

Simon

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
Labels: gsoc2011, lucene-gsoc-11, mentor
Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch

Today for merging we pass down a larger readBufferSize than for searching
because we get better performance.
I think we should generalize this to a class (IOContext), which would hold
the buffer size, but then could hold other flags like DIRECT (bypass OS's
buffer cache), SEQUENTIAL, etc.
Then, we can make the DirectIOLinuxDirectory fully usable because we would
only use DIRECT/SEQUENTIAL during merging.
This will require fixing how IW pools readers, so that a reader opened for
merging is not then used for searching, and vice/versa. Really, it's only
all the open file handles that need to be different -- we could in theory
share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: revisit naming for grouping/join?

2011-07-06 Thread Michael McCandless

On Tue, Jul 5, 2011 at 5:44 PM, Mike Sokolov soko...@ifactory.com wrote:

 : Maybe modules/nested? modules/nesteddocs?

        modules/subdocs
        modules/nesteddocs
        modules/nested

 None of them scream this is the perfect name to me, but none of them
 scream dear lord this is a terrible idea either.

 Instinct says All other factors being equal, pick the shortest name

 : Hmm... sub feels like it undersells, ie emphasizes under or
 : inferior to and de-emphasizes the strong cooperation w/ the parent.


 How about modules/superdoc?

 It wouldn't undersell, at least :)

I agree it's no longer under selling :)

But I like this even less than sub!  First, I think it has the same
problems that sub has since it's just symmetric: it's too un-equal, ie
implies one side is superior and above the other side, when in
fact joining (XML search, product SKUs, nested docs, etc.) are really
symmetric.  The nested parts of the doc are just as valid a part of
the document as the non-nested part.

Second, I don't like the super-ness of super (ie, in the sense of
supercalifragilisticexpialidocious or superman or superwoman) -- it's
too generic, ie, like best or awesome.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060538#comment-13060538
 ] 

Michael McCandless commented on LUCENE-2793:


+1

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: revisit naming for grouping/join?

2011-07-06 Thread Mike Sokolov




On 07/06/2011 08:47 AM, Michael McCandless wrote:

How about modules/superdoc?

It wouldn't undersell, at least :)
 

I agree it's no longer under selling :)

But I like this even less than sub!  First, I think it has the same
problems that sub has since it's just symmetric: it's too un-equal, ie
implies one side is superior and above the other side,
I basically agree, although I think there is an asymmetry in that this 
is a many-one relation?  The main improvement this name makes is the 
removal of the plural in the other options (doc vs docs).  And it's 
shorter than huperduperdoc :)  But otoh nothing I've seen here really 
captures all that much about index-time vs query-time join, which seems 
to be the main distinction (why you can't just call it join)?  If 
you're still in the market for names here are a few: StructureJoin, 
IntrinsicJoin, TreeJoin; Branch? Just brainstorming loosely.  Frankly 
Nest* seems well enough.


-Sokolov

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Two words Terms

2011-07-06 Thread DM Smith

You'll get more responses if you ask on the user's list. This list is for the 
development of the Lucene library, not for user application of the library.

On Jul 5, 2011, at 6:42 PM, jcardona7508 wrote:

 Hi everybody, I have a question, I need a to create documents with two words 
 terms, for example, the content of the document is:
 I have problems using the operating system windows 7, where terms must be:
 Ter1: I
 Ter2: have
 Term3: problems
 Term4: using
 Term5: the
 Term6: operating system
 Term7: windows 7
 
 The terms 6 and 7 must be two words, operating system and windows 7 
 because in the program it has sense together, not operating , system, 
 windows, 7.
 Can I create terms with 2 words? like Term6: operating system?
 What can I do?
 Thanks
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Two-words-Terms-tp3142833p3142833.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

CloudStateUpdateTest too many close

2011-07-06 Thread Yonik Seeley

I just noticed that CloudStateUpdateTest consistently  generates the
following log message:

SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@5dedb45. Please report this exception to
solr-u...@lucene.apache.org

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services

2011-07-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060573#comment-13060573
 ] 

Mark Miller commented on SOLR-2638:
---

cool - would be nice to abstract some of this out of CoreContainer - could use 
some slimming.

 A CoreContainer Plugin interface to create Container level Services
 ---

 Key: SOLR-2638
 URL: https://issues.apache.org/jira/browse/SOLR-2638
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-2638.patch


 It can help register services such as Zookeeper .
 interface
 {code:java}
 public abstract class ContainerPlugin {
   /**Called before initializing any core.
* @param container
* @param attrs
*/
   public abstract void init(CoreContainer container, MapString,String 
 attrs);
   /**Callback after all cores are initialized
*/
   public void postInit(){}
   /**Callback after each core is created, but before registration
* @param core
*/
   public void onCoreCreate(SolrCore core){}
   /**Callback for server shutdown
*/
   public void shutdown(){}
 }
 {code}
 It may be specified in solr.xml as
 {code:xml}
 solr
   plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 
 zkClientTimeout=8000/
   cores adminPath=/admin/cores defaultCoreName=collection1 
 host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr 
 core name=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 instanceDir=./
   /cores
 /solr
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060578#comment-13060578
 ] 

Michael McCandless commented on LUCENE-3233:


Actually, maybe a better general fix for FST would be for it to dynamically 
decide whether to make an array based on how many bytes will be wasted (in 
addition to the number of arcs / depth of the node).  This way we could turn on 
arcs always, and FST would pick the right times to use it.  If we stick to only 
1 byte for the number of bytes per arc, the FST could simply not use the array 
when an arc is  256 bytes.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: CloudStateUpdateTest too many close

2011-07-06 Thread Mark Miller


On Jul 6, 2011, at 9:36 AM, Yonik Seeley wrote:

 I just noticed that CloudStateUpdateTest consistently  generates the
 following log message:
 
 SEVERE: Too many close [count:-1] on
 org.apache.solr.core.SolrCore@5dedb45. Please report this exception to
 solr-u...@lucene.apache.org
 
 -Yonik
 http://www.lucidimagination.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


I'll fix the test cleanup.

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: CloudStateUpdateTest too many close

2011-07-06 Thread Mark Miller


 
 
 I'll fix the test cleanup.
 

Or you will beat me to it - conflicts!



- Mark





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-06 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I removed all the remaining nocommits as I think all of them had been addressed 
to.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3246.


Resolution: Fixed

Next I'll work on LUCENE-1536...

 Invert IR.getDelDocs - IR.getLiveDocs
 --

 Key: LUCENE-3246
 URL: https://issues.apache.org/jira/browse/LUCENE-3246
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3246-IndexSplitters.patch, LUCENE-3246.patch, 
 LUCENE-3246.patch


 Spinoff from LUCENE-1536, where we need to fix the low level filtering
 we do for deleted docs to match Filters (ie, a set bit means the doc
 is accepted) so that filters can be pushed all the way down to the
 enums when possible/appropriate.
 This change also inverts the meaning first arg to
 TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs


[ 
https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060583#comment-13060583
 ] 

Michael McCandless commented on LUCENE-3246:


This commit changed the index format (the *.del), but the change is fully 
back-compat even with trunk indices.

 Invert IR.getDelDocs - IR.getLiveDocs
 --

 Key: LUCENE-3246
 URL: https://issues.apache.org/jira/browse/LUCENE-3246
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3246-IndexSplitters.patch, LUCENE-3246.patch, 
 LUCENE-3246.patch


 Spinoff from LUCENE-1536, where we need to fix the low level filtering
 we do for deleted docs to match Filters (ie, a set bit means the doc
 is accepted) so that filters can be pushed all the way down to the
 enums when possible/appropriate.
 This change also inverts the meaning first arg to
 TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: CloudStateUpdateTest too many close

2011-07-06 Thread Yonik Seeley

On Wed, Jul 6, 2011 at 9:51 AM, Mark Miller markrmil...@gmail.com wrote:



 I'll fix the test cleanup.


 Or you will beat me to it - conflicts!

Heh, I should have just looked at the test first... it was easier than
I thought.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] [jira] [Commented] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases

2011-07-06 Thread Matt Warren

I've been looking at this with Olle over on the RavenDB mailing list.

Just to add that this patch
https://issues.apache.org/jira/secure/attachment/12420781/LUCENE-1930.patch
solves
the issue also. It's from this issue
https://issues.apache.org/jira/browse/LUCENE-1930

But it's more complicated than the fix you propose. As far as I can tell it
uses a completely different method of projecting locations, but I don't
really know much about how it works other than that.

On 6 July 2011 14:33, Digy (JIRA) j...@apache.org wrote:


[
 https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060571#comment-13060571]

 Digy commented on LUCENENET-431:
 

 Hi Olle,

 {code}
static double TransformLat(double lat)
{
var PI = 3.14159265358979323846;
return (Math.Atan(Math.Exp((lat * 180 / 20037508.34) / 180 *
 PI)) / PI * 360 - 90) * 10;
}


static double TransformLon(double lon)
{
return (lon * 180 / 20037508.34) * 10;
}

 

 private double _lat = TransformLat(55.6880508001);
 private double _lng = TransformLon(13.5871808352); // This passes:
 13.6271808352

 

 private void AddData(IndexWriter writer)
{
AddPoint(writer, Within radius, TransformLat(55.6880508001),
 TransformLon(13.5717346673));
AddPoint(writer, Within radius, TransformLat(55.6821978456),
 TransformLon(13.6076183965));
AddPoint(writer, Within radius, TransformLat(55.673251569),
 TransformLon(13.5946697607));
AddPoint(writer, Close but not in radius,
 TransformLat(55.8634157297), TransformLon(13.5497731987));
AddPoint(writer, Faar away, TransformLat(40.7137578228),
 TransformLon(-74.0126901936));

writer.Commit();
writer.Close();
}
 {code}

 When I change your code as above, it seems to work(According to above
 functions yours 4th point should be 11 miles away).

 If this works for all your cases, we can think of a patch for Spatial.Net.
 (Don't ask what these two functions do, since I found them somewhere in
 OpenLayers project :) )
 Maybe someone can explain these projection issues(if this really is the
 case).

 DIGY



  Spatial.Net Cartesian won't find docs in radius in certain cases
  
 
  Key: LUCENENET-431
  URL: https://issues.apache.org/jira/browse/LUCENENET-431
  Project: Lucene.Net
   Issue Type: Bug
   Components: Lucene.Net Contrib
 Affects Versions: Lucene.Net 2.9.4
  Environment: Windows 7 x64
 Reporter: Olle Jacobsen
   Labels: spatialsearch
 
  To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the
 following witch should return 3 results.
  Line
  42: private double _lat = 55.6880508001;
  43: private double _lng = 13.5871808352; // This passes: 13.6271808352
  73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673);
  74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965);
  75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607);
  76: AddPoint(writer, Close but not in radius, 55.8634157297,
 13.5497731987);
  77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936);
  130: const double miles = 5.0;
  156: Console.WriteLine(Distances should be 3  + distances.Count);
  157: Console.WriteLine(Results should be 3  + results);
  159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed
 distances
  160: Assert.AreEqual(3, results);

 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (LUCENE-3280) Add new bit set impl for caching filters

Add new bit set impl for caching filters


 Key: LUCENE-3280
 URL: https://issues.apache.org/jira/browse/LUCENE-3280
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0
 Attachments: LUCENE-3280.patch

I think OpenBitSet is trying to satisfy too many audiences, and it's
confusing/error-proned as a result.  It has int/long variants of many
methods.  Some methods require in-bound access, others don't; of those
others, some methods auto-grow the bits, some don't.  OpenBitSet
doesn't always know its numBits.

I'd like to factor out a more focused bit set impl whose primary
target usage is a cached Lucene Filter, ie a bit set indexed by docID
(int, not long) whose size is known and fixed up front (backed by
final long[]) and is always accessed in-bounds.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3280) Add new bit set impl for caching filters


 [ 
https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3280:
---

Attachment: LUCENE-3280.patch

Initial patch w/ some nocommits still but tests pass...

 Add new bit set impl for caching filters
 

 Key: LUCENE-3280
 URL: https://issues.apache.org/jira/browse/LUCENE-3280
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3280.patch


 I think OpenBitSet is trying to satisfy too many audiences, and it's
 confusing/error-proned as a result.  It has int/long variants of many
 methods.  Some methods require in-bound access, others don't; of those
 others, some methods auto-grow the bits, some don't.  OpenBitSet
 doesn't always know its numBits.
 I'd like to factor out a more focused bit set impl whose primary
 target usage is a cached Lucene Filter, ie a bit set indexed by docID
 (int, not long) whose size is known and fixed up front (backed by
 final long[]) and is always accessed in-bounds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060613#comment-13060613
 ] 

Robert Muir commented on LUCENE-3233:
-

Thanks Mike, I will set the option for now, we can address any potential perf 
hit in a number of different ways here (besides modifying FST itself).


 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-06 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3280) Add new bit set impl for caching filters

2011-07-06 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060619#comment-13060619
 ] 

Yonik Seeley commented on LUCENE-3280:
--

I think FastBitSet should still have 
{code}
  /** Expert: returns the long[] storing the bits */
  public long[] getBits() { return bits; }
{code}

The whole reason I had to create OpenBitSet in the first place was that you 
couldn't do anything custom fast (on a word-for-word basis) because the bits 
were locked away from you.

 Add new bit set impl for caching filters
 

 Key: LUCENE-3280
 URL: https://issues.apache.org/jira/browse/LUCENE-3280
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3280.patch


 I think OpenBitSet is trying to satisfy too many audiences, and it's
 confusing/error-proned as a result.  It has int/long variants of many
 methods.  Some methods require in-bound access, others don't; of those
 others, some methods auto-grow the bits, some don't.  OpenBitSet
 doesn't always know its numBits.
 I'd like to factor out a more focused bit set impl whose primary
 target usage is a cached Lucene Filter, ie a bit set indexed by docID
 (int, not long) whose size is known and fixed up front (backed by
 final long[]) and is always accessed in-bounds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2793:


Attachment: LUCENE-2793.patch

I took varuns patch and cleaned a couple of things up. I think this is ready, 
if nobody objects I will go ahead and commit this to the branch, merge up with 
trunk and upload a new patch to integrate this into trunk.

Once this is on trunk we can follow up with native stuff etc.

Thoughts?

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3280) Add new bit set impl for caching filters


[ 
https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060668#comment-13060668
 ] 

Michael McCandless commented on LUCENE-3280:


OK I'll add getBits().

 Add new bit set impl for caching filters
 

 Key: LUCENE-3280
 URL: https://issues.apache.org/jira/browse/LUCENE-3280
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3280.patch


 I think OpenBitSet is trying to satisfy too many audiences, and it's
 confusing/error-proned as a result.  It has int/long variants of many
 methods.  Some methods require in-bound access, others don't; of those
 others, some methods auto-grow the bits, some don't.  OpenBitSet
 doesn't always know its numBits.
 I'd like to factor out a more focused bit set impl whose primary
 target usage is a cached Lucene Filter, ie a bit set indexed by docID
 (int, not long) whose size is known and fixed up front (backed by
 final long[]) and is always accessed in-bounds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2793:


Attachment: LUCENE-2793.patch

s/4069/4096

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities


[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060680#comment-13060680
 ] 

Robert Muir commented on LUCENE-3220:
-

Hi David: I had some ideas on stats to simplify some of these sims:
# I think we can use an easier way to compute average document length: 
sumTotalTermFreq() / maxDoc(). This way the average is 'exact' and not skewed 
by index-time-boosts, smallfloat quantization, or anything like that.
# To support pivoted unique normalization like lnu.ltc, I think we can solve 
this in a similar way: add sumDocFreq(), which is just a single long, and 
divide this by maxDoc. this gives us avg # of unique terms. I think terrier 
might have a similar stat (#postings or #pointers or something)?

so i think this could make for nice simplifications: especially for switching 
norms completely over to docvalues: we should be able to do #1 immediately 
right now, change the way we compute avgdoclen for e.g. BM25 and DFR.

then in a separate issue i could revert this norm summation stuff to make the 
docvalues integration simpler, and open a new issue for sumDocFreq()


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-06 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060682#comment-13060682
 ] 

Hoss Man commented on SOLR-2399:


bq. I also repackaged it... the only thing you really need to change is in 
index.jsp

Hmmm... is that really necessary?

SolrDispatchFilter already has the notion of a path-prefix setting that can 
be specified in the web.xml and defaults to null. it uses that wen proxying to 
build up the correct urls for things like the per core admin pages and what not 
anytime it proxies a request to the JSPs.

couldn't we just make SolrDispatchFilter add the pathPrefix to the 
HttpServletRequest as an attribute, and then no one would ever need to modify 
the index.jsp ... it could just derive all the paths from 
request.getContextPath() and request.getAttribute(solr-path-prefix).

right?


 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060686#comment-13060686
 ] 

Robert Muir commented on LUCENE-3233:
-

I ran some quick numbers, using the syn file example here, just best of 3 runs:

||Impl||Build time||RAM usage||
|SynonymFilterFactory|6619ms|207.92 mb|
|FSTSynonymFilterFactory|463 ms|3.51 mb|

I modified the builder slightly to build the FST more efficiently for this, 
will upload the updated patch.

So i think the build-time and RAM consumption are really improved, the next 
thing is to benchmark the runtime perf.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3281) OpenBitSet should report the configured capacity/size

2011-07-06 Thread Robert Ragno (JIRA)

OpenBitSet should report the configured capacity/size
-

 Key: LUCENE-3281
 URL: https://issues.apache.org/jira/browse/LUCENE-3281
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.2, 3.1, 3.0.3, 3.0.2, 3.0.1, 3.0
Reporter: Robert Ragno
Priority: Minor


OpenBitSet rounds up the capacity() to the next multiple of 64 from what was 
specified. However, this is particularly damaging with the new asserts, which 
trigger when anything above the specified capacity is used as an index. The 
simple fix is to return numBits for capacity().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3233:


Attachment: LUCENE-3233.patch

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings

2011-07-06 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Summary: REGRESSION: in Solr 3.x and trunk the admin/file handler fails to 
show directory listings  (was: In Solr 3.2 and trunk the admin/file handler 
fails to show directory listings)

 REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show 
 directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.4, 4.0

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™

2011-07-06 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060698#comment-13060698
 ] 

David Smiley commented on LUCENE-3233:
--

Wow that's striking; nice work guys.  FSTs are definitely one of those killer 
pieces of technology in Lucene.

The difference in build time is surprising to me.  Any theory why 
SynonymFilterFactory takes so much more time to build?

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™

[
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060705#comment-13060705
]

Robert Muir commented on LUCENE-3233:
-

{quote}
The difference in build time is surprising to me. Any theory why
SynonymFilterFactory takes so much more time to build?
{quote}

Yes, its the n^2 portion where you have a synonym entry like this: a, b, c, d
in reality this is creating entries like this:
a - a
a - b
a - c
a - d
b - a
b - b
...

in the current impl, this is done using some inefficient datastructures (like
nested chararraymaps with Token),
as well as calling merge().

In the FST impl, we don't use any nested structures (instead input and output
entries are just phrases), and we explicitly
deduplicate both inputs and outputs during construction, the FST output is just
a
ListInteger basically pointing to ords in the deduplicated bytesrefhash.

so during construction when you add() its just a hashmap lookup on the input
phrase, a bytesrefhash get/put on the UTF16toUTF8WithHash
to get the output ord, and an append to an arraylist.

this code isn't really optimized right now and we can definitely speed it up
even more in the future. but the main thing
right now is to ensure the filter performance is good.

HuperDuperSynonymsFilter™
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser

2011-07-06 Thread Adriano Crestani (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060707#comment-13060707
 ] 

Adriano Crestani commented on LUCENE-1768:
--

{quote}
For numeric fields, half open ranges are important, as it supports queries like 
price  2.00 Dollar. Wasn't there not also an issue open to support other 
syntax for numerics like  and  operators?
{quote}

Yes, there is, just do not recall the JIRA number now. Maybe Vinicius could try 
to implement it as well to fill out his task list in case he finishes his tasks 
before schedule, since it is also related to numeric queries. I am just not 
sure how much complex the task would be, I know the big change for this is in 
the syntax parser, which will require to know how to change javacc files.

 NumericRange support for new query parser
 -

 Key: LUCENE-1768
 URL: https://issues.apache.org/jira/browse/LUCENE-1768
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/queryparser
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Adriano Crestani
  Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: week1.patch, week2.patch, week3.patch, week4.patch, 
 week5-6.patch


 It would be good to specify some type of schema for the query parser in 
 future, to automatically create NumericRangeQuery for different numeric 
 types? It would then be possible to index a numeric value 
 (double,float,long,int) using NumericField and then the query parser knows, 
 which type of field this is and so it correctly creates a NumericRangeQuery 
 for strings like [1.567..*] or (1.787..19.5].
 There is currently no way to extract if a field is numeric from the index, so 
 the user will have to configure the FieldConfig objects in the ConfigHandler. 
 But if this is done, it will not be that difficult to implement the rest.
 The only difference between the current handling of RangeQuery is then the 
 instantiation of the correct Query type and conversion of the entered numeric 
 values (simple Number.valueOf(...) cast of the user entered numbers). 
 Evenerything else is identical, NumericRangeQuery also supports the MTQ 
 rewrite modes (as it is a MTQ).
 Another thing is a change in Date semantics. There are some strange flags in 
 the current parser that tells it how to handle dates.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size

2011-07-06 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060711#comment-13060711
 ] 

Yonik Seeley commented on LUCENE-3281:
--

See LUCENE-3280, it looks like Lucene will be switching to FastBitSet for most 
things?
OpenBitSet is meant to be expert level and not impose any additional overhead 
(like keeping track of the largest bit that has been set).
But yeah, the new asserts do make things a little odd w.r.t. capacity()... how 
about the following:
{code}
   /** Returns the current capacity in bits (1 greater than the index of the 
last bit) */
-  public long capacity() { return bits.length  6; }
+  public long capacity() {
+long cap = bits.length  6;
+assert( (cap = numBits) = 0);
+return cap;
+  }
{code}



 OpenBitSet should report the configured capacity/size
 -

 Key: LUCENE-3281
 URL: https://issues.apache.org/jira/browse/LUCENE-3281
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2
Reporter: Robert Ragno
Priority: Minor
   Original Estimate: 2m
  Remaining Estimate: 2m

 OpenBitSet rounds up the capacity() to the next multiple of 64 from what was 
 specified. However, this is particularly damaging with the new asserts, which 
 trigger when anything above the specified capacity is used as an index. The 
 simple fix is to return numBits for capacity().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Commented] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases

2011-07-06 Thread Digy (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060709#comment-13060709
 ] 

Digy commented on LUCENENET-431:


Thanks Olle and Matt,

I committed the LUCENE-1930 patch to the 2.9.4g branch (+ added Olle's test 
case).

(Another divergence from lucene.java; since this patch is still waiting to be 
applied).

DIGY

 Spatial.Net Cartesian won't find docs in radius in certain cases
 

 Key: LUCENENET-431
 URL: https://issues.apache.org/jira/browse/LUCENENET-431
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4
 Environment: Windows 7 x64
Reporter: Olle Jacobsen
  Labels: spatialsearch

 To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the 
 following witch should return 3 results.
 Line
 42: private double _lat = 55.6880508001;
 43: private double _lng = 13.5871808352; // This passes: 13.6271808352
 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673);
 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965);
 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607);
 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987);
 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936);
 130: const double miles = 5.0;
 156: Console.WriteLine(Distances should be 3  + distances.Count);
 157: Console.WriteLine(Results should be 3  + results);
 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed 
 distances
 160: Assert.AreEqual(3, results);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size

2011-07-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060714#comment-13060714
 ] 

Uwe Schindler commented on LUCENE-3281:
---

Also to return this number, size() is the right method (at least in trunk).

 OpenBitSet should report the configured capacity/size
 -

 Key: LUCENE-3281
 URL: https://issues.apache.org/jira/browse/LUCENE-3281
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2
Reporter: Robert Ragno
Priority: Minor
   Original Estimate: 2m
  Remaining Estimate: 2m

 OpenBitSet rounds up the capacity() to the next multiple of 64 from what was 
 specified. However, this is particularly damaging with the new asserts, which 
 trigger when anything above the specified capacity is used as an index. The 
 simple fix is to return numBits for capacity().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Resolved] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases

2011-07-06 Thread Digy (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-431.


   Resolution: Fixed
Fix Version/s: Lucene.Net 2.9.4g
 Assignee: Digy

 Spatial.Net Cartesian won't find docs in radius in certain cases
 

 Key: LUCENENET-431
 URL: https://issues.apache.org/jira/browse/LUCENENET-431
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4
 Environment: Windows 7 x64
Reporter: Olle Jacobsen
Assignee: Digy
  Labels: spatialsearch
 Fix For: Lucene.Net 2.9.4g


 To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the 
 following witch should return 3 results.
 Line
 42: private double _lat = 55.6880508001;
 43: private double _lng = 13.5871808352; // This passes: 13.6271808352
 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673);
 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965);
 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607);
 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987);
 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936);
 130: const double miles = 5.0;
 156: Console.WriteLine(Distances should be 3  + distances.Count);
 157: Console.WriteLine(Results should be 3  + results);
 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed 
 distances
 160: Assert.AreEqual(3, results);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-07-06 Thread Shawn Heisey (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060717#comment-13060717
]

Shawn Heisey commented on SOLR-1972:

A little further info on rollingAvgRequestsPerSecond ... I have noticed that it
is always different from AvgRequestsPerSecond, even when requests and
rollingRequests are the same. I would expect different numbers when requests
and rollingRequests diverge, but not when they are the same.

I did take a look at the code, but have to admit that I haven't wrapped my
brain around it enough to figure out what the problem might be.

Need additional query stats in admin interface - median, 95th and 99th
percentile
-

Key: SOLR-1972
URL: https://issues.apache.org/jira/browse/SOLR-1972
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
Attachments: SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch

I would like to see more detailed query statistics from the admin GUI. This
is what you can get now:
requests : 809
errors : 0
timeouts : 0
totalTime : 70053
avgTimePerRequest : 86.59209
avgRequestsPerSecond : 0.8148785
I'd like to see more data on the time per request - median, 95th percentile,
99th percentile, and any other statistical function that makes sense to
include. In my environment, the first bunch of queries after startup tend to
take several seconds each. I find that the average value tends to be useless
until it has several thousand queries under its belt and the caches are
thoroughly warmed. The statistical functions I have mentioned would quickly
eliminate the influence of those initial slow queries.
The system will have to store individual data about each query. I don't know
if this is something Solr does already. It would be nice to have a
configurable count of how many of the most recent data points are kept, to
control the amount of memory the feature uses. The default value could be
something like 1024 or 4096.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size

2011-07-06 Thread Robert Ragno (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060722#comment-13060722
 ] 

Robert Ragno commented on LUCENE-3281:
--

Well, size() and capacity() are currently the same. But all that is needed is 
actually:

/** Returns the current capacity in bits (1 greater than the index of the last 
bit) */
-  public long capacity() { return bits.length  6; }
+  public long capacity() { return numBits; }

That will have the same effect. You throw away the first value for cap in the 
above, after all. Checking for numButs to be non-negative should be done in the 
constructor, if added, and maybe with a documented exception instead of an 
assert.

 OpenBitSet should report the configured capacity/size
 -

 Key: LUCENE-3281
 URL: https://issues.apache.org/jira/browse/LUCENE-3281
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2
Reporter: Robert Ragno
Priority: Minor
   Original Estimate: 2m
  Remaining Estimate: 2m

 OpenBitSet rounds up the capacity() to the next multiple of 64 from what was 
 specified. However, this is particularly damaging with the new asserts, which 
 trigger when anything above the specified capacity is used as an index. The 
 simple fix is to return numBits for capacity().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™

[
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3233:

Attachment: LUCENE-3233.patch

here is a patch with a little microbenchmark... so we have some tuning to do.

the benchmark analyzes a short string a million times, that doesn't match any
synonyms (actually hte solr default)

||impl||ms||
|SynonymsFilter|1692|
|FST with array arcs|2794|
|FST with no array arcs|8823|

so, disabling the array arcs is a pretty crucial hit here. but we could do
other options to speed up this common case, e.g. with daciuk we could build a
charrunautomaton of the K-prefixes of the synonyms, this would be really fast
to reject these terms that don't match any syns.

or we could explicitly put our bytesref output in a byte[], and use long
pointers as outputs.

or we could speed up FST! But i think its interesting to see how important this
parameter is.

HuperDuperSynonymsFilter™
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size


[ 
https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060757#comment-13060757
 ] 

Michael McCandless commented on LUCENE-3281:


I think the challenge here is numBits is currently not maintained unless 
assertions are enabled (eg, see expandingWordNum), so we can't just always 
return numBits from capacity()...

Maybe we should just always maintain numBits (ie, even when asserts are off)?  
Then capacity() could return numBits.

 OpenBitSet should report the configured capacity/size
 -

 Key: LUCENE-3281
 URL: https://issues.apache.org/jira/browse/LUCENE-3281
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2
Reporter: Robert Ragno
Priority: Minor
   Original Estimate: 2m
  Remaining Estimate: 2m

 OpenBitSet rounds up the capacity() to the next multiple of 64 from what was 
 specified. However, this is particularly damaging with the new asserts, which 
 trigger when anything above the specified capacity is used as an index. The 
 simple fix is to return numBits for capacity().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: revisit naming for grouping/join?

2011-07-06 Thread Chris Hostetter


: Also... I think we are over-thinking the name ;)  We can't convey
: *everything* in this name; as long as the name makes it clear that
: you'll want to consider this / read its javadocs whenever doing
: something with nested docs, I think that's sufficient.  I think
: NestedQueryWrapper (maybe NestedDocsQuery) and NestedDocsCollector are
: good enough, at least better than the functional-driven names they now
: have...

Yeah, that's fair ... i'm not in love with NestedDocsQuery and 
NestedDocsCollector but i agree they are better then what we have now.

: Honestly at this point I'm tempted to just stick with what we have
: (the functionally driven names, instead of the dominant use case
: driven name).
: 
: At its heart, this query is performing a join (well, finishing the
: join that was done during indexing), and despite our efforts to more
: descriptively capture the dominant use case, I don't think we're
: succeeding.  We are basically struggling to find ways to explain what
: a join does, into these class names.

I really think it's a bad idea to use Join in the name ... i understand 
that to you this is a join, but as you say it's really just finishing a 
join that was already done at index time -- for most users join is 
going to have the connotation of a SQL join where you don't have to 
normalize the data in advance (ie: build the index with all the docs you 
want ot join in a block) and we shouldn't use it unless we are talking 
about a truely generic query time join -- particularly if we are going to 
use examples i nthe doc that seem like the kind of think you would do with 
a query time join in SQL.

i know you feel like nested (or subdocs or parent) undersells the 
*possible* usecases of this feature, but the thing to remember is that 
even in the use cases where the real life data isn't something you might 
think of as being organized in a nested or hierarchical model, in 
order to use this feature the user must map their source data model to a 
Lucene Document model that *does* capture a hierarchy relationship so they 
can index their data in in the appropraite way.  X and Y may not be in a 
hierarchy, but if you want to join them like this, then the Document for X 
and the Document for Y must be thought of as being in a hierarchy and 
indexed in lock step with eachother.

Block just doesn't feel like it really conveys this ... but anything 
along the Nested, Parent, Subdoc, line of terminology would at least 
give some point of refrence to the idea that the *Document* model in 
Lucene needs to be organized in this way -- and i think it's really 
important that the name make that clear. 

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060764#comment-13060764
 ] 

Michael McCandless commented on LUCENE-3233:


Wow, it's very important to allow arcs to be encoded as arrays (for the binary 
search on lookup).  I think we should just fix FST... I'll think about it.  
MemoryCodec would also get big gains here.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: revisit naming for grouping/join?

2011-07-06 Thread Steven A Rowe

From my external POV on this debate, it seems as though the main point of 
contention is naming the nature of the relation between documents.  

Instead of doing that, a name that says that there is some form of relation, 
but leaving open its nature, might work: something like docrelation?  
(Avoiding the related documents IR concept would be important here.)

Steve

 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Wednesday, July 06, 2011 2:59 PM
 To: dev@lucene.apache.org
 Subject: Re: revisit naming for grouping/join?
 
 
 : Also... I think we are over-thinking the name ;)  We can't convey
 : *everything* in this name; as long as the name makes it clear that
 : you'll want to consider this / read its javadocs whenever doing
 : something with nested docs, I think that's sufficient.  I think
 : NestedQueryWrapper (maybe NestedDocsQuery) and NestedDocsCollector are
 : good enough, at least better than the functional-driven names they now
 : have...
 
 Yeah, that's fair ... i'm not in love with NestedDocsQuery and
 NestedDocsCollector but i agree they are better then what we have now.
 
 : Honestly at this point I'm tempted to just stick with what we have
 : (the functionally driven names, instead of the dominant use case
 : driven name).
 :
 : At its heart, this query is performing a join (well, finishing the
 : join that was done during indexing), and despite our efforts to more
 : descriptively capture the dominant use case, I don't think we're
 : succeeding.  We are basically struggling to find ways to explain what
 : a join does, into these class names.
 
 I really think it's a bad idea to use Join in the name ... i understand
 that to you this is a join, but as you say it's really just finishing a
 join that was already done at index time -- for most users join is
 going to have the connotation of a SQL join where you don't have to
 normalize the data in advance (ie: build the index with all the docs you
 want ot join in a block) and we shouldn't use it unless we are talking
 about a truely generic query time join -- particularly if we are going to
 use examples i nthe doc that seem like the kind of think you would do
 with
 a query time join in SQL.
 
 i know you feel like nested (or subdocs or parent) undersells the
 *possible* usecases of this feature, but the thing to remember is that
 even in the use cases where the real life data isn't something you might
 think of as being organized in a nested or hierarchical model, in
 order to use this feature the user must map their source data model to a
 Lucene Document model that *does* capture a hierarchy relationship so
 they
 can index their data in in the appropraite way.  X and Y may not be in a
 hierarchy, but if you want to join them like this, then the Document for
 X
 and the Document for Y must be thought of as being in a hierarchy and
 indexed in lock step with eachother.
 
 Block just doesn't feel like it really conveys this ... but anything
 along the Nested, Parent, Subdoc, line of terminology would at
 least
 give some point of refrence to the idea that the *Document* model in
 Lucene needs to be organized in this way -- and i think it's really
 important that the name make that clear.
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060781#comment-13060781
 ] 

Robert Muir commented on LUCENE-3233:
-

I agree, this would be the best solution. Maybe we should just open a separate 
issue for that? 

we can let this one be for now until that is resolved, can even continue 
working on other parts of it.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size

2011-07-06 Thread Robert Ragno (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060786#comment-13060786
]

Robert Ragno commented on LUCENE-3281:
--

Ah, good point. It seems cleaner to maintain it (which is straightforward). The
other sensible alternative would be to make the asserts all refer to the
up-rounded capacity. However, it seems reasonable and consistent to have an OBS
present the capacity it was constructed with.

I suppose there is room to split capacity() and size(), but that might confuse
existing uses.

Incidentally, if it were open to behavioral changes... I would find it more
convenient if the asserts were replaced with assuming that the vector was
infinite, filled with zeros. This seems more consistent with the set
operations, anyway. And the union operation, and so on. (And it is not as if
anyone can properly be relying on the current asserts to control flow.)

OpenBitSet should report the configured capacity/size
-

Key: LUCENE-3281
URL: https://issues.apache.org/jira/browse/LUCENE-3281
Project: Lucene - Java
Issue Type: Bug
Components: core/other
Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2
Reporter: Robert Ragno
Priority: Minor
Original Estimate: 2m
Remaining Estimate: 2m

OpenBitSet rounds up the capacity() to the next multiple of 64 from what was
specified. However, this is particularly damaging with the new asserts, which
trigger when anything above the specified capacity is used as an index. The
simple fix is to return numBits for capacity().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings

2011-07-06 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-2535:


Assignee: Erick Erickson

 REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show 
 directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
Assignee: Erick Erickson
 Fix For: 3.4, 4.0

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1280) Fields used update processor

2011-07-06 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-1280:
---

Attachment: FieldsUsedUpdateProcessorFactory.java

Updated version that allows configuration of fields used field and a field 
name regex for matching

 Fields used update processor
 

 Key: SOLR-1280
 URL: https://issues.apache.org/jira/browse/SOLR-1280
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Erik Hatcher
Priority: Trivial
 Attachments: FieldsUsedUpdateProcessorFactory.java, 
 FieldsUsedUpdateProcessorFactory.java


 When dealing with highly heterogeneous documents with different fields per 
 document, it can be very useful to know what fields are present on the result 
 documents from a search.  For example, this could be used to determine which 
 fields make the best facets for a given query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1280) Fields used update processor

2011-07-06 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060826#comment-13060826
 ] 

Erik Hatcher commented on SOLR-1280:


In this update the config can be something like this:

{code}
updateRequestProcessorChain name=fields_used default=true
  processor class=solr.processor.FieldsUsedUpdateProcessorFactory
str name=fieldsUsedFieldNameattribute_fields/str
str name=fieldNameRegex.*_attribute/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain
{code}

Regex was chosen to allow flexibility in matching field names for inclusion, 
but I think perhaps a better (more easily comprehended/configured) way would be 
to have a comma-separated list of field names that could contain a * for 
globbing, which should be about all the flexibility needed for this.

 Fields used update processor
 

 Key: SOLR-1280
 URL: https://issues.apache.org/jira/browse/SOLR-1280
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Erik Hatcher
Priority: Trivial
 Attachments: FieldsUsedUpdateProcessorFactory.java, 
 FieldsUsedUpdateProcessorFactory.java


 When dealing with highly heterogeneous documents with different fields per 
 document, it can be very useful to know what fields are present on the result 
 documents from a search.  For example, this could be used to determine which 
 fields make the best facets for a given query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-07-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060829#comment-13060829
 ] 

Uwe Schindler commented on LUCENE-3179:
---

Committed long versions and additional tests: rev 1143558 (trunk), rev 1143560 
(3.x).

I did not commit the cutover to Long.numberOfLeadingZeroes, because it was not 
performance tested. Also from the use-case, on machines without intrinsics, the 
JDK-given methods are slower (see comments in BitUtils.ntz, as in most cases 
the bits are shifted away (in nextSetBit), so the faster algorithm is to 
inverse the algorithm when calculating ntz.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Assignee: Paul Elschot
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, 
 LUCENE-3179-long-ntz.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, 
 LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2308) Separately specify a field's type

2011-07-06 Thread Nikola Tankovic (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nikola Tankovic updated LUCENE-2308:

Attachment: LUCENE-2308-6.patch

Minor fixes and more tests cutover.

Separately specify a field's type
-

Key: LUCENE-2308
URL: https://issues.apache.org/jira/browse/LUCENE-2308
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Labels: gsoc2011, lucene-gsoc-11, mentor
Fix For: 4.0

Attachments: LUCENE-2308-2.patch, LUCENE-2308-3.patch,
LUCENE-2308-4.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch,
LUCENE-2308-6.patch, LUCENE-2308.patch, LUCENE-2308.patch

This came up from dicussions on IRC. I'm summarizing here...
Today when you make a Field to add to a document you can set things
index or not, stored or not, analyzed or not, details like omitTfAP,
omitNorms, index term vectors (separately controlling
offsets/positions), etc.
I think we should factor these out into a new class (FieldType?).
Then you could re-use this FieldType instance across multiple fields.
The Field instance would still hold the actual value.
We could then do per-field analyzers by adding a setAnalyzer on the
FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
for per-field codecs (with flex), where we now have
PerFieldCodecWrapper).
This would NOT be a schema! It's just refactoring what we already
specify today. EG it's not serialized into the index.
This has been discussed before, and I know Michael Busch opened a more
ambitious (I think?) issue. I think this is a good first baby step. We could
consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3233:
---

Attachment: LUCENE-3233.patch

New patch, including some optimizing to FST (which we can commit under a 
separate issue): array arcs can now be any size, and I re-use the BytesReader 
inner class that's created for parsing arcs.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2637) Solrj support for Field Collapsing / Grouping query results parsing

2011-07-06 Thread Tao Cheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Cheng updated SOLR-2637:


Attachment: SOLR-2637.patch

1. fix 4 space tab to 2
2. added doc comments
3. extract ngroups when group.ngroups=true.

 Solrj support for Field Collapsing / Grouping query results parsing
 ---

 Key: SOLR-2637
 URL: https://issues.apache.org/jira/browse/SOLR-2637
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 4.0
Reporter: Tao Cheng
Priority: Minor
  Labels: features
 Fix For: 4.0

 Attachments: SOLR-2637.patch, SOLR-2637.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 Patch ready for Field Collapsing query results parsing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Getting patches (with tests!) committed

2011-07-06 Thread Erick Erickson

In the past I've had to ping the dev list with an include patch XYZ
please message

But I've just assigned it to myself, I'll see if I can get it
committed, I'm new enough at the process that I need the practice

Best
Erick

On Wed, Jul 6, 2011 at 1:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
How do committers recommend that patch contributors (like me) get their
patches committed? At the moment I'm thinking of this one:
https://issues.apache.org/jira/browse/SOLR-2535
This is a regression bug. I found the bug, I added a patch which fixes the
bug and tested that it was fixed. The tests are actually new tests that
tested code that wasn't tested before. I put the fix version in JIRA as
3.3 at the time I did this, because it was ready to go. Well 3.3 came and
went, and the version got bumped to 3.4. There are no processes in place for
committers to recognize completed patches. I think that's a problem. It's
very discouraging, as the contributor. I think prior to a release and
ideally at other occasions, issues assigned to the next release number should
actually be examined. Granted there are ~250 of them on the Solr side:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+SOLR+AND+resolution+%3D+Unresolved+AND+fixVersion+%3D+12316683+ORDER+BY+priority+DESC
And some initial triage could separate the wheat from the chaff.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060905#comment-13060905
 ] 

Robert Muir commented on LUCENE-3233:
-

bq. New patch, including some optimizing to FST (which we can commit under a 
separate issue)

works! I don't think we need to open a new issue, I didn't think you would come 
back with a patch in just two hours!

I'll play with the patch some now and see what I can do with it.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-07-06 Thread Adrien Grand (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated SOLR-1972:
---

Attachment: SOLR-1972.patch

Hi Shawn,

I fixed the patch so that rolling statistics are now consistent with
non-rolling statistics for the first requests. Average requests by second may
sometimes be a little different, but ensuring rolling and non-rolling
statistics have exactly the same value would require more synchronization,
which is not an option in my opinion.

Please let me know if you still get negative values for
rollingAvgRequestsPerSecond with this patch.

I hope this patch is the good one!

Need additional query stats in admin interface - median, 95th and 99th
percentile
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3278) Rename contrib/queryparser project to queryparser-contrib


 [ 
https://issues.apache.org/jira/browse/LUCENE-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3278.


   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Chris Male

Committed revision 1143615.

 Rename contrib/queryparser project to queryparser-contrib
 -

 Key: LUCENE-3278
 URL: https://issues.apache.org/jira/browse/LUCENE-3278
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male
Assignee: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3278.patch


 Much like with contrib/queries, we should differentiate the 
 contrib/queryparser from the queryparser module.  No directory structure 
 changes will be made, just ant and maven.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-06 Thread Shay Banon (JIRA)

BlockJoinQuery: Allow to add a custom child collector, and customize the parent 
bitset extraction
-

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon


It would be nice to allow to add a custom child collector to the BlockJoinQuery 
to be called on every matching doc (so we can do things with it, like counts 
and such). Also, allow to extend BlockJoinQuery to have a custom code that 
converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-06 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3282:
---

Attachment: LUCENE-3282.patch

 BlockJoinQuery: Allow to add a custom child collector, and customize the 
 parent bitset extraction
 -

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
 Attachments: LUCENE-3282.patch


 It would be nice to allow to add a custom child collector to the 
 BlockJoinQuery to be called on every matching doc (so we can do things with 
 it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
 custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3233:


Attachment: LUCENE-3233.patch

updated patch, this tableizes the first FST arcs for latin-1.

precomputing this tiny table speeds up this filter a ton (~3000ms - ~2000ms) 
and I think is a cheap easy win for the terms index too.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3283) Move core QueryParsers to queryparser module

Move core QueryParsers to queryparser module


 Key: LUCENE-3283
 URL: https://issues.apache.org/jira/browse/LUCENE-3283
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/queryparser, modules/queryparser
Reporter: Chris Male


Move the contents of lucene/src/java/org/apache/lucene/queryParser to the 
queryparser module.

To differentiate these parsers from the others, they are going to be placed a 
'classic' package.  We'll rename QueryParser to ClassicQueryParser as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3284) Move contribs/modules away from QueryParser dependency

Move contribs/modules away from QueryParser dependency
--

 Key: LUCENE-3284
 URL: https://issues.apache.org/jira/browse/LUCENE-3284
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/queryparser, modules/queryparser
Reporter: Chris Male


Some contribs and modules depend on the core QueryParser just for simplicity in 
their tests.  We should apply the same process as I did to the core tests, and 
move them away from using the QueryParser where possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3285) Move QueryParsers from contrib/queryparser to queryparser module

Move QueryParsers from contrib/queryparser to queryparser module


 Key: LUCENE-3285
 URL: https://issues.apache.org/jira/browse/LUCENE-3285
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male


Each of the QueryParsers will be ported across.

Those which use the flexible parsing framework will be placed under the package 
flexible.  The StandardQueryParser will be renamed to FlexibleQueryParser and 
surround.QueryParser will be renamed to SurroundQueryParser.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3286) Move XML QueryParser to queryparser module

Move XML QueryParser to queryparser module
--

 Key: LUCENE-3286
 URL: https://issues.apache.org/jira/browse/LUCENE-3286
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male


The XML QueryParser will be ported across to queryparser module.

As part of this work I want to reconsider the need for its demo, which has many 
addition dependencies.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3286) Move XML QueryParser to queryparser module


 [ 
https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3286:
---

Description: 
The XML QueryParser will be ported across to queryparser module.

As part of this work I want to reconsider the need for its demo, which has many 
additional dependencies.

  was:
The XML QueryParser will be ported across to queryparser module.

As part of this work I want to reconsider the need for its demo, which has many 
addition dependencies.


 Move XML QueryParser to queryparser module
 --

 Key: LUCENE-3286
 URL: https://issues.apache.org/jira/browse/LUCENE-3286
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male

 The XML QueryParser will be ported across to queryparser module.
 As part of this work I want to reconsider the need for its demo, which has 
 many additional dependencies.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061001#comment-13061001
 ] 

Robert Muir commented on LUCENE-3233:
-

Benchmark with the synonyms.zip attached to this issue (so that we are actually 
matching synonyms):
in this case i only analyzed the text 100,000 times, as its a lot more output.
I also checked they are emitting the same stuff:

||Impl||ms||
|SynonymsFilter|112527|
|FST|22872|

So, thats 5x faster, I think probably due to avoiding the expensive cloning.

I think we are fine on performance.


 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3286) Move XML QueryParser to queryparser module


[ 
https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061002#comment-13061002
 ] 

Robert Muir commented on LUCENE-3286:
-

we had an idea to expand the demo module, to cover more than just the basics 
now (e.g. including examples and such).

Maybe we can put it there?

 Move XML QueryParser to queryparser module
 --

 Key: LUCENE-3286
 URL: https://issues.apache.org/jira/browse/LUCENE-3286
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male

 The XML QueryParser will be ported across to queryparser module.
 As part of this work I want to reconsider the need for its demo, which has 
 many additional dependencies.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3286) Move XML QueryParser to queryparser module


[ 
https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061005#comment-13061005
 ] 

Chris Male commented on LUCENE-3286:


Sounds like a great location for it.

 Move XML QueryParser to queryparser module
 --

 Key: LUCENE-3286
 URL: https://issues.apache.org/jira/browse/LUCENE-3286
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male

 The XML QueryParser will be ported across to queryparser module.
 As part of this work I want to reconsider the need for its demo, which has 
 many additional dependencies.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3286) Move XML QueryParser to queryparser module


 [ 
https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3286:
---

Description: 
The XML QueryParser will be ported across to queryparser module.

As part of this work, we'll move the QP's demo into the demo module.

  was:
The XML QueryParser will be ported across to queryparser module.

As part of this work I want to reconsider the need for its demo, which has many 
additional dependencies.


 Move XML QueryParser to queryparser module
 --

 Key: LUCENE-3286
 URL: https://issues.apache.org/jira/browse/LUCENE-3286
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/queryparser
Reporter: Chris Male

 The XML QueryParser will be ported across to queryparser module.
 As part of this work, we'll move the QP's demo into the demo module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2639) default fl in solrconfig.xml does not recognize 'score' as a field

2011-07-06 Thread Tao Cheng (JIRA)

default fl in solrconfig.xml does not recognize 'score' as a field


 Key: SOLR-2639
 URL: https://issues.apache.org/jira/browse/SOLR-2639
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.0
 Environment: 4.0 trunk
Reporter: Tao Cheng
Priority: Minor


Got the following exception when query SOLR without explicitly specifying 
fl. In my solrconfig.xml, I set default value of fl to some fields 
including 'score'.

type Status report
message undefined field score
description The request sent by the client was syntactically incorrect 
(undefined field score).

I didn't have such error in my march trunk build. Note that when I set fl=, 
all fields except 'score' are returned. In the March trunk build, fl= still 
makes SOLR returns all the default fields specified in solrconfig.xml.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3284) Move contribs/modules away from QueryParser dependency