date:20110622

Build: https://builds.apache.org/job/Lucene-trunk/1602/

No tests ran.

Build Log (for compile errors):
[...truncated 8987 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-22 Thread Dawid Weiss (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053079#comment-13053079
]

Dawid Weiss commented on LUCENE-2341:
-

bq. Dawid, do you think it's reasonable to optimize further and use directly a
list returned by IStemmer.lookup (instead of copying with addAll) ? My concern
is that (at least in current DictionaryLookup implementation) that list seems
to be shared by distinct invocations of the lookup method, which would make the
use of a specific IStemmer not applicable in thread-safe code.

IStemmer implementations are not thread safe anyway, so there is no problem in
reusing that list. In fact, the returned WordData objects are reused internally
as well, so you can't store them either (this is done to avoid GC overhead).

So yes: I missed that, but you'll need to ensure IStemmer instances are not
shared. This can be done in various ways (thread local, etc), but I think the
simplest way to do it would be to instantiate PolishStemmer at the
MorfologikFilter level. This is cheap (the dictionary is loaded once anyway).

You can then create two constructors in the analyzer -- one with
PolishStemmer.DICTIONARY and one with the default (I'd suggest MORFOLOGIK).
Exposing IStemmer constructor will do more harm than good -- thinking ahead is
good, but in this case I don't think there'll be this many people interested in
subclassing IStemmer (if anything, they'll plug into Lucene's infrastructure
directly).

A simple test case spawning 5 or 10 threads in a parallel executor and
crunching stems on the same analyzer would also be nice to ensure we have
everything correct wrt multithreading, but it's not that crucial if you don't
have the time to write it.

Thanks!

explore morfologik integration
--

Key: LUCENE-2341
URL: https://issues.apache.org/jira/browse/LUCENE-2341
Project: Lucene - Java
Issue Type: New Feature
Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
Attachments: LUCENE-2341.diff, LUCENE-2341.diff,
morfologik-stemming-1.5.0.jar

Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer
available:
http://sourceforge.net/projects/morfologik/
This works differently than LUCENE-2298, and ideally would be another option
for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked

2011-06-22 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-2399:


Attachment: SOLR-2399-110622.patch

Okay, there we go :

{quote}On the 'java-properties' page, is the UI assuming ':' is the path 
seperator?
Can this use the value of path.seperator to split?{quote}
Yes  Yes - Done 
[[commit|https://github.com/steffkes/solr-admin/commit/abb57cacb4a8aa11e406da32ecfa0e2b3caf07be]]

bq. Should the Ping query append a random number so that it avoids HTTP cache? 
Good Idea! - Done 
[[commit|https://github.com/steffkes/solr-admin/commit/61f24c2b08e5b8ca847d197374abf1b3fbd0595a]]


bq. Something for the wishlist... on the threads page, it would be great to 
have a button to expand (and collapse?) all the stack traces. Its hard to 
figure out which thread is doing what just from the title.
I've added an Button at the Top and the Bottom of the Table to show/hide all of 
them w/ one click 
[[commit|https://github.com/steffkes/solr-admin/commit/26378c34ecebe34ce6e80292d8fb02acacb69ead]]



Attached Patch contains all git-changes since our last SVN-Commit. Could you 
also include those images Ryan? They will not go into the SVN-Diff because of 
their binary type :/ 
* https://github.com/steffkes/solr-admin/raw/master/img/ico/toolbox.png
* https://github.com/steffkes/solr-admin/raw/master/img/ico/zone.png
* 
https://github.com/steffkes/solr-admin/raw/master/img/ico/system-monitor--exclamation.png

Thanks! :)

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-06-22 Thread Bill Bell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053080#comment-13053080
 ] 

Bill Bell commented on SOLR-2242:
-

Simon,

I made all those changes except for the termsList one. I think it is useful to 
have the count based on terms.

See attachment.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-06-22 Thread Bill Bell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR-2242.shard.patch

New patch ready for commit?

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files

2011-06-22 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-2612.
-

Resolution: Fixed

Committed revision 1138319 on trunk and revision 1138320 on branch_3x.

 Add testpackage and testpackageroot conditions to clustering and 
 analysis-extras build files
 

 Key: SOLR-2612
 URL: https://issues.apache.org/jira/browse/SOLR-2612
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2612.patch


 Clustering and analysis-extras are the only two build files which do not have 
 testpackage and testpackageroot exclusions wired into the build file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-06-22 Thread Noble Paul (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053090#comment-13053090
]

Noble Paul commented on SOLR-2382:
--

The patch does not apply on trunk

DIH Cache Improvements
--

Key: SOLR-2382
URL: https://issues.apache.org/jira/browse/SOLR-2382
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch,
SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch

Functionality:
1. Provide a pluggable caching framework for DIH so that users can choose a
cache implementation that best suits their data and application.

2. Provide a means to temporarily cache a child Entity's data without
needing to create a special cached implementation of the Entity Processor
(such as CachedSqlEntityProcessor).

3. Provide a means to write the final (root entity) DIH output to a cache
rather than to Solr. Then provide a way for a subsequent DIH call to use the
cache as an Entity input. Also provide the ability to do delta updates on
such persistent caches.

4. Provide the ability to partition data across multiple caches that can
then be fed back into DIH and indexed either to varying Solr Shards, or to
the same Core in parallel.
Use Cases:
1. We needed a flexible scalable way to temporarily cache child-entity
data prior to joining to parent entities.
- Using SqlEntityProcessor with Child Entities can cause an n+1 select
problem.
- CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching
mechanism and does not scale.
- There is no way to cache non-SQL inputs (ex: flat files, xml, etc).

2. We needed the ability to gather data from long-running entities by a
process that runs separate from our main indexing process.

3. We wanted the ability to do a delta import of only the entities that
changed.
- Lucene/Solr requires entire documents to be re-indexed, even if only a
few fields changed.
- Our data comes from 50+ complex sql queries and/or flat files.
- We do not want to incur overhead re-gathering all of this data if only 1
entity's data changed.
- Persistent DIH caches solve this problem.

4. We want the ability to index several documents in parallel (using 1.4.1,
which did not have the threads parameter).

5. In the future, we may need to use Shards, creating a need to easily
partition our source data into Shards.
Implementation Details:
1. De-couple EntityProcessorBase from caching.
- Created a new interface, DIHCache two implementations:
- SortedMapBackedCache - An in-memory cache, used as default with
CachedSqlEntityProcessor (now deprecated).
- BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested
with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.
I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar,
so to use or evaluate this patch, download bdb-je from
http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html

2. Allow Entity Processors to take a cacheImpl parameter to cause the
entity data to be cached (see EntityProcessorBase DIHCacheProperties).

3. Partially De-couple SolrWriter from DocBuilder
- Created a new interface DIHWriter, two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

4. Create a new Entity Processor, DIHCacheProcessor, which reads a
persistent Cache as DIH Entity Input.

5. Support a partition parameter with both DIHCacheWriter and
DIHCacheProcessor to allow for easy partitioning of source entity data.

6. Change the semantics of entity.destroy()
- Previously, it was being called on each iteration of
DocBuilder.buildDocument().
- Now it is does one-time cleanup tasks (like closing or deleting a
disk-backed cache) once the entity processor is completed.
- The only out-of-the-box entity processor that previously implemented
destroy() was LineEntitiyProcessor, so this is not a very invasive change.
General Notes:
We are near completion in converting our search functionality from a legacy
search engine to Solr. However, I found that DIH did not support caching to
the level of our prior product's data import utility. In order to get our
data into Solr, I created these caching enhancements. Because I believe this
has broad application, and because we would like this feature to be supported
by the Community, I have front-ported this, enhanced, to Trunk. I have also
added unit tests and verified that all existing test cases

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-22 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053097#comment-13053097
 ] 

Ryan McKinley commented on SOLR-2399:
-

Thanks Stefan.  Added in #1138323

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-trunk - Build # 1602 - Failure

2011-06-22 Thread Chris Male

Second attempt at fixing the javadoc.  Passes for me now.

On Wed, Jun 22, 2011 at 6:29 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 Build: https://builds.apache.org/job/Lucene-trunk/1602/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 8987 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-22 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053110#comment-13053110
 ] 

Ryan McKinley commented on SOLR-2399:
-

in #1138328, I added a min-width value -- this should keep things from looking 
rediculous when it gets really small

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053113#comment-13053113
 ] 

Simon Willnauer commented on SOLR-2242:
---

bq. New patch ready for commit?

bill, I still see lots of whitespace / indentation problems  in that latest 
patch. Anyway I looked at it and I wonder if we could restructure this a little 
like we could first check if termList != null and do all the cases there and if 
termList == null we get the TermCountsLimit that would remove all the redundant 
getTermCountsLimit / getListedTermCounts calls. Like the termList==null case 
seems very easy and straight forward:
{code}
   if (termList != null) {
NamedListInteger counts = getListedTermCounts(facetValue, 
termList);
switch (numFacetTerms) {
case COUNTS:
  final NamedListInteger resCount = new NamedListInteger();
  counts = resCount;
case COUNTS_AND_VALUES:
  counts.add(numFacetTerms, counts.size());
  break;
}
res.add(key, counts);
  } else {
...
{code}

yet, its hard to refactor this without a single test (note, there might be a 
bug). I would be really happy to see a test-case for this that tests all the 
variations.
Regarding the constants, I think the default case should be a constant too. If 
you use NamedList can you make sure you put the right generic to it if 
possible, otherwise my IDE goes wild and adds warnings all over the place. In 
your case NamedListInteger works fine.

simon

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

[
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053115#comment-13053115
]

Robert Muir commented on LUCENE-3226:
-

{quote}
Also, in LUCENE-2921 I plan to get rid of all those ridiculous constant names
and track the index version at the segment level only. It will be easier, IMO,
to have an easy to understand constant name when it comes to supporting an
older index (or remove support for). Perhaps it's only me, but when I read
those format constant names, I only did that when removing support for older
indexes. Other than that, they are not very interesting ...

What Hoss reported about CheckIndex is the real problem we should handle here.
SegmentInfo prints in its toString the code version which created it, which is
better than seeing -9 IMO, and that should be 3.1 or 3.2. If it's a 3.2.0
newly created index, you shouldn't see 3.1 reported from
SegmentInfos.toString. Perhaps CheckIndex needs to be fixed to refer to
Constants.LUCENE_MAIN_VERSION instead?

Robert, shall we reopen the issue to discuss?
{quote}

We can reopen... but the issue will always exist here, LUCENE-2921 can't solve
this particular case since its the segments file...

rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

Key: LUCENE-3226
URL: https://issues.apache.org/jira/browse/LUCENE-3226
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
Assignee: Robert Muir
Fix For: 3.3, 4.0

Attachments: LUCENE-3226.patch

A 3.2 user recently asked if something was wrong because CheckIndex was
reporting his (newly built) index version as...
{noformat}
Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
{noformat}
It seems like there are two very confusing pieces of information here...
1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice. All
other FORMAT_* constants in SegmentInfos are descriptive of the actual change
made, and not specific to the version when they were introduced.
2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it
Lucene 3.1, which is missleading since that format is alwasy used in 3.2
(and probably 3.3, etc...).
I suggest:
a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
b) change CheckIndex so that the label for the newest format always ends
with and later (ie: Lucene 3.1 and later) so when we release versions
w/o a format change we don't have to remember to manual list them in
CheckIndex. when we *do* make format changes and update CheckIndex and
later can be replaced with to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2606) Solr sort no longer works on field names with some punctuation in them


[ 
https://issues.apache.org/jira/browse/SOLR-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053121#comment-13053121
 ] 

Jan Høydahl commented on SOLR-2606:
---

Perhaps a test-class producting randomized (legal) field names of could be of 
use for this and other tests?

 Solr sort no longer works on field names with some punctuation in them
 --

 Key: SOLR-2606
 URL: https://issues.apache.org/jira/browse/SOLR-2606
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1, 3.2
 Environment: Linux
Reporter: Mitsu Hadeishi

 We just upgraded from Solr 1.4 to 3.2. For the most part the upgrade went 
 fine, however we discovered that sorting on field names with dashes in them 
 is no longer working properly. For example, the following query used to work:
 http://[our solr server]/select/?q=computersort=static-need-binary+asc
 and now it gives this error:
 HTTP Status 400 - undefined field static
 type Status report
 message undefined field static
 description The request sent by the client was syntactically incorrect 
 (undefined field static).
 It appears the parser for sorting has been changed so that it now tokenizes 
 differently, and assumes field names cannot have dashes in them. However, 
 field names clearly can have dashes in them. The exact same query which 
 worked fine for us in 1.4 is now breaking in 3.2. Changing the sort field to 
 use a field name that doesn't have a dash in it works just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

[
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053123#comment-13053123
]

Shai Erera commented on LUCENE-3226:

How does renaming a constant solve the CheckIndex issue? I commented on the
constant name, and I think it should reflect the code version it applies to,
not the feature. Because if e.g. in the same version you add two features,
incrementally, you wouldn't change the format number twice right? And then the
constant name becomes meaningless again, or too complicated. It happened to me
a while ago (can't remember the exact feature though, perhaps it was in
TermInfos).

I mentioned LUCENE-2921 only because I intended to name the constants exactly
that (X_Y).

I see you've already reverted the changes you made. I think that the changes to
CheckIndex could remain though, adding the 3.1+ to the string?

rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

Key: LUCENE-3226
URL: https://issues.apache.org/jira/browse/LUCENE-3226
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
Fix For: 3.3, 4.0

Attachments: LUCENE-3226.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8982 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8982/

All tests passed

Build Log (for compile errors):
[...truncated 16696 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-trunk - Build # 1540 - Failure

Build: https://builds.apache.org/job/Solr-trunk/1540/

All tests passed

Build Log (for compile errors):
[...truncated 17830 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3218) Make CFS appendable


 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218_3x.patch

here is a patch against 3.x. I had to change one test in lucene/backwards and 
remove some tests from there which used the CFW / CFR.

A review would be good here!

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-22 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053142#comment-13053142
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Could that work for your use case?

Sounds like it, that's great :)
Do you think there any efficiencies to be gained on the document retrieve side 
of things if you know that the documents commonly being retrieved are 
physically nearby i.e. an app will often retrieve a parent's fields and then 
those from child docs which are required to be physically located adjacent to 
the parent's data. Would existing lower-level caching in Directory or the OS 
mean there's already a good chance of finding child data in cached blocks or 
could a change to file structures and/or doc retrieve APIs radically boost 
parent-plus-child retrieve performance?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-flexscoring-branch - Build # 17 - Failure

Build: 
https://builds.apache.org/job/Lucene-Solr-tests-only-flexscoring-branch/17/

All tests passed

Build Log (for compile errors):
[...truncated 12028 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8979 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8979/

All tests passed

Build Log (for compile errors):
[...truncated 11584 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8983 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8983/

All tests passed

Build Log (for compile errors):
[...truncated 16184 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8980 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8980/

All tests passed

Build Log (for compile errors):
[...truncated 12238 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3218) Make CFS appendable


 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3218:


Attachment: LUCENE-3218_tests.patch

Hi Simon, currently this attached patch fails... not sure why yet.

But I think we should resolve this tests issue before backporting

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-1431) CommComponent abstracted

2011-06-22 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1431.
--

Resolution: Fixed

I have committed it to trunk. We may need more iterations to clean it up

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8984 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8984/

All tests passed

Build Log (for compile errors):
[...truncated 15320 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8981 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8981/

All tests passed

Build Log (for compile errors):
[...truncated 12732 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3218) Make CFS appendable


 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218_test_fix.patch

thank you robert, while this has actually been tested since its in the base 
class though its now cleaner. The test failure came from RAMDirectory simply 
overriding existing files. I added an explicit check for it.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-22 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2610:


Attachment: SOLR-2610-branch3x.patch

Patch for branch 3x

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3218) Make CFS appendable


[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053181#comment-13053181
 ] 

Robert Muir commented on LUCENE-3218:
-

Thanks Simon, I feel better now that we get our open-files-for-write tracking 
back.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8985 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8985/

All tests passed

Build Log (for compile errors):
[...truncated 16738 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-22 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-2610.
-

Resolution: Fixed

Committed revision 1138405 on trunk and 1138407 on branch_3x.

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #159: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/159/

No tests ran.

Build Log (for compile errors):
[...truncated 7519 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

build should allow you (especially hudson) to refer to a local javadocs 
installation instead of downloading
---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir


Currently, we fail on all javadocs warnings.

However, you get a warning if it cannot download the package-list from sun.com
So I think we should allow you optionally set a sysprop using linkoffline.
Then we would get much less hudson fake failures

I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading


 [ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-3228:
---

Assignee: Robert Muir

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading


[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053194#comment-13053194
 ] 

Robert Muir commented on LUCENE-3228:
-

as a start, i installed the two freebsd ports for java doc on hudson into 
/usr/local/share/doc/jdk1.5 and jdk1.6

I'll see if i can add the hooks to the build scripts now


 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8986 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8986/

All tests passed

Build Log (for compile errors):
[...truncated 16135 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading


[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053198#comment-13053198
 ] 

Robert Muir commented on LUCENE-3228:
-

As a partial solution, I setup the 30 minute builds to just directly override 
javadoc.link (and javadoc.link.java for Solr) for our 30 minute builds... we 
don't care about the actual javadoc artifacts or where the links actually point 
to, only that there are no warnings.

This is in r1138418

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading


[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053200#comment-13053200
 ] 

Robert Muir commented on LUCENE-3228:
-

I noticed also that solr uses an online link for junit javadocs... we should 
download this one and do the same, too.
I'll look at this if the link for the sun javadocs takes for the 30 minute 
builds.

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8983 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8983/

All tests passed

Build Log (for compile errors):
[...truncated 13538 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3229) Overlaped SpanNearQuery

Overlaped SpanNearQuery
---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor


While using Span queries I think I've found a little bug.

With a document like this (from the TestNearSpansOrdered unit test) :

w1 w2 w3 w4 w5

If I try to search for this span query :

spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)

the above document is returned and I think it should not because 'w4' is not 
after 'w5'.
The 2 spans are not ordered, because there is an overlap.

I will add a test patch in the TestNearSpansOrdered unit test.
I will add a patch to solve this issue too.
Basicaly it modifies the two docSpansOrdered functions to make sure that the 
spans does not overlap.




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludovic Boutros updated LUCENE-3229:


Attachment: SpanOverlapTestUnit.diff

Add the Test unit.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludovic Boutros updated LUCENE-3229:


Attachment: SpanOverlap.diff

add a Patch.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1298) FunctionQuery results as pseudo-fields

2011-06-22 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053212#comment-13053212
 ] 

Koji Sekiguchi commented on SOLR-1298:
--

Hi, I'm using solr example data on trunk.

If I post q=ipodfl=score,price , Solr returns score and price as expected.
But if I post q=ipodfl=score,log(price) , Solr returns score, the value of 
log(price) and rest of all fields.

 FunctionQuery results as pseudo-fields
 --

 Key: SOLR-1298
 URL: https://issues.apache.org/jira/browse/SOLR-1298
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-1298-FieldValues.patch, SOLR-1298.patch


 It would be helpful if the results of FunctionQueries could be added as 
 fields to a document. 
 Couple of options here:
 1. Run FunctionQuery as part of relevance score and add that piece to the 
 document
 2. Run the function (not really a query) during Document/Field retrieval

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8984 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8984/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce

Error Message:
MockDirectoryWrapper: cannot close: there are still open files: {}

Stack Trace:
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 
open files: {}
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:473)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:366)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)




Build Log (for compile errors):
[...truncated 3264 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor


 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Attachment: SOLR-1979.patch

New version. Example of accepted params:

{code}
 processor 
class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
   defaults
 str name=langidtrue/str
 str name=langid.fltitle,subject,text,keywords/str
 str name=langid.langFieldlanguage_s/str
 str name=langid.langsFieldlanguages/str
 str name=langid.overwritefalse/str
 float name=langid.threshold0.5/float
 str name=langid.whitelistno,en,es,dk/str
 str name=langid.maptrue/str
 str name=langid.map.fltitle,text/str
 bool name=langid.map.overwritefalse/bool
 bool name=langid.map.keepOrigfalse/bool
 bool name=langid.map.individualfalse/bool
 str name=langid.map.individual.fl/str
 str name=langid.fallbackFieldsmeta_content_language,lang/str
 str name=langid.fallbacken/str
   /defaults
 /processor
{code}

The only mandatory parameter is langid.fl
To enable field name mapping, set langid.map=true. It will then map field names 
for all fields in langid.fl. If the set of fields to map is different from 
langid.fl, supply langid.map.fl. Those fields will then be renamed with a 
language suffix equal to the language detected from the langid.fl fields.

If you require detecting languages separately for each field, supply 
langid.map.individual=true. The supplied fields will then be renamed according 
to detected language on an individual basis. If the set of fields to detect 
individually is different from the already supplied langid.fl or langid.map.fl, 
supply langid.map.individual.fl. The fields listed in langid.map.individual.fl 
will then be detected individually, while the rest of the mapping fields will 
be mapped according to global document language.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jan Høydahl updated SOLR-1979:
--

Description:
Language identification from document fields, and mapping of field names to
language-specific fields based on detected language.

Wrap the Tika LanguageIdentifier in an UpdateProcessor.

was:
We need the ability to detect language of some random text in order to act upon
it, such as indexing the content into language aware fields. Another usecase is
to be able to filter/facet on language on random unstructured content.

To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The
processor is configurable like this:

{code:xml}
processor
class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
str name=inputFieldsname,subject/str
str name=outputFieldlanguage_s/str
str name=idFieldid/str
str name=fallbacken/str
/processor
{code}

It will then read the text from inputFields name and subject, perform language
identification and output the ISO code for the detected language in the
outputField. If no language was detected, fallback language is used.

Create LanguageIdentifierUpdateProcessor

Key: SOLR-1979
URL: https://issues.apache.org/jira/browse/SOLR-1979
Project: Solr
Issue Type: New Feature
Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch,
SOLR-1979.patch, SOLR-1979.patch

Language identification from document fields, and mapping of field names to
language-specific fields based on detected language.
Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor


[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053227#comment-13053227
 ] 

Jan Høydahl commented on SOLR-1979:
---

One question regarding the JUnit test: I now use
{code}
assertU(commit());
{code}
How can I add update request params to this commit? To select another update 
chain from different tests, I'd like to add update params on the fly, e.g.:
{code}
assertU(commit(), update.chain=langid2);
{code}

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex


[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053237#comment-13053237
 ] 

Shai Erera commented on LUCENE-3226:


how about printing the oldest and newest segment version?

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8984 - Still Failing

2011-06-22 Thread Simon Willnauer

I just committed a fix for this

simon

On Wed, Jun 22, 2011 at 2:51 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8984/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce

 Error Message:
 MockDirectoryWrapper: cannot close: there are still open files: {}

 Stack Trace:
 java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are 
 still open files: {}
        at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:473)
        at 
 org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279)
        at 
 org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:366)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)




 Build Log (for compile errors):
 [...truncated 3264 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex


[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053241#comment-13053241
 ] 

Robert Muir commented on LUCENE-3226:
-

This would be good (as we can compute it from the segments file), but, we just 
have to think about how to display the case where this is null: we know its = 
3.0 in this case... but we don't know any more than that?

Still we should do it, especially in 4.x when most indexes being checkIndexed 
will have this filled out (except 3.0 indexes)

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3218) Make CFS appendable


 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3218.
-

Resolution: Fixed

backported to 3.x - thanks guys

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053245#comment-13053245
 ] 

Mike Sokolov commented on LUCENE-3080:
--

There could be a good reason though for using byte-offsets in highlighting. I 
have in mind an optimization that would pull in text from an external file or 
other source, enabling highlighting without stored fields.  For best 
performance the snippet should be pulled from the external source using random 
access to storage, but this requires byte offsets.  I think this might be a big 
win for large field values.

This could only be done if the highlighter doesn't need to perform any text 
manipulation itself, so it's not really appropriate for Highlighter, as Robert 
said, but in the case of FVH it might be possible to implement.  I'm looking at 
this, but wondering before I get too deep in if anyone can comment on the 
feasibility of using byte offsets - I'm unclear on what they get used for other 
than highlighting: would it cause problems to have a CharFilter that returns 
corrected offsets such that char positions in the analyzed text are 
translated into byte positions in the source text? 

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex


[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053248#comment-13053248
 ] 

Shai Erera commented on LUCENE-3226:


We can print pre-3.1.

But, if somebody opened a 3.0 / 2.x index w/ 3.1+ and all segments were 
'touched' by the 3.1+ code, then their version would be 3.0 or 2.x (i.e., 
not null). So it could be that someone opens two indexes, and CheckIndex 
reports oldVersion=pre-3.1 for one and oldVersion=2.x for the other. I 
think it's acceptable though.

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef


[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053254#comment-13053254
 ] 

Robert Muir commented on LUCENE-3080:
-

Mike, its an interesting idea, as I think the offsets are intended to be opaque 
to the app (so you should be able to use byte offsets if you want).

There are some problems though, especially tokenfilters that muck with offsets:
NGramTokenFilter, WordDelimiterFilter, ...

In general there are assumptions here that offsets are utf16.

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053275#comment-13053275
 ] 

Mike Sokolov commented on LUCENE-3080:
--

It might be a bit more complicated?  Looks like WordDelimiterFilter, in 
generatePart and concatenate, eg, performs computation with the offsets.  So it 
would either need to know the units of the offsets it was passed, or be given 
more than just a correctOffset() method: rather it seems to require something 
like addCharsToOffset (offset, charOffsetIncr), where charOffsetIncr is a 
number of chars, but offset is in some unspecified unit.

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field


 [ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3216:


Attachment: LUCENE-3216_floats.patch

here is a first patch that converts the floats impl to buffer values in ram 
during indexing but writes values directly during merge. all tests pass

I plan to commit this soon too. Rather go small iterations here instead of a 
large patch.

 Store DocValues per segment instead of per field
 

 Key: LUCENE-3216
 URL: https://issues.apache.org/jira/browse/LUCENE-3216
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3216_floats.patch


 currently we are storing docvalues per field which results in at least one 
 file per field that uses docvalues (or at most two per field per segment 
 depending on the impl.). Yet, we should try to by default pack docvalues into 
 a single file if possible. To enable this we need to hold all docvalues in 
 memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3230) Make FSDirectory.fsync() public and static

Make FSDirectory.fsync() public and static
--

 Key: LUCENE-3230
 URL: https://issues.apache.org/jira/browse/LUCENE-3230
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/store
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.3, 4.0


I find FSDirectory.fsync() (today protected and instance method) very useful as 
a utility to sync() files. I'd like create a FSDirectory.sync() utility which 
contains the exact same impl of FSDir.fsync(), and have the latter call it. We 
can have it part of IOUtils too, as it's a completely standalone utility.

I would get rid of FSDir.fsync() if it wasn't protected (as if encouraging 
people to override it). I doubt anyone really overrides it (our core 
Directories don't).

Also, while reviewing the code, I noticed that if IOE occurs, the code sleeps 
for 5 msec. If an InterruptedException occurs then, it immediately throws 
ThreadIE, completely ignoring the fact that it slept due to IOE. Shouldn't we 
at least pass IOE.getMessage() on ThreadIE?

The patch is trivial, so I'd like to get some feedback before I post it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef


[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053281#comment-13053281
 ] 

Robert Muir commented on LUCENE-3080:
-

yes: in general I think it would be problematic, especially since most tests 
use only all-ascii data.

Another problem on this issue is that if you want to use bytes, but with the 
Tokenizer-analysis-chain, it only takes Reader, so you cannot assume anything 
about the original bytes or encoding (e.g. that its UTF-8 for example).





 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #156: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/156/

No tests ran.

Build Log (for compile errors):
[...truncated 7007 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3229) Overlaped SpanNearQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053286#comment-13053286
 ] 

ludovic Boutros commented on LUCENE-3229:
-

testSpanNearUnOrdered unit test does not work anymore.

The unordered SpanNear class uses the ordering function of the ordered SpanNear 
class. Perhaps, it should use its own ordering function witch allows the span 
overlaps.
I will check.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2614) stats with pivot

2011-06-22 Thread pengyao (JIRA)

stats with pivot


 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
 Fix For: 4.0


 Is it possible to get stats (like Stats Component: min ,max, sum, count,

missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?

 I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
 and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).


 Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.

 Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.


 Thanks a lot!


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2614) stats with pivot

2011-06-22 Thread pengyao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengyao updated SOLR-2614:
--

Component/s: (was: Schema and Analysis)
   Priority: Critical  (was: Major)
Description: 
 Is it possible to get stats (like Stats Component: min ,max, sum, count,

missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?

 I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
 and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).


 Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.

 Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.


 Thanks a lot!


this  is  very  import,because  only counts value,it's no use for sometimes.
please add   stats with pivot  in solr 4.0 

thanks a lot

  was:
 Is it possible to get stats (like Stats Component: min ,max, sum, count,

missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?

 I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
 and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).


 Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.

 Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.


 Thanks a lot!



 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.0


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053300#comment-13053300
 ] 

Mike Sokolov commented on LUCENE-3080:
--

Yeah I knew that at some point, but stuffed it away as something to think about 
later :) There really is no way to pass byte streams into the analysis chain.  
Maybe providing a character encoding to the filter could enable it to compute 
the needed byte offsets. 

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-22 Thread David Mark Nemeskey (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

EasySimilarity added. Lots of questions and nocommit in the code.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludovic Boutros updated LUCENE-3229:


Attachment: SpanOverlap2.diff

add a patch for the SpanNearUnOrdered class. Everything should be ok now.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlap2.diff, 
 SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3229) Overlaped SpanNearQuery