[jira] [Commented] (SOLR-4196) Untangle XML-specific nature of Config and Container classes

2013-01-22 Thread Andy Fowler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560418#comment-13560418
 ] 

Andy Fowler commented on SOLR-4196:
---

I noticed a bug(?) in Solr-4.1 release that if there are unloaded transient 
cores in solr.xml, and a new core is created via the admin handler, the record 
of the core in solr.xml is removed on persist. From some poking around at a few 
different issues, it seems that you're aware of this? Really excited about your 
work on LRU core management, Erick!

> Untangle XML-specific nature of Config and Container classes
> 
>
> Key: SOLR-4196
> URL: https://issues.apache.org/jira/browse/SOLR-4196
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4196.patch, SOLR-4196.patch, SOLR-4196.patch, 
> SOLR-4196.patch, SOLR-4196.patch, SOLR-4196.patch
>
>
> sub-task for SOLR-4083. If we're going to try to obsolete solr.xml, we need 
> to pull all of the specific XML processing out of Config and Container. 
> Currently, we refer to xpaths all over the place. This JIRA is about 
> providing a thunking layer to isolate the XML-esque nature of solr.xml and 
> allow a simple properties file to be used instead which will lead, 
> eventually, to solr.xml going away.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-22 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-4330.
--

   Resolution: Fixed
Fix Version/s: 3.6.3
   5.0
   4.2

> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 4.2, 5.0, 3.6.3
>
> Attachments: SOLR-4330.patch, SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4336) 4.1 no longer treats blank request params the same way as 4.0

2013-01-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560361#comment-13560361
 ] 

Mark Miller edited comment on SOLR-4336 at 1/23/13 3:21 AM:


This is probably my fault - I think I made some changes for solr.xml that may 
have affected this. Just did not notice because it did not trip any tests most 
likely. This was so that you could set properties like 
{noformat}${property:}{noformat} and get the default value by not not setting 
that property.

This is just a guess at the moment - the memory is still loose - but a 
direction to look.

  was (Author: markrmil...@gmail.com):
This is probably my fault - I think I made some changes for solr.xml that 
may have affected this. Just did not notice because it did not trip any tests 
most likely. This was so that you could set properties like ${property:} and 
get the default value by not not setting that property.

This is just a guess at the moment - the memory is still loose - but a 
direction to look.
  
> 4.1 no longer treats blank request params the same way as 4.0
> -
>
> Key: SOLR-4336
> URL: https://issues.apache.org/jira/browse/SOLR-4336
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.1
>Reporter: Hoss Man
>
> I haven't figured out where/why this changed, but IRC user trmnlr pointed out 
> to me that a query like the following example works fine in Solr 4.0 and 
> results in the default value for "start", but produces a 
> 'NumberFormatException: For input string: ""' in Solr 4.1...
> http://localhost:8983/solr/select?q=*:*&start=&rows=10
> ...the code related to parsing the "start" param hasn't changed between 4.0 
> and 4.1, suggesting that this is a more general regression in behavior in 
> something more lower level, that probably affects all SolrParams -- anywhere 
> in the past that users might have safely specify an empty string value and 
> still get Solr's "default" value will likely now behave differently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4336) 4.1 no longer treats blank request params the same way as 4.0

2013-01-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560361#comment-13560361
 ] 

Mark Miller commented on SOLR-4336:
---

This is probably my fault - I think I made some changes for solr.xml that may 
have affected this. Just did not notice because it did not trip any tests most 
likely. This was so that you could set properties like ${property:} and get the 
default value by not not setting that property.

This is just a guess at the moment - the memory is still loose - but a 
direction to look.

> 4.1 no longer treats blank request params the same way as 4.0
> -
>
> Key: SOLR-4336
> URL: https://issues.apache.org/jira/browse/SOLR-4336
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.1
>Reporter: Hoss Man
>
> I haven't figured out where/why this changed, but IRC user trmnlr pointed out 
> to me that a query like the following example works fine in Solr 4.0 and 
> results in the default value for "start", but produces a 
> 'NumberFormatException: For input string: ""' in Solr 4.1...
> http://localhost:8983/solr/select?q=*:*&start=&rows=10
> ...the code related to parsing the "start" param hasn't changed between 4.0 
> and 4.1, suggesting that this is a more general regression in behavior in 
> something more lower level, that probably affects all SolrParams -- anywhere 
> in the past that users might have safely specify an empty string value and 
> still get Solr's "default" value will likely now behave differently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4336) 4.1 no longer treats blank request params the same way as 4.0

2013-01-22 Thread Hoss Man (JIRA)
Hoss Man created SOLR-4336:
--

 Summary: 4.1 no longer treats blank request params the same way as 
4.0
 Key: SOLR-4336
 URL: https://issues.apache.org/jira/browse/SOLR-4336
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
Reporter: Hoss Man


I haven't figured out where/why this changed, but IRC user trmnlr pointed out 
to me that a query like the following example works fine in Solr 4.0 and 
results in the default value for "start", but produces a 
'NumberFormatException: For input string: ""' in Solr 4.1...

http://localhost:8983/solr/select?q=*:*&start=&rows=10

...the code related to parsing the "start" param hasn't changed between 4.0 and 
4.1, suggesting that this is a more general regression in behavior in something 
more lower level, that probably affects all SolrParams -- anywhere in the past 
that users might have safely specify an empty string value and still get Solr's 
"default" value will likely now behave differently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4196) Untangle XML-specific nature of Config and Container classes

2013-01-22 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4196:
-

Attachment: SOLR-4196.patch

Latest rev, includes persistence for solr.properties and individual core files.

Need to add persitence for transient cores that are not in memory when persist 
is called, that'll be coming soon.

> Untangle XML-specific nature of Config and Container classes
> 
>
> Key: SOLR-4196
> URL: https://issues.apache.org/jira/browse/SOLR-4196
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4196.patch, SOLR-4196.patch, SOLR-4196.patch, 
> SOLR-4196.patch, SOLR-4196.patch, SOLR-4196.patch
>
>
> sub-task for SOLR-4083. If we're going to try to obsolete solr.xml, we need 
> to pull all of the specific XML processing out of Config and Container. 
> Currently, we refer to xpaths all over the place. This JIRA is about 
> providing a thunking layer to isolate the XML-esque nature of solr.xml and 
> allow a simple properties file to be used instead which will lead, 
> eventually, to solr.xml going away.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4334) null:org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-22 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4334.
--

Resolution: Invalid

Please raise this kind of issue on the user's list, and if the consensus there 
is that you are seeing a bug then open a JIRA.

Best
Erick

> null:org.apache.solr.common.SolrException: Error loading class 
> 'org.apache.solr.response.transform.EditorialMarkerFactory'
> --
>
> Key: SOLR-4334
> URL: https://issues.apache.org/jira/browse/SOLR-4334
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
> Environment: solr 4.0 final on tomcat 7.0.34
>Reporter: David Morana
> Fix For: 4.0
>
>
> Hi,
> I can't seem to get the doc transformer to work.
> I'm turned on the elevated query component and I tried to turn on the doc 
> transformer in the solrconfig.xml file to get [elevated] appear in the 
> results.
> I tried both of these separately and I got the same error and the core would 
> not load:
> {noformat}
>  class="org.apache.solr.response.transform.EditorialMarkerFactory" />
>  class="org.apache.solr.response.transform.EditorialMarkerFactory" />
> {noformat}
> Please advise...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560129#comment-13560129
 ] 

Michael McCandless commented on LUCENE-4609:


Ugh!  My DV total bytes numbers were too high: luceneutil also indexes
title field as DV.  So ignore past byte sizes ... here's the [correct,
I hope!] byte sizes for the NO_PARENTS case, full 6.6M Wikipedia en
index: DV (index) 151208 KB, int[] (in RAM): 305889 KB.  And
NO_PARENTS perf (base = trunk, comp = int[] collector):

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
Wildcard   74.70  (3.3%)   74.32  (1.9%)   
-0.5% (  -5% -4%)
PKLookup  245.87  (1.8%)  244.80  (2.0%)   
-0.4% (  -4% -3%)
  HighPhrase   15.68  (5.7%)   15.72  (6.4%)
0.2% ( -11% -   12%)
 Respell  111.09  (3.5%)  111.33  (3.7%)
0.2% (  -6% -7%)
  AndHighLow   97.90  (1.6%)   98.16  (1.4%)
0.3% (  -2% -3%)
 LowSpanNear7.62  (3.8%)7.67  (3.5%)
0.7% (  -6% -8%)
 Prefix3   45.94  (5.6%)   46.34  (2.7%)
0.9% (  -6% -9%)
  IntNRQ   18.04  (8.2%)   18.20  (4.6%)
0.9% ( -11% -   14%)
 LowSloppyPhrase   17.77  (2.9%)   17.94  (4.8%)
1.0% (  -6% -8%)
  Fuzzy2   41.36  (2.4%)   42.68  (2.3%)
3.2% (  -1% -8%)
   LowPhrase   16.94  (2.4%)   17.65  (3.5%)
4.1% (  -1% -   10%)
HighSpanNear2.98  (2.8%)3.14  (2.1%)
5.3% (   0% -   10%)
  AndHighMed   49.18  (1.0%)   51.97  (0.7%)
5.7% (   3% -7%)
HighSloppyPhrase0.90  (6.7%)0.97 (12.6%)
6.8% ( -11% -   27%)
 MedSloppyPhrase   18.54  (1.8%)   19.91  (3.0%)
7.4% (   2% -   12%)
 MedSpanNear   19.86  (1.6%)   21.36  (2.0%)
7.5% (   3% -   11%)
   MedPhrase   55.57  (2.2%)   60.31  (2.3%)
8.5% (   3% -   13%)
  Fuzzy1   33.38  (1.4%)   37.19  (1.9%)   
11.4% (   8% -   14%)
 AndHighHigh   12.58  (1.2%)   14.66  (0.9%)   
16.6% (  14% -   18%)
 LowTerm   40.41  (1.2%)   47.14  (1.4%)   
16.6% (  13% -   19%)
 MedTerm   23.00  (1.4%)   27.14  (3.0%)   
18.0% (  13% -   22%)
   OrHighMed7.50  (2.2%)   10.16  (2.3%)   
35.6% (  30% -   40%)
   OrHighLow7.55  (2.0%)   10.30  (2.8%)   
36.3% (  30% -   41%)
HighTerm7.92  (1.9%)   10.98  (2.8%)   
38.6% (  33% -   44%)
  OrHighHigh4.30  (2.7%)6.39  (3.0%)   
48.6% (  41% -   55%)
{noformat}


> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid "collection1" convention

2013-01-22 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Attachment: (was: SOLR-4317.patch)

> SolrTestCaseJ4: Can't avoid "collection1" convention
> 
>
> Key: SOLR-4317
> URL: https://issues.apache.org/jira/browse/SOLR-4317
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0
>Reporter: Tricia Jenkins
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4317.patch, SOLR-4317.patch
>
>
> There is still an issue after the SOLR-3826 patch was applied for 4.0 
> [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  When 
> TestHarness is called from SolrTestCase4J the only available constructors 
> ignore coreName, set coreName = null, and initialize the default 
> 'collection1.'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-4317) SolrTestCaseJ4: Can't avoid "collection1" convention

2013-01-22 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Comment: was deleted

(was: Missing a file :()

> SolrTestCaseJ4: Can't avoid "collection1" convention
> 
>
> Key: SOLR-4317
> URL: https://issues.apache.org/jira/browse/SOLR-4317
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0
>Reporter: Tricia Jenkins
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4317.patch, SOLR-4317.patch
>
>
> There is still an issue after the SOLR-3826 patch was applied for 4.0 
> [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  When 
> TestHarness is called from SolrTestCase4J the only available constructors 
> ignore coreName, set coreName = null, and initialize the default 
> 'collection1.'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid "collection1" convention

2013-01-22 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Attachment: SOLR-4317.patch

Missing a file :(

> SolrTestCaseJ4: Can't avoid "collection1" convention
> 
>
> Key: SOLR-4317
> URL: https://issues.apache.org/jira/browse/SOLR-4317
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0
>Reporter: Tricia Jenkins
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4317.patch, SOLR-4317.patch, SOLR-4317.patch
>
>
> There is still an issue after the SOLR-3826 patch was applied for 4.0 
> [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  When 
> TestHarness is called from SolrTestCase4J the only available constructors 
> ignore coreName, set coreName = null, and initialize the default 
> 'collection1.'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4707) Track file reference kept by readers that are opened through the writer

2013-01-22 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560062#comment-13560062
 ] 

Jessica Cheng commented on LUCENE-4707:
---

Hi Michael,

Thanks for the quick response!

Our use is atypical--we implemented Directory on top of Cassandra, so we don't 
have the OS-level file protection that normal user would have. We basically see 
exception when the reader tries to load the file from Cassandra and it's not 
there anymore because it's been deleted.

I suppose that most "abnormal" uses are unsupported, but it seems like this 
would be a simple/small (without understanding further lucene complexity) 
change, so maybe it can be considered? Alternatively, if there's any way to 
expose the deleter incRef and decRef through the API (with strong "use it at 
your own risk"), we can also manage the reader refs ourselves.

Thanks!

> Track file reference kept by readers that are opened through the writer
> ---
>
> Key: LUCENE-4707
> URL: https://issues.apache.org/jira/browse/LUCENE-4707
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
> Environment: Mac OS X 10.8.2 and Linux 2.6.32
>Reporter: Jessica Cheng
>
> We ran into a bug where files (mostly CFS) that are still referred to by our 
> NRT reader/searcher are deleted by IndexFileDeleter. As far as I can see from 
> the verbose logging and reading the code, it seems that the problem is the 
> creation and merging of these CFS files between hard commits. The files 
> referred to by hard commits are incRef’ed at commit checkpoints, so these 
> files won’t be deleted until they are decRef’ed when the commit is deleted 
> according to the DeletionPolicy (good). However, intermediate files that are 
> created and merged between the hard commits only have refs through the 
> regular checkpoints, so as soon as a new checkpoint no longer includes those 
> files, they are immediately deleted by the deleter. See the abridged verbose 
> log lines that illustrate this behavior:
> IW 11 [Mon Jan 21 17:30:35 PST 2013; commitScheduler]: create compound file 
> _8.cfs
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5_6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6" [9 segments ; 
> isCommit = false]
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 0
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5 _6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6 _9(4.0.0.2):c6" 
> [10 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   DecRef "_8.cfs": 
> pre-decr count is 2
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: now checkpoint 
> "_b(4.0.0.2):C81" [1 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]:   DecRef 
> "_8.cfs": pre-decr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: delete "_8.cfs"
> With this behavior, it seems no matter how frequently we refresh the reader 
> (unless we do it at every read), we’d run into the race where the reader 
> still holds a reference to the file that’s just been deleted by the deleter. 
> My proposal is to count the file reference handed out to the NRT 
> reader/searcher when writer.getReader(boolean) is called and decRef the files 
> only when the said reader is closed.
> Please take a look and evaluate if my observations are correct and if the 
> proposal makes sense. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4317) SolrTestCaseJ4: Can't avoid "collection1" convention

2013-01-22 Thread Tricia Jenkins (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560050#comment-13560050
 ] 

Tricia Jenkins commented on SOLR-4317:
--

There is still a problem with jars from a sharedLib directory defined in 
solr.xml outside of the coreName's dataDir not being available to coreName's 
TestHarness despite it being logged as added to classpath:
1200 T11 oasc.SolrResourceLoader.replaceClassLoader Adding 
'file:/C:/Development/workspace/peel-solr/src/solr.home/lib/my-solrplugins.jar' 
to classloader

I'm getting: SEVERE Full Import failed:java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
EntityProcessor implementation for entity:properties Processing Document # 1
Caused by: java.lang.ClassNotFoundException: Unable to load MyEntityProcessor 
or org.apache.solr.handler.dataimport.PropertiesEntityProcessor

Workaround is to not use sharedLib.

> SolrTestCaseJ4: Can't avoid "collection1" convention
> 
>
> Key: SOLR-4317
> URL: https://issues.apache.org/jira/browse/SOLR-4317
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0
>Reporter: Tricia Jenkins
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4317.patch, SOLR-4317.patch
>
>
> There is still an issue after the SOLR-3826 patch was applied for 4.0 
> [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  When 
> TestHarness is called from SolrTestCase4J the only available constructors 
> ignore coreName, set coreName = null, and initialize the default 
> 'collection1.'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560048#comment-13560048
 ] 

Michael McCandless commented on LUCENE-4609:


The above results were 1M index; here's the full wikipedia en (6.6M docs) 
results:
{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
HighSpanNear2.91  (2.1%)2.90  (2.4%)   
-0.6% (  -5% -4%)
 Prefix3   46.35  (4.0%)   46.07  (3.9%)   
-0.6% (  -8% -7%)
PKLookup  240.11  (1.4%)  238.95  (1.9%)   
-0.5% (  -3% -2%)
Wildcard   73.79  (2.2%)   73.48  (2.3%)   
-0.4% (  -4% -4%)
  IntNRQ   18.05  (6.1%)   18.01  (5.9%)   
-0.2% ( -11% -   12%)
 Respell   96.78  (3.1%)   98.09  (3.3%)
1.3% (  -4% -7%)
 LowSloppyPhrase   17.63  (4.4%)   17.91  (3.8%)
1.6% (  -6% -   10%)
  AndHighLow  108.80  (2.8%)  110.58  (4.2%)
1.6% (  -5% -8%)
 LowSpanNear7.53  (4.8%)7.67  (5.6%)
1.8% (  -8% -   12%)
HighSloppyPhrase0.87 (10.1%)0.90  (9.6%)
3.2% ( -14% -   25%)
  Fuzzy2   42.22  (2.5%)   43.90  (2.7%)
4.0% (  -1% -9%)
  HighPhrase   15.32  (7.5%)   15.93  (5.4%)
4.0% (  -8% -   18%)
   LowPhrase   17.09  (4.3%)   18.10  (2.9%)
5.9% (  -1% -   13%)
  AndHighMed   52.60  (1.4%)   55.90  (2.1%)
6.3% (   2% -9%)
 MedSpanNear   20.09  (2.0%)   21.44  (1.8%)
6.7% (   2% -   10%)
 MedSloppyPhrase   18.69  (3.0%)   20.00  (2.7%)
7.0% (   1% -   13%)
  Fuzzy1   33.68  (2.0%)   37.26  (2.2%)   
10.6% (   6% -   15%)
   MedPhrase   57.00  (2.9%)   63.56  (3.3%)   
11.5% (   5% -   18%)
 MedTerm   19.22  (1.2%)   21.70  (1.1%)   
12.9% (  10% -   15%)
 LowTerm   41.98  (1.2%)   48.26  (1.8%)   
15.0% (  11% -   18%)
 AndHighHigh   12.09  (1.0%)   13.98  (1.2%)   
15.7% (  13% -   18%)
HighTerm7.11  (2.1%)9.11  (2.0%)   
28.1% (  23% -   32%)
   OrHighMed6.67  (2.4%)8.55  (2.1%)   
28.2% (  23% -   33%)
   OrHighLow6.76  (2.1%)8.70  (2.3%)   
28.6% (  23% -   33%)
  OrHighHigh3.84  (2.5%)5.33  (2.7%)   
38.7% (  32% -   45%)
{noformat}

On-disk size of _dv* is 464768 KB and in memory int[] is 669428 KB (44% more).

Next I'll try NO_PARENTS ord policy...

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560046#comment-13560046
 ] 

Commit Tag Bot commented on LUCENE-4550:


[branch_4x commit] David Wayne Smiley
http://svn.apache.org/viewvc?view=revision&revision=1437185

LUCENE-4550: fix SpatialArgs.calcDistanceFromErrPct


> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch
>
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4708) Make LZ4 hash tables reusable

2013-01-22 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-4708:


 Summary: Make LZ4 hash tables reusable
 Key: LUCENE-4708
 URL: https://issues.apache.org/jira/browse/LUCENE-4708
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


Currently LZ4 compressors instantiate their own hash table for every byte 
sequence they need to compress. These can be large (256KB for LZ4 HC) so we 
should try to reuse them across calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2013-01-22 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-4550.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.2

> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch
>
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560045#comment-13560045
 ] 

Commit Tag Bot commented on LUCENE-4550:


[trunk commit] David Wayne Smiley
http://svn.apache.org/viewvc?view=revision&revision=1437182

LUCENE-4550: fix SpatialArgs.calcDistanceFromErrPct


> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch
>
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4707) Track file reference kept by readers that are opened through the writer

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560019#comment-13560019
 ] 

Michael McCandless commented on LUCENE-4707:


This is normal/expected behavior: the readers will hold open the files then 
need, and even though the writer has deleted them, the reader can continue to 
use them.  Once the reader is closed the files will be deleted "for real".

Or, is there some end problem/exception you're seeing?

> Track file reference kept by readers that are opened through the writer
> ---
>
> Key: LUCENE-4707
> URL: https://issues.apache.org/jira/browse/LUCENE-4707
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
> Environment: Mac OS X 10.8.2 and Linux 2.6.32
>Reporter: Jessica Cheng
>
> We ran into a bug where files (mostly CFS) that are still referred to by our 
> NRT reader/searcher are deleted by IndexFileDeleter. As far as I can see from 
> the verbose logging and reading the code, it seems that the problem is the 
> creation and merging of these CFS files between hard commits. The files 
> referred to by hard commits are incRef’ed at commit checkpoints, so these 
> files won’t be deleted until they are decRef’ed when the commit is deleted 
> according to the DeletionPolicy (good). However, intermediate files that are 
> created and merged between the hard commits only have refs through the 
> regular checkpoints, so as soon as a new checkpoint no longer includes those 
> files, they are immediately deleted by the deleter. See the abridged verbose 
> log lines that illustrate this behavior:
> IW 11 [Mon Jan 21 17:30:35 PST 2013; commitScheduler]: create compound file 
> _8.cfs
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5_6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6" [9 segments ; 
> isCommit = false]
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 0
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5 _6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6 _9(4.0.0.2):c6" 
> [10 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   DecRef "_8.cfs": 
> pre-decr count is 2
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: now checkpoint 
> "_b(4.0.0.2):C81" [1 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]:   DecRef 
> "_8.cfs": pre-decr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: delete "_8.cfs"
> With this behavior, it seems no matter how frequently we refresh the reader 
> (unless we do it at every read), we’d run into the race where the reader 
> still holds a reference to the file that’s just been deleted by the deleter. 
> My proposal is to count the file reference handed out to the NRT 
> reader/searcher when writer.getReader(boolean) is called and decRef the files 
> only when the said reader is closed.
> Please take a look and evaluate if my observations are correct and if the 
> proposal makes sense. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid "collection1" convention

2013-01-22 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Description: There is still an issue after the SOLR-3826 patch was applied 
for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  
When TestHarness is called from SolrTestCase4J the only available constructors 
ignore coreName, set coreName = null, and initialize the default 'collection1.' 
 (was: I think that there is still an issue after the SOLR-3826 patch was 
applied for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 
2012.  This line is missing:

Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
===
--- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
(revision 1435375)
+++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
(working copy)
@@ -384,9 +384,9 @@
   public static void createCore() {
 assertNotNull(testSolrHome);
 solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
getSolrConfigFile());
-h = new TestHarness( dataDir.getAbsolutePath(),
+h = new TestHarness( coreName, new Initializer( coreName, 
dataDir.getAbsolutePath(),
 solrConfig,
-getSchemaFile());
+getSchemaFile() ) );
 lrf = h.getRequestFactory
 ("standard",0,20,CommonParams.VERSION,"2.2");
   }


TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
indexSchema) sets coreName to null and opens the default core: collection1.  I 
would expect that coreName is carried all the way through the test.)

> SolrTestCaseJ4: Can't avoid "collection1" convention
> 
>
> Key: SOLR-4317
> URL: https://issues.apache.org/jira/browse/SOLR-4317
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0
>Reporter: Tricia Jenkins
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4317.patch, SOLR-4317.patch
>
>
> There is still an issue after the SOLR-3826 patch was applied for 4.0 
> [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  When 
> TestHarness is called from SolrTestCase4J the only available constructors 
> ignore coreName, set coreName = null, and initialize the default 
> 'collection1.'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560014#comment-13560014
 ] 

Michael McCandless commented on LUCENE-4609:


{quote}
That's right. But if we'll see net gains, it doesn't mean anything about how it 
will perform on small set of integers.
That's why I think this test has nothing to do w/ the Encoder/Decoder.
{quote}

Ahh, I see: you're right.

If this prototype collector is faster it's not clear how we'd "productize" it.  
Maybe as multi-valued DV (int[] per doc), which could then use big packed ints 
array under the hood or something ...

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4609:
---

Attachment: LUCENE-4609.patch

New prototype collector, this time using simple int[] instead of PackedInts.

Trunk (base) vs prototype collector (comp):
{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
  IntNRQ  114.81  (6.2%)  112.35  (8.4%)   
-2.1% ( -15% -   13%)
 Prefix3  176.77  (4.7%)  173.10  (7.4%)   
-2.1% ( -13% -   10%)
Wildcard  254.90  (3.2%)  250.81  (3.3%)   
-1.6% (  -7% -5%)
  AndHighLow  371.35  (2.6%)  366.23  (2.3%)   
-1.4% (  -6% -3%)
PKLookup  302.90  (1.7%)  299.45  (1.7%)   
-1.1% (  -4% -2%)
 Respell  143.44  (3.1%)  143.18  (3.4%)   
-0.2% (  -6% -6%)
  Fuzzy2   86.16  (2.0%)   88.32  (3.1%)
2.5% (  -2% -7%)
 LowSloppyPhrase   67.41  (1.8%)   69.45  (2.9%)
3.0% (  -1% -7%)
 LowSpanNear   37.85  (2.6%)   39.38  (3.0%)
4.0% (  -1% -9%)
HighSpanNear   10.19  (2.6%)   10.62  (3.2%)
4.2% (  -1% -   10%)
 MedTerm  111.19  (1.4%)  117.18  (1.6%)
5.4% (   2% -8%)
  Fuzzy1   83.60  (2.5%)   88.65  (2.8%)
6.0% (   0% -   11%)
  AndHighMed  171.63  (1.4%)  182.81  (2.0%)
6.5% (   3% -   10%)
 MedSpanNear   64.59  (2.0%)   69.13  (2.1%)
7.0% (   2% -   11%)
   LowPhrase   57.89  (5.3%)   63.54  (4.5%)
9.8% (   0% -   20%)
  HighPhrase   37.97 (11.0%)   41.79  (8.3%)   
10.1% (  -8% -   32%)
 MedSloppyPhrase   63.51  (2.0%)   70.31  (3.2%)   
10.7% (   5% -   16%)
 LowTerm  145.85  (1.5%)  169.28  (1.6%)   
16.1% (  12% -   19%)
HighSloppyPhrase2.97  (8.4%)3.47 (12.4%)   
16.6% (  -3% -   40%)
 AndHighHigh   46.49  (1.0%)   54.30  (1.2%)   
16.8% (  14% -   19%)
   MedPhrase  101.99  (4.1%)  128.31  (4.7%)   
25.8% (  16% -   36%)
   OrHighMed   24.97  (1.7%)   35.04  (3.6%)   
40.3% (  34% -   46%)
HighTerm   26.22  (1.2%)   37.55  (3.6%)   
43.2% (  38% -   48%)
   OrHighLow   24.31  (1.5%)   34.89  (3.8%)   
43.5% (  37% -   49%)
  OrHighHigh   17.72  (1.4%)   26.44  (4.5%)   
49.3% (  42% -   55%)
{noformat}

So this is at least good news ... it means if we can speed up decode there are 
gain to be had ... but RAM usage is now 105231 KB (hmm not THAT much larger 
than 63880 KB ... interesting).

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4335) Solrj UpdateRequest can send illegal XML to Solr

2013-01-22 Thread Max Hansmire (JIRA)
Max Hansmire created SOLR-4335:
--

 Summary: Solrj UpdateRequest can send illegal XML to Solr
 Key: SOLR-4335
 URL: https://issues.apache.org/jira/browse/SOLR-4335
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.4
Reporter: Max Hansmire


If you include illegal XML characters like U+ in document in an 
UpdateRequest, they cause an error on the server.

{noformat}
java.lang.RuntimeException: [was class java.io.CharConversionException] Invalid 
UTF-8 character 0x at char #1940, byte #127)
{noformat}

Other Illegal XML characters are replaced by the code in 
org.apache.solr.common.util.XML. For instance U+ is replaced with "#0;". 
SolrJ should be consistent in how it handles illegal XML characters.

>From the source code it looks like this issue affects the most recent versions 
>of Solr, but I did not attempted to reproduce on 4.0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1356#comment-1356
 ] 

Shai Erera commented on LUCENE-4609:


bq. and if we don't see net gains with it then I don't think we should pursue 
packed ints encoder/decoder

That's right. But if we'll see net gains, it doesn't mean anything about how it 
will perform on small set of integers.
That's why I think this test has nothing to do w/ the Encoder/Decoder.

But I don't mind if this experiment is done here anyway.

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid "collection1" convention

2013-01-22 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Attachment: SOLR-4317.patch

Added test which uses multicore example.  Also further modified TestHarness and 
SolrTestCase4J to use coreName in creating core.  Didn't make any changes to 
ant scripts so the test isn't called during a normal build.

> SolrTestCaseJ4: Can't avoid "collection1" convention
> 
>
> Key: SOLR-4317
> URL: https://issues.apache.org/jira/browse/SOLR-4317
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0
>Reporter: Tricia Jenkins
>Priority: Minor
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4317.patch, SOLR-4317.patch
>
>
> I think that there is still an issue after the SOLR-3826 patch was applied 
> for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  
> This line is missing:
> Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
> ===
> --- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
> (revision 1435375)
> +++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
> (working copy)
> @@ -384,9 +384,9 @@
>public static void createCore() {
>  assertNotNull(testSolrHome);
>  solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
> getSolrConfigFile());
> -h = new TestHarness( dataDir.getAbsolutePath(),
> +h = new TestHarness( coreName, new Initializer( coreName, 
> dataDir.getAbsolutePath(),
>  solrConfig,
> -getSchemaFile());
> +getSchemaFile() ) );
>  lrf = h.getRequestFactory
>  ("standard",0,20,CommonParams.VERSION,"2.2");
>}
> TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
> indexSchema) sets coreName to null and opens the default core: collection1.  
> I would expect that coreName is carried all the way through the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Win7 64bit, jvm 7 and MMAP OOM

2013-01-22 Thread Mark Miller
Thanks!

- Mark

On Jan 22, 2013, at 2:38 PM, eksdev  wrote:

> Just to share some experience if someone hits the same problem.  
> 
> We had huge problems on Win7 64bit, JVM 64bit 1.7.0_07,  (a few days old 
> trunk version, 5.0, default codec) solr under tomcat thread queue limited to 
> 20 . NRTCaching and MMAP have the same problems (no updates , just search). 
> Unmap hack was on
> 
> Problem:
> After hitting it for 15 minutes with 10 search threads, gc() did not manage  
> to catch-up and free enough memory and the server repeatably spiralled to OOM 
> (giving more max heap did not help as well). Server is running on 8 cores, 
> and 10 client threads are normally not an issue.
> 
> Observations:
> - Tweaking jvm memory and gc() options did not help at all.  
> - exactly the  same configuration and tests on 3 linux  flavours  had 
> absolutely no problems. 
> - Win using FSDirectory works slower, but stable 
> - When "OOM" spiralling happens, major culprits are, by occupied memory and 
> Noo of instances: 
> o.a.l.util.WeakIdentityMap$IdentityWeakReference   
> java.util.concurrent.ConcurrentHashMap$HashEntry
> java.util.concurrent.ConcurrentHashMap$HashEntry[]
> - if pausing search requests for really long time (5 to 10 minutes!) these 
> references get eventually released.
> --
> 
> 
> I know java+MMAP on win platforms has problems (slowly releasing mapped 
> regions), but I did not expect it is that bad, to the point of being useless. 
>   
> It is not an itch currently, all our production is on linux, but if someone 
> has an idea how to work around it,  we would be glad to try it.  
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4707) Track file reference kept by readers that are opened through the writer

2013-01-22 Thread Jessica Cheng (JIRA)
Jessica Cheng created LUCENE-4707:
-

 Summary: Track file reference kept by readers that are opened 
through the writer
 Key: LUCENE-4707
 URL: https://issues.apache.org/jira/browse/LUCENE-4707
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: Mac OS X 10.8.2 and Linux 2.6.32
Reporter: Jessica Cheng


We ran into a bug where files (mostly CFS) that are still referred to by our 
NRT reader/searcher are deleted by IndexFileDeleter. As far as I can see from 
the verbose logging and reading the code, it seems that the problem is the 
creation and merging of these CFS files between hard commits. The files 
referred to by hard commits are incRef’ed at commit checkpoints, so these files 
won’t be deleted until they are decRef’ed when the commit is deleted according 
to the DeletionPolicy (good). However, intermediate files that are created and 
merged between the hard commits only have refs through the regular checkpoints, 
so as soon as a new checkpoint no longer includes those files, they are 
immediately deleted by the deleter. See the abridged verbose log lines that 
illustrate this behavior:

IW 11 [Mon Jan 21 17:30:35 PST 2013; commitScheduler]: create compound file 
_8.cfs

IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]: now checkpoint 
"_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
_5(4.0.0.2):C5_6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6" [9 segments ; 
isCommit = false]

IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
pre-incr count is 0

IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]: now checkpoint 
"_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
_5(4.0.0.2):C5 _6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6 _9(4.0.0.2):c6" [10 
segments ; isCommit = false]

IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
pre-incr count is 1

IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   DecRef "_8.cfs": 
pre-decr count is 2

IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: now checkpoint 
"_b(4.0.0.2):C81" [1 segments ; isCommit = false]

IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]:   DecRef 
"_8.cfs": pre-decr count is 1

IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: delete "_8.cfs"

With this behavior, it seems no matter how frequently we refresh the reader 
(unless we do it at every read), we’d run into the race where the reader still 
holds a reference to the file that’s just been deleted by the deleter. My 
proposal is to count the file reference handed out to the NRT reader/searcher 
when writer.getReader(boolean) is called and decRef the files only when the 
said reader is closed.

Please take a look and evaluate if my observations are correct and if the 
proposal makes sense. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559943#comment-13559943
 ] 

Michael McCandless commented on LUCENE-4609:


On the DV 2.0 branch the on-disk size is 55288 KB (~16% smaller): cool!

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Win7 64bit, jvm 7 and MMAP OOM

2013-01-22 Thread eksdev
Just to share some experience if someone hits the same problem.  

We had huge problems on Win7 64bit, JVM 64bit 1.7.0_07,  (a few days old trunk 
version, 5.0, default codec) solr under tomcat thread queue limited to 20 . 
NRTCaching and MMAP have the same problems (no updates , just search). Unmap 
hack was on

Problem:
After hitting it for 15 minutes with 10 search threads, gc() did not manage  to 
catch-up and free enough memory and the server repeatably spiralled to OOM 
(giving more max heap did not help as well). Server is running on 8 cores, and 
10 client threads are normally not an issue.

Observations:
- Tweaking jvm memory and gc() options did not help at all.  
- exactly the  same configuration and tests on 3 linux  flavours  had 
absolutely no problems. 
- Win using FSDirectory works slower, but stable 
- When "OOM" spiralling happens, major culprits are, by occupied memory and Noo 
of instances: 
o.a.l.util.WeakIdentityMap$IdentityWeakReference   
java.util.concurrent.ConcurrentHashMap$HashEntry
java.util.concurrent.ConcurrentHashMap$HashEntry[]
- if pausing search requests for really long time (5 to 10 minutes!) these 
references get eventually released.
--


I know java+MMAP on win platforms has problems (slowly releasing mapped 
regions), but I did not expect it is that bad, to the point of being useless.   
It is not an itch currently, all our production is on linux, but if someone has 
an idea how to work around it,  we would be glad to try it.  
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559913#comment-13559913
 ] 

Michael McCandless commented on LUCENE-4609:


bq. Perhaps this should belong to a different issue?

I think this is the right issue to explore whether packed ints can be 
smaller/faster for facets?

Ie I think we should iterate on this prototype/specialized collector, and if we 
don't see net gains with it then I don't think we should pursue packed ints 
encoder/decoder.  This serves as the litmus test.

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559912#comment-13559912
 ] 

Michael McCandless commented on LUCENE-4609:


Also, the on-disk size of the vInt(dGap) encoded ords is 63880 KB while the 
in-RAM packed ints size was 74501 KB.  Maybe if we block-coded the packed ints 
parts we'd get better compression ...

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1437080 - /lucene/cms/trunk/content/core/corenews.mdtext

2013-01-22 Thread Robert Muir
no prob... thanks for getting 4.1 out there!

On Tue, Jan 22, 2013 at 2:01 PM, Steve Rowe  wrote:
> Wow, thanks Robert - those missing features were in there originally, but one 
> of my edits accidentally deleted them, and I didn't notice…
>
> Steve
>
> On Jan 22, 2013, at 1:25 PM, rm...@apache.org wrote:
>
>> Author: rmuir
>> Date: Tue Jan 22 18:25:20 2013
>> New Revision: 1437080
>>
>> URL: http://svn.apache.org/viewvc?rev=1437080&view=rev
>> Log:
>> add missing features
>>
>> Modified:
>>lucene/cms/trunk/content/core/corenews.mdtext
>>
>> Modified: lucene/cms/trunk/content/core/corenews.mdtext
>> URL: 
>> http://svn.apache.org/viewvc/lucene/cms/trunk/content/core/corenews.mdtext?rev=1437080&r1=1437079&r2=1437080&view=diff
>> ==
>> --- lucene/cms/trunk/content/core/corenews.mdtext (original)
>> +++ lucene/cms/trunk/content/core/corenews.mdtext Tue Jan 22 18:25:20 2013
>> @@ -18,6 +18,21 @@ release for a full list of details.
>>
>> ### Lucene 4.1 Release Highlights:
>>
>> +- Lucene 4.1 has a new default codec (Lucene41Codec) based on the
>> +  previously-experimental "Block" indexing format for improved
>> +  performance, but also incorporating the functionality of "Appending"
>> +  and "Pulsing".
>> +
>> +- The default codec incorporates the optimization of Pulsing: terms
>> +  that appear in only one document (such as primary key/id fields) just
>> +  store the document id in the term dictionary instead of a pointer to
>> +  this document id in a separate file.
>> +
>> +- The default codec incorporates an efficient compressed stored fields
>> +  implementation that compresses chunks of documents together with LZ4.
>> +  (see
>> +  
>> )
>> +
>> - Lucene no longer seeks when writing files (all fields are written in
>>   an append-only way). This means it works by default with append-only
>>   streams, hdfs, etc.
>>
>>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2013-01-22 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559889#comment-13559889
 ] 

David Smiley commented on LUCENE-4550:
--

CHANGES.txt entry will be as follows:

* LUCENE-4550: Shapes wider than 180 degrees would use too much accuracy for 
the PrefixTree based SpatialStrategy.  For a pathological case of nearly 360 
degrees and barely any height, it would generate so many indexed terms (> 500k) 
that it could even cause an OutOfMemoryError. Fixed. (David Smiley)

> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch
>
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1437080 - /lucene/cms/trunk/content/core/corenews.mdtext

2013-01-22 Thread Steve Rowe
Wow, thanks Robert - those missing features were in there originally, but one 
of my edits accidentally deleted them, and I didn't notice…

Steve

On Jan 22, 2013, at 1:25 PM, rm...@apache.org wrote:

> Author: rmuir
> Date: Tue Jan 22 18:25:20 2013
> New Revision: 1437080
> 
> URL: http://svn.apache.org/viewvc?rev=1437080&view=rev
> Log:
> add missing features
> 
> Modified:
>lucene/cms/trunk/content/core/corenews.mdtext
> 
> Modified: lucene/cms/trunk/content/core/corenews.mdtext
> URL: 
> http://svn.apache.org/viewvc/lucene/cms/trunk/content/core/corenews.mdtext?rev=1437080&r1=1437079&r2=1437080&view=diff
> ==
> --- lucene/cms/trunk/content/core/corenews.mdtext (original)
> +++ lucene/cms/trunk/content/core/corenews.mdtext Tue Jan 22 18:25:20 2013
> @@ -18,6 +18,21 @@ release for a full list of details.
> 
> ### Lucene 4.1 Release Highlights:
> 
> +- Lucene 4.1 has a new default codec (Lucene41Codec) based on the
> +  previously-experimental "Block" indexing format for improved
> +  performance, but also incorporating the functionality of "Appending"
> +  and "Pulsing".
> +
> +- The default codec incorporates the optimization of Pulsing: terms
> +  that appear in only one document (such as primary key/id fields) just
> +  store the document id in the term dictionary instead of a pointer to
> +  this document id in a separate file.
> +
> +- The default codec incorporates an efficient compressed stored fields
> +  implementation that compresses chunks of documents together with LZ4.
> +  (see
> +  
> )
> +
> - Lucene no longer seeks when writing files (all fields are written in
>   an append-only way). This means it works by default with append-only
>   streams, hdfs, etc.
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2013-01-22 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4550:
-

Attachment: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch

Bug fixed.  I'll commit today.

> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch
>
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2013-01-22 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-4550:


Assignee: David Smiley

> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-4550__fix_SpatialArgs_calcDistanceFromErrPct.patch
>
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4334) null:org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-22 Thread David Morana (JIRA)
David Morana created SOLR-4334:
--

 Summary: null:org.apache.solr.common.SolrException: Error loading 
class 'org.apache.solr.response.transform.EditorialMarkerFactory'
 Key: SOLR-4334
 URL: https://issues.apache.org/jira/browse/SOLR-4334
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: solr 4.0 final on tomcat 7.0.34
Reporter: David Morana
 Fix For: 4.0


Hi,
I can't seem to get the doc transformer to work.
I'm turned on the elevated query component and I tried to turn on the doc 
transformer in the solrconfig.xml file to get [elevated] appear in the results.
I tried both of these separately and I got the same error and the core would 
not load:
{noformat}


{noformat}

Please advise...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Fixing query-time multi-word synonym issue

2013-01-22 Thread Mikhail Khludnev
FWIW, multi-word synonyms is a side benefit of query parsing approach
implemented by my team.
Here how it looks like
https://docs.google.com/a/griddynamics.com/presentation/pub?id=1oifLFI0MiA3ZyXZWisHJVRK13P8cki5yCABvABPObKw&start=false&loop=false&delayms=3000#slide=id.g1006de00_2_34"fee
people" frequent typo has been substituted to the correct brand.
Five previous slides depicts the approach, the main idea is get rid of old
good Boolean retrieval, and introduce own notion of matching.
I can share more details if you wish.


On Tue, Jan 22, 2013 at 7:17 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hello,
>
> I'm looking for some guidance around solving the infamous index-time vs.
> query-time multi-word synonym problem.  Looking for help with understanding
> the pieces and effort involved, and also being on a lookout for any
> potential "man, it will take you forever, you'll have to do major Lucene
> surgery" type of warnings.
>
> I never looked deeply into this problem and my understanding is that
> multi-word synonyms don't work at query-time because QueryParser(?) simply
> breaks queries on spaces and thus makes it impossible for
> SynonymTokenFilter (?) to "see" the non-broken-up token sequence and do
> synonym expansion.
>
> I think this is also documented on the Wiki.
> Are there other pieces involved that I didn't mention, but should have?
>
> The following are 3 different efforts I found:
> https://issues.apache.org/jira/browse/LUCENE-4499
> http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html
>
> Plus Jack's proposal:
> http://search-lucene.com/m/Zkj0k15dDGP1
>
> Does any of the above approaches sound like the right one, or at least in
> the right direction, and stands the chance of being accepted?
>
> Thanks,
> Otis
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: semantic indexing using ontology

2013-01-22 Thread Gora Mohanty
On 19 January 2013 02:54, amirou  wrote:
> HI everybody,
>
> I want to develop a java system which indexes a set of documents represented
> by an ontology. Is this could be done with lucene.
>
> I yes what are the plugins which I have to use.
> Thank you very much.

Have you tried searching Google for "Solr RDF",
which turns up several possibilities, e.g.,
http://siren.sindice.com/ ?

Depending on your needs, you could also use one of
the several open-source RDF data stores, or even
a graph database like Neo4J.

Regards,
Gora

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3251) dynamically add field to schema

2013-01-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559839#comment-13559839
 ] 

Yonik Seeley commented on SOLR-3251:


Thinking a little further about this, building a new schema when it changes 
(i.e. making schema effectively immutable), might be a good idea too.
For performance reasons, we'd want to share/reuse objects across the different 
schema instances of course.

> dynamically add field to schema
> ---
>
> Key: SOLR-3251
> URL: https://issues.apache.org/jira/browse/SOLR-3251
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: SOLR-3251.patch
>
>
> One related piece of functionality needed for SOLR-3250 is the ability to 
> dynamically add a field to the schema.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4706) Tool to recover data from .fdt files

2013-01-22 Thread Otis Gospodnetic (JIRA)
Otis Gospodnetic created LUCENE-4706:


 Summary: Tool to recover data from .fdt files
 Key: LUCENE-4706
 URL: https://issues.apache.org/jira/browse/LUCENE-4706
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Otis Gospodnetic
Priority: Minor


Somebody posted this on the ML.  Untested: http://pastebin.com/nmF0j4np

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4333) edismax query with escaped colon ignores AND and OR

2013-01-22 Thread Robert J. van der Boon (JIRA)
Robert J. van der Boon created SOLR-4333:


 Summary: edismax query with escaped colon ignores AND and OR
 Key: SOLR-4333
 URL: https://issues.apache.org/jira/browse/SOLR-4333
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
 Environment: tomcat 7.34
java 7u11
windows 2008R2
Reporter: Robert J. van der Boon


When I use the edismax query handler with qf=samenvatting and have a query of 
the form
{noformat}
a\:b AND cde
{noformat}
then the parsedquery ends up as:
{noformat}
(+(DisjunctionMaxQuery((samenvatting:"a b")) 
DisjunctionMaxQuery((samenvatting:and)) 
DisjunctionMaxQuery((samenvatting:cde/no_coord
{noformat}
note that the AND operator is ignored, and a search for the word AND is 
performed.
As far as I've seen it doesn't matter if the part before the \: is a real field 
or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559833#comment-13559833
 ] 

Shai Erera commented on LUCENE-4609:


Perhaps this should belong to a different issue? I mean, this one is focused on 
a PackedInts encoder/decoder (or any other encoder/decoder that's better than 
VInt).

Separately, it's interesting that this performs worse than DV Source + 
decoding. I mean, the results don't factor in reading and populating the cache, 
right? The test is already "hot" when it's measured?

I must say that I'm not entirely surprised ... having recently looked at 
PackedInts API, it doesn't look so optimized (working w/ DataInput for 
example), where the dgap+vint that we have in CountingFC is *very* optimized. I 
think that what we need is a custom cache, which encodes things similar to 
PackedInts ... or maybe as you say, the bulk read methods would do better.

But we should explore that on another issue?

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



semantic indexing using ontology

2013-01-22 Thread amirou
HI everybody,

I want to develop a java system which indexes a set of documents represented
by an ontology. Is this could be done with lucene.

I yes what are the plugins which I have to use.
Thank you very much.


Amir.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/semantic-indexing-using-ontology-tp4034671.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4331) queryelevationcomponent init error; he reference to entity "objID" must end with the ';' delimiter.

2013-01-22 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-4331.


Resolution: Invalid

> queryelevationcomponent init error; he reference to entity "objID" must end 
> with the ';' delimiter. 
> 
>
> Key: SOLR-4331
> URL: https://issues.apache.org/jira/browse/SOLR-4331
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
> Environment: solr 4.0 final on tomcat 7.0.34
>Reporter: David Morana
> Fix For: 4.0
>
>
> Hi, 
> I put this in elevate.xml just to test it out.
> {code:xml}
> 
>  
>id="https://opentextdev/cs/llisapi.dll?func=ll&objID=577575&objAction=download";
>  />
>  
> 
> {code}
> I have the urls to opentext documents as the uniquekey (id)
> this is what I get:
> {noformat}
> 16:25:48 SEVERE Config Exception during parsing file: 
> elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; 
> lineNumber: 28; columnNumber: 77; The reference to entity "objID" must end 
> with the ';' delimiter.
> 16:25:48 SEVERE SolrCore java.lang.NullPointerException
> 16:25:48 SEVERE CoreContainer Unable to create core: Lisa
> 16:25:48 SEVERE CoreContainer null:org.apache.solr.common.SolrException: 
> Error initializing QueryElevationComponent. 
> {noformat}
> It seems to not like the Opentext objID in the URL. 
> How can I fix this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4331) queryelevationcomponent init error; he reference to entity "objID" must end with the ';' delimiter.

2013-01-22 Thread David Morana (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559792#comment-13559792
 ] 

David Morana commented on SOLR-4331:


So, now that elevate is working; I just add it to my update chain and it will 
always fire with queries?



> queryelevationcomponent init error; he reference to entity "objID" must end 
> with the ';' delimiter. 
> 
>
> Key: SOLR-4331
> URL: https://issues.apache.org/jira/browse/SOLR-4331
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
> Environment: solr 4.0 final on tomcat 7.0.34
>Reporter: David Morana
> Fix For: 4.0
>
>
> Hi, 
> I put this in elevate.xml just to test it out.
> {code:xml}
> 
>  
>id="https://opentextdev/cs/llisapi.dll?func=ll&objID=577575&objAction=download";
>  />
>  
> 
> {code}
> I have the urls to opentext documents as the uniquekey (id)
> this is what I get:
> {noformat}
> 16:25:48 SEVERE Config Exception during parsing file: 
> elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; 
> lineNumber: 28; columnNumber: 77; The reference to entity "objID" must end 
> with the ';' delimiter.
> 16:25:48 SEVERE SolrCore java.lang.NullPointerException
> 16:25:48 SEVERE CoreContainer Unable to create core: Lisa
> 16:25:48 SEVERE CoreContainer null:org.apache.solr.common.SolrException: 
> Error initializing QueryElevationComponent. 
> {noformat}
> It seems to not like the Opentext objID in the URL. 
> How can I fix this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4321) SolrCloud collection CREATE will put multiple master shards on a single node (and no shards on some)

2013-01-22 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4321:
--

Fix Version/s: 5.0
   4.2

> SolrCloud collection CREATE will put multiple master shards on a single node 
> (and no shards on some)
> 
>
> Key: SOLR-4321
> URL: https://issues.apache.org/jira/browse/SOLR-4321
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.1
>Reporter: Brett Hoerner
>Assignee: Mark Miller
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4321.patch, SOLR-4321.patch, SOLR-4321.patch
>
>
> This is best described by a photo of my SolrCloud admin: 
> http://i.imgur.com/hW4xnxy.png
> I have a 4 node cluster, with shard count of 2 and replication factor of 2.
> After running something like,
>   
> http://localhost:8983/solr/admin/collections?action=CREATE&name=15724&numShards=2&replicationFactor=2
> The shards seem to be completely randomly placed, which is fine, but some 
> nodes receive more of the share than others (some even receiving none).
> For example, 15724 has given node "backfill-1d" 2 "parts", 
> 15724_shard1_replica2 and 15724_shard2_replica1 but has given "backfill-2d" 
> nothing at all.
> This creates unnecessary load on some nodes, no? Is this something I am 
> supposed to handle myself (I looked, but don't see how).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4321) SolrCloud collection CREATE will put multiple master shards on a single node (and no shards on some)

2013-01-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559784#comment-13559784
 ] 

Mark Miller commented on SOLR-4321:
---

Thanks guys - I've added a test. Will commit soon.

> SolrCloud collection CREATE will put multiple master shards on a single node 
> (and no shards on some)
> 
>
> Key: SOLR-4321
> URL: https://issues.apache.org/jira/browse/SOLR-4321
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.1
>Reporter: Brett Hoerner
>Assignee: Mark Miller
> Attachments: SOLR-4321.patch, SOLR-4321.patch, SOLR-4321.patch
>
>
> This is best described by a photo of my SolrCloud admin: 
> http://i.imgur.com/hW4xnxy.png
> I have a 4 node cluster, with shard count of 2 and replication factor of 2.
> After running something like,
>   
> http://localhost:8983/solr/admin/collections?action=CREATE&name=15724&numShards=2&replicationFactor=2
> The shards seem to be completely randomly placed, which is fine, but some 
> nodes receive more of the share than others (some even receiving none).
> For example, 15724 has given node "backfill-1d" 2 "parts", 
> 15724_shard1_replica2 and 15724_shard2_replica1 but has given "backfill-2d" 
> nothing at all.
> This creates unnecessary load on some nodes, no? Is this something I am 
> supposed to handle myself (I looked, but don't see how).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4321) SolrCloud collection CREATE will put multiple master shards on a single node (and no shards on some)

2013-01-22 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4321:
--

Attachment: SOLR-4321.patch

> SolrCloud collection CREATE will put multiple master shards on a single node 
> (and no shards on some)
> 
>
> Key: SOLR-4321
> URL: https://issues.apache.org/jira/browse/SOLR-4321
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.1
>Reporter: Brett Hoerner
>Assignee: Mark Miller
> Attachments: SOLR-4321.patch, SOLR-4321.patch, SOLR-4321.patch
>
>
> This is best described by a photo of my SolrCloud admin: 
> http://i.imgur.com/hW4xnxy.png
> I have a 4 node cluster, with shard count of 2 and replication factor of 2.
> After running something like,
>   
> http://localhost:8983/solr/admin/collections?action=CREATE&name=15724&numShards=2&replicationFactor=2
> The shards seem to be completely randomly placed, which is fine, but some 
> nodes receive more of the share than others (some even receiving none).
> For example, 15724 has given node "backfill-1d" 2 "parts", 
> 15724_shard1_replica2 and 15724_shard2_replica1 but has given "backfill-2d" 
> nothing at all.
> This creates unnecessary load on some nodes, no? Is this something I am 
> supposed to handle myself (I looked, but don't see how).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4332) Adding documents to SolrCloud collection broken when a node doesn't have a core for the collection

2013-01-22 Thread Eric Falcao (JIRA)
Eric Falcao created SOLR-4332:
-

 Summary: Adding documents to SolrCloud collection broken when a 
node doesn't have a core for the collection
 Key: SOLR-4332
 URL: https://issues.apache.org/jira/browse/SOLR-4332
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.1
Reporter: Eric Falcao
Priority: Critical


In SOLR-4321, it's documented that creating a collection via API results in 
some nodes having more than one core, while other nodes have zero cores.

Not sure if this is desired behavior, but when a node doesn't know about a 
core, it throws a 404 on select/update.

Reproduction:
-Create a 2 node SolrCloud cluster
-Create a new collection with numShards=1. 50% of your cluster will have a core 
for that collection.
-Do an update or select against the node that doesn't have the core. 404

Like I said, not sure if this is desired behavior, but I would expect a cluster 
of nodes to be able to forward requests appropriately to nodes that have a core 
for the collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4331) queryelevationcomponent init error; he reference to entity "objID" must end with the ';' delimiter.

2013-01-22 Thread David Morana (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559775#comment-13559775
 ] 

David Morana commented on SOLR-4331:


Hi Otis,
 That seems to have done the trick! I encoded both ampersands and solr didn't 
throw any errors!
 You're awesome, never change!
Thanks,
David



> queryelevationcomponent init error; he reference to entity "objID" must end 
> with the ';' delimiter. 
> 
>
> Key: SOLR-4331
> URL: https://issues.apache.org/jira/browse/SOLR-4331
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
> Environment: solr 4.0 final on tomcat 7.0.34
>Reporter: David Morana
> Fix For: 4.0
>
>
> Hi, 
> I put this in elevate.xml just to test it out.
> {code:xml}
> 
>  
>id="https://opentextdev/cs/llisapi.dll?func=ll&objID=577575&objAction=download";
>  />
>  
> 
> {code}
> I have the urls to opentext documents as the uniquekey (id)
> this is what I get:
> {noformat}
> 16:25:48 SEVERE Config Exception during parsing file: 
> elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; 
> lineNumber: 28; columnNumber: 77; The reference to entity "objID" must end 
> with the ';' delimiter.
> 16:25:48 SEVERE SolrCore java.lang.NullPointerException
> 16:25:48 SEVERE CoreContainer Unable to create core: Lisa
> 16:25:48 SEVERE CoreContainer null:org.apache.solr.common.SolrException: 
> Error initializing QueryElevationComponent. 
> {noformat}
> It seems to not like the Opentext objID in the URL. 
> How can I fix this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2453) Make Index Output Buffer Size Configurable

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2453:


Affects Version/s: (was: 3.0.1)
   4.0
   4.1
Fix Version/s: 5.0
   4.2

> Make Index Output Buffer Size Configurable
> --
>
> Key: LUCENE-2453
> URL: https://issues.apache.org/jira/browse/LUCENE-2453
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 4.0, 4.1
>Reporter: Karthick Sankarachary
>Assignee: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-2453.patch, LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users 
> thereof to specify a size for the input buffer, which by default is 1024 
> bytes. In practice, this option is leveraged by the simple file and compound 
> segment index input sub-classes. 
> By the same token, it would be nice if the buffered index output class could 
> open up it's buffer size for users to configure. In particular, this would 
> allow sub-classes thereof to align the output buffer size, which by default 
> is 16348 bytes, to that of the underlying directory's data unit. For example, 
> a network-based directory might want to buffer data in multiples of it's 
> maximum transmission unit. To use an existing use-case, the file system-based 
> directory could potentially choose to align it's output buffer size to the 
> operating system's file block size.
> The proposed change to the buffered index output class involves defining a 
> one-arg constructor that takes a user-defined buffer size, and a default 
> constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2453) Make Index Output Buffer Size Configurable

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2453:


Attachment: LUCENE-2453.patch

bringing this up-to-date I think this is pretty useful for downstream apps 
though. I plan to commit this soon too.

> Make Index Output Buffer Size Configurable
> --
>
> Key: LUCENE-2453
> URL: https://issues.apache.org/jira/browse/LUCENE-2453
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 4.0, 4.1
>Reporter: Karthick Sankarachary
>Assignee: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-2453.patch, LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users 
> thereof to specify a size for the input buffer, which by default is 1024 
> bytes. In practice, this option is leveraged by the simple file and compound 
> segment index input sub-classes. 
> By the same token, it would be nice if the buffered index output class could 
> open up it's buffer size for users to configure. In particular, this would 
> allow sub-classes thereof to align the output buffer size, which by default 
> is 16348 bytes, to that of the underlying directory's data unit. For example, 
> a network-based directory might want to buffer data in multiples of it's 
> maximum transmission unit. To use an existing use-case, the file system-based 
> directory could potentially choose to align it's output buffer size to the 
> operating system's file block size.
> The proposed change to the buffered index output class involves defining a 
> one-arg constructor that takes a user-defined buffer size, and a default 
> constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2453) Make Index Output Buffer Size Configurable

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-2453:
---

Assignee: Simon Willnauer

> Make Index Output Buffer Size Configurable
> --
>
> Key: LUCENE-2453
> URL: https://issues.apache.org/jira/browse/LUCENE-2453
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 3.0.1
>Reporter: Karthick Sankarachary
>Assignee: Simon Willnauer
> Attachments: LUCENE-2453.patch, LUCENE-2453.patch
>
>
> Currently, the buffered index input class allows sub-classes and users 
> thereof to specify a size for the input buffer, which by default is 1024 
> bytes. In practice, this option is leveraged by the simple file and compound 
> segment index input sub-classes. 
> By the same token, it would be nice if the buffered index output class could 
> open up it's buffer size for users to configure. In particular, this would 
> allow sub-classes thereof to align the output buffer size, which by default 
> is 16348 bytes, to that of the underlying directory's data unit. For example, 
> a network-based directory might want to buffer data in multiples of it's 
> maximum transmission unit. To use an existing use-case, the file system-based 
> directory could potentially choose to align it's output buffer size to the 
> operating system's file block size.
> The proposed change to the buffered index output class involves defining a 
> one-arg constructor that takes a user-defined buffer size, and a default 
> constructor that uses the currently defined buffer size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4331) queryelevationcomponent init error; he reference to entity "objID" must end with the ';' delimiter.

2013-01-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559767#comment-13559767
 ] 

Otis Gospodnetic commented on SOLR-4331:


objID is fine.  It is "&" character that is not liked over there, because it us 
used for HTML entities (I think this is the correct terms).  For example "& g t 
;" (without spaces, had to insert them to avoid JIRA messing with them) would 
render a "greater than" char in the browser. "& l t ;" would be "less than", 
and "& a m p ;" would be "ampersand".


> queryelevationcomponent init error; he reference to entity "objID" must end 
> with the ';' delimiter. 
> 
>
> Key: SOLR-4331
> URL: https://issues.apache.org/jira/browse/SOLR-4331
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
> Environment: solr 4.0 final on tomcat 7.0.34
>Reporter: David Morana
> Fix For: 4.0
>
>
> Hi, 
> I put this in elevate.xml just to test it out.
> {code:xml}
> 
>  
>id="https://opentextdev/cs/llisapi.dll?func=ll&objID=577575&objAction=download";
>  />
>  
> 
> {code}
> I have the urls to opentext documents as the uniquekey (id)
> this is what I get:
> {noformat}
> 16:25:48 SEVERE Config Exception during parsing file: 
> elevate.xml:org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; 
> lineNumber: 28; columnNumber: 77; The reference to entity "objID" must end 
> with the ';' delimiter.
> 16:25:48 SEVERE SolrCore java.lang.NullPointerException
> 16:25:48 SEVERE CoreContainer Unable to create core: Lisa
> 16:25:48 SEVERE CoreContainer null:org.apache.solr.common.SolrException: 
> Error initializing QueryElevationComponent. 
> {noformat}
> It seems to not like the Opentext objID in the URL. 
> How can I fix this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Solr 4.1 released

2013-01-22 Thread Steve Rowe
January 2013, Apache Solr™ 4.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.1.

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.1 is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Note: starting with Solr 4.1, the "apache-" prefix has been removed
from all artifact and distribution filenames.

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.1 Release Highlights:

SolrCloud enhancements (see http://wiki.apache.org/solr/SolrCloud):
* Simple multi-tenancy through enhanced document routing:
  - The "compositeId" router is the default for collections with hash
based routing (i.e. when numShards=N is specified on collection
creation).
  - Documents with ids sharing the same domain/prefix, e.g.
'customerB!', will be routed to the same shard, allowing for
efficient querying.  At query time, one can specify a "shard.keys"
parameter that lists the domains, e.g. 'shard.keys=customerB!', and
controls what shards the query is routed to.
  - Collections that do not specify numShards at collection creation
time use custom sharding and default to the "implicit" router.
Document updates received by a shard will be indexed to that shard,
unless a "_shard_" parameter or document field names a different
shard.
* Short circuiting for distributed search if a request only needs
  to query a single shard.
* Allow creating more than one shard per instance with the Collection
  API.
* Allow access to the collections API through CloudSolrServer without
  referencing an existing collection.
* Collection API: Support for specifying a list of Solr addresses to
  spread a new collection across.
* New and improved auto host detection strategy.
* Numerous bug fixes and general hardening - it's recommended that all
  Solr 4.0 SolrCloud users upgrade to 4.1.

New features:
* The majority of Solr's features, including replication, now work with
  custom Directory and DirectoryFactory implementations.
* Indexed term offsets, specifiable via a 'storeOffsetsWithPositions'
  flag on field definitions in the schema.  Useful for highlighters.
* Solr QParsers may now be directly invoked in the lucene query syntax
  via localParams and without the _query_ magic field hack.
  Example: foo AND {!term f=myfield v=$qq}
* Solr now parses request parameters (from URL or sent with POST using
  content-type application/x-www-form-urlencoded) in its dispatcher code.
  It no longer relies on special configuration settings in Tomcat or other
  web containers to enable UTF-8 encoding, which is mandatory for correct
  Solr behaviour. Solr now works out of the box with e.g. Tomcat, JBoss,...
* Directory IO rate limiting based on the IO context.
* Distributed search support for MoreLikeThis.
* Multi-core: On-demand core loading and LRU-based core unloading after
  reaching a user-specified maximum number.
* The new Solr 4 spatial fields now work with the {!geofilt} and {!bbox}
  query parsers. The score local-param works too.
* Extra statistics to RequestHandlers - 5 & 15-minute reqs/sec rolling
  averages; median, 75th, 95th, 99th, 99.9th percentile request times.
* PostingsHighlighter support (see
  http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html) 

Admin UI improvements:
* Internet Explorer is now supported
* Enhanced readability of XML query response display in Query UI
* Many improvements to DataImportHandler UI
* Core creation and deletion now updates the main/left list of cores
* Admin Cores UI now redirects to newly created core details
* Deleted documents are calculated/displayed
* Allow multiple Items to stay open on Plugins-Page 

Storage improvements (thanks to the new Lucene 4.1 codec):
* Faster search, in particular for rare terms such as primary key/id
  fields.
* Stored fields are compressed. (See
  
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

DataImportHandler contrib module backwards-compatibility breaks:
* These default to the "root" Locale, rather than the JVM default locale 
  as before.
  - NumberFormatTransformer & DateFormatTransformer
  - "formatDate" evaluator 
  - "dataimport.properties" file "last_index_time" property
* These default to UTF-8 encoding, rather than the JVM default encoding 
  as before.
  - FileDataSource & FieldReaderDataSource  
* These may require code changes to custom plug-ins
  - The EvaluatorBag class was eliminated and its public/protected methods 
  

[ANNOUNCE] Apache Lucene 4.1 released

2013-01-22 Thread Steve Rowe
January 2013, Apache Lucene™ 4.1 available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.1.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release
is available for immediate download at:
   http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 4.1 Release Highlights:

 * Lucene 4.1 has a new default codec (Lucene41Codec) based on the
   previously-experimental "Block" indexing format for improved
   performance, but also incorporating the functionality of "Appending"
   and "Pulsing".

 * The default codec incorporates the optimization of Pulsing: terms
   that appear in only one document (such as primary key/id fields) just
   store the document id in the term dictionary instead of a pointer to
   this document id in a separate file.

 * The default codec incorporates an efficient compressed stored fields
   implementation that compresses chunks of documents together with LZ4.
   (see
   
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

 * Lucene no longer seeks when writing files (all fields are written in
   an append-only way). This means it works by default with append-only
   streams, hdfs, etc.

 * New suggest implementations: AnalyzingSuggester, where the underlying
   form (computed from a lucene Analyzer) used for suggestions is
   separate from the returned text (see
   http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html),
   and FuzzySuggester, which additionally allows for inexact matching on
   the input.

 * Near-realtime support was added to the facet module.
  (see http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html)

 * New Highlighter (postingshighlighter) added to the highlighter
   module.  (see
   http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html)

 * Added FilterStrategy to FilteredQuery for more flexibility in
   filtered query execution.

 * Added CommonTermsQuery to speed up queries with very highly frequent
   terms.  Term frequencies are efficiently detected at query time - no
   index time preparation required.

 * Several bugfixes and optimizations since the 4.0 release.

Please read CHANGES.txt for a full list of new features.

Please report any feedback to the mailing lists
(http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

Happy searching,
Lucene/Solr developers
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

2013-01-22 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4609:
---

Attachment: LUCENE-4609.patch

Here's another attempt (totally prototype / not committable) at using 
PackedInts to hold the ords ...

It's hacked up: it visits all byte[] from DocValues in the index and converts 
to in-RAM PackedInts arrays, and then does all facet counting from those arrays.

But, the performance is sort of 'meh':

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
 MedTerm  109.40  (1.5%)  102.06  (1.5%)   
-6.7% (  -9% -   -3%)
  AndHighLow  374.95  (3.0%)  361.19  (2.6%)   
-3.7% (  -8% -1%)
  AndHighMed  172.57  (1.5%)  169.35  (1.1%)   
-1.9% (  -4% -0%)
 Prefix3  177.54  (6.2%)  174.26  (8.0%)   
-1.8% ( -15% -   13%)
  IntNRQ  116.07  (7.5%)  113.97  (9.3%)   
-1.8% ( -17% -   16%)
  Fuzzy2   86.19  (2.4%)   85.16  (2.8%)   
-1.2% (  -6% -4%)
 AndHighHigh   46.76  (1.4%)   46.36  (1.1%)   
-0.8% (  -3% -1%)
 LowTerm  146.56  (1.8%)  145.58  (1.4%)   
-0.7% (  -3% -2%)
HighTerm   26.35  (2.0%)   26.20  (2.1%)   
-0.6% (  -4% -3%)
 MedSpanNear   64.98  (2.3%)   64.62  (2.8%)   
-0.5% (  -5% -4%)
 LowSloppyPhrase   67.07  (2.3%)   66.80  (3.6%)   
-0.4% (  -6% -5%)
   OrHighMed   25.18  (1.6%)   25.10  (2.1%)   
-0.3% (  -3% -3%)
Wildcard  256.33  (3.1%)  255.56  (3.5%)   
-0.3% (  -6% -6%)
PKLookup  305.42  (2.3%)  304.72  (2.1%)   
-0.2% (  -4% -4%)
   OrHighLow   24.59  (1.3%)   24.54  (2.2%)   
-0.2% (  -3% -3%)
  Fuzzy1   81.38  (3.0%)   81.60  (2.7%)
0.3% (  -5% -6%)
 Respell  141.17  (3.8%)  141.87  (3.9%)
0.5% (  -6% -8%)
 LowSpanNear   38.34  (3.2%)   38.78  (3.0%)
1.1% (  -4% -7%)
 MedSloppyPhrase   63.80  (2.1%)   64.53  (3.5%)
1.1% (  -4% -6%)
HighSpanNear   10.20  (2.8%)   10.32  (3.1%)
1.2% (  -4% -7%)
   MedPhrase  103.16  (4.5%)  104.72  (2.1%)
1.5% (  -4% -8%)
  OrHighHigh   17.81  (1.5%)   18.18  (2.7%)
2.1% (  -2% -6%)
   LowPhrase   58.77  (5.5%)   60.49  (3.0%)
2.9% (  -5% -   12%)
  HighPhrase   38.68 (10.0%)   40.46  (5.6%)
4.6% ( -10% -   22%)
HighSloppyPhrase2.97  (7.9%)3.22 (12.6%)
8.3% ( -11% -   31%)

{noformat}

Maybe if I used the bulk read PackedInts APIs instead it would be better...

> Write a PackedIntsEncoder/Decoder for facets
> 
>
> Key: LUCENE-4609
> URL: https://issues.apache.org/jira/browse/LUCENE-4609
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4612) ant nightly-smoke leaves a dirty checkout

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559730#comment-13559730
 ] 

Commit Tag Bot commented on LUCENE-4612:


[branch_4x commit] Michael McCandless
http://svn.apache.org/viewvc?view=revision&revision=1437009

LUCENE-4612: tell python not to write bytecode to the filesystem


> ant nightly-smoke leaves a dirty checkout
> -
>
> Key: LUCENE-4612
> URL: https://issues.apache.org/jira/browse/LUCENE-4612
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
>
> ?   dev-tools/scripts/__pycache__
> Can we not leave this around?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4612) ant nightly-smoke leaves a dirty checkout

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559728#comment-13559728
 ] 

Commit Tag Bot commented on LUCENE-4612:


[trunk commit] Michael McCandless
http://svn.apache.org/viewvc?view=revision&revision=1437007

LUCENE-4612: tell python not to write bytecode to the filesystem


> ant nightly-smoke leaves a dirty checkout
> -
>
> Key: LUCENE-4612
> URL: https://issues.apache.org/jira/browse/LUCENE-4612
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
>
> ?   dev-tools/scripts/__pycache__
> Can we not leave this around?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1530) Open IndexSearcher lazily

2013-01-22 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559726#comment-13559726
 ] 

Erick Erickson commented on SOLR-1530:
--

I'm going to claim that this is handled by openSearcher=false in the new (4.x) 
world and close it in a couple of days unless people object.

> Open IndexSearcher lazily
> -
>
> Key: SOLR-1530
> URL: https://issues.apache.org/jira/browse/SOLR-1530
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Reporter: Shalin Shekhar Mangar
>Assignee: Erick Erickson
> Fix For: 4.2, 5.0
>
>
> Make it possible to open a searcher lazily - If indexing is being done 
> continuously but searches are done infrequently we should avoid the overhead 
> of re-opening searchers on every commit. There are also use-cases where a 
> Solr core is needed to be loaded only for writes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4612) ant nightly-smoke leaves a dirty checkout

2013-01-22 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4612.


   Resolution: Fixed
Fix Version/s: 5.0
   4.2

> ant nightly-smoke leaves a dirty checkout
> -
>
> Key: LUCENE-4612
> URL: https://issues.apache.org/jira/browse/LUCENE-4612
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
>
> ?   dev-tools/scripts/__pycache__
> Can we not leave this around?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-1530) Open IndexSearcher lazily

2013-01-22 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-1530:


Assignee: Erick Erickson

> Open IndexSearcher lazily
> -
>
> Key: SOLR-1530
> URL: https://issues.apache.org/jira/browse/SOLR-1530
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Reporter: Shalin Shekhar Mangar
>Assignee: Erick Erickson
> Fix For: 4.2, 5.0
>
>
> Make it possible to open a searcher lazily - If indexing is being done 
> continuously but searches are done infrequently we should avoid the overhead 
> of re-opening searchers on every commit. There are also use-cases where a 
> Solr core is needed to be loaded only for writes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4612) ant nightly-smoke leaves a dirty checkout

2013-01-22 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4612:
--

Assignee: Michael McCandless

> ant nightly-smoke leaves a dirty checkout
> -
>
> Key: LUCENE-4612
> URL: https://issues.apache.org/jira/browse/LUCENE-4612
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
>
> ?   dev-tools/scripts/__pycache__
> Can we not leave this around?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4612) ant nightly-smoke leaves a dirty checkout

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559711#comment-13559711
 ] 

Michael McCandless commented on LUCENE-4612:


-B seems to work ... I'll pass this to all places where we invoke Python.

> ant nightly-smoke leaves a dirty checkout
> -
>
> Key: LUCENE-4612
> URL: https://issues.apache.org/jira/browse/LUCENE-4612
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
>
> ?   dev-tools/scripts/__pycache__
> Can we not leave this around?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fixing query-time multi-word synonym issue

2013-01-22 Thread Otis Gospodnetic
Hello,

I'm looking for some guidance around solving the infamous index-time vs.
query-time multi-word synonym problem.  Looking for help with understanding
the pieces and effort involved, and also being on a lookout for any
potential "man, it will take you forever, you'll have to do major Lucene
surgery" type of warnings.

I never looked deeply into this problem and my understanding is that
multi-word synonyms don't work at query-time because QueryParser(?) simply
breaks queries on spaces and thus makes it impossible for
SynonymTokenFilter (?) to "see" the non-broken-up token sequence and do
synonym expansion.

I think this is also documented on the Wiki.
Are there other pieces involved that I didn't mention, but should have?

The following are 3 different efforts I found:
https://issues.apache.org/jira/browse/LUCENE-4499
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html

Plus Jack's proposal:
http://search-lucene.com/m/Zkj0k15dDGP1

Does any of the above approaches sound like the right one, or at least in
the right direction, and stands the chance of being accepted?

Thanks,
Otis


[jira] [Commented] (LUCENE-4612) ant nightly-smoke leaves a dirty checkout

2013-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559704#comment-13559704
 ] 

Michael McCandless commented on LUCENE-4612:


I *think* we can just pass -B when we run Python?  
http://stackoverflow.com/questions/154443/how-to-avoid-pyc-files

> ant nightly-smoke leaves a dirty checkout
> -
>
> Key: LUCENE-4612
> URL: https://issues.apache.org/jira/browse/LUCENE-4612
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> ?   dev-tools/scripts/__pycache__
> Can we not leave this around?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4326) Solr exceptions indicate missing Tika jars in example

2013-01-22 Thread Shinichiro Abe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559695#comment-13559695
 ] 

Shinichiro Abe commented on SOLR-4326:
--

Maybe this issue is related to SOLR-4209.

> Solr exceptions indicate missing Tika jars in example
> -
>
> Key: SOLR-4326
> URL: https://issues.apache.org/jira/browse/SOLR-4326
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 4.0
> Environment: Built from source from lucene_solr_4_0_0 tag
>Reporter: Karl Wright
>
> The Solr example does not process all kinds of extraction properly.  The 
> cause seems to be missing Tika jars.  For example, .cfg files cause the 
> following exception:
> {code}
> SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
> org/apache/tika/parser/asm/XHTMLClassVisitor
> at 
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> at org.eclipse.jetty.server.Server.handle(Server.java:351)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:952)
> at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
> at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/tika/parser/asm/XHTMLClassVisitor
> at org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> ... 25 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators

[jira] [Commented] (SOLR-2968) Hunspell very high memory use when loading dictionary

2013-01-22 Thread Max Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559680#comment-13559680
 ] 

Max Schmidt commented on SOLR-2968:
---

Dictionaries with the same file location should be shared across all field and 
all indexes. This would minimize the problem if you're using multiple indexes. 

Currently I can't use Solr because I have 10 indexes with 5 field and for each 
field a DictionaryCompoundWordTokenFilterFactory is assigned. So the dictionary 
will be loaded 50 times. This is too much for my RAM.

> Hunspell very high memory use when loading dictionary
> -
>
> Key: SOLR-2968
> URL: https://issues.apache.org/jira/browse/SOLR-2968
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.5
>Reporter: Maciej Lisiewski
>Priority: Minor
> Attachments: patch.txt
>
>
> Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
> dictionary/rules files. 
> For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
> whole core to crash with various out of memory errors unless you set max heap 
> size close to 2GB or more.
> By comparison Stempel using the same dictionary file works just fine with 1/8 
> of that (and possibly lower values as well).
> Sample error log entries:
> http://pastebin.com/fSrdd5W1
> http://pastebin.com/Lmi0re7Z

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4328) Simultaneous multiple connections to Solr example sometimes fail with various IOExceptions

2013-01-22 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559616#comment-13559616
 ] 

Karl Wright commented on SOLR-4328:
---

The problem also shows up when Solr is run on Tomcat6.  This rules out Jetty as 
the source of the issue.


> Simultaneous multiple connections to Solr example sometimes fail with various 
> IOExceptions
> --
>
> Key: SOLR-4328
> URL: https://issues.apache.org/jira/browse/SOLR-4328
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0, 3.6.2
> Environment: ManifoldCF, Solr connector, SolrJ, and Solr 4.0 or 3.6 
> on Mac OSX or Ubuntu, all localhost connections
>Reporter: Karl Wright
>
> In ManifoldCF, we've been seeing problems with SolrJ connections throwing 
> java.net.SocketException's.  See CONNECTORS-616 for details as to exactly 
> what varieties of this exception are thrown, but "broken pipe" is the most 
> common.  This occurs on multiple Unix variants as stated.  (We also 
> occasionally see exceptions on Windows, but they are much less frequent and 
> are different variants than on Unix.)
> The exceptions seem to occur during the time an initial connection is getting 
> established, and seems to occur randomly when multiple connections are 
> getting established all at the same time.  Wire logging shows that only the 
> first few headers are sent before the connection is broken.  Solr itself does 
> log any error.  A retry is usually sufficient to have the transaction succeed.
> The Solr Connector in ManifoldCF has recently been upgraded to rely on SolrJ, 
> which could be a complicating factor.  However, I have repeatedly audited 
> both the Solr Connection code and the SolrJ code for best practices, and 
> while I found a couple of problems, nothing seems to be of the sort that 
> could cause a broken pipe.  For that to happen, the socket must be closed 
> either on the client end or on the server end, and there appears to be no 
> mechanism for that happening on the client end, since multiple threads would 
> have to be working with the same socket for that to be a possibility.
> It is also true that in ManifoldCF we disable the automatic retries that are 
> normally enabled for HttpComponents HttpClient.  These automatic retries 
> likely mask this problem should it be occurring in other situations.
> Places where there could potentially be a bug, in order of likelihood:
> (1) Jetty.  Nobody I am aware of has seen this on Tomcat yet.  But I also 
> don't know if anyone has tried it.
> (2) Solr servlet.  If it is possible for a servlet implementation to cause 
> the connection to drop without any exception being generated, this would be 
> something that should be researched.
> (3) HttpComponents/HttpClient.  If there is a client-side issue, it would 
> have to be because an httpclient instance was closing sockets from other 
> instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4599) Compressed term vectors

2013-01-22 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4599.
--

Resolution: Fixed

> Compressed term vectors
> ---
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/termvectors
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-4705.
-

   Resolution: Fixed
 Assignee: Simon Willnauer
Lucene Fields: New,Patch Available  (was: New)

> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch, LUCENE-4705.patch
>
>
> the rewrite method doesn't pass on the filterstrategy in FilteredQuery and we 
> don't have a test for it. grrr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559589#comment-13559589
 ] 

Commit Tag Bot commented on LUCENE-4705:


[branch_4x commit] Simon Willnauer
http://svn.apache.org/viewvc?view=revision&revision=1436868

LUCENE-4705: Pass on FilterStrategy in FilteredQuery if the filtered query is 
rewritten


> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
>Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch, LUCENE-4705.patch
>
>
> the rewrite method doesn't pass on the filterstrategy in FilteredQuery and we 
> don't have a test for it. grrr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4705:


Environment: (was: the rewrite method doesn't pass on the 
filterstrategy in FilteredQuery and we don't have a test for it. grrr)

> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
>Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch, LUCENE-4705.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559580#comment-13559580
 ] 

Commit Tag Bot commented on LUCENE-4705:


[trunk commit] Simon Willnauer
http://svn.apache.org/viewvc?view=revision&revision=1436859

LUCENE-4705: Pass on FilterStrategy in FilteredQuery if the filtered query is 
rewritten


> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
> Environment: the rewrite method doesn't pass on the filterstrategy in 
> FilteredQuery and we don't have a test for it. grrr
>Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch, LUCENE-4705.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4705:


Description: the rewrite method doesn't pass on the filterstrategy in 
FilteredQuery and we don't have a test for it. grrr

> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
>Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch, LUCENE-4705.patch
>
>
> the rewrite method doesn't pass on the filterstrategy in FilteredQuery and we 
> don't have a test for it. grrr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4705:


Attachment: LUCENE-4705.patch

missed the changes entry. 

> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
> Environment: the rewrite method doesn't pass on the filterstrategy in 
> FilteredQuery and we don't have a test for it. grrr
>Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch, LUCENE-4705.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4705:


Attachment: LUCENE-4705.patch

here is a patch

> FilteredQuery always uses default FilterStrategy if the filtered query is 
> rewritten
> ---
>
> Key: LUCENE-4705
> URL: https://issues.apache.org/jira/browse/LUCENE-4705
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.1
> Environment: the rewrite method doesn't pass on the filterstrategy in 
> FilteredQuery and we don't have a test for it. grrr
>Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4705.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4705) FilteredQuery always uses default FilterStrategy if the filtered query is rewritten

2013-01-22 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-4705:
---

 Summary: FilteredQuery always uses default FilterStrategy if the 
filtered query is rewritten
 Key: LUCENE-4705
 URL: https://issues.apache.org/jira/browse/LUCENE-4705
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.1
 Environment: the rewrite method doesn't pass on the filterstrategy in 
FilteredQuery and we don't have a test for it. grrr
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0
 Attachments: LUCENE-4705.patch



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559561#comment-13559561
 ] 

Commit Tag Bot commented on SOLR-4330:
--

[branch_4x commit] Koji Sekiguchi
http://svn.apache.org/viewvc?view=revision&revision=1436839

SOLR-4330: group.sort is ignored when using group.truncate and ex/tag local 
params together


> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Attachments: SOLR-4330.patch, SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559550#comment-13559550
 ] 

Commit Tag Bot commented on SOLR-4330:
--

[trunk commit] Koji Sekiguchi
http://svn.apache.org/viewvc?view=revision&revision=1436837

SOLR-4330: group.sort is ignored when using group.truncate and ex/tag local 
params together


> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Attachments: SOLR-4330.patch, SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-22 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559540#comment-13559540
 ] 

Markus Jelsma commented on LUCENE-4599:
---

Alright. Looking forward to that. Thanks Adrien!

> Compressed term vectors
> ---
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/termvectors
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-22 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559530#comment-13559530
 ] 

Adrien Grand commented on LUCENE-4599:
--

Not yet. I'm leaving some time for the Jenkins instances to find bugs (for 
example, one of them found a little bug last night that Robert had to fix) and 
for people to criticize/fix/improve the format.

> Compressed term vectors
> ---
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/termvectors
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-22 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559528#comment-13559528
 ] 

Markus Jelsma commented on LUCENE-4599:
---

Alright. Do you already have an issue filed for making it default in trunk so i 
can watch? We use recent Solr/Lucene trunk check outs and are interested in how 
this affects stuff.

> Compressed term vectors
> ---
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/termvectors
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-22 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559527#comment-13559527
 ] 

Adrien Grand commented on LUCENE-4599:
--

Unfortunately it is too late for Lucene 4.1 and anyway this new format still 
requires a lot of testing, but I plan to propose to make it the default term 
vectors format for Lucene 4.2, so yes Lucene 4.2 might compress both stored 
fields and term vectors.

> Compressed term vectors
> ---
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/termvectors
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-22 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559524#comment-13559524
 ] 

Markus Jelsma commented on LUCENE-4599:
---

Great reduction! Is this going to be enabled in the default Lucene 41 codec 
that already compresses stored fields?

> Compressed term vectors
> ---
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/termvectors
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4043) Add scoring support for query time join

2013-01-22 Thread David vandendriessche (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559492#comment-13559492
 ] 

David vandendriessche commented on LUCENE-4043:
---

I got suggested to extend the Query class and return a Hash myself.

My query class contains:

@Override
public int hashCode() {
return q.toString().hashCode();
}
It seems to work now.

> Add scoring support for query time join
> ---
>
> Key: LUCENE-4043
> URL: https://issues.apache.org/jira/browse/LUCENE-4043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-4043.patch, LUCENE-4043.patch, LUCENE-4043.patch, 
> LUCENE-4043.patch
>
>
> Have similar scoring for query time joining just like the index time block 
> join (with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4043) Add scoring support for query time join

2013-01-22 Thread David vandendriessche (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559480#comment-13559480
 ] 

David vandendriessche commented on LUCENE-4043:
---

Isn't it possible to pass the fromQueryhash to the hash that the join query 
returns?

> Add scoring support for query time join
> ---
>
> Key: LUCENE-4043
> URL: https://issues.apache.org/jira/browse/LUCENE-4043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-4043.patch, LUCENE-4043.patch, LUCENE-4043.patch, 
> LUCENE-4043.patch
>
>
> Have similar scoring for query time joining just like the index time block 
> join (with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (LUCENE-4043) Add scoring support for query time join

2013-01-22 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated LUCENE-4043:
--

Comment: was deleted

(was: Isn't it possible to pass the fromQueryhash to the hash that the join 
query returns?)

> Add scoring support for query time join
> ---
>
> Key: LUCENE-4043
> URL: https://issues.apache.org/jira/browse/LUCENE-4043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-4043.patch, LUCENE-4043.patch, LUCENE-4043.patch, 
> LUCENE-4043.patch
>
>
> Have similar scoring for query time joining just like the index time block 
> join (with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org