ImportError: cannot import name Library, during installing PyLucene

2013-10-11 Thread SangHee Kim
Hi everyone!

I have spent 5 hours to fix this problem but I can't. During installing
PyLucene withhttp://lucene.apache.org/pylucene/install.htmt , I faced with
a error like follwing.

sanghee-m:jcc sanghee$ python setup.py build
found JAVAFRAMEWORKS =
/System/Library/Frameworks/JavaVM.frameworkTraceback (most recent call
last):
  File setup.py, line 398, in module
main('--debug' in sys.argv)
  File setup.py, line 306, in main
from setuptools import LibraryImportError: cannot import name Library
sanghee-m:jcc sanghee$ python setup.py build --debug
found JAVAFRAMEWORKS =
/System/Library/Frameworks/JavaVM.frameworkTraceback (most recent call
last):
  File setup.py, line 398, in module
main('--debug' in sys.argv)
  File setup.py, line 306, in main
from setuptools import LibraryImportError: cannot import name Library
sanghee-m:jcc sanghee$

I can't find Library also. How can I solve this problme? Can you let me
know where should I check when I get this kind of error?

I uses setuptools 1.1.6 and pylucene-4.4.0-1.

-- 
SangHee Kim
http://goo.gl/LnpDX


Re: [VOTE] Release PyLucene 4.5.0-1

2013-10-11 Thread Andi Vajda


On Fri, 11 Oct 2013, Steve Rowe wrote:

I really have no idea where to start looking to figure out what's 
happening - I'm not a big python user - any ideas?


Would it be useful to package up my make'd directory and send it to you?


I don't know yet. Do you know which version of setuptools you have
installed ?

I'm currently battling an issue with the third generation setuptools,
v 1.1.6.
If you don't have something like 0.6something or 0.7 installed, please try 
that (for lack of any better ideas, sorry).


Andi..



Steve

On Oct 10, 2013, at 7:34 PM, Andi Vajda va...@apache.org wrote:


On Thu, 10 Oct 2013, Steve Rowe wrote:


Meant to send to the mailing lists:

Begin forwarded message:


From: Steve Rowe sar...@gmail.com
Subject: Re: [VOTE] Release PyLucene 4.5.0-1
Date: October 10, 2013 3:18:50 AM EDT
To: Andi Vajda va...@apache.org

Andi,

I thought I'd run 'make' and 'sudo make install' in two steps, so I checked, 
and bash 'history' agreed:

586  vi Makefile
587  make
588  sudo make install

I tried again, first rm -rf'ing the unpacked distribution, then unpacking, 
(skipping the jcc 'make' and 'sudo make install' this time), editing the 
Makefile, then running 'make', then 'sudo make install', and I got the same 
error - I suppose this is the most salient line:


The problem is that I can't even reproduce the error.
You're not the first one to report it but it usually goes away :-(
Stuck.

Andi..



-
No local packages or download links found for lucene==4.5.0
-

Steve

On Oct 10, 2013, at 2:59 AM, Andi Vajda va...@apache.org wrote:

Hi Steve,

On Thu, 10 Oct 2013, Steve Rowe wrote:


After make'ing and installing jcc (no setup.py changes required); uncommenting 
the first Mac OS X 10.6 section in Makefile (I have OS X 10.8.5, with stock 
Python 2.7.2 and Oracle Java 1.7.0_25); and finally make'ing pylucene: 'sudo 
make install' fails - here's the tail end of the output:

-
writing build/bdist.macosx-10.8-x86_64/egg/EGG-INFO/native_libs.txt
creating dist
creating 'dist/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg' and adding 
'build/bdist.macosx-10.8-x86_64/egg' to it
removing 'build/bdist.macosx-10.8-x86_64/egg' (and everything under it)
Processing lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg
creating 
/Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg
Extracting lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg to 
/Library/Python/2.7/site-packages
Removing lucene 4.4.0 from easy-install.pth file
Adding lucene 4.5.0 to easy-install.pth file

Installed 
/Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg
Processing dependencies for lucene==4.5.0
Searching for lucene==4.5.0
Reading http://pypi.python.org/simple/lucene/
Couldn't find index page for 'lucene' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://pypi.python.org/simple/
No local packages or download links found for lucene==4.5.0
error: Could not find suitable distribution for 
Requirement.parse('lucene==4.5.0')


This error has been a problem for a while.
You need to make, then make install, in two steps.
Otherwise, when 'make install' in pylucene from clean, this error seems to 
happen. I don't know of a fix.

Andi..


make: *** [install] Error 1
-

I've included the entire 'sudo make install' output here: 
https://paste.apache.org/8gAF

Steve

On Oct 8, 2013, at 1:00 AM, Andi Vajda va...@apache.org wrote:



The PyLucene 4.5.0-1 release tracking the recent release of Apache Lucene 4.5.0 
is ready.

A release candidate is available from:
http://people.apache.org/~vajda/staging_area/

A list of changes in this release can be seen at:
http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_5/CHANGES

PyLucene 4.5.0 is built with JCC 2.17 included in these release artifacts:
http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES

A list of Lucene Java changes can be seen at:
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_5_0/lucene/CHANGES.txt

Please vote to release these artifacts as PyLucene 4.5.0-1.

Thanks !

Andi..

ps: the KEYS file for PyLucene release signing is at:
http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
http://people.apache.org/~vajda/staging_area/KEYS

pps: here is my +1













Re: [VOTE] Release PyLucene 4.5.0-1

2013-10-11 Thread Steve Rowe
Andi,

I really have no idea where to start looking to figure out what's happening - 
I'm not a big python user - any ideas?

Would it be useful to package up my make'd directory and send it to you?

Steve

On Oct 10, 2013, at 7:34 PM, Andi Vajda va...@apache.org wrote:

 On Thu, 10 Oct 2013, Steve Rowe wrote:
 
 Meant to send to the mailing lists:
 
 Begin forwarded message:
 
 From: Steve Rowe sar...@gmail.com
 Subject: Re: [VOTE] Release PyLucene 4.5.0-1
 Date: October 10, 2013 3:18:50 AM EDT
 To: Andi Vajda va...@apache.org
 
 Andi,
 
 I thought I'd run 'make' and 'sudo make install' in two steps, so I 
 checked, and bash 'history' agreed:
 
 586  vi Makefile
 587  make
 588  sudo make install
 
 I tried again, first rm -rf'ing the unpacked distribution, then unpacking, 
 (skipping the jcc 'make' and 'sudo make install' this time), editing the 
 Makefile, then running 'make', then 'sudo make install', and I got the same 
 error - I suppose this is the most salient line:
 
 The problem is that I can't even reproduce the error.
 You're not the first one to report it but it usually goes away :-(
 Stuck.
 
 Andi..
 
 
 -
 No local packages or download links found for lucene==4.5.0
 -
 
 Steve
 
 On Oct 10, 2013, at 2:59 AM, Andi Vajda va...@apache.org wrote:
 Hi Steve,
 
 On Thu, 10 Oct 2013, Steve Rowe wrote:
 
 After make'ing and installing jcc (no setup.py changes required); 
 uncommenting the first Mac OS X 10.6 section in Makefile (I have OS X 
 10.8.5, with stock Python 2.7.2 and Oracle Java 1.7.0_25); and finally 
 make'ing pylucene: 'sudo make install' fails - here's the tail end of the 
 output:
 
 -
 writing build/bdist.macosx-10.8-x86_64/egg/EGG-INFO/native_libs.txt
 creating dist
 creating 'dist/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg' and adding 
 'build/bdist.macosx-10.8-x86_64/egg' to it
 removing 'build/bdist.macosx-10.8-x86_64/egg' (and everything under it)
 Processing lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg
 creating 
 /Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg
 Extracting lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg to 
 /Library/Python/2.7/site-packages
 Removing lucene 4.4.0 from easy-install.pth file
 Adding lucene 4.5.0 to easy-install.pth file
 
 Installed 
 /Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg
 Processing dependencies for lucene==4.5.0
 Searching for lucene==4.5.0
 Reading http://pypi.python.org/simple/lucene/
 Couldn't find index page for 'lucene' (maybe misspelled?)
 Scanning index of all packages (this may take a while)
 Reading http://pypi.python.org/simple/
 No local packages or download links found for lucene==4.5.0
 error: Could not find suitable distribution for 
 Requirement.parse('lucene==4.5.0')
 
 This error has been a problem for a while.
 You need to make, then make install, in two steps.
 Otherwise, when 'make install' in pylucene from clean, this error seems to 
 happen. I don't know of a fix.
 
 Andi..
 
 make: *** [install] Error 1
 -
 
 I've included the entire 'sudo make install' output here: 
 https://paste.apache.org/8gAF
 
 Steve
 
 On Oct 8, 2013, at 1:00 AM, Andi Vajda va...@apache.org wrote:
 
 
 The PyLucene 4.5.0-1 release tracking the recent release of Apache 
 Lucene 4.5.0 is ready.
 
 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/
 
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_5/CHANGES
 
 PyLucene 4.5.0 is built with JCC 2.17 included in these release 
 artifacts:
 http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES
 
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_5_0/lucene/CHANGES.txt
 
 Please vote to release these artifacts as PyLucene 4.5.0-1.
 
 Thanks !
 
 Andi..
 
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 
 pps: here is my +1
 
 
 
 
 



Re: ImportError: cannot import name Library, during installing PyLucene

2013-10-11 Thread Andi Vajda


On Fri, 11 Oct 2013, SangHee Kim wrote:


Hi everyone!

I have spent 5 hours to fix this problem but I can't. During installing
PyLucene withhttp://lucene.apache.org/pylucene/install.htmt , I faced with
a error like follwing.

sanghee-m:jcc sanghee$ python setup.py build
found JAVAFRAMEWORKS =
/System/Library/Frameworks/JavaVM.frameworkTraceback (most recent call
last):
 File setup.py, line 398, in module
   main('--debug' in sys.argv)
 File setup.py, line 306, in main
   from setuptools import LibraryImportError: cannot import name Library


Indeed, there are two problems with setuptools 1.1.6, apparently:
  1. the Library class is only accessible via setuptools.extension
  2. the logic in setuptools.command.build_ext patching in darwin-specific
 options for building a shared library into _CONFIG_VARS is broken

I added code to JCC's setup.py to workaround both issues.
This is checked into rev 1531420 in pylucene's trunk.

Please, refresh your copy of JCC from pylucene's trunk (still called 2.17), 
rebuild and reinstall it and try your pylucene 4.4.0 build again.

Please, let me know if this solves the problem for you as well.

Andi..


[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792375#comment-13792375
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531202 from [~rcmuir] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531202 ]

LUCENE-5269: Fix NGramTokenFilter length filtering

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5269.
-

Resolution: Fixed

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset

2013-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792396#comment-13792396
 ] 

Uwe Schindler commented on LUCENE-5277:
---

Is there any issue that will use the new ctor? As the current ctor is unused 
why not simply remove it and leave adding the new one to an issue that really 
needs it?

 Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the 
 new bitset
 ---

 Key: LUCENE-5277
 URL: https://issues.apache.org/jira/browse/LUCENE-5277
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5277.patch


 FixedBitSet copy constructor is redundant the way it is now -- one can call 
 FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). 
 I think it will be useful to add a numBits parameter to that method to allow 
 growing/shrinking the new bitset, while copying all relevant bits from the 
 passed one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792402#comment-13792402
 ] 

Uwe Schindler commented on LUCENE-5269:
---

This is so crazy! Why did we never hit this combination before?

Thanks for fixing, although I see the CodePointLengthFilter not really as a bug 
fix, it is more a new feature! Maybe explicitely add this as new feature to 
changes.txt?

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792424#comment-13792424
 ] 

Robert Muir commented on LUCENE-5269:
-

I didnt want new features mixed with bugfixes really :(

But in my opinion this was the simplest way to solve the problem: to just add a 
filter like this and for it to use that instead of LengthFilter.

I think it would be wierd to see new features in a 4.5.1?

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792429#comment-13792429
 ] 

Robert Muir commented on LUCENE-5269:
-

{quote}
This is so crazy! Why did we never hit this combination before?
{quote}

This combination is especially good at finding the bug, here's why:
{code}
Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, 2, 
94);
TokenStream stream = new ShingleFilter(tokenizer, 5);
stream = new NGramTokenFilter(TEST_VERSION_CURRENT, stream, 55, 83);
{code}

The edge-ngram has min=2 max=94, its basically brute forcing every token size.
then the shingles makes tons of tokens with positionIncrement=0.
so it makes it easy for the (previously buggy ngramtokenfilter with wrong 
length filter) to misclassify tokens with its logic expecting codepoints, emit 
an initial token with posinc=0:

{code}
if ((curPos + curGramSize) = curCodePointCount) {
...
  posIncAtt.setPositionIncrement(curPosInc);
{code}


 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5338) Split shards by a route key

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5338:


Description: 
Provide a way to split a shard using a route key such that all documents of the 
specified route key end up in a single dedicated sub-shard.

Example:
Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
'A!' has hash range [12,15]. Then invoking:
{code}
/admin/collections?action=SPLITcollection=collection1split.key=A!
{code}
should produce three sub-shards with hash range [0,11], [12,15] and [16,20].

Specifying the source shard is not required here because the route key is 
enough to figure it out. Route keys spanning more than one shards will not be 
supported.

Note that the sub-shard with the hash range of the route key may also contain 
documents for other route keys whose hashes collide.



  was:
Provide a way to split a shard using a route key such that all documents of the 
specified route key end up in a single dedicated sub-shard.

Example:
Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
'A!' has hash range [12,15]. Then invoking:
{code}
/admin/collections?action=SPLITcollection=collection1split.key=A!
{code}
should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. 
Then the sub-shard dedicated to documents for route key 'A!' can be scaled 
separately.

Specifying the source shard is not required here because the route key is 
enough to figure it out.




 Split shards by a route key
 ---

 Key: SOLR-5338
 URL: https://issues.apache.org/jira/browse/SOLR-5338
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0


 Provide a way to split a shard using a route key such that all documents of 
 the specified route key end up in a single dedicated sub-shard.
 Example:
 Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
 'A!' has hash range [12,15]. Then invoking:
 {code}
 /admin/collections?action=SPLITcollection=collection1split.key=A!
 {code}
 should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
 Specifying the source shard is not required here because the route key is 
 enough to figure it out. Route keys spanning more than one shards will not be 
 supported.
 Note that the sub-shard with the hash range of the route key may also contain 
 documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310.patch

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator

2013-10-11 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5260:
-

Attachment: LUCENE-5260.patch

Uploaded Patch:
  - changed the input to lookup.build to take TermFreqPayloadIterator instead 
of TermFreqPayloadIterator 
  - made all suggesters compatible with termFreqPayloadIterator (but error if 
payload is present but cannot be used)
  - nuked all implementations of TermFreq and made them work with 
termFreqPayload instead (Except for SortedTermFreqIteratorWrapper). 
  - got rid of all the references to termFreqIter

Still todo:
  - actually nuke TermFreqIterator
  - change the names of the implementations to reflect that they are 
implementations of TermFreqPayloadIter
  - add tests to ensure that all the implementations work with payload
  - support payloads in SortedTermFreqIteratorWrapper

 Make older Suggesters more accepting of TermFreqPayloadIterator
 ---

 Key: LUCENE-5260
 URL: https://issues.apache.org/jira/browse/LUCENE-5260
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5260.patch


 As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would 
 be nice to make the older suggesters accepting of TermFreqPayloadIterator and 
 throw an exception if payload is found (if it cannot be used). 
 This will also allow us to nuke most of the other interfaces for 
 BytesRefIterator. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792461#comment-13792461
 ] 

Uwe Schindler commented on LUCENE-5269:
---

bq. I didnt want new features mixed with bugfixes really 

I agree! But now we have the new feature, so I just asked to add this as a 
separate entry in CHANGES.txt under New features, just the new filter nothing 
more.

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5290) Warming up using search logs.

2013-10-11 Thread Minoru Osuka (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792471#comment-13792471
 ] 

Minoru Osuka commented on SOLR-5290:


The patch includes test code.

 Warming up using search logs.
 -

 Key: SOLR-5290
 URL: https://issues.apache.org/jira/browse/SOLR-5290
 Project: Solr
  Issue Type: Wish
  Components: search
Affects Versions: 4.4
Reporter: Minoru Osuka
Priority: Minor
 Attachments: SOLR-5290.patch


 It is possible to warm up of cache automatically in newSearcher event, but it 
 is impossible to warm up of cache automatically in firstSearcher event 
 because there isn't old searcher.
 We describe queries in solrconfig.xml if we required to cache in 
 firstSearcher event like this:
 {code:xml}
 listener event=firstSearcher class=solr.QuerySenderListener
   arr name=queries
 lst
   str name=qstatic firstSearcher warming in solrconfig.xml/str
 /lst
   /arr
 /listener
 {code}
 This setting is very statically. I want to query dynamically in firstSearcher 
 event when restart solr. So I paid my attention to the past search log. I 
 think if there are past search logs, it is possible to warm up of cache 
 automatically in firstSearcher event like an autowarming of the cache in 
 newSearcher event.
 I had created QueryLogSenderListener which extended QuerySenderListener.
 Sample definition in solrconfig.xml:
  - directory : Specify the Solr log directory. (Required)
  - regex : Describe the regular expression of log. (Required)
  - encoding : Specify the Solr log encoding. (Default : UTF-8)
  - count : Specify the number of the log to process. (Default : 100)
  - paths : Specify the request handler name to process.
  - exclude_params : Specify the request parameter to except.
 {code:xml}
 !-- Warming up using search logs.
   --
 listener event=firstSearcher class=solr.QueryLogSenderListener
   arr name=queries
 lst
   str name=qstatic firstSearcher warming in solrconfig.xml/str
 /lst
   /arr
   str name=directorylogs/str
   str name=encodingUTF-8/str
   str 
 name=regex![CDATA[^(?level[\w]+)\s+\-\s+(?timestamp[\d\-\s\.:]+);\s+(?class[\w\.\_\$]+);\s+\[(?core.+)\]\s+webapp=(?webapp.+)\s+path=(?path.+)\s+params=\{(?params.*)\}\s+hits=(?hits\d+)\s+status=(?status\d+)\s+QTime=(?qtime\d+).*]]/str
   arr name=paths
 str/select/str
   /arr
   int name=count100/int
   arr name=exclude_params
 strindent/str
 str_/str
   /arr
 /listener
 {code}
 I'd like to propose this feature.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes

2013-10-11 Thread dejie Chang (JIRA)
dejie Chang created SOLR-5339:
-

 Summary: solr-core-4.4's ip is not right when the os is centos 5.6 
sometimes 
 Key: SOLR-5339
 URL: https://issues.apache.org/jira/browse/SOLR-5339
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.4
 Environment: centos 5.6
Reporter: dejie Chang
Priority: Critical


when I install the solr-cloud on the centos5.6 . t



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5338) Split shards by a route key

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5338:


Attachment: SOLR-5338.patch

Changes:
* Introduces two new methods in CompositeIdRouter 
{code}
public ListRange partitionRangeByKey(String key, Range range)
{code}
and
{code}
public Range routeKeyHashRange(String routeKey)
{code}
* The collection split action accepts a new parameter 'split.key'
* The parent slice is found and its range is partitioned according to split.key
* We re-use the logic introduced in SOLR-5300 to do the actual splitting. 

 Split shards by a route key
 ---

 Key: SOLR-5338
 URL: https://issues.apache.org/jira/browse/SOLR-5338
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0

 Attachments: SOLR-5338.patch


 Provide a way to split a shard using a route key such that all documents of 
 the specified route key end up in a single dedicated sub-shard.
 Example:
 Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
 'A!' has hash range [12,15]. Then invoking:
 {code}
 /admin/collections?action=SPLITcollection=collection1split.key=A!
 {code}
 should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
 Specifying the source shard is not required here because the route key is 
 enough to figure it out. Route keys spanning more than one shards will not be 
 supported.
 Note that the sub-shard with the hash range of the route key may also contain 
 documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes

2013-10-11 Thread dejie Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dejie Chang updated SOLR-5339:
--

Description: when I install the solr-cloud on the centos5.6 . it is strange 
that sometimes ,the ip is not correct which is displayed on the 
http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual is 
192.168.10.54. but on the windows it is right. and i found it is because of 
hostaddress = InetAddress.getLocalHost().getHostAddress(); in ZkController.java 
. sometimes the method which get ip is not correct .we should not trust . so i 
think in linux we should not use this method   (was: when I install the 
solr-cloud on the centos5.6 . t)

 solr-core-4.4's ip is not right when the os is centos 5.6 sometimes 
 

 Key: SOLR-5339
 URL: https://issues.apache.org/jira/browse/SOLR-5339
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.4
 Environment: centos 5.6
Reporter: dejie Chang
Priority: Critical

 when I install the solr-cloud on the centos5.6 . it is strange that sometimes 
 ,the ip is not correct which is displayed on the 
 http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual 
 is 192.168.10.54. but on the windows it is right. and i found it is because 
 of hostaddress = InetAddress.getLocalHost().getHostAddress(); in 
 ZkController.java . sometimes the method which get ip is not correct .we 
 should not trust . so i think in linux we should not use this method 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5320) Multi level compositeId router

2013-10-11 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792478#comment-13792478
 ] 

Anshum Gupta commented on SOLR-5320:


A 3 level composite id routing to begin with is what I think would be good.
I'd use 8 bits each from the first 2 components of the key and 16 bits from the 
last component.
Functionally, this should work on similar lines as the current 2-level 
composite id routing.

 Multi level compositeId router
 --

 Key: SOLR-5320
 URL: https://issues.apache.org/jira/browse/SOLR-5320
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Anshum Gupta
   Original Estimate: 336h
  Remaining Estimate: 336h

 This would enable multi level routing as compared to the 2 level routing 
 available as of now. On the usage bit, here's an example:
 Document Id: myapp!dummyuser!doc
 myapp!dummyuser! can be used as the shardkey for searching content for 
 dummyuser.
 myapp! can be used for searching across all users of myapp.
 I am looking at either a 3 (or 4) level routing. The 32 bit hash would then 
 comprise of 8X4 components from each part (in case of 4 level).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5308) Split all documents of a route key into another collection

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5308:


Attachment: (was: SOLR-5308.patch)

 Split all documents of a route key into another collection
 --

 Key: SOLR-5308
 URL: https://issues.apache.org/jira/browse/SOLR-5308
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0


 Enable SolrCloud users to split out a set of documents from a source 
 collection into another collection.
 This will be useful in multi-tenant environments. This feature will make it 
 possible to split a tenant out of a collection and put them into their own 
 collection which can be scaled separately.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 407 - Failure

2013-10-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/407/

1 tests failed.
REGRESSION:  org.apache.lucene.index.Test2BPostings.test

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0)
at 
org.apache.lucene.store.BufferedIndexOutput.init(BufferedIndexOutput.java:50)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:365)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280)
at 
org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478)
at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149)
at 
org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:171)
at 
org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:80)
at 
org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4408)
at 
org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:470)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1523)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1193)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1174)
at org.apache.lucene.index.Test2BPostings.test(Test2BPostings.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)




Build Log:
[...truncated 655 lines...]
   [junit4] Suite: org.apache.lucene.index.Test2BPostings
   [junit4]   2 NOTE: download the large Jenkins line-docs file by running 
'ant get-jenkins-line-docs' in the lucene directory.
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=Test2BPostings 
-Dtests.method=test -Dtests.seed=D8D3920C725BF71C -Dtests.multiplier=2 
-Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=en_IN -Dtests.timezone=America/Puerto_Rico 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR408s J0 | Test2BPostings.test 
   [junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space
   [junit4]at 
__randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0)
   [junit4]at 
org.apache.lucene.store.BufferedIndexOutput.init(BufferedIndexOutput.java:50)
   [junit4]at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:365)
   [junit4]at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280)
   [junit4]at 
org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206)
   [junit4]at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478)
   [junit4]at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
   [junit4]at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149)
   [junit4]at 

[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310-1.patch

The testcases still fail occassionally

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310-1.patch, SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: (was: SOLR-5310-1.patch)

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310.patch, SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310.patch

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310.patch, SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread builder
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/

No tests ran.

Build Log:
[...truncated 61 lines...]


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread Simon Willnauer
ok maybe updateing the JDK would be a good idea :)



On Fri, Oct 11, 2013 at 2:46 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/

 No tests ran.

 Build Log:
 [...truncated 61 lines...]

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread Uwe Schindler
Hihi,

FYI: I have a compilation unit here (non-Lucene) that also segfaults on JDK 
7.0u25, if you don't do ant clean before. If there are already existing class 
files and only modified ones are recompiled it always segfaults. Reproducible, 
but I have no idea what causes this. :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: simon.willna...@gmail.com [mailto:simon.willna...@gmail.com] On
 Behalf Of Simon Willnauer
 Sent: Friday, October 11, 2013 2:50 PM
 Cc: dev@lucene.apache.org
 Subject: Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737
 - Failure!
 
 ok maybe updateing the JDK would be a good idea :)
 
 
 
 On Fri, Oct 11, 2013 at 2:46 PM,  buil...@flonkings.com wrote:
  Build:
  builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/
 
  No tests ran.
 
  Build Log:
  [...truncated 61 lines...]
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5338) Split shards by a route key

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792600#comment-13792600
 ] 

Shalin Shekhar Mangar commented on SOLR-5338:
-

[~ysee...@gmail.com] - Would you mind reviewing the new CompositeIdRouter 
methods?

 Split shards by a route key
 ---

 Key: SOLR-5338
 URL: https://issues.apache.org/jira/browse/SOLR-5338
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0

 Attachments: SOLR-5338.patch


 Provide a way to split a shard using a route key such that all documents of 
 the specified route key end up in a single dedicated sub-shard.
 Example:
 Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
 'A!' has hash range [12,15]. Then invoking:
 {code}
 /admin/collections?action=SPLITcollection=collection1split.key=A!
 {code}
 should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
 Specifying the source shard is not required here because the route key is 
 enough to figure it out. Route keys spanning more than one shards will not be 
 supported.
 Note that the sub-shard with the hash range of the route key may also contain 
 documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

Fix a bug regarding ignoreCase in the attached patch.

 add NGramSynonymTokenizer
 -

 Key: LUCENE-5252
 URL: https://issues.apache.org/jira/browse/LUCENE-5252
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
 LUCENE-5252_4x.patch


 I'd like to propose that we have another n-gram tokenizer which can process 
 synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
 size is fixed, i.e. minGramSize = maxGramSize.
 Today, I think we have the following problems when using SynonymFilter with 
 NGramTokenizer. 
 For purpose of illustration, we have a synonym setting ABC, DEFG w/ 
 expand=true and N = 2 (2-gram).
 # There is no consensus (I think :-) how we assign offsets to generated 
 synonym tokens DE, EF and FG when expanding source token AB and BC.
 # If the query pattern looks like ABCY, it cannot be matched even if there is 
 a document …ABCY… in index when autoGeneratePhraseQueries set to true, 
 because there is no CY token (but GY is there) in the index.
 NGramSynonymTokenizer can solve these problems by providing the following 
 methods.
 * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
 tokenize registered words. e.g.
 ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
 |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
 * The back and forth of the registered words, NGramSynonymTokenizer generates 
 *extra* tokens w/ posInc=0. e.g.
 ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
 |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
 In the above sample, Z and 1 are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792661#comment-13792661
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531313 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1531313 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792663#comment-13792663
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531315 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531315 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator

2013-10-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792662#comment-13792662
 ] 

Michael McCandless commented on LUCENE-5260:


Thanks Areek, patch looks great!  I like the hasPayloads() up-front
introspection.

In UnsortedTermFreqIteratorWrapper.payload(), why do we set currentOrd
as a side effect?  Shouldn't next() already do that?  Maybe, we should
instead assert currentOrd == ords[curPos]?  Also, can we break that
sneaky currentOrd assignment in next into its own line before?


 Make older Suggesters more accepting of TermFreqPayloadIterator
 ---

 Key: LUCENE-5260
 URL: https://issues.apache.org/jira/browse/LUCENE-5260
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5260.patch


 As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would 
 be nice to make the older suggesters accepting of TermFreqPayloadIterator and 
 throw an exception if payload is found (if it cannot be used). 
 This will also allow us to nuke most of the other interfaces for 
 BytesRefIterator. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792671#comment-13792671
 ] 

Mark Miller commented on SOLR-5325:
---

Add some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792671#comment-13792671
 ] 

Mark Miller edited comment on SOLR-5325 at 10/11/13 2:50 PM:
-

Added some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.


was (Author: markrmil...@gmail.com):
Add some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: LUCENE-5274-4.patch

Reworked to remove dependency on query parser and most of the analyzer 
dependency and to fix errors with phrases.  It'll need to lose the rest of the 
analyzer dependency and have more test cases in addition to any other concerns 
raised in the review. 

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274-4.patch, LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792679#comment-13792679
 ] 

Mark Miller commented on SOLR-5199:
---

Hey Jessica - if we can confirm this is the same issue as SOLR-5325, we can 
close this as a duplicate.

 Restarting zookeeper makes the overseer stop processing queue events
 

 Key: SOLR-5199
 URL: https://issues.apache.org/jira/browse/SOLR-5199
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
Assignee: Mark Miller
  Labels: overseer, zookeeper
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: 5199-log


 Taking the external zookeeper down (I'm just testing, so I only have one 
 external zookeeper instance running) and then bringing it back up seems to 
 have caused the overseer to stop processing queue event.
 I tried to issue the delete collection command (curl 
 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and 
 each time it just timed out. Looking at the zookeeper data, I see
 ... 
 /overseer
collection-queue-work
  qn-02
  qn-04
  qn-06
 ...
 and the qn-xxx are not being processed.
 Attached please find the log from the overseer (according to 
 /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792684#comment-13792684
 ] 

Mark Miller commented on SOLR-5325:
---

I'm still kind of surprised this would happen - we should be retrying on 
connectionloss up to an expiration - which would make us the leader no longer. 
Perhaps the length of retrying can be a little short or something. And perhaps 
that is part of why it is more difficult for me to reproduce in a test.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792688#comment-13792688
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531323 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1531323 ]

SOLR-5325: raise retry padding a bit

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792689#comment-13792689
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531324 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531324 ]

SOLR-5325: raise retry padding a bit

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5308) Split all documents of a route key into another collection

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792692#comment-13792692
 ] 

Shalin Shekhar Mangar commented on SOLR-5308:
-

For splitting a single source shard into a single target collection/shard by a 
route key such as:
{code}
/admin/collections?action=migratecollection=collection1split.key=A!shard=shardXtarget.collection=collection2target.shard=shardY
{code}
A rough strategy could be to:
# Create new core X on source
# Create new core Y on target
# Ask target core to buffer updates
# Start forwarding updates for route key received by source shard to target 
collection
# Split source shard to a new core X
# Ask Y to replicate fully from X
# Core Admin merge Y to target core
# Ask target core to replay buffer updates


 Split all documents of a route key into another collection
 --

 Key: SOLR-5308
 URL: https://issues.apache.org/jira/browse/SOLR-5308
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0


 Enable SolrCloud users to split out a set of documents from a source 
 collection into another collection.
 This will be useful in multi-tenant environments. This feature will make it 
 possible to split a tenant out of a collection and put them into their own 
 collection which can be scaled separately.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792695#comment-13792695
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531327 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531327 ]

SOLR-5325: raise retry padding a bit

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792694#comment-13792694
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531325 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531325 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4824) Fuzzy / Faceting results are changed after ingestion of documents past a certain number

2013-10-11 Thread Lakshmi Venkataswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792742#comment-13792742
 ] 

Lakshmi Venkataswamy commented on SOLR-4824:


I have tested 4.5.0 version and the same behavior has been observed.  So we are 
staying with 3.6 in production for now.

 Fuzzy / Faceting results are changed after ingestion of documents past a 
 certain number 
 

 Key: SOLR-4824
 URL: https://issues.apache.org/jira/browse/SOLR-4824
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2, 4.3
 Environment: Ubuntu 12.04 LTS 12.04.2 
 jre1.7.0_17
 jboss-as-7.1.1.Final
Reporter: Lakshmi Venkataswamy

 In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, 
 I found that after a certain number of documents were ingested the fuzzy 
 query had drastically lower number of results.  We have approximately 18,000 
 documents per day and after ingesting approximately 40 days of documents, the 
 next incremental day of documents results in a lower number of results of a 
 fuzzy search.
 The query :  
 http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1facet=onfacet.field=datefl=datefacet.sort
 produces the following result before the threshold is crossed
 responselst name=responseHeader
 int name=status0/intint name=QTime2349/intlst name=paramsstr 
 name=faceton/strstr name=fldate/strstr name=facet.sort/
 str name=qcc:worde~1/strstr 
 name=facet.fielddate/str/lst/lstresult name=response 
 numFound=362803 start=0/result
 lst name=facet_countslst name=facet_queries/lst 
 name=facet_fieldslst name=date
 int name=2012-12-312866/int
 int name=2013-01-0111372/int
 int name=2013-01-0211514/int
 int name=2013-01-0312015/int
 int name=2013-01-0411746/int
 int name=2013-01-0510853/int
 int name=2013-01-0611053/int
 int name=2013-01-0711815/int
 int name=2013-01-0811427/int
 int name=2013-01-0911475/int
 int name=2013-01-1011461/int
 int name=2013-01-1112058/int
 int name=2013-01-1211335/int
 int name=2013-01-1312039/int
 int name=2013-01-1412064/int
 int name=2013-01-1512234/int
 int name=2013-01-1612545/int
 int name=2013-01-1711766/int
 int name=2013-01-1812197/int
 int name=2013-01-1911414/int
 int name=2013-01-2011633/int
 int name=2013-01-2112863/int
 int name=2013-01-2212378/int
 int name=2013-01-2311947/int
 int name=2013-01-2411822/int
 int name=2013-01-2511882/int
 int name=2013-01-2610474/int
 int name=2013-01-2711051/int
 int name=2013-01-2811776/int
 int name=2013-01-2911957/int
 int name=2013-01-3011260/int
 int name=2013-01-318511/int
 /lst/lstlst name=facet_dates/lst 
 name=facet_ranges//lst/response
 Once the 40 days of documents ingested threshold is crossed the results drop 
 as show below for the same query
 responselst name=responseHeader
 int name=status0/intint name=QTime2/intlst name=paramsstr 
 name=faceton/strstr name=fldate/strstr name=facet.sort/str 
 name=qcc:worde~1/strstr name=facet.fielddate/str/lst/lst
 result name=response numFound=1338 start=0/result
 lst name=facet_countslst name=facet_queries/lst 
 name=facet_fieldslst name=date
 int name=2012-12-310/int
 int name=2013-01-0141/int
 int name=2013-01-0221/int
 int name=2013-01-0324/int
 int name=2013-01-0419/int
 int name=2013-01-059/int
 int name=2013-01-0611/int
 int name=2013-01-0717/int
 int name=2013-01-0814/int
 int name=2013-01-0924/int
 int name=2013-01-1043/int
 int name=2013-01-1114/int
 int name=2013-01-1252/int
 int name=2013-01-1357/int
 int name=2013-01-1425/int
 int name=2013-01-1517/int
 int name=2013-01-1634/int
 int name=2013-01-1711/int
 int name=2013-01-1816/int
 int name=2013-01-19121/int
 int name=2013-01-2033/int
 int name=2013-01-2126/int
 int name=2013-01-2259/int
 int name=2013-01-2327/int
 int name=2013-01-2410/int
 int name=2013-01-259/int
 int name=2013-01-266/int
 int name=2013-01-2716/int
 int name=2013-01-2811/int
 int name=2013-01-2915/int
 int name=2013-01-3021/int
 int name=2013-01-31109/int
 int name=2013-02-0111/int
 int name=2013-02-027/int
 int name=2013-02-0310/int
 int name=2013-02-048/int
 int name=2013-02-0513/int
 int name=2013-02-0675/int
 int name=2013-02-0777/int
 int name=2013-02-0831/int
 int name=2013-02-0935/int
 int name=2013-02-1022/int
 int name=2013-02-1118/int
 int name=2013-02-1211/int
 int name=2013-02-1368/int
 int name=2013-02-1440/int
 /lst/lstlst name=facet_dates/lst 
 name=facet_ranges//lst/response
 I have also tested this with different months of data and have seen the same 
 issue  around the number of documents.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792778#comment-13792778
 ] 

Jessica Cheng commented on SOLR-5199:
-

Sorry, I only saw this once and I didn't have time to investigate, so I don't 
know what the cause is. SOLR-5325 definitely sounds similar so I'll close this 
issue now. Thanks!

 Restarting zookeeper makes the overseer stop processing queue events
 

 Key: SOLR-5199
 URL: https://issues.apache.org/jira/browse/SOLR-5199
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
Assignee: Mark Miller
  Labels: overseer, zookeeper
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: 5199-log


 Taking the external zookeeper down (I'm just testing, so I only have one 
 external zookeeper instance running) and then bringing it back up seems to 
 have caused the overseer to stop processing queue event.
 I tried to issue the delete collection command (curl 
 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and 
 each time it just timed out. Looking at the zookeeper data, I see
 ... 
 /overseer
collection-queue-work
  qn-02
  qn-04
  qn-06
 ...
 and the qn-xxx are not being processed.
 Attached please find the log from the overseer (according to 
 /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Jessica Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jessica Cheng closed SOLR-5199.
---

Resolution: Duplicate

 Restarting zookeeper makes the overseer stop processing queue events
 

 Key: SOLR-5199
 URL: https://issues.apache.org/jira/browse/SOLR-5199
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
Assignee: Mark Miller
  Labels: overseer, zookeeper
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: 5199-log


 Taking the external zookeeper down (I'm just testing, so I only have one 
 external zookeeper instance running) and then bringing it back up seems to 
 have caused the overseer to stop processing queue event.
 I tried to issue the delete collection command (curl 
 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and 
 each time it just timed out. Looking at the zookeeper data, I see
 ... 
 /overseer
collection-queue-work
  qn-02
  qn-04
  qn-06
 ...
 and the qn-xxx are not being processed.
 Attached please find the log from the overseer (according to 
 /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5273) Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792800#comment-13792800
 ] 

ASF subversion and git services commented on LUCENE-5273:
-

Commit 1531354 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1531354 ]

LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary 
distributions accompanying a release, including on Maven Central, should be 
identical across all distributions.

 Binary artifacts in Lucene and Solr convenience binary distributions 
 accompanying a release, including on Maven Central, should be identical 
 across all distributions
 -

 Key: LUCENE-5273
 URL: https://issues.apache.org/jira/browse/LUCENE-5273
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6

 Attachments: LUCENE-5273.patch


 As mentioned in various issues (e.g. LUCENE-3655, LUCENE-3885, SOLR-4766), we 
 release multiple versions of the same artifact: binary Maven artifacts are 
 not identical to the ones in the Lucene and Solr binary distributions, and 
 the Lucene jars in the Solr binary distribution, including within the war, 
 are not identical to the ones in the Lucene binary distribution.  This is bad.
 It's (probably always?) not horribly bad, since the differences all appear to 
 be caused by the build re-creating manifests and re-building jars and the 
 Solr war from their constituents at various points in the release build 
 process; as a result, manifest timestamp attributes, as well as archive 
 metadata (at least constituent timestamps, maybe other things?), differ each 
 time a jar is rebuilt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792824#comment-13792824
 ] 

Mark Miller commented on SOLR-5323:
---

I also think this was a mistake - I don't know that we need another solr.home 
type thing to address it though.

The root of the issue is that the clustering is not really lazy loading 
clustering - and the current policy is to lazy load the contrib modules - and 
that is because of the component. I think Erik is on to the right path with 
lazy SearchComponents. I think that if the only request handlers that refer to 
a search component are lazy, they should probably also init lazily. I have not 
looked into how hard that is to do, but it seems like the correct fix to bring 
clustering in line with the other contribs. I also think the whole enabled flag 
we had is no good.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792834#comment-13792834
 ] 

Dawid Weiss commented on SOLR-5323:
---

I can revert to lazy-loading, not a problem. But this isn't solving the 
relative paths issue at all. Like I mentioned there were several times when I 
had to pass an example preconfigured solr configuration to somebody -- this 
always required that person to put the content of the example under a specific 
directory in Solr distribution, otherwise things wouldn't work because of 
relative paths. It was a pain to explain why this step is needed and to 
enforce... I ended up just copying the required JARs into the example. This 
seems wrong somehow -- if it's a solr distribution then there should be a way 
to reference contribs in a way that allows people to have their stuff in any 
folder hierarchy?

What do you think?

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792845#comment-13792845
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531368 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531368 ]

LUCENE-5269: satisfy the policeman

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792846#comment-13792846
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531369 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531369 ]

LUCENE-5269: satisfy the policeman

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792850#comment-13792850
 ] 

Mark Miller commented on SOLR-5323:
---

I just think anything with the relative paths is a separate issue.

You can use any hierarchy - you just have to change those paths. I'm all for 
that being improved somehow, but the issue here seems to be:

Solr contrib modules are lazy loaded so that if you don't use them, you can 
delete any of them from the dist package layout and things still work. Or you 
can not delete them and if you try and use them, things work. Clustering now 
violates that. It's not really clusterings fault, it seems to more be a 
limitation of the search component.


 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792859#comment-13792859
 ] 

Dawid Weiss commented on SOLR-5323:
---

Ok, I will reverting the changes from SOLR-4708.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792884#comment-13792884
 ] 

ASF subversion and git services commented on LUCENE-5275:
-

Commit 1531376 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531376 ]

LUCENE-5275: Change AttributeSource.toString to display the current state of 
attributes

 Fix AttributeSource.toString()
 --

 Key: LUCENE-5275
 URL: https://issues.apache.org/jira/browse/LUCENE-5275
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5275.patch, LUCENE-5275.patch


 Its currently just Object.toString, e.g.:
 org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
 But I think we should make it more useful, to end users trying to see what 
 their chain is doing, and to make SOPs easier when debugging:
 {code}
 EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
 try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this 
 already!)) {
   ts.reset();
   while (ts.incrementToken()) {
 System.out.println(ts.toString());
   }
   ts.end();
 }
 {code}
 Proposed output:
 {noformat}
 PorterStemFilter@8a32165c term=it,bytes=[69 
 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false
 PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-5323:
--

Attachment: SOLR-5323.patch

Patch reverting (portions) of SOLR-4708.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-5323:
--

Fix Version/s: 4.5.1

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792895#comment-13792895
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531377 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1531377 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Enable ClusteringComponent by default
 -

 Key: SOLR-4708
 URL: https://issues.apache.org/jira/browse/SOLR-4708
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4708.patch, SOLR-4708.patch


 In the past, the ClusteringComponent used to rely on 3rd party JARs not 
 available from a Solr distro.  This is no longer the case, but the /browse UI 
 and other references still had the clustering component disabled in the 
 example with an awkward system property way to enable it.  Let's remove all 
 of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792894#comment-13792894
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531377 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1531377 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792897#comment-13792897
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531378 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: (was: LUCENE-5274.patch)

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor

 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792898#comment-13792898
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531378 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Enable ClusteringComponent by default
 -

 Key: SOLR-4708
 URL: https://issues.apache.org/jira/browse/SOLR-4708
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4708.patch, SOLR-4708.patch


 In the past, the ClusteringComponent used to rely on 3rd party JARs not 
 available from a Solr distro.  This is no longer the case, but the /browse UI 
 and other references still had the clustering component disabled in the 
 example with an awkward system property way to enable it.  Let's remove all 
 of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: (was: LUCENE-5274-4.patch)

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor

 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: LUCENE-5274.patch

New version of the patch.  This one works a lot better with phrases and even 
works on fields that have the same source but different tokenizers.

It still makes highlighting depend on the analysis module to pick up 
PerFieldAnalyzerWrapper.

I think all the new code this adds to FieldPhraseList deserves a unit test on 
its own but I'm not in the frame of mind to write one at the moment so I'll 
have to come back to it later.

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792903#comment-13792903
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531380 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Enable ClusteringComponent by default
 -

 Key: SOLR-4708
 URL: https://issues.apache.org/jira/browse/SOLR-4708
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4708.patch, SOLR-4708.patch


 In the past, the ClusteringComponent used to rely on 3rd party JARs not 
 available from a Solr distro.  This is no longer the case, but the /browse UI 
 and other references still had the clustering component disabled in the 
 example with an awkward system property way to enable it.  Let's remove all 
 of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792902#comment-13792902
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531380 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved SOLR-5323.
---

Resolution: Fixed

Applied to branch_4x, lucene_solr_4_5 and trunk.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792907#comment-13792907
 ] 

Robert Muir commented on LUCENE-5274:
-

Why would a highlighter improvement require mocktokenizer changes?

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792912#comment-13792912
 ] 

ASF subversion and git services commented on LUCENE-5275:
-

Commit 1531381 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531381 ]

LUCENE-5275: Change AttributeSource.toString to display the current state of 
attributes

 Fix AttributeSource.toString()
 --

 Key: LUCENE-5275
 URL: https://issues.apache.org/jira/browse/LUCENE-5275
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5275.patch, LUCENE-5275.patch


 Its currently just Object.toString, e.g.:
 org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
 But I think we should make it more useful, to end users trying to see what 
 their chain is doing, and to make SOPs easier when debugging:
 {code}
 EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
 try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this 
 already!)) {
   ts.reset();
   while (ts.incrementToken()) {
 System.out.println(ts.toString());
   }
   ts.end();
 }
 {code}
 Proposed output:
 {noformat}
 PorterStemFilter@8a32165c term=it,bytes=[69 
 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false
 PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-10-11 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792911#comment-13792911
 ] 

Jessica Cheng commented on SOLR-4816:
-

I think the latest patch:

-if (request instanceof IsUpdateRequest  updatesToLeaders) {
+if (request instanceof IsUpdateRequest) {

removed the effect of the updatesToLeaders variable. Looking at 
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup
 it's not used anywhere to make a decision anymore.

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792913#comment-13792913
 ] 

Nik Everett commented on LUCENE-5274:
-

Hey, forgot to mention that.  MockTokenizer seems to throw away the character 
after the end of each token even if that character is the valid start to the 
next token.  This comes up because I wanted to tokenize strings in a simplistic 
way to test that the highlighter can handle different tokenizers and it just 
wasn't working right.  So I fixed MockTokenizer but I did it in a pretty 
brutal way.  I'm happy to move the change to another bug and improve it but 
testing the highlighter change without it is a bit painful.

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792921#comment-13792921
 ] 

Robert Muir commented on LUCENE-5274:
-

if you suspect there is a bug in mocktokenizer, please open a separate issue 
for that. mocktokenizer is used by like, thousands of tests :)

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5275) Fix AttributeSource.toString()

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5275.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.6

 Fix AttributeSource.toString()
 --

 Key: LUCENE-5275
 URL: https://issues.apache.org/jira/browse/LUCENE-5275
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5275.patch, LUCENE-5275.patch


 Its currently just Object.toString, e.g.:
 org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
 But I think we should make it more useful, to end users trying to see what 
 their chain is doing, and to make SOPs easier when debugging:
 {code}
 EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
 try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this 
 already!)) {
   ts.reset();
   while (ts.incrementToken()) {
 System.out.println(ts.toString());
   }
   ts.end();
 }
 {code}
 Proposed output:
 {noformat}
 PorterStemFilter@8a32165c term=it,bytes=[69 
 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false
 PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4073) Overseer will miss operations in some cases for OverseerCollectionProcessor

2013-10-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4073.
---

   Resolution: Duplicate
Fix Version/s: (was: 4.6)

 Overseer will miss  operations in some cases for OverseerCollectionProcessor
 

 Key: SOLR-4073
 URL: https://issues.apache.org/jira/browse/SOLR-4073
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2
 Environment: Solr cloud
Reporter: Raintung Li
Assignee: Mark Miller
 Attachments: patch-4073

   Original Estimate: 168h
  Remaining Estimate: 168h

 One overseer disconnect with Zookeeper, but overseer thread still handle the 
 request(A) in the DistributedQueue. Example: overseer thread reconnect 
 Zookeeper try to remove the Top's request. workQueue.remove();.   
 Now the other server will take over the overseer privilege because old 
 overseer disconnect. Start overseer thread and handle the queue request(A) 
 again, and remove the request(A) from queue, then try to get the top's 
 request(B, doesn't get). In the this time old overseer reconnect with 
 ZooKeeper, and remove the top's request from queue. Now the top request is B, 
 it is moved by old overseer server.  New overseer server never do B 
 request,because this request deleted by old overseer server, at the last this 
 request(B) miss operations.
 At best, distributeQueue.peek can get the request's ID that will be removed 
 for workqueue.remove(ID), not remove the top's request.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5265) Make BlockPackedWriter constructor take an acceptable overhead ratio

2013-10-11 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5265:
-

Attachment: LUCENE-5265.patch

Here is a patch.

 Make BlockPackedWriter constructor take an acceptable overhead ratio
 

 Key: LUCENE-5265
 URL: https://issues.apache.org/jira/browse/LUCENE-5265
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5265.patch


 Follow-up of http://search-lucene.com/m/SjmSW1CZYuZ1
 MemoryDocValuesFormat takes an acceptable overhead ratio but it is only used 
 when doing table compression. It should be used for all compression methods, 
 especially DELTA_COMPRESSED whose encoding is based on BlockPackedWriter.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5266) Optimization of the direct PackedInts readers

2013-10-11 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792948#comment-13792948
 ] 

Adrien Grand commented on LUCENE-5266:
--

bq. The only caveat is the encoding would need to ensure there is always an 
extra 2 bytes at the end.

There are some places (codecs) where I encode many short sequences 
consecutively so I care about not wasting extra bytes but if this proves to 
help performance, I think it shouldn't be too hard to do add the ability to 
have extra bytes at the end of the stream (I'm thinking about adding a new 
PackedInts.Format to the enum but there might be other options).

 Optimization of the direct PackedInts readers
 -

 Key: LUCENE-5266
 URL: https://issues.apache.org/jira/browse/LUCENE-5266
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5266.patch, LUCENE-5266.patch


 Given that the initial focus for PackedInts readers was more on in-memory 
 readers (for storing stuff like the mapping from old to new doc IDs at 
 merging time), I never spent time trying to optimize the direct readers 
 although it could be beneficial now that they are used for disk-based doc 
 values.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5340) Add support for named snapshots

2013-10-11 Thread Mike Schrag (JIRA)
Mike Schrag created SOLR-5340:
-

 Summary: Add support for named snapshots
 Key: SOLR-5340
 URL: https://issues.apache.org/jira/browse/SOLR-5340
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Mike Schrag


It would be really nice if Solr supported named snapshots. Right now if you 
snapshot a SolrCloud cluster, every node potentially records a slightly 
different timestamp. Correlating those back together to effectively restore the 
entire cluster to a consistent snapshot is pretty tedious.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Nik Everett (JIRA)
Nik Everett created LUCENE-5278:
---

 Summary: MockTokenizer throws away the character right after a 
token even if it is a valid start to a new token
 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial


MockTokenizer throws away the character right after a token even if it is a 
valid start to a new token.  You won't see this unless you build a tokenizer 
that can recognize every character like with new RegExp(.) or RegExp(...).

Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792974#comment-13792974
 ] 

Nik Everett commented on LUCENE-5274:
-

Filed LUCENE-5278.

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5278:


Attachment: LUCENE-5278.patch

This patch fixes the behaviour from my perspective but breaks a bunch of 
other tests.

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial
 Attachments: LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792993#comment-13792993
 ] 

Robert Muir commented on LUCENE-5278:
-

I think i understand what you want: it makes sense. The only reason its the way 
it is today is because this thing historically came from CharTokenizer (see the 
isTokenChar?).

But it would be better if you could e.g. make a pattern like ([A-Z]a-z+) and 
for it to actually break FooBar into Foo, Bar rather than throwout out bar 
all together.

I'll dig into this!

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial
 Attachments: LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793000#comment-13793000
 ] 

Robert Muir commented on LUCENE-5274:
-

Thanks Nik: I can help with that one!

Another question: about the MergedIterator :)

I can see the possible use case here, but I think it deserves some discussion 
first (versus just making it public).
This thing has limitations (its currently only used by indexwriter for 
buffereddeletes, its basically like a MultiTerms over an Iterator). For example 
each iterator it consumes should not have duplicate values according to its 
compareTo(): its not clear to me this WeightedPhraseInfo behaves this way:
* what if you have a synonym of dog sitting on top of cat with the same 
boost factor... its a duplicate according to that compareTo, but the text is 
different.
* what if the synonym is just dog with posinc=0 stacked ontop of itself 
(which is totally valid to do)...

Perhaps highlighting can make use of it, but its unclear to me that its really 
following the contract. Furthermore the class in question (WeightedPhraseInfo) 
is public, and adding Comparable to it looks like it will create a situation 
where its inconsistent with equals()... I think this is a little dangerous.

If it turns out we can reuse it: great! But i think rather than just slapping 
public on it, we should move it to .util, ensure it has good javadocs and unit 
tests, and investigate what exactly happens when these contracts are violated: 
e.g. can we make an exception happen rather than just broken behavior in a way 
that won't hurt performance and so on?



 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-5278:
---

Assignee: Robert Muir

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793018#comment-13793018
 ] 

Nik Everett commented on LUCENE-5274:
-

{quote}
I can see the possible use case here, but I think it deserves some discussion 
first (versus just making it public).
{quote}
Sure!  I'm more used to Guava's tools so I think I was lulled in to a false 
sense of recognition.  No chance of updating to a modern version of Guava?:)

{quote}
This thing has limitations (its currently only used by indexwriter for 
buffereddeletes, its basically like a MultiTerms over an Iterator). For example 
each iterator it consumes should not have duplicate values according to its 
compareTo(): its not clear to me this WeightedPhraseInfo behaves this way
{quote}
Yikes!  I didn't catch that but now that you point it out it is right there in 
the docs and I should have.  WeightedPhraseInfo doesn't behave that way and 

{quote}
Furthermore the class in question (WeightedPhraseInfo) is public, and adding 
Comparable to it looks like it will create a situation where its inconsistent 
with equals()... I think this is a little dangerous.
{quote}
I agree on the inconsistent with inconsistent with equals.  I can either fix 
that or use a Comparator for sorting both WeightedPhraseInfo and Toffs.  That'd 
require a MergeSorter that can take one but 

{quote}
If it turns out we can reuse it: great! But i think rather than just slapping 
public on it, we should move it to .util, ensure it has good javadocs and unit 
tests, and investigate what exactly happens when these contracts are violated: 
e.g. can we make an exception happen rather than just broken behavior in a way 
that won't hurt performance and so on?
{quote}
Makes sense to me.

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793029#comment-13793029
 ] 

Robert Muir commented on LUCENE-5274:
-

{quote}
Sure! I'm more used to Guava's tools so I think I was lulled in to a false 
sense of recognition. No chance of updating to a modern version of Guava?
{quote}

There is no lucene dependency on guava. I don't think we should introduce one, 
and it wouldnt solve the issues i mentioned anyway (e.g. comparable 
inconsistent with equals and stuff). It would only add 2.1MB of bloated 
unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its 
useless).

We should keep our third party dependencies minimal and necessary so that any 
app using lucene can choose for itself what version of this stuff (if any) it 
wants to use. If we rely upon unnecessary stuff it hurts the end user by 
forcing them to compatible versions.


 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5027) Field Collapsing PostFilter

2013-10-11 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5027:
-

Attachment: SOLR-5027.patch

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-10-11 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793035#comment-13793035
 ] 

Joel Bernstein commented on SOLR-5027:
--

Patch that passes precommit for trunk

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors

2013-10-11 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793037#comment-13793037
 ] 

Bill Bell commented on LUCENE-5212:
---

It appears this happens on 7u40 64-bit too. See 
https://bugs.openjdk.java.net/browse/JDK-8024830

Am I reading this wrong?

Start failing around hs24-b21:

   [junit4] # SIGSEGV (0xb) at pc=0xfd7ff91d9f7d, pid=23810, tid=343
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode 
solaris-amd64 )
   [junit4] # Problematic frame:
   [junit4] # J 
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields;
   [junit4] #

Note, first 7u40 build b01 has hs24-b24.

Next, I will try to find changeset.



 java 7u40 causes sigsegv and corrupt term vectors
 -

 Key: LUCENE-5212
 URL: https://issues.apache.org/jira/browse/LUCENE-5212
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: crashFaster2.0.patch, crashFaster.patch, 
 hs_err_pid32714.log, jenkins.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793038#comment-13793038
 ] 

Nik Everett commented on LUCENE-5274:
-

{{quote}}
There is no lucene dependency on guava. I don't think we should introduce one, 
and it wouldnt solve the issues i mentioned anyway (e.g. comparable 
inconsistent with equals and stuff). It would only add 2.1MB of bloated 
unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its 
useless).

We should keep our third party dependencies minimal and necessary so that any 
app using lucene can choose for itself what version of this stuff (if any) it 
wants to use. If we rely upon unnecessary stuff it hurts the end user by 
forcing them to compatible versions.
{{quote}}
I figured that was the reasoning and I don't intend to argue with it.  In this 
case it would provide a method to merge sorted iterators just like 
MergedIterator only without the caveats around duplication but I'm happy to 
work around it.  Guava certainly wouldn't fix my forgetting equals and hashcode.

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches

2013-10-11 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5279:
--

 Summary: Don't use recursion in DisjunctionSumScorer.countMatches
 Key: LUCENE-5279
 URL: https://issues.apache.org/jira/browse/LUCENE-5279
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless


I noticed the TODO in there, to not use recursion, so I fixed it to just use a 
private queue ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches

2013-10-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5279:
---

Attachment: LUCENE-5279.patch

Patch.

However, it seems to be slower, testing on full Wikpedia en:

{noformat}
Report after iter 10:
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
   OrHighLow   14.44  (7.7%)   12.48  (4.7%)  
-13.6% ( -24% -   -1%)
  OrHighHigh5.56  (6.2%)4.86  (4.4%)  
-12.6% ( -21% -   -2%)
   OrHighMed   18.62  (6.7%)   16.29  (4.4%)  
-12.5% ( -22% -   -1%)
  AndHighLow  398.09  (1.6%)  390.34  (2.3%)   
-1.9% (  -5% -1%)
OrNotHighLow  374.60  (1.7%)  369.61  (1.7%)   
-1.3% (  -4% -2%)
  Fuzzy1   67.10  (2.1%)   66.41  (2.2%)   
-1.0% (  -5% -3%)
OrNotHighMed   51.68  (1.7%)   51.37  (1.5%)   
-0.6% (  -3% -2%)
  Fuzzy2   46.73  (2.8%)   46.45  (2.6%)   
-0.6% (  -5% -4%)
OrHighNotLow   20.05  (3.5%)   19.96  (5.0%)   
-0.5% (  -8% -8%)
OrHighNotMed   27.15  (3.2%)   27.05  (4.8%)   
-0.3% (  -8% -7%)
   OrNotHighHigh7.72  (3.2%)7.70  (4.7%)   
-0.3% (  -7% -7%)
   OrHighNotHigh9.81  (3.0%)9.79  (4.5%)   
-0.1% (  -7% -7%)
 LowSloppyPhrase   43.83  (1.9%)   43.89  (2.1%)
0.2% (  -3% -4%)
  IntNRQ3.49  (4.5%)3.50  (4.1%)
0.2% (  -8% -9%)
 Prefix3   70.74  (2.7%)   71.01  (2.4%)
0.4% (  -4% -5%)
HighTerm   65.33  (3.0%)   65.62 (13.5%)
0.4% ( -15% -   17%)
 MedSloppyPhrase3.47  (3.5%)3.49  (4.7%)
0.6% (  -7% -9%)
   LowPhrase   13.06  (1.5%)   13.14  (2.0%)
0.6% (  -2% -4%)
Wildcard   16.71  (2.9%)   16.82  (2.2%)
0.7% (  -4% -5%)
 MedTerm  100.90  (2.5%)  101.71 (10.4%)
0.8% ( -11% -   14%)
 LowTerm  311.85  (1.4%)  314.53  (6.4%)
0.9% (  -6% -8%)
HighSpanNear8.06  (5.1%)8.13  (5.9%)
0.9% (  -9% -   12%)
 Respell   48.00  (2.3%)   48.45  (2.8%)
0.9% (  -4% -6%)
HighSloppyPhrase3.40  (4.1%)3.43  (6.6%)
1.0% (  -9% -   12%)
  AndHighMed   34.14  (1.6%)   34.52  (1.7%)
1.1% (  -2% -4%)
 AndHighHigh   28.15  (1.7%)   28.48  (1.7%)
1.2% (  -2% -4%)
 MedSpanNear   30.62  (2.8%)   31.07  (3.2%)
1.5% (  -4% -7%)
 LowSpanNear   10.30  (2.6%)   10.48  (2.9%)
1.7% (  -3% -7%)
   MedPhrase  195.60  (5.1%)  201.44  (6.6%)
3.0% (  -8% -   15%)
  HighPhrase4.17  (5.6%)4.34  (6.9%)
4.0% (  -8% -   17%)
{noformat}

So ... I don't plan on pursuing it any further, but wanted to open the issue in 
case anybody wants to try ...

 Don't use recursion in DisjunctionSumScorer.countMatches
 

 Key: LUCENE-5279
 URL: https://issues.apache.org/jira/browse/LUCENE-5279
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-5279.patch


 I noticed the TODO in there, to not use recursion, so I fixed it to just use 
 a private queue ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5278:


Attachment: LUCENE-5278.patch

Nice patch Nik!

I think this is ready: i tweaked variable names and rearranged stuff (e.g. i 
use -1 instead of Integer so we arent boxing and a few other things).

I also added some unit tests.

The main issues why tests were failing with your original patch:
* reset() needed to clear the buffer variables.
* the state machine needed some particular extra check when emitting a token: 
e.g. if you make a regex of .., but you send it abcde, the tokens should be 
ab, cd, but not e. so when we end on a partial match, we have to check 
that we are in an accept state.
* term-limit-exceeded is a special case (versus last character being in a 
reject state)

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch, LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5278:


Attachment: LUCENE-5278.patch

added a few more tests to TestMockAnalyzer so all these crazy corner cases are 
found there and not debugging other tests :)

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793204#comment-13793204
 ] 

Robert Muir commented on LUCENE-5274:
-

Yeah I guess for me, its not a caveat at all, but a feature :)

We need to iterate sorted-union for stuff in the index like terms and fields, 
so they appear as if they exist only once.
The guava one isn't doing a union operation but just simply maintaining 
compareTo() order...


 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793205#comment-13793205
 ] 

ASF subversion and git services commented on LUCENE-5278:
-

Commit 1531479 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531479 ]

LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works 
better with custom regular expressions

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793206#comment-13793206
 ] 

Robert Muir commented on LUCENE-5278:
-

I committed this to trunk: I did a lot of testing locally but I want to let 
Jenkins have its way with it for a few hours before backporting to branch_4x.

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



help in getting sort to work on an indexed binary field

2013-10-11 Thread Jessica Cheng
Hi,

We added a custom field type to allow an indexed binary field type that
supports search (exact match), prefix search, and sort as unsigned bytes
lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator
accomplishes what we want, and even though the name of the comparator
mentions UTF8, it doesn't actually assume so and just does byte-level
operation, so it's good. However, when we do this across different nodes,
we run into an issue where in QueryComponent.doFieldSortValues:

  // Must do the same conversion when sorting by a
  // String field in Lucene, which returns the terms
  // data as BytesRef:
  if (val instanceof BytesRef) {
UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
field.setStringValue(spare.toString());
val = ft.toObject(field);
  }

UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually
UTF8. I did a hack where I specified our own field comparator to be
ByteBuffer based to get around that instanceof check, but then the field
value gets transformed into BYTEARR in JavaBinCodec, and when it's
unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds,
a ShardFieldSortedHitQueue is constructed with
ShardDoc.getCachedComparator, which decides to give me comparatorNatural in
the else of the TODO for CUSTOM, which barfs because byte[] are not
Comparable...

Any advice is appreciated!

Thanks,
Jessica


[jira] [Commented] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values

2013-10-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793213#comment-13793213
 ] 

Yonik Seeley commented on SOLR-5330:


So I instrumented the faceting code like so:
{code}
  seg.tempBR = seg.tenum.next();
if (seg.tempBR.bytes == val.bytes) {
System.err.println(##SHARING DETECTED: val.offset=+val.offset +  
val.length=+val.length +  new.offset=+seg.tempBR.offset +  
new.length=+seg.tempBR.length);
if (val.offset == seg.tempBR.offset) {
  System.err.println(!!SHARING USING SAME OFFSET);
}
{code}

And it detects tons of sharing (the returned bytesref still pointing to the 
same byte[]) of course... but the thing is, it never generates an invalid 
result.  calling next() on the term enum never changes the bytes that were 
previously pointed to... it simply points to a different part of the same byte 
array.  I can never detect a case where the original bytes are changed, thus 
invalidating the shallow copy.


 PerSegmentSingleValuedFaceting overwrites facet values
 --

 Key: SOLR-5330
 URL: https://issues.apache.org/jira/browse/SOLR-5330
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2.1
Reporter: Michael Froh
Assignee: Yonik Seeley
 Attachments: solr-5330.patch


 I recently tried enabling facet.method=fcs for one of my indexes and found a 
 significant performance improvement (with a large index, many facet values, 
 and near-realtime updates). Unfortunately, the results were also wrong. 
 Specifically, some facet values were being partially overwritten by other 
 facet values. (That is, if I expected facet values like abcdef and 123, I 
 would get a value like 123def.)
 Debugging through the code, it looks like the problem was in 
 PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, 
 when BytesRef val is shallow-copied from the temporary per-segment BytesRef. 
 The byte array assigned to val is shared with the byte array for seg.tempBR, 
 and is overwritten a few lines down by the call to seg.tenum.next().
 I managed to fix it locally by replacing the shallow copy with a deep copy.
 While I encountered this problem on Solr 4.2.1, I see that the code is 
 identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I 
 believe this bug still exists.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values

2013-10-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793213#comment-13793213
 ] 

Yonik Seeley edited comment on SOLR-5330 at 10/12/13 2:30 AM:
--

So I instrumented the faceting code like so:
{code}
  seg.tempBR = seg.tenum.next();
if (seg.tempBR.bytes == val.bytes) {
System.err.println(##SHARING DETECTED: val.offset=+val.offset +  
val.length=+val.length +  new.offset=+seg.tempBR.offset +  
new.length=+seg.tempBR.length);
if (val.offset == seg.tempBR.offset) {
  System.err.println(!!SHARING USING SAME OFFSET);
}
{code}

And it detects tons of sharing (the returned bytesref still pointing to the 
same byte[]) of course... but the thing is, it never generates an invalid 
result.  calling next() on the term enum never changes the bytes that were 
previously pointed to... it simply points to a different part of the same byte 
array.  I can never detect a case where the original bytes are changed, thus 
invalidating the shallow copy.

Example output:
{code}
##SHARING DETECTED: val.offset=1 val.length=4 new.offset=6 new.length=4
{code}


was (Author: ysee...@gmail.com):
So I instrumented the faceting code like so:
{code}
  seg.tempBR = seg.tenum.next();
if (seg.tempBR.bytes == val.bytes) {
System.err.println(##SHARING DETECTED: val.offset=+val.offset +  
val.length=+val.length +  new.offset=+seg.tempBR.offset +  
new.length=+seg.tempBR.length);
if (val.offset == seg.tempBR.offset) {
  System.err.println(!!SHARING USING SAME OFFSET);
}
{code}

And it detects tons of sharing (the returned bytesref still pointing to the 
same byte[]) of course... but the thing is, it never generates an invalid 
result.  calling next() on the term enum never changes the bytes that were 
previously pointed to... it simply points to a different part of the same byte 
array.  I can never detect a case where the original bytes are changed, thus 
invalidating the shallow copy.


 PerSegmentSingleValuedFaceting overwrites facet values
 --

 Key: SOLR-5330
 URL: https://issues.apache.org/jira/browse/SOLR-5330
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2.1
Reporter: Michael Froh
Assignee: Yonik Seeley
 Attachments: solr-5330.patch


 I recently tried enabling facet.method=fcs for one of my indexes and found a 
 significant performance improvement (with a large index, many facet values, 
 and near-realtime updates). Unfortunately, the results were also wrong. 
 Specifically, some facet values were being partially overwritten by other 
 facet values. (That is, if I expected facet values like abcdef and 123, I 
 would get a value like 123def.)
 Debugging through the code, it looks like the problem was in 
 PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, 
 when BytesRef val is shallow-copied from the temporary per-segment BytesRef. 
 The byte array assigned to val is shared with the byte array for seg.tempBR, 
 and is overwritten a few lines down by the call to seg.tenum.next().
 I managed to fix it locally by replacing the shallow copy with a deep copy.
 While I encountered this problem on Solr 4.2.1, I see that the code is 
 identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I 
 believe this bug still exists.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset

2013-10-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793217#comment-13793217
 ] 

Shai Erera commented on LUCENE-5277:


I thought of that ... it started in LUCENE-5248 where I want to keep a growable 
bitset alongside the docs/values arrays to mark whether a document has an 
updated value or not (following Rob's idea). When I implemented that using 
OpenBitSet, I discovered the bug and opened LUCENE-5272. As I worked on fixing 
the bug, I realized OBS has other issues as well and thought that perhaps I can 
use FixedBitSet, only grow it by copying its array. This is doable even without 
the ctor, since I can call getBits() and do it like that:

{code}
FixedBitSet newBits = new FixedBitSet(17); // new capacity
System.arraycopy(oldBits.getBits(), 0, newBits.getBits(), 0, 
oldBits.getBits().length);
{code}

I then noticed there is a ctor already in FixedBitSet which copies another FBS 
so I thought just to improve it. It seems more intuitive to do t than let users 
figure out they can grow a FixedBitSet like above?

 Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the 
 new bitset
 ---

 Key: LUCENE-5277
 URL: https://issues.apache.org/jira/browse/LUCENE-5277
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5277.patch


 FixedBitSet copy constructor is redundant the way it is now -- one can call 
 FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). 
 I think it will be useful to add a numBits parameter to that method to allow 
 growing/shrinking the new bitset, while copying all relevant bits from the 
 passed one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

2013-10-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793221#comment-13793221
 ] 

Shai Erera commented on LUCENE-5248:


bq. Do we have test coverage of updating with null (deleting the update from 
the document)?

We have TestNDVUpdates.testUnsetValue and testUnsetAllValues, though we don't 
have a test which unsets a value while a document is merging. We have tests 
that cover updating a value (no unsetting) while it is merging, I guess I can 
modify them to unset as well, but will then need to improve the test to use 
docsWithField. I'll look into it.

bq. So if there are two terms in a row with the same field (which does not 
exist) won't we hit NPE?

Good catch! You're right, I had another {{if (termsEnum == null) continue}} but 
I removed it since I thought the above if takes care of that. I added a unit 
test which reproduces and the fix. Will commit on LUCENE-5189.

 Improve the data structure used in ReaderAndLiveDocs to hold the updates
 

 Key: LUCENE-5248
 URL: https://issues.apache.org/jira/browse/LUCENE-5248
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, 
 LUCENE-5248.patch


 Currently ReaderAndLiveDocs holds the updates in two structures:
 +MapString,MapInteger,Long+
 Holds a mapping from each field, to all docs that were updated and their 
 values. This structure is updated when applyDeletes is called, and needs to 
 satisfy several requirements:
 # Un-ordered writes: if a field f is updated by two terms, termA and termB, 
 in that order, and termA affects doc=100 and termB doc=2, then the updates 
 are applied in that order, meaning we cannot rely on updates coming in order.
 # Same document may be updated multiple times, either by same term (e.g. 
 several calls to IW.updateNDV) or by different terms. Last update wins.
 # Sequential read: when writing the updates to the Directory 
 (fieldsConsumer), we iterate on the docs in-order and for each one check if 
 it's updated and if not, pull its value from the current DV.
 # A single update may affect several million documents, therefore need to be 
 efficient w.r.t. memory consumption.
 +MapInteger,MapString,Long+
 Holds a mapping from a document, to all the fields that it was updated in and 
 the updated value for each field. This is used by IW.commitMergedDeletes to 
 apply the updates that came in while the segment was merging. The 
 requirements this structure needs to satisfy are:
 # Access in doc order: this is how commitMergedDeletes works.
 # One-pass: we visit a document once (currently) and so if we can, it's 
 better if we know all the fields in which it was updated. The updates are 
 applied to the merged ReaderAndLiveDocs (where they are stored in the first 
 structure mentioned above).
 Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

2013-10-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793224#comment-13793224
 ] 

Shai Erera commented on LUCENE-5248:


bq. I added a unit test which reproduces and the fix. Will commit on 
LUCENE-5189.

Sorry, it's a bug introduced in this patch so I'll fix here.

 Improve the data structure used in ReaderAndLiveDocs to hold the updates
 

 Key: LUCENE-5248
 URL: https://issues.apache.org/jira/browse/LUCENE-5248
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, 
 LUCENE-5248.patch


 Currently ReaderAndLiveDocs holds the updates in two structures:
 +MapString,MapInteger,Long+
 Holds a mapping from each field, to all docs that were updated and their 
 values. This structure is updated when applyDeletes is called, and needs to 
 satisfy several requirements:
 # Un-ordered writes: if a field f is updated by two terms, termA and termB, 
 in that order, and termA affects doc=100 and termB doc=2, then the updates 
 are applied in that order, meaning we cannot rely on updates coming in order.
 # Same document may be updated multiple times, either by same term (e.g. 
 several calls to IW.updateNDV) or by different terms. Last update wins.
 # Sequential read: when writing the updates to the Directory 
 (fieldsConsumer), we iterate on the docs in-order and for each one check if 
 it's updated and if not, pull its value from the current DV.
 # A single update may affect several million documents, therefore need to be 
 efficient w.r.t. memory consumption.
 +MapInteger,MapString,Long+
 Holds a mapping from a document, to all the fields that it was updated in and 
 the updated value for each field. This is used by IW.commitMergedDeletes to 
 apply the updates that came in while the segment was merging. The 
 requirements this structure needs to satisfy are:
 # Access in doc order: this is how commitMergedDeletes works.
 # One-pass: we visit a document once (currently) and so if we can, it's 
 better if we know all the fields in which it was updated. The updates are 
 applied to the merged ReaderAndLiveDocs (where they are stored in the first 
 structure mentioned above).
 Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >