date:20120823


[ 
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440246#comment-13440246
 ] 

Michael McCandless commented on LUCENE-4322:


Actually the size increase wasn't as bad as I thought ... Lucene core JAR is 
2.3 MB in 4.0 Beta and now it's 2.7 MB.

So I agree the immediate problem (can't compile in some envs) is fixed ... so 
making these smaller isn't really important.

Still if we have pointless code (int[] values with bpv  32) we should remove 
it.

And Dawid's idea sounds compelling if it could make things faster!

 Can we make oal.util.packed.BulkOperation* smaller?
 ---

 Key: LUCENE-4322
 URL: https://issues.apache.org/jira/browse/LUCENE-4322
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.0


 These source files add up to a lot of sources ... it caused problems when 
 compiling under Maven and InteliJ.
 I committed a change to make separates files, but in aggregate this is still 
 a lot ...
 EG maybe we don't need to specialize encode?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.x-java7 - Build # 334 - Failure

2012-08-23 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-java7/334/

All tests passed

Build Log:
[...truncated 7963 lines...]
[junit4:junit4] ERROR: JVM J0 ended with an exception, command line: 
/usr/local/openjdk7/jre/bin/java -XX:+UseG1GC -Dtests.prefix=tests 
-Dtests.seed=AAA2B5B80FBC4CA0 -Xmx512M -Dtests.iters= -Dtests.verbose=false 
-Dtests.infostream=false 
-Dtests.lockdir=/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-java7/lucene/build
 -Dtests.codec=random -Dtests.postingsformat=random -Dtests.locale=random 
-Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-java7/solr/testlogging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=3 -DtempDir=. 
-Dlucene.version=4.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Dfile.encoding=UTF-8 -classpath

[jira] [Updated] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy

2012-08-23 Thread Alexey Lef (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexey Lef updated LUCENE-4323:
---

Attachment: (was: LUCENE-4323.patch)

Add max cfs segment size to LogMergePolicy and TieredMergePolicy

Key: LUCENE-4323
URL: https://issues.apache.org/jira/browse/LUCENE-4323
Project: Lucene - Core
Issue Type: Improvement
Components: core/index
Affects Versions: 4.0-BETA
Reporter: Alexey Lef
Priority: Minor
Attachments: LUCENE-4323.patch

Our application is managing thousands of indexes ranging from a few KB to a
few GB in size. To keep the number of files under control and at the same
time avoid the overhead of compound file format for large segments, we would
like to keep only small segments as CFS. The meaning of small here is in
absolute byte size terms, not as a percentage of the overall index. It is ok
and in fact desirable to have the entire index as CFS as long as it is below
the threshold.
The attached patch adds a new configuration option maxCFSSegmentSize which
sets the absolute limit on the compound file segment size, in addition to the
existing noCFSRatio, i.e. the lesser of the two will be used. The default is
to allow any size (Long.MAX_VALUE) so that the default behavior is exactly as
it was before.
The patch is for the trunk as of Aug 23, 2012.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-Tests-4.x-java7 - Build # 334 - Failure

2012-08-23 Thread Dawid Weiss

This has to be a JVM bug, so far I can tell it always occurs with G1GC
but not much else.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4322) Can we make oal.util.packed.BulkOperation* smaller?

2012-08-23 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440282#comment-13440282
 ] 

Dawid Weiss commented on LUCENE-4322:
-

It shouldn't make anything slower, really. There are several reasons -- loop 
unrolling at jit time is one, but then there are also jit codegen limits (too 
big methods won't even jit, ever), cpu cache considerations (jitted code will 
be larger than a loop), etc.

 Can we make oal.util.packed.BulkOperation* smaller?
 ---

 Key: LUCENE-4322
 URL: https://issues.apache.org/jira/browse/LUCENE-4322
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.0


 These source files add up to a lot of sources ... it caused problems when 
 compiling under Maven and InteliJ.
 I committed a change to make separates files, but in aggregate this is still 
 a lot ...
 EG maybe we don't need to specialize encode?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3288) audit tutorial before 4.0 release

2012-08-23 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440288#comment-13440288
 ] 

Erik Hatcher commented on SOLR-3288:


Fix README example reference to configuration.  It's now under collection1/

 audit tutorial before 4.0 release
 -

 Key: SOLR-3288
 URL: https://issues.apache.org/jira/browse/SOLR-3288
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Blocker
 Fix For: 4.0


 Prior to the 4.0 release, audit the tutorial and verify...
 * command line output looks reasonable
 * analysis examples/discussion matches field types used
 * links to admin UI are correct for new UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4322) Can we make oal.util.packed.BulkOperation* smaller?


[ 
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440290#comment-13440290
 ] 

Robert Muir commented on LUCENE-4322:
-

{quote}
Actually the size increase wasn't as bad as I thought ... Lucene core JAR is 
2.3 MB in 4.0 Beta and now it's 2.7 MB.
{quote}

this is surprising, it used to be like 1MB. Its scary to me its 2.7MB even 
though we pulled out large generated code like queryparser and 
standardtokenizer. I think we need to investigate what happened here.

 Can we make oal.util.packed.BulkOperation* smaller?
 ---

 Key: LUCENE-4322
 URL: https://issues.apache.org/jira/browse/LUCENE-4322
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.0


 These source files add up to a lot of sources ... it caused problems when 
 compiling under Maven and InteliJ.
 I committed a change to make separates files, but in aggregate this is still 
 a lot ...
 EG maybe we don't need to specialize encode?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4322) Can we make oal.util.packed.BulkOperation* smaller?


[ 
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440293#comment-13440293
 ] 

Michael McCandless commented on LUCENE-4322:


bq.  I think we need to investigate what happened here.

+1

In 3.6.1 it's 1.5M.  For the longest time we were under 1M!  That was 
impressive :)

 Can we make oal.util.packed.BulkOperation* smaller?
 ---

 Key: LUCENE-4322
 URL: https://issues.apache.org/jira/browse/LUCENE-4322
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.0


 These source files add up to a lot of sources ... it caused problems when 
 compiling under Maven and InteliJ.
 I committed a change to make separates files, but in aggregate this is still 
 a lot ...
 EG maybe we don't need to specialize encode?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4322) Can we make oal.util.packed.BulkOperation* smaller?


[ 
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440297#comment-13440297
 ] 

Robert Muir commented on LUCENE-4322:
-

{quote}
In 3.6.1 it's 1.5M.
{quote}

A lot of that is because it supports several grammars of 
StandardTokenizer/UAX29URLEMailTokenizer/ClassicTokenizer and still has 
queryparser.

All of this was removed in core. 4.0 should be 1MB.

If we have megabytes of generated specialized code, we should remove all of 
this and replace it with a simple loop. Then each optimization should be 
re-introduced one by one based on its space/time tradeoff.

We certainly dont need optimizations for bits per value  anything like 4 or 5 
I think. 32 is outlandish, just use an int[]

 Can we make oal.util.packed.BulkOperation* smaller?
 ---

 Key: LUCENE-4322
 URL: https://issues.apache.org/jira/browse/LUCENE-4322
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.0


 These source files add up to a lot of sources ... it caused problems when 
 compiling under Maven and InteliJ.
 I committed a change to make separates files, but in aggregate this is still 
 a lot ...
 EG maybe we don't need to specialize encode?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy

[
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440306#comment-13440306
]

Uwe Schindler commented on LUCENE-4323:
---

Thanks. What think the others committers about this?

Add max cfs segment size to LogMergePolicy and TieredMergePolicy

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy

[
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler reassigned LUCENE-4323:
-

Assignee: Uwe Schindler

Add max cfs segment size to LogMergePolicy and TieredMergePolicy

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4324) extend checkJavaDocs.py to methods,constants,fields

Robert Muir created LUCENE-4324:
---

 Summary: extend checkJavaDocs.py to methods,constants,fields
 Key: LUCENE-4324
 URL: https://issues.apache.org/jira/browse/LUCENE-4324
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Robert Muir


We have a large amount of classes in the source code, its nice that we have 
checkJavaDocs.py to ensure packages and classes have some human-level 
description.

But I think we need it for methods etc too. (it is also part of our 
contribution/style guidelines: 
http://wiki.apache.org/lucene-java/HowToContribute#Making_Changes)

The reason is that like classes and packages, once we can enforce this in the 
build, people will quickly add forgotten documentation soon after their commit 
when its fresh in their mind.

Otherwise, its likely to never happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4322) Can we make oal.util.packed.BulkOperation* smaller?

2012-08-23 Thread Adrien Grand (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440335#comment-13440335
]

Adrien Grand commented on LUCENE-4322:
--

bq. All of this was removed in core. 4.0 should be 1MB.

Even when removing the whole oal.util.packed package, the JAR size is still
2.1MB.

bq. We certainly dont need optimizations for bits per value anything like 4
or 5 I think. 32 is outlandish, just use an int[].

These classes are not only used to store large int arrays in memory but also to
perform encoding/decoding of short sequences, such as in BlockPF. If we want
BlockPF to remain fast, 5 is probably too low. Mike tested BlockPF with an
unspecialized decoder and it showed a great performance loss :
https://issues.apache.org/jira/browse/LUCENE-3892?focusedCommentId=13431491page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13431491

Can we make oal.util.packed.BulkOperation* smaller?
---

Key: LUCENE-4322
URL: https://issues.apache.org/jira/browse/LUCENE-4322
Project: Lucene - Core
Issue Type: Bug
Reporter: Michael McCandless
Fix For: 5.0, 4.0

These source files add up to a lot of sources ... it caused problems when
compiling under Maven and InteliJ.
I committed a change to make separates files, but in aggregate this is still
a lot ...
EG maybe we don't need to specialize encode?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-08-23 Thread alexey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440345#comment-13440345
 ] 

alexey commented on LUCENE-2899:


Yes, please, it would be awesome if someone could make this last effort and 
commit this issue. Many thanks!

 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899.patch, LUCENE-2899.patch, opennlp_trunk.patch


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4322) Can we make oal.util.packed.BulkOperation* smaller?

[
https://issues.apache.org/jira/browse/LUCENE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440346#comment-13440346
]

Robert Muir commented on LUCENE-4322:
-

{quote}
Even when removing the whole oal.util.packed package, the JAR size is still
2.1MB.
{quote}

Right, I don't mean to complain about the packed package or single it out
(though I have concerns about the massive specialization),
I was pointing out the larger issue of bloat. There are definitely other
problems too.

{quote}
These classes are not only used to store large int arrays in memory but also to
perform encoding/decoding of short sequences, such as in BlockPF. If we want
BlockPF to remain fast, 5 is probably too low. Mike tested BlockPF with an
unspecialized decoder and it showed a great performance loss :
https://issues.apache.org/jira/browse/LUCENE-3892?focusedCommentId=13431491page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13431491
{quote}

But I don't think an unspecialized decoder is necessarily fair. I think we
could optimize the low bpv that we would find in freqs/positions and then have
a unspecialized fallback or whatever.

I have concerns that specializing every bpv just means that nothing is even
getting JITd and actually makes things worse.

Can we make oal.util.packed.BulkOperation* smaller?
---

Key: LUCENE-4322
URL: https://issues.apache.org/jira/browse/LUCENE-4322
Project: Lucene - Core
Issue Type: Bug
Reporter: Michael McCandless
Fix For: 5.0, 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

CHANGES.txt/MIGRATE.txt issue: Evolution of FieldSelector

2012-08-23 Thread Jack Krupansky

Lucene 3.6/3.6.1 has a FieldSelector class that no longer exists in 4.0, but 
there is no mention of that fact in either CHANGES.txt or MIGRATE.txt, nor 
advice for how to migrate the use of that class.


I read through LUCENE-3309 and its patch, but I don't see any of the 
references classes in 4.0.


See:
https://issues.apache.org/jira/browse/LUCENE-3309

So, what's the story?

I'm not personally using it, but I did see a recent list message suggesting 
that someone use it with 3.6, so I was wondering how that would translate to 
4.0


Thanks.

-- Jack Krupansky 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PyLucene 3.6 build on MacOS and PyLucene website

2012-08-23 Thread Robert Muir

On Thu, Aug 23, 2012 at 3:05 AM, Thomas Koch k...@orbiteam.de wrote:


 I want to have a look at the process and see how I can help, but can't
 promise anything right now. Won't have time until next week to give it a
 try, but will provide feedback then anyway.


OK, thanks. I think i addressed most of your concerns yesterday. I
tried to fix the formatting so various code samples and instructions
etc look correct.
When reviewing it would be good to look for any crazy formatting or
cases where __init__() shows up as init() because _'s are treated as
markdown characters (i tried to find all these and escape them).
Also to check if the formatting is appropriate (e.g. italics versus
bold or whatever), and no huge headings or anything like that.
I'm sure there are some little problems but imo its looking a lot more readable.

-- 
lucidworks.com

[jira] [Updated] (SOLR-3304) Add Solr support for the new Lucene spatial module

2012-08-23 Thread Andy Fowler (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Fowler updated SOLR-3304:
--

Attachment: SOLR-3304-strategy-getter-fixed.patch

Attached is one more update to David Smiley's patch which resolves the NPE I 
was getting when trying to query on a geo field, before a document had been 
added (i.e. after restarting solr with an already-created index).

Instead of assuming that the spacialStrategy had been instantiated during 
CreateFields, the same logic is used at query-time.

It applies cleanly to branch_4x and all tests pass for me. Thanks for your work 
on this, David!

 Add Solr support for the new Lucene spatial module
 --

 Key: SOLR-3304
 URL: https://issues.apache.org/jira/browse/SOLR-3304
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0-ALPHA
Reporter: Bill Bell
Assignee: David Smiley
  Labels: spatial
 Attachments: SOLR-3304_Solr_fields_for_Lucene_spatial_module 
 (fieldName in Strategy) - indexableFields.patch, 
 SOLR-3304_Solr_fields_for_Lucene_spatial_module (fieldName in 
 Strategy).patch, SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, 
 SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, 
 SOLR-3304-strategy-getter-fixed.patch


 Get the Solr spatial module integrated with the lucene spatial module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy


[ 
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440370#comment-13440370
 ] 

Michael McCandless commented on LUCENE-4323:


+1

 Add max cfs segment size to LogMergePolicy and TieredMergePolicy
 

 Key: LUCENE-4323
 URL: https://issues.apache.org/jira/browse/LUCENE-4323
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0-BETA
Reporter: Alexey Lef
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-4323.patch


 Our application is managing thousands of indexes ranging from a few KB to a 
 few GB in size. To keep the number of files under control and at the same 
 time avoid the overhead of compound file format for large segments, we would 
 like to keep only small segments as CFS. The meaning of small here is in 
 absolute byte size terms, not as a percentage of the overall index. It is ok 
 and in fact desirable to have the entire index as CFS as long as it is below 
 the threshold.
 The attached patch adds a new configuration option maxCFSSegmentSize which 
 sets the absolute limit on the compound file segment size, in addition to the 
 existing noCFSRatio, i.e. the lesser of the two will be used. The default is 
 to allow any size (Long.MAX_VALUE) so that the default behavior is exactly as 
 it was before.
 The patch is for the trunk as of Aug 23, 2012.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4324) extend checkJavaDocs.py to methods,constants,fields


[ 
https://issues.apache.org/jira/browse/LUCENE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440371#comment-13440371
 ] 

Michael McCandless commented on LUCENE-4324:


+1

 extend checkJavaDocs.py to methods,constants,fields
 ---

 Key: LUCENE-4324
 URL: https://issues.apache.org/jira/browse/LUCENE-4324
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Robert Muir

 We have a large amount of classes in the source code, its nice that we have 
 checkJavaDocs.py to ensure packages and classes have some human-level 
 description.
 But I think we need it for methods etc too. (it is also part of our 
 contribution/style guidelines: 
 http://wiki.apache.org/lucene-java/HowToContribute#Making_Changes)
 The reason is that like classes and packages, once we can enforce this in the 
 build, people will quickly add forgotten documentation soon after their 
 commit when its fresh in their mind.
 Otherwise, its likely to never happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4324) extend checkJavaDocs.py to methods,constants,fields


[ 
https://issues.apache.org/jira/browse/LUCENE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440375#comment-13440375
 ] 

Uwe Schindler commented on LUCENE-4324:
---

But inheriting docs from abstract base is allowed?

 extend checkJavaDocs.py to methods,constants,fields
 ---

 Key: LUCENE-4324
 URL: https://issues.apache.org/jira/browse/LUCENE-4324
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Robert Muir

 We have a large amount of classes in the source code, its nice that we have 
 checkJavaDocs.py to ensure packages and classes have some human-level 
 description.
 But I think we need it for methods etc too. (it is also part of our 
 contribution/style guidelines: 
 http://wiki.apache.org/lucene-java/HowToContribute#Making_Changes)
 The reason is that like classes and packages, once we can enforce this in the 
 build, people will quickly add forgotten documentation soon after their 
 commit when its fresh in their mind.
 Otherwise, its likely to never happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4324) extend checkJavaDocs.py to methods,constants,fields

[
https://issues.apache.org/jira/browse/LUCENE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440376#comment-13440376
]

Robert Muir commented on LUCENE-4324:
-

These parsers parse the actual .html files. So if its unchanged it wouldnt be
in that table of methods.

They basically look at the table of methods for empty descriptions.

extend checkJavaDocs.py to methods,constants,fields
---

Key: LUCENE-4324
URL: https://issues.apache.org/jira/browse/LUCENE-4324
Project: Lucene - Core
Issue Type: New Feature
Components: general/build
Reporter: Robert Muir

We have a large amount of classes in the source code, its nice that we have
checkJavaDocs.py to ensure packages and classes have some human-level
description.
But I think we need it for methods etc too. (it is also part of our
contribution/style guidelines:
http://wiki.apache.org/lucene-java/HowToContribute#Making_Changes)
The reason is that like classes and packages, once we can enforce this in the
build, people will quickly add forgotten documentation soon after their
commit when its fresh in their mind.
Otherwise, its likely to never happen.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr 4.0 Beta Documentation issues: Is it mandatory with 4.0 to run at least one core? /example/solr/README.txt needs updating

2012-08-23 Thread Tom Burton-West

Hello,

The CoreAdmin wiki page (http://wiki.apache.org/solr/CoreAdmin) implies
that setting up at least one core is not mandatory and neither is
solr.xml.   However when trying to migrate from 3.6 to 4.0 beta, I got a
message in the admin console: There are no SolrCores running — for the
current functionality we require at least one SolrCore, sorry :)

Here are a few questions that probably need to be cleared up in the
documentation.
1) Is running at least one core required or is the message above referring
to some admin console functionality that wont work without at least one
core?   If running at least one core is required, perhaps this needs also
to go in the Release notes/Changes.

2) The README.txt file in example/solr/README.txt  needs revision.  Is
example/solr  the Solr Home directory?  If so what is the relationship
between Solr Home and subdirectories for different cores?   Do only lib
files go in /example/solr/lib  or in example/solr/collection1/lib
Which files/directories are shared by cores and which need to be in
separate core directories?

I'll be happy to add these as a comment to Solr 3288 if that is
appropriate.  Please let me know.

Thanks for all your work on Solr 4!

Tom

[jira] [Commented] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy

2012-08-23 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440408#comment-13440408
 ] 

Steven Rowe commented on LUCENE-4323:
-

+1

(Too bad the useCompoundFile() and the related configuration getterssetters 
have to be implemented in both LogMergePolicy and TieredMergePolicy, since 
MergePolicy, their shared superclass, doesn't have a concrete implementation.)

 Add max cfs segment size to LogMergePolicy and TieredMergePolicy
 

 Key: LUCENE-4323
 URL: https://issues.apache.org/jira/browse/LUCENE-4323
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0-BETA
Reporter: Alexey Lef
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-4323.patch


 Our application is managing thousands of indexes ranging from a few KB to a 
 few GB in size. To keep the number of files under control and at the same 
 time avoid the overhead of compound file format for large segments, we would 
 like to keep only small segments as CFS. The meaning of small here is in 
 absolute byte size terms, not as a percentage of the overall index. It is ok 
 and in fact desirable to have the entire index as CFS as long as it is below 
 the threshold.
 The attached patch adds a new configuration option maxCFSSegmentSize which 
 sets the absolute limit on the compound file segment size, in addition to the 
 existing noCFSRatio, i.e. the lesser of the two will be used. The default is 
 to allow any size (Long.MAX_VALUE) so that the default behavior is exactly as 
 it was before.
 The patch is for the trunk as of Aug 23, 2012.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy

[
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-4323:
--

Attachment: LUCENE-4323.patch

Updated patch:
The setter had a documentation and overflow problem. I also fixed the other
setters in Tiered to behave correctly when Double.POSITIVE_INFINITY is passed.
Finally I added tests for the setters.

I will commit this later!

Add max cfs segment size to LogMergePolicy and TieredMergePolicy

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PyLucene 3.6 build on MacOS and PyLucene website

2012-08-23 Thread Robert Muir

On Thu, Aug 23, 2012 at 12:51 PM, Andi Vajda va...@apache.org wrote:

 I can now retract my 'crying shame' comment on the site. A bunch of work is
 still needed but there is enough critical mass there to get the formatting
 correct proceeding by example. Before it was a oh, where do I even start,
 kind of mess


A few things i noticed (in general I didnt want to try to change
content or anything like that but just fix formatting and links so
things are readable and work)

* does pylucene have a logo?

it currently uses the lucene logo. on the other hand, if we want to
keep the current logo we could at least apply the shadow'd apache
feather to make it nicer: compare the lucene logo at
http://lucene.apache.org/ with http://lucene.apache.org/pylucene

* does pylucene want the slides functionality that is on
http://lucene.apache.org/core/ and http://lucene.apache.org/solr ? if
so we just need a list of some 'text snippets/slogans' to rotate
through, could probably reuse the lucene core images.

* should the pylucene download button appear under jcc/ pages? I
wasn't sure if it should, but thats easy to fix.

* pylucene and jcc Features pages are really Documentation, this
is a little confusing. I think the link should be renamed
Documentation and maybe have a separate features list like
http://lucene.apache.org/core/features.html ?

* the application of italics/bold is somewhat arbitrary: I did this
because previously these had code.../code which causes a big blue
blockquoted section with line breaks, useful really for code examples
but not for reference to functions or class names. If there is some
defined scheme (java class name monospace, functions italics, command
lines and arguments bold, or whatever), it could probably be more
consistent.

* maybe in addition to the download 'button' there should also be a
simple page with the information about how to retrieve older versions
from the apache archives etc. (similar to
http://lucene.apache.org/core/downloads.html)


-- 
lucidworks.com

[jira] [Commented] (LUCENE-4309) Not obvious how to find mailing lists


[ 
https://issues.apache.org/jira/browse/LUCENE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440478#comment-13440478
 ] 

Robert Muir commented on LUCENE-4309:
-

FYI: i pushed the current patch as a start, and moved License under the About 
sections.


 Not obvious how to find mailing lists
 -

 Key: LUCENE-4309
 URL: https://issues.apache.org/jira/browse/LUCENE-4309
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/website
Reporter: Sebb
 Attachments: LUCENE-4309.patch


 The website hides mailing lists under the heading Discussion, which is not 
 at all obvious.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-765) Index package level javadocs needs content


[ 
https://issues.apache.org/jira/browse/LUCENE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440484#comment-13440484
 ] 

Robert Muir commented on LUCENE-765:


I added a start to this for 4.0 with some basic explanations of postings APIs 
and index statistics.

It would be great to add more. 

 Index package level javadocs needs content
 --

 Key: LUCENE-765
 URL: https://issues.apache.org/jira/browse/LUCENE-765
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/javadocs
Reporter: Grant Ingersoll
Priority: Minor
  Labels: newdev

 The org.apache.lucene.index package level javadocs are sorely lacking.  They 
 should be updated to give a summary of the important classes, how indexing 
 works, etc.  Maybe give an overview of how the different writers coordinate.  
 Links to file formats, information on the posting algorithm, etc. would be 
 helpful.
 See the search package javadocs as a sample of the kind of info that could go 
 here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3753) Core admin and solr.xml documentation for 4.0 needs to be updated for 4.0 changes

Tom Burton-West created SOLR-3753:
-

 Summary: Core admin and solr.xml documentation for 4.0 needs to be 
updated for 4.0 changes
 Key: SOLR-3753
 URL: https://issues.apache.org/jira/browse/SOLR-3753
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Tom Burton-West


The existing documentation on Solr Cores needs to be updated to reflect changes 
in Solr 4.0

If having at least one solr core declared is mandatory for Solr 4.0, that needs 
to be stated in the release notes, in the example solr.xml file, and on the 
wiki page for CoreAdmin. http://wiki.apache.org/solr/CoreAdmin.

In the absence of a solr.xml file, current 4.0 behavior is to use defaults 
declared in  CoreContainer.java.  This needs to be documented; probably in 
solr.xml and/or on the CoreAdmin page.  (See line 94 of CoreAdmin.java where 
the default name collection1 is declared.  Without this documentation, users 
can get confused about where the collection1 core name is coming from. (I'm 
one of them).

The solr.xml file states that paths are relative to the installation 
directory This needs to be clarified.  In addition it appears that currently 
relative paths specified using . or .. are interpreted as string literals.  
If that is not a bug, than this behavior needs to be documented.  If it is a 
bug, please let me know and I'll open another issue.
 
The  example/solr/README.txt  Needs to clarify which files need to be in Solr 
Home and which files are mandatory or optional in the directories containing 
configuration files (and data files) for Solr cores.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3755) Add examples to javadocs of Analyzer (4.0)/ReusableAnalyzerBase(3.6)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3755.
-

   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0
   5.0

 Add examples to javadocs of Analyzer (4.0)/ReusableAnalyzerBase(3.6)
 

 Key: LUCENE-3755
 URL: https://issues.apache.org/jira/browse/LUCENE-3755
 Project: Lucene - Core
  Issue Type: Task
  Components: general/javadocs
Reporter: Robert Muir
  Labels: newdev
 Fix For: 5.0, 4.0


 This stuff is great, it makes it easy to define analyzers:
 {code}
 Analyzer analyzer = new Analyzer() {
 @Override
 protected TokenStreamComponents createComponents(String fieldName, Reader 
 reader) {
   Tokenizer source = new FooTokenizer(reader);
   TokenStream filter = new FooFilter(source);
   filter = new BarFilter(filter);
   return new TokenStreamComponents(source, filter);
 }
   };
 {code}
 But, we should add basic examples to the javadocs I think (we can backport to 
 ReusableAnalyzerBase).
 Also it would be nice to throw in an example that adds a CharFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3721) Multiple concurrent recoveries of same shard?


[ 
https://issues.apache.org/jira/browse/SOLR-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440531#comment-13440531
 ] 

Mark Miller commented on SOLR-3721:
---

I've been reviewing this code, and so far there is only way I can see that 
would seem to allow multiple recoveries at once - if there is an interrupt when 
cancelRecovery is doing a join, it seems another recovery could be started and 
briefly overlap. Since we already call close on the recovery thread, the 
overlap would be brief at best. An interrupt like this should be somewhat rare 
- but interrupts do happen on jetty shutdown. I'm going to guess this is not 
what you were seeing, but I'll plug it.

 Multiple concurrent recoveries of same shard?
 -

 Key: SOLR-3721
 URL: https://issues.apache.org/jira/browse/SOLR-3721
 Project: Solr
  Issue Type: Bug
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Using our own Solr release based on Apache revision 
 1355667 from 4.x branch. Our changes to the Solr version is our solutions to 
 TLT-3178 etc., and should have no effect on this issue.
Reporter: Per Steffensen
  Labels: concurrency, multicore, recovery, solrcloud
 Fix For: 4.0

 Attachments: recovery_in_progress.png, recovery_start_finish.log


 We run a performance/endurance test on a 7 Solr instance SolrCloud setup and 
 eventually Solrs lose ZK connections and go into recovery. BTW the recovery 
 often does not ever succeed, but we are looking into that. While doing that I 
 noticed that, according to logs, multiple recoveries are in progress at the 
 same time for the same shard. That cannot be intended and I can certainly 
 imagine that it will cause some problems.
 It is just the logs that are wrong, did I make some mistake, or is this a 
 real bug?
 See attached grep from log, grepping only on Finished recovery and 
 Starting recovery logs.
 {code}
 grep -B 1 Finished recovery\|Starting recovery solr9.log solr8.log 
 solr7.log solr6.log solr5.log solr4.log solr3.log solr2.log solr1.log 
 solr0.log  recovery_start_finish.log
 {code}
 It can be hard to get an overview of the log, but I have generated a graph 
 showing (based alone on Started recovery and Finished recovery logs) how 
 many recoveries are in progress at any time for the different shards. See 
 attached recovery_in_progress.png. The graph is also a little hard to get an 
 overview of (due to the many shards) but it is clear that for several shards 
 there are multiple recoveries going on at the same time, and that several 
 recoveries never succeed.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3754) Dataimport fails with java.lang.NoSuchMethodError

2012-08-23 Thread Magnar Martinsen (JIRA)

Magnar Martinsen created SOLR-3754:
--

 Summary: Dataimport fails with java.lang.NoSuchMethodError
 Key: SOLR-3754
 URL: https://issues.apache.org/jira/browse/SOLR-3754
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0-BETA
 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago)
jboss-as-7.1.1.Final
Java(TM) SE Runtime Environment (build 1.6.0_33-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)

Reporter: Magnar Martinsen
Priority: Critical
 Fix For: 4.0-ALPHA


While running dataimport with JdbcDataSoure and SQLEntityProcessor
This worked with Solr-4.0.0-ALPHA. Bug apparently introduced in 4.0.0-BETA. 
Here is the exception from the full-import command:

21:00:28,982 ERROR [org.apache.solr.handler.dataimport.JdbcDataSource] 
(Thread-70) Ignoring Error when closing connection: java.sql.SQLException: 
Streaming result set com.mysql.jdbc.RowDataDynamic@30163b85 is still active. No 
statements may be issued when any streaming result sets are open and in use on 
a given connection. Ensure that you have called .close() on any active 
streaming result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
at 
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2747)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1911)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618)
at 
com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4833)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4719)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4328)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1556)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
at 
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)

21:00:28,991 ERROR [org.apache.solr.handler.dataimport.DataImporter] 
(Thread-70) Full Import failed:java.lang.RuntimeException: 
java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.NoSuchMethodError: 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.NoSuchMethodError: 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.NoSuchMethodError: 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
... 5 more
Caused by: java.lang.NoSuchMethodError: 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:75)
at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:63)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (LUCENE-4325) review experimental classes

Robert Muir created LUCENE-4325:
---

 Summary: review experimental classes
 Key: LUCENE-4325
 URL: https://issues.apache.org/jira/browse/LUCENE-4325
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


This can frighten people away from APIs (maybe as much as @deprecated).

We should be careful what we tag as experimental: things like Collector which 
have been around for an aweful long time imo should not be, its ok if they are 
expertish, if we need to make a change we take that on a case by case basis.

We also don't need this tag on pkg-private things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4325) review experimental classes


 [ 
https://issues.apache.org/jira/browse/LUCENE-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4325:


Attachment: LUCENE-4325_start.patch

Start to a patch, i went thru analyzers and core.

 review experimental classes
 ---

 Key: LUCENE-4325
 URL: https://issues.apache.org/jira/browse/LUCENE-4325
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4325_start.patch


 This can frighten people away from APIs (maybe as much as @deprecated).
 We should be careful what we tag as experimental: things like Collector which 
 have been around for an aweful long time imo should not be, its ok if they 
 are expertish, if we need to make a change we take that on a case by case 
 basis.
 We also don't need this tag on pkg-private things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


 [ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Burton-West updated SOLR-3589:
--

Affects Version/s: 4.0-BETA

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West

 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


[ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440583#comment-13440583
 ] 

Tom Burton-West commented on SOLR-3589:
---

Just repeated tests in Solr 4.0Beta and the bug behaves the same.

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West

 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3754) Dataimport fails with java.lang.NoSuchMethodError


[ 
https://issues.apache.org/jira/browse/SOLR-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440661#comment-13440661
 ] 

Uwe Schindler commented on SOLR-3754:
-

Hi,
This error cannot happen, as compilation of Solr and Lucene ensures that the 
method signatures match. The obvious problem is that you have a mismatch of 
different Solr/Lucene WAR and JAR files in your installation. The signature of 
HtmlStripCharFilter changed between alpha and beta. Please check your 
installation, that all JAR and WAR files match the official distribution.
It looks like you are running Solr 4 alpha with DataImport handler from Solr 4 
beta (based on signature analysis from error message).

 Dataimport fails with java.lang.NoSuchMethodError
 -

 Key: SOLR-3754
 URL: https://issues.apache.org/jira/browse/SOLR-3754
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0-BETA
 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago)
 jboss-as-7.1.1.Final
 Java(TM) SE Runtime Environment (build 1.6.0_33-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)
Reporter: Magnar Martinsen
Priority: Critical
 Fix For: 4.0-ALPHA


 While running dataimport with JdbcDataSoure and SQLEntityProcessor
 This worked with Solr-4.0.0-ALPHA. Bug apparently introduced in 4.0.0-BETA. 
 Here is the exception from the full-import command:
 21:00:28,982 ERROR [org.apache.solr.handler.dataimport.JdbcDataSource] 
 (Thread-70) Ignoring Error when closing connection: java.sql.SQLException: 
 Streaming result set com.mysql.jdbc.RowDataDynamic@30163b85 is still active. 
 No statements may be issued when any streaming result sets are open and in 
 use on a given connection. Ensure that you have called .close() on any active 
 streaming result sets before attempting more queries.
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
 at 
 com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2747)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1911)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618)
 at 
 com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4833)
 at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4719)
 at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4328)
 at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1556)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 21:00:28,991 ERROR [org.apache.solr.handler.dataimport.DataImporter] 
 (Thread-70) Full Import failed:java.lang.RuntimeException: 
 java.lang.RuntimeException: 
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 Caused by: java.lang.RuntimeException: 
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at

[jira] [Closed] (SOLR-3754) Dataimport fails with java.lang.NoSuchMethodError


 [ 
https://issues.apache.org/jira/browse/SOLR-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed SOLR-3754.
---

Resolution: Invalid

 Dataimport fails with java.lang.NoSuchMethodError
 -

 Key: SOLR-3754
 URL: https://issues.apache.org/jira/browse/SOLR-3754
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0-BETA
 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago)
 jboss-as-7.1.1.Final
 Java(TM) SE Runtime Environment (build 1.6.0_33-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)
Reporter: Magnar Martinsen
Priority: Critical
 Fix For: 4.0-ALPHA


 While running dataimport with JdbcDataSoure and SQLEntityProcessor
 This worked with Solr-4.0.0-ALPHA. Bug apparently introduced in 4.0.0-BETA. 
 Here is the exception from the full-import command:
 21:00:28,982 ERROR [org.apache.solr.handler.dataimport.JdbcDataSource] 
 (Thread-70) Ignoring Error when closing connection: java.sql.SQLException: 
 Streaming result set com.mysql.jdbc.RowDataDynamic@30163b85 is still active. 
 No statements may be issued when any streaming result sets are open and in 
 use on a given connection. Ensure that you have called .close() on any active 
 streaming result sets before attempting more queries.
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
 at 
 com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2747)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1911)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618)
 at 
 com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4833)
 at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4719)
 at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4328)
 at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1556)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 21:00:28,991 ERROR [org.apache.solr.handler.dataimport.DataImporter] 
 (Thread-70) Full Import failed:java.lang.RuntimeException: 
 java.lang.RuntimeException: 
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 Caused by: java.lang.RuntimeException: 
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
 ... 5 more
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.init(Ljava/io/Reader;)V
 at 
 org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:75)
 at 
 org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:63)

--
This message is automatically generated by JIRA.
If

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


 [ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Burton-West updated SOLR-3589:
--

Attachment: testSolr3589.xml.gz

File is gzipped. Unix line endings. Put document in solr/example/exampledocs.  
Queries listed in file.

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
 Attachments: testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


[ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440669#comment-13440669
 ] 

Tom Burton-West commented on SOLR-3589:
---

I'm not at the point where I understand the test cases for Edismax enough to 
write unit tests. If someone can point me to an example unit test somewhere 
that I could use to model a test please do.
  In the meantime, attached is a file which can be put in the Solr exampledocs 
directory and indexed.  Sample queries demonstrating the problem with English 
hyphenated words and with CJK are included 

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
 Attachments: testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


 [ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Burton-West updated SOLR-3589:
--

Attachment: testSolr3589.xml.gz

See above note

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
 Attachments: testSolr3589.xml.gz, testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3755) shard splitting

2012-08-23 Thread Yonik Seeley (JIRA)

Yonik Seeley created SOLR-3755:
--

 Summary: shard splitting
 Key: SOLR-3755
 URL: https://issues.apache.org/jira/browse/SOLR-3755
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Yonik Seeley


We can currently easily add replicas to handle increases in query volume, but 
we should also add a way to add additional shards dynamically by splitting 
existing shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3755) shard splitting

2012-08-23 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440698#comment-13440698
 ] 

Yonik Seeley commented on SOLR-3755:


We need to associate hash ranges with shards and allow overlapping shards (i.e. 
1-10, 1-5,6-10)

General Strategy for splitting w/ no service interruptions:
 - Bring up 2 new cores on the same node, covering the new hash ranges
 - Both cores should go into recovery mode (i.e. leader should start
forwarding updates)
 - leader does a hard commit and splits the index
 - Smaller indexes are installed on the new cores
 - Overseer should create new replicas for new shards
 - Mark old shard as “retired” – some mechanism to shut it down (after there is 
an acceptable amount of coverage of the new shards via replicas)

Future: allow splitting even with “custom” shards

 shard splitting
 ---

 Key: SOLR-3755
 URL: https://issues.apache.org/jira/browse/SOLR-3755
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Yonik Seeley

 We can currently easily add replicas to handle increases in query volume, but 
 we should also add a way to add additional shards dynamically by splitting 
 existing shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3756) If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election.

Mark Miller created SOLR-3756:
-

 Summary: If we are elected the leader of a shard, but we fail to 
publish this for any reason, we should clean up and re trigger a leader 
election.
 Key: SOLR-3756
 URL: https://issues.apache.org/jira/browse/SOLR-3756
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3756) If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election.


[ 
https://issues.apache.org/jira/browse/SOLR-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440756#comment-13440756
 ] 

Mark Miller commented on SOLR-3756:
---

More defensive than anything - if you cannot publish, that should mean you 
cannot talk to ZooKeeper - which should mean the loss of the ephemeral node and 
a new leader election anyway.

 If we are elected the leader of a shard, but we fail to publish this for any 
 reason, we should clean up and re trigger a leader election.
 -

 Key: SOLR-3756
 URL: https://issues.apache.org/jira/browse/SOLR-3756
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3756) If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election.


 [ 
https://issues.apache.org/jira/browse/SOLR-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3756:
--

Priority: Minor  (was: Major)

 If we are elected the leader of a shard, but we fail to publish this for any 
 reason, we should clean up and re trigger a leader election.
 -

 Key: SOLR-3756
 URL: https://issues.apache.org/jira/browse/SOLR-3756
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4323) Add max cfs segment size to LogMergePolicy and TieredMergePolicy

[
https://issues.apache.org/jira/browse/LUCENE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler resolved LUCENE-4323.
---

Resolution: Fixed
Fix Version/s: 4.0
5.0

Committed trunk revision: 1376766
Committed 4.x revision: 1376767

Thanks Alexey!

Add max cfs segment size to LogMergePolicy and TieredMergePolicy

Attachments: LUCENE-4323.patch, LUCENE-4323.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3288) audit tutorial before 4.0 release

2012-08-23 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440798#comment-13440798
 ] 

Erik Hatcher commented on SOLR-3288:


bq. Fix README example reference to configuration. It's now under collection1/

I didn't see anything in the README that needed fixing, but I did update a 
conf/ reference in tutorial.html.

 audit tutorial before 4.0 release
 -

 Key: SOLR-3288
 URL: https://issues.apache.org/jira/browse/SOLR-3288
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Blocker
 Fix For: 4.0


 Prior to the 4.0 release, audit the tutorial and verify...
 * command line output looks reasonable
 * analysis examples/discussion matches field types used
 * links to admin UI are correct for new UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3425) CloudSolrServer can't create cores when using the zkHost based constructor


 [ 
https://issues.apache.org/jira/browse/SOLR-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3425:
--

Fix Version/s: 5.0

 CloudSolrServer can't create cores when using the zkHost based constructor
 --

 Key: SOLR-3425
 URL: https://issues.apache.org/jira/browse/SOLR-3425
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Tommaso Teofili
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3425-test.patch


 When programmatically creating cores with a running SolrCloud instance the 
 CloudSolrServer uses the slices nodes information to feed the underlying 
 LBHttpSolrServer so it fails to create cores as there aren't any slices for 
 any new collection (as it's still to be created).
 This happens when using the CloudSolrServer constructor which takes the ZK 
 host as only parameter while it can be avoided by using the constructor which 
 also takes the list of Solr URLs and the underlying LBHttpSolrServer is 
 actually used for making the core creation request.
 However it'd be good to use the ZK host live nodes information to 
 automatically issue a core creation command on one of the underlying Solr 
 hosts without having to specify the full list of URLs beforehand.
 The scenario is when one wants to create a collection with N shards so the 
 client sends N core creation requests for the same collection thus the 
 SolrCloud stuff should just take care of choosing the host where to issue the 
 core creation request and update the cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3425) CloudSolrServer can't create cores when using the zkHost based constructor


 [ 
https://issues.apache.org/jira/browse/SOLR-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-3425:
-

Assignee: Mark Miller

 CloudSolrServer can't create cores when using the zkHost based constructor
 --

 Key: SOLR-3425
 URL: https://issues.apache.org/jira/browse/SOLR-3425
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Tommaso Teofili
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3425-test.patch


 When programmatically creating cores with a running SolrCloud instance the 
 CloudSolrServer uses the slices nodes information to feed the underlying 
 LBHttpSolrServer so it fails to create cores as there aren't any slices for 
 any new collection (as it's still to be created).
 This happens when using the CloudSolrServer constructor which takes the ZK 
 host as only parameter while it can be avoided by using the constructor which 
 also takes the list of Solr URLs and the underlying LBHttpSolrServer is 
 actually used for making the core creation request.
 However it'd be good to use the ZK host live nodes information to 
 automatically issue a core creation command on one of the underlying Solr 
 hosts without having to specify the full list of URLs beforehand.
 The scenario is when one wants to create a collection with N shards so the 
 client sends N core creation requests for the same collection thus the 
 SolrCloud stuff should just take care of choosing the host where to issue the 
 core creation request and update the cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3645) /terms should become 4.x distrib compatible or default to distrib=false


 [ 
https://issues.apache.org/jira/browse/SOLR-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3645:
--

Fix Version/s: 5.0
   4.0
 Assignee: Mark Miller

 /terms should become 4.x distrib compatible or default to distrib=false
 ---

 Key: SOLR-3645
 URL: https://issues.apache.org/jira/browse/SOLR-3645
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0-ALPHA
 Environment: SolrCloud, RHEL 5.4
Reporter: Nick Cotton
Assignee: Mark Miller
Priority: Minor
  Labels: feature
 Fix For: 4.0, 5.0


 In a SolrCloud configuration, /terms does not return any terms when issued as 
 follows:
 http://hostname:8983/solr/terms?terms.fl=nameterms=trueterms.limit=-1isSh
 ard=trueterms.sort=indexterms.prefix=s
 but does return reasonable results when distrib is turned off like so
 http://hostname:8983/solr/terms?terms.fl=nameterms=truedistrib=falseterms
 .limit=-1isShard=trueterms.sort=indexterms.prefix=s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3538) Unloading a SolrCore object and specifying delete does not fully delete all Solr parts