date:20120614

[jira] [Commented] (LUCENE-4145) "Unhandled exception" from test framework (in json parsing of test output files?)

2012-06-14 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295490#comment-13295490
 ] 

Dawid Weiss commented on LUCENE-4145:
-

This is weird, I'll look into it.

> "Unhandled exception" from test framework (in json parsing of test output 
> files?)
> -
>
> Key: LUCENE-4145
> URL: https://issues.apache.org/jira/browse/LUCENE-4145
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Dawid Weiss
>
> Working on SOLR-3267 i got a weird exception printed to the junit output...
> {noformat}
>[junit4] Unhandled exception in thread: Thread[pumper-events,5,main]
>[junit4] 
> com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException:
>  No such reference: id#org.apache.solr.search.TestSort[3]
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3546) Add index page to Admin UI

2012-06-14 Thread Lance Norskog (JIRA)

Lance Norskog created SOLR-3546:
---

 Summary: Add index page to Admin UI
 Key: SOLR-3546
 URL: https://issues.apache.org/jira/browse/SOLR-3546
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Lance Norskog
Priority: Minor


It would be great to index a file by uploading it. In designing schemas and 
testing features I often make one or two test documents. It would be great to 
upload these directly from the UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4146) -Dtests.iters combined with -Dtestmethod never fails?

2012-06-14 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295489#comment-13295489
 ] 

Dawid Weiss commented on LUCENE-4146:
-

Also, completing the above answer -- this issue also affects things like 
"re-running" a test from Eclipse and other IDEs. If you run your suite with 
-Dtests.iters=5 you'll get a tree of tests that executed, with their "unique" 
names that include a seed. If you click on a given test and re-run it Eclipse 
will try to filter execution to that particular test (that name) and if the 
seed is random (and not fixed) the chances of such a test occurring again are 
nearly zero, so you'll get an empty result (no executed test).

I've tried a number of workarounds/ hacks but none of them worked well. This is 
really the best of what I've tried.

> -Dtests.iters combined with -Dtestmethod never fails?
> -
>
> Key: LUCENE-4146
> URL: https://issues.apache.org/jira/browse/LUCENE-4146
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: LUCENE-4146.fail.patch, 
> TEST-org.apache.lucene.TestSearch.iters-no-fail.xml, 
> TEST-org.apache.lucene.TestSearch.no-iters-fail.xml
>
>
> a test that is hardcoded to fail will report succes if you run it with 
> -Dtests.iters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4146) -Dtests.iters combined with -Dtestmethod never fails?

2012-06-14 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295485#comment-13295485
 ] 

Dawid Weiss commented on LUCENE-4146:
-

This isn't a bug, Hoss. This is an unfortunate API shortcoming of JUnit that I 
had to accomodate somehow. So what happens is that:

1) no two junit tests can have the same "description" (which in realistic terms 
means no two junit tests can have an identical method name); this confuses the 
hell out of all IDE clients and other clients (like ant, maven, etc.).
2) because of the above (and wanting to have separate tests for repetitions), 
repeated test names are created so that they contain a sequential number and a 
seed (to make then unique).
3) because of the above a method filter no longer works because that exact 
string doesn't match the generated pseudo-method name.

A workaround is to add globs around method name as in:
{noformat}
ant test -Dtests.iters=2 -Dtestcase=TestSearch 
-Dtestmethod=*testFailureBuildfile*
{noformat}

Yeah, I realize this sucks but I have no better ideas for the moment (that 
would work with existing JUnit infrastructure).

> -Dtests.iters combined with -Dtestmethod never fails?
> -
>
> Key: LUCENE-4146
> URL: https://issues.apache.org/jira/browse/LUCENE-4146
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: LUCENE-4146.fail.patch, 
> TEST-org.apache.lucene.TestSearch.iters-no-fail.xml, 
> TEST-org.apache.lucene.TestSearch.no-iters-fail.xml
>
>
> a test that is hardcoded to fail will report succes if you run it with 
> -Dtests.iters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295473#comment-13295473
 ] 

David Smiley commented on SOLR-3534:


Whoops -- that commit (#1350466) was mis-commented to SOLR-3304.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch, 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-3534:
---

Attachment: 
SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch

This is an updated patch. Instead of SolrPluginUtils, I chose QueryParsing 
which already has a similar method for q.op.  And like q.op I had the 2nd arg 
be the string that the caller resolves. Some callers don't have a convenient 
params to provide.  The fact that some don't led me to start doing more 
refactorings to QParser that I decided to withdraw from as to not make this 
issue do too much at once.

I already committed test modifications so that this patch will pass. (I jumped 
the gun perhaps but no matter.)  You should see this change in the subversion 
tab in JIRA.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch, 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3522) "literal" function can not be parsed

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3522:
---

Attachment: SOLR-3522.patch

patch that should fix the problem ... except that the test still fails in a way 
that suggests StringDistanceFunction isn't implementing equals properly (two 
FunctionQueries parsed from identical input aren't asserting equally) so now i 
need to go down that rabbit hole.  (i may just have astupid mistake in the test 
i'm not seeing at the moment)


> "literal" function can not be parsed
> 
>
> Key: SOLR-3522
> URL: https://issues.apache.org/jira/browse/SOLR-3522
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.0, 3.6.1
>
> Attachments: SOLR-3522.patch
>
>
> attempting to use the "literal" function in the fl param causes a parse 
> error...
> Example queries with functions that works fine...
> {noformat}
> http://localhost:8983/solr/collection1/select?q=*:*&fl=foo:sum%284,5%29
> http://localhost:8983/solr/collection1/select?fl=score&q={!func}strdist%28%22foo%22,%22fo%22,edit%29
> {noformat}
> Examples using literal function that fails...
> {noformat}
> http://localhost:8983/solr/collection1/select?q=*:*&fl=foo:literal%28%22foo%22%29
> http://localhost:8983/solr/collection1/select?fl=score&q={!func}strdist%28%22foo%22,literal%28%22fo%22%29,edit%29
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4132:
---

Attachment: LUCENE-4132.patch

Good catch Mike ! It went away in the last changes. I re-added testReuse, with 
asserting that e.g. the MP instances returned from LiveIWC are not the same.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3522) "literal" function can not be parsed

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3522:
---

Fix Version/s: 3.6.1
 Assignee: Hoss Man
  Summary: "literal" function can not be parsed  (was: "literal" 
function can not be parsed in 4x/trunk)

Looking into this, it seems that the literal function is completley broken in 
3.6 as well -- raw literals work, just not the {{literal("foo")}} or 
{{literal($foo})}}.

problem seems to be a simple mistake of calling "fp.getString()" (which is the 
entire input string) instead of using fp.parseArg() ... i'll work on a test & 
fix.

> "literal" function can not be parsed
> 
>
> Key: SOLR-3522
> URL: https://issues.apache.org/jira/browse/SOLR-3522
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.0, 3.6.1
>
>
> attempting to use the "literal" function in the fl param causes a parse 
> error...
> Example queries with functions that works fine...
> {noformat}
> http://localhost:8983/solr/collection1/select?q=*:*&fl=foo:sum%284,5%29
> http://localhost:8983/solr/collection1/select?fl=score&q={!func}strdist%28%22foo%22,%22fo%22,edit%29
> {noformat}
> Examples using literal function that fails...
> {noformat}
> http://localhost:8983/solr/collection1/select?q=*:*&fl=foo:literal%28%22foo%22%29
> http://localhost:8983/solr/collection1/select?fl=score&q={!func}strdist%28%22foo%22,literal%28%22fo%22%29,edit%29
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-06-14 Thread Chris Russell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295426#comment-13295426
 ] 

Chris Russell commented on SOLR-2894:
-

Erik, I can't get your patch to apply cleanly to solr 1350445

$ patch -p0 -i SOLR-2894.patch
patching file 
solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
patching file 
solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
patching file 
solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
patching file 
solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
Hunk #2 FAILED at 103.
1 out of 2 hunks FAILED -- saving rejects to file 
solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
patching file 
solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
Hunk #11 FAILED at 799.
1 out of 17 hunks FAILED -- saving rejects to file 
solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
patching file 
solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java


> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 4.0
>
> Attachments: SOLR-2894.patch, SOLR-2894.patch, 
> distributed_pivot.patch, distributed_pivot.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3267) TestSort failures (reproducible)

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3267.


Resolution: Fixed

I think my thought process when putting the "numberOfOddities" check in that 
test was that we shouldn't fail if a randomly generated string just happened to 
wind up being a valid function, or all control/blank characters, (or "score", 
or "_docid_) ... but that if it happened more then a non-trivial number of 
times, that was odd and should cause a failure so someone would look at the 
test.

Looking at some of hte problematic seeds, i realized that the common situation 
with oddities was:
* randomly generated strings that were all whitespace and/or control characters
* randomly generated strings that were valid quote sequences (which means they 
can be treated as a (literal) function.

so i changed it as follows...

* removed all the "oddity" checking
* added loop in the event that a random string is all whitespace, but made it 
fail hard if 37 attempts all produce strings that are entirely whitespace 
(rather then an "infinite" loop)
* improved the "munging" of the random strings to ensure they aren't valid 
functions (or literal quoted strings)
* made the test fail hard if any string produced parses as a function or query 
instead of a field name.

Committed revision 1350444. - trunk
Committed revision 1350445. - 4x





> TestSort failures (reproducible)
> 
>
> Key: SOLR-3267
> URL: https://issues.apache.org/jira/browse/SOLR-3267
> Project: Solr
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Hoss Man
> Fix For: 4.0
>
>
> {noformat}
> Over 0.2% oddities in test: 14/6386 have func/query parsing semenatics gotten 
> broader?
> {noformat}
> Huh? Steps to reproduce:
> {noformat}
> ant test -Dtestcase=TestSort -Dtestmethod=testRandomFieldNameSorts 
> -Dtests.seed=-3e789c8564f08cbd:515c61b079794ea7:-6347ac0df7ad45c0 
> -Dargs="-Dfile.encoding=UTF-8"
> [junit] Testcase: 
> testRandomFieldNameSorts(org.apache.solr.search.TestSort):FAILED
> [junit] Over 0.2% oddities in test: 14/6386 have func/query parsing 
> semenatics gotten broader?
> [junit] junit.framework.AssertionFailedError: Over 0.2% oddities in test: 
> 14/6386 have func/query parsing semenatics gotten broader?
> [junit] at org.junit.Assert.fail(Assert.java:93)
> [junit] at org.junit.Assert.assertTrue(Assert.java:43)
> [junit] at 
> org.apache.solr.search.TestSort.testRandomFieldNameSorts(TestSort.java:145)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:597)
> [junit] at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> [junit] at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> [junit] at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> [junit] at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> [junit] at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> [junit] at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> [junit] at 
> org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:63)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:739)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:655)
> [junit] at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:566)
> [junit] at 
> org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:628)
> [junit] at org.junit.rules.RunRules.evaluate(RunRules.java:18)
> [junit] at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> [junit] at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> [junit] at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164)
> [junit] at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(Lu

[jira] [Resolved] (SOLR-3542) Highlighter: Integration of LUCENE-4133 (Part of LUCENE-3440)

2012-06-14 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-3542.
--

   Resolution: Fixed
Fix Version/s: 5.0

Committed in trunk and 4x. I also set WeightedFragListBuilder to default in 
example solrconfig.xml.

Many thanks, Sebastian!


> Highlighter: Integration of LUCENE-4133 (Part of LUCENE-3440)
> -
>
> Key: SOLR-3542
> URL: https://issues.apache.org/jira/browse/SOLR-3542
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: 4.0
>Reporter: Sebastian Lutze
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter, highlight, patch
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3542.patch
>
>
> This patch integrates a weight-based approach for sorting highlighted 
> fragments. 
> See LUCENE-4133 (Part of LUCENE-3440). 
> This patch contains: 
> - Introduction of class WeightedFragListBuilder, a implementation of 
> SolrFragListBuilder
> - Updated example-configuration 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4146) -Dtests.iters combined with -Dtestmethod never fails?

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-4146:
-

Attachment: TEST-org.apache.lucene.TestSearch.no-iters-fail.xml
TEST-org.apache.lucene.TestSearch.iters-no-fail.xml
LUCENE-4146.fail.patch

trivial patch adding a test that is guaranteed to fail.

When run simply, it fails as expected...

{noformat}
hossman@bester:~/lucene/4x_dev/lucene/core$ ant test -Dtestcase=TestSearch 
-Dtestmethod=testFailure
Buildfile: /home/hossman/lucene/4x_dev/lucene/core/build.xml
...
test:
   [junit4]  says ¡Hola! Master seed: E9EE2618BEEE855E
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] Suite: org.apache.lucene.TestSearch
   [junit4] FAILURE 0.14s | TestSearch.testFailure
   [junit4]> Throwable #1: java.lang.AssertionError: This statement is false
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([E9EE2618BEEE855E:8153D5F484DEE7F1]:0)
   [junit4]>at org.junit.Assert.fail(Assert.java:93)
   [junit4]>at org.junit.Assert.assertTrue(Assert.java:43)
   [junit4]>at 
org.apache.lucene.TestSearch.testFailure(TestSearch.java:39)
...
   [junit4]> 
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSearch 
-Dtests.method=testFailure -Dtests.seed=E9EE2618BEEE855E -Dtests.locale=es_PA 
-Dtests.timezone=Pacific/Chatham -Dargs="-Dfile.encoding=UTF-8"
   [junit4]   2>
   [junit4]> (@AfterClass output)
   [junit4]   2> NOTE: test params are: codec=Lucene40: {}, 
sim=RandomSimilarityProvider(queryNorm=false,coord=false): {}, locale=es_PA, 
timezone=Pacific/Chatham
   [junit4]   2> NOTE: Linux 2.6.31-23-generic amd64/Sun Microsystems Inc. 
1.6.0_24 (64-bit)/cpus=2,threads=1,free=105287320,total=124125184
   [junit4]   2> NOTE: All tests run in this JVM: [TestSearch]
   [junit4]   2> 
   [junit4] Completed in 0.37s, 1 test, 1 failure <<< FAILURES!
   [junit4]  
   [junit4] JVM J0: 0.53 .. 1.50 = 0.97s
   [junit4] Execution time total: 1.55 sec.
   [junit4] Tests summary: 1 suite, 1 test, 1 failure

BUILD FAILED
/home/hossman/lucene/4x_dev/lucene/common-build.xml:1019: The following error 
occurred while executing this line:
/home/hossman/lucene/4x_dev/lucene/common-build.xml:745: There were test 
failures: 1 suite, 1 test, 1 failure

Total time: 5 seconds
hossman@bester:~/lucene/4x_dev/lucene/core$ cp 
../build/core/test/TEST-org.apache.lucene.TestSearch.xml 
~/tmp/TEST-org.apache.lucene.TestSearch.no-iters-fail.xml
{noformat}

However, when using -Dtests.iters, the test "passes" - but there's obvious 
record that it even ran...

{noformat}
hossman@bester:~/lucene/4x_dev/lucene/core$ ant test -Dtests.iters=2 
-Dtestcase=TestSearch -Dtestmethod=testFailureBuildfile: 
/home/hossman/lucene/4x_dev/lucene/core/build.xml
...
test:
   [junit4]  says cześć. Master seed: 9BA05DE6F296F7C4
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] Suite: org.apache.lucene.TestSearch
   [junit4] Completed in 0.07s, 0 tests
   [junit4]  
   [junit4] JVM J0: 0.73 .. 1.45 = 0.71s
   [junit4] Execution time total: 1.47 sec.
   [junit4] Tests summary: 1 suite, 0 tests
 [echo] 5 slowest tests:
 [tophints]   0.15s | org.apache.lucene.TestSearch

BUILD SUCCESSFUL
Total time: 5 seconds
hossman@bester:~/lucene/4x_dev/lucene/core$ cp 
../build/core/test/TEST-org.apache.lucene.TestSearch.xml 
~/tmp/TEST-org.apache.lucene.TestSearch.iters-no-fail.xml
{noformat}

(note in the XML file that it says no tests were run)

> -Dtests.iters combined with -Dtestmethod never fails?
> -
>
> Key: LUCENE-4146
> URL: https://issues.apache.org/jira/browse/LUCENE-4146
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: LUCENE-4146.fail.patch, 
> TEST-org.apache.lucene.TestSearch.iters-no-fail.xml, 
> TEST-org.apache.lucene.TestSearch.no-iters-fail.xml
>
>
> a test that is hardcoded to fail will report succes if you run it with 
> -Dtests.iters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4146) -Dtests.iters combined with -Dtestmethod never fails?

2012-06-14 Thread Hoss Man (JIRA)

Hoss Man created LUCENE-4146:


 Summary: -Dtests.iters combined with -Dtestmethod never fails?
 Key: LUCENE-4146
 URL: https://issues.apache.org/jira/browse/LUCENE-4146
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man


a test that is hardcoded to fail will report succes if you run it with 
-Dtests.iters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows-Java6-64 - Build # 76 - Failure!

2012-06-14 Thread jenkins

Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java6-64/76/

1 tests failed.
REGRESSION:  org.apache.solr.spelling.suggest.SuggesterTSTTest.testReload

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([847D645062E1B0E6:438D1C53A8A248F4]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:459)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:426)
at 
org.apache.solr.spelling.suggest.SuggesterTest.testReload(SuggesterTest.java:91)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='ac']/int[@name='numFound'][.='2']
xml response was: 

04


request 
was:q=ac&spellcheck.count=2&qt=/suggest_tst&spellcheck.onlyMorePopular=true
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:452)
... 39 more




Build Log:
[...truncated 10016 lines...]
   [junit4]   2>at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.eval

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so
Lucene.Net doesn't have autoCommit.

So I don't have autoCommit set to true, but I can clearly see a segments_1
file there along with the other files. If that helpes, it always keeps with
the name segments_1 with 32 bytes, never changes.

And as again, if I kill the process and try to open the index with Luke
3.3, the index folder is being wiped out.

Not sure what to make of all that.

On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
> make a zero-segment commit.  This was changed/fixed in 3.1 with
> LUCENE-2386.
>
> In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
> defaulting to false, but if you set it to true then IndexWriter will
> periodically commit.
>
> Seeing segment files created and merge is definitely expected, but
> it's not expected to see segments_N files unless you pass
> autoCommit=true.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko 
> wrote:
> > Not what I'm seeing. I actually see a lot of segments created and merged
> > while it operates. Expected?
> >
> > Reminding you, this is 2.9.4 / 3.0.3
> >
> > On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
> >  wrote:
> >>
> >> Right: Lucene never autocommits anymore ...
> >>
> >> If you create a new index, add a bunch of docs, and things crash
> >> before you have a chance to commit, then there is no index (not even a
> >> 0 doc one) in that directory.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko  >
> >> wrote:
> >> > I'm quite certain this shouldn't happen also when Commit wasn't
> called.
> >> >
> >> > Mike, can you comment on that?
> >> >
> >> > On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
> >> >  wrote:
> >> >>
> >> >> Well, the only thing I see is that there is no place where
> >> >> writer.Commit()
> >> >> is called in the delegate assigned to corpusReader.OnDocument.  I
> know
> >> >> that
> >> >> lucene is very transactional, and at least in 3.x, the writer will
> >> >> never
> >> >> auto commit to the index.  You can write millions of documents, but
> if
> >> >> commit is never called, those documents aren't actually part of the
> >> >> index.
> >> >>  Committing isn't a cheap operation, so you definitely don't want to
> do
> >> >> it
> >> >> on every document.
> >> >>
> >> >> You can test it yourself with this (naive) solution.  Right below the
> >> >> writer.SetUseCompoundFile(false) line, add "int numDocsAdded = 0;".
>  At
> >> >> the
> >> >> end of the corpusReader.OnDocument delegate add:
> >> >>
> >> >> // Example only.  I wouldn't suggest committing this often
> >> >> if(++numDocsAdded % 5 == 0)
> >> >> {
> >> >>writer.Commit();
> >> >> }
> >> >>
> >> >> I had the application crash for real on this file:
> >> >>
> >> >>
> >> >>
> http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
> ,
> >> >> about 20% into the operation.  Without the commit, the index is
> empty.
> >> >>  Add
> >> >> it in, and I get 755 files in the index after it crashes.
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Christopher
> >> >>
> >> >> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
> >> >> wrote:
> >> >>
> >> >>
> >> >> > Yes, reproduced in first try. See attached program - I referenced
> it
> >> >> > to
> >> >> > current trunk.
> >> >> >
> >> >> >
> >> >> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
> >> >> > wrote:
> >> >> >
> >> >> >> Christopher,
> >> >> >>
> >> >> >> I used the IndexBuilder app from here
> >> >> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
> >> >> >> with a
> >> >> >> 8.5GB wikipedia dump.
> >> >> >>
> >> >> >> After running for 2.5 days I had to forcefully close it (infinite
> >> >> >> loop
> >> >> >> in
> >> >> >> the wiki-markdown parser at 92%, go figure), and the 40-something
> GB
> >> >> >> index
> >> >> >> I had by then was unusable. I then was able to reproduce this
> >> >> >>
> >> >> >> Please note I now added a few safe-guards you might want to remove
> >> >> >> to
> >> >> >> make sure the app really crashes on process kill.
> >> >> >>
> >> >> >> I'll try to come up with a better way to reproduce this -
> hopefully
> >> >> >> Mike
> >> >> >> will be able to suggest better ways than manual process kill...
> >> >> >>
> >> >> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
> >> >> >> currens.ch...@gmail.com> wrote:
> >> >> >>
> >> >> >>> Mike, The codebase for lucene.net should be almost identical to
> >> >> >>> java's
> >> >> >>> 3.0.3 release, and LUCENE-1044 is included in that.
> >> >> >>>
> >> >> >>> Itamar, are you committing the index regularly?  I only ask
> because
> >> >> >>> I
> >> >> >>> can't
> >> >> >>> reproduce it myself by forcibly terminating the process while
> it

[jira] [Updated] (LUCENE-4145) "Unhandled exception" from test framework (in json parsing of test output files?)

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-4145:
-

Summary: "Unhandled exception" from test framework (in json parsing of test 
output files?)  (was: "Unhandled exception" from test framework?)

FWIW: i can reproduce this fairly trivially ... let me now if you want me to 
capture anything in particular.

> "Unhandled exception" from test framework (in json parsing of test output 
> files?)
> -
>
> Key: LUCENE-4145
> URL: https://issues.apache.org/jira/browse/LUCENE-4145
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Dawid Weiss
>
> Working on SOLR-3267 i got a weird exception printed to the junit output...
> {noformat}
>[junit4] Unhandled exception in thread: Thread[pumper-events,5,main]
>[junit4] 
> com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException:
>  No such reference: id#org.apache.solr.search.TestSort[3]
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4145) "Unhandled exception" from test framework?

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295404#comment-13295404
 ] 

Hoss Man commented on LUCENE-4145:
--

Execution was...

{noformat}
hossman@bester:~/lucene/dev/solr/core$ ant test -Dtests.iters=10 
-Dtestcase=TestSort -Dtestmethod=testRandomFieldNameSorts 
...
validate:

common.test:
   [junit4]  says aloha! Master seed: D6A9197BD551566E
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] Unhandled exception in thread: Thread[pumper-events,5,main]
   [junit4] 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException:
 No such reference: id#org.apache.solr.search.TestSort[3]
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.events.json.JsonDescriptionAdapter.deserialize(JsonDescriptionAdapter.java:90)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.events.json.JsonDescriptionAdapter.deserialize(JsonDescriptionAdapter.java:15)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonDeserializerExceptionWrapper.deserialize(JsonDeserializerExceptionWrapper.java:51)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.GsonToMiniGsonTypeAdapterFactory$3.read(GsonToMiniGsonTypeAdapterFactory.java:85)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:86)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:170)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.Gson.fromJson(Gson.java:720)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.events.Deserializer.deserialize(Deserializer.java:31)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.LocalSlaveStreamHandler.pumpEvents(LocalSlaveStreamHandler.java:100)
   [junit4] at 
com.carrotsearch.ant.tasks.junit4.LocalSlaveStreamHandler$1.run(LocalSlaveStreamHandler.java:73)
   [junit4] at java.lang.Thread.run(Thread.java:662)
...
{noformat}

...and the (ant) process is still running, but no files in 
solr/build/solr-core/test have been modified in over 20 minutes.

> "Unhandled exception" from test framework?
> --
>
> Key: LUCENE-4145
> URL: https://issues.apache.org/jira/browse/LUCENE-4145
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Dawid Weiss
>
> Working on SOLR-3267 i got a weird exception printed to the junit output...
> {noformat}
>[junit4] Unhandled exception in thread: Thread[pumper-events,5,main]
>[junit4] 
> com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException:
>  No such reference: id#org.apache.solr.search.TestSort[3]
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3542) Highlighter: Integration of LUCENE-4133 (Part of LUCENE-3440)

2012-06-14 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295403#comment-13295403
 ] 

Koji Sekiguchi commented on SOLR-3542:
--

Patch looks good! Will commit soon.

> Highlighter: Integration of LUCENE-4133 (Part of LUCENE-3440)
> -
>
> Key: SOLR-3542
> URL: https://issues.apache.org/jira/browse/SOLR-3542
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: 4.0
>Reporter: Sebastian Lutze
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter, highlight, patch
> Fix For: 4.0
>
> Attachments: SOLR-3542.patch
>
>
> This patch integrates a weight-based approach for sorting highlighted 
> fragments. 
> See LUCENE-4133 (Part of LUCENE-3440). 
> This patch contains: 
> - Introduction of class WeightedFragListBuilder, a implementation of 
> SolrFragListBuilder
> - Updated example-configuration 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4145) "Unhandled exception" from test framework?

2012-06-14 Thread Hoss Man (JIRA)

Hoss Man created LUCENE-4145:


 Summary: "Unhandled exception" from test framework?
 Key: LUCENE-4145
 URL: https://issues.apache.org/jira/browse/LUCENE-4145
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Dawid Weiss


Working on SOLR-3267 i got a weird exception printed to the junit output...

{noformat}
   [junit4] Unhandled exception in thread: Thread[pumper-events,5,main]
   [junit4] 
com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException:
 No such reference: id#org.apache.solr.search.TestSort[3]
...
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Corrupt index

2012-06-14 Thread Michael McCandless

Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
make a zero-segment commit.  This was changed/fixed in 3.1 with
LUCENE-2386.

In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
defaulting to false, but if you set it to true then IndexWriter will
periodically commit.

Seeing segment files created and merge is definitely expected, but
it's not expected to see segments_N files unless you pass
autoCommit=true.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko  wrote:
> Not what I'm seeing. I actually see a lot of segments created and merged
> while it operates. Expected?
>
> Reminding you, this is 2.9.4 / 3.0.3
>
> On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
>  wrote:
>>
>> Right: Lucene never autocommits anymore ...
>>
>> If you create a new index, add a bunch of docs, and things crash
>> before you have a chance to commit, then there is no index (not even a
>> 0 doc one) in that directory.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko 
>> wrote:
>> > I'm quite certain this shouldn't happen also when Commit wasn't called.
>> >
>> > Mike, can you comment on that?
>> >
>> > On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
>> >  wrote:
>> >>
>> >> Well, the only thing I see is that there is no place where
>> >> writer.Commit()
>> >> is called in the delegate assigned to corpusReader.OnDocument.  I know
>> >> that
>> >> lucene is very transactional, and at least in 3.x, the writer will
>> >> never
>> >> auto commit to the index.  You can write millions of documents, but if
>> >> commit is never called, those documents aren't actually part of the
>> >> index.
>> >>  Committing isn't a cheap operation, so you definitely don't want to do
>> >> it
>> >> on every document.
>> >>
>> >> You can test it yourself with this (naive) solution.  Right below the
>> >> writer.SetUseCompoundFile(false) line, add "int numDocsAdded = 0;".  At
>> >> the
>> >> end of the corpusReader.OnDocument delegate add:
>> >>
>> >> // Example only.  I wouldn't suggest committing this often
>> >> if(++numDocsAdded % 5 == 0)
>> >> {
>> >>    writer.Commit();
>> >> }
>> >>
>> >> I had the application crash for real on this file:
>> >>
>> >>
>> >> http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2,
>> >> about 20% into the operation.  Without the commit, the index is empty.
>> >>  Add
>> >> it in, and I get 755 files in the index after it crashes.
>> >>
>> >>
>> >> Thanks,
>> >> Christopher
>> >>
>> >> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
>> >> wrote:
>> >>
>> >>
>> >> > Yes, reproduced in first try. See attached program - I referenced it
>> >> > to
>> >> > current trunk.
>> >> >
>> >> >
>> >> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
>> >> > wrote:
>> >> >
>> >> >> Christopher,
>> >> >>
>> >> >> I used the IndexBuilder app from here
>> >> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
>> >> >> with a
>> >> >> 8.5GB wikipedia dump.
>> >> >>
>> >> >> After running for 2.5 days I had to forcefully close it (infinite
>> >> >> loop
>> >> >> in
>> >> >> the wiki-markdown parser at 92%, go figure), and the 40-something GB
>> >> >> index
>> >> >> I had by then was unusable. I then was able to reproduce this
>> >> >>
>> >> >> Please note I now added a few safe-guards you might want to remove
>> >> >> to
>> >> >> make sure the app really crashes on process kill.
>> >> >>
>> >> >> I'll try to come up with a better way to reproduce this - hopefully
>> >> >> Mike
>> >> >> will be able to suggest better ways than manual process kill...
>> >> >>
>> >> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
>> >> >> currens.ch...@gmail.com> wrote:
>> >> >>
>> >> >>> Mike, The codebase for lucene.net should be almost identical to
>> >> >>> java's
>> >> >>> 3.0.3 release, and LUCENE-1044 is included in that.
>> >> >>>
>> >> >>> Itamar, are you committing the index regularly?  I only ask because
>> >> >>> I
>> >> >>> can't
>> >> >>> reproduce it myself by forcibly terminating the process while it's
>> >> >>> indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
>> >> >>> all
>> >> >>> and
>> >> >>> terminate the process (even with a 10,000 4K documents created),
>> >> >>> there
>> >> >>> will
>> >> >>> be no documents in the index when I open it in luke, which I
>> >> >>> expect.
>> >> >>>  If
>> >> >>> I
>> >> >>> commit at 10,000 documents, and terminate it a few thousand after
>> >> >>> that,
>> >> >>> the
>> >> >>> index has the first ten thousand that were committed.  I've even
>> >> >>> terminated
>> >> >>> it *while* a second commit was taking place, and it still had all
>> >> >>> of
>> >> >>> the
>> >> >>> documents I expected.
>> >> >>>
>> >> >>> It may be that I'm not trying to reproducing it correctly.  Do you
>> >> >>> have a
>> >> >>> minimal amount of code that can reproduce it?
>>

[jira] [Assigned] (SOLR-3267) TestSort failures (reproducible)

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-3267:
--

Assignee: Hoss Man

> TestSort failures (reproducible)
> 
>
> Key: SOLR-3267
> URL: https://issues.apache.org/jira/browse/SOLR-3267
> Project: Solr
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Hoss Man
> Fix For: 4.0
>
>
> {noformat}
> Over 0.2% oddities in test: 14/6386 have func/query parsing semenatics gotten 
> broader?
> {noformat}
> Huh? Steps to reproduce:
> {noformat}
> ant test -Dtestcase=TestSort -Dtestmethod=testRandomFieldNameSorts 
> -Dtests.seed=-3e789c8564f08cbd:515c61b079794ea7:-6347ac0df7ad45c0 
> -Dargs="-Dfile.encoding=UTF-8"
> [junit] Testcase: 
> testRandomFieldNameSorts(org.apache.solr.search.TestSort):FAILED
> [junit] Over 0.2% oddities in test: 14/6386 have func/query parsing 
> semenatics gotten broader?
> [junit] junit.framework.AssertionFailedError: Over 0.2% oddities in test: 
> 14/6386 have func/query parsing semenatics gotten broader?
> [junit] at org.junit.Assert.fail(Assert.java:93)
> [junit] at org.junit.Assert.assertTrue(Assert.java:43)
> [junit] at 
> org.apache.solr.search.TestSort.testRandomFieldNameSorts(TestSort.java:145)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:597)
> [junit] at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> [junit] at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> [junit] at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> [junit] at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> [junit] at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> [junit] at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> [junit] at 
> org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:63)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:739)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:655)
> [junit] at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:566)
> [junit] at 
> org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:628)
> [junit] at org.junit.rules.RunRules.evaluate(RunRules.java:18)
> [junit] at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> [junit] at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> [junit] at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164)
> [junit] at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
> [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> [junit] at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> [junit] at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> [junit] at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> [junit] at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> [junit] at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> [junit] at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> [junit] at 
> org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:63)
> [junit] at 
> org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75)
> [junit] at 
> org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:38)
> [junit] at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69)
> [junit] at org.junit.rules.RunRules.evaluate(RunRules.java:18)
> [junit] at org.junit.runners.ParentR

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

Not what I'm seeing. I actually see a lot of segments created and merged
while it operates. Expected?

Reminding you, this is 2.9.4 / 3.0.3

On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Right: Lucene never autocommits anymore ...
>
> If you create a new index, add a bunch of docs, and things crash
> before you have a chance to commit, then there is no index (not even a
> 0 doc one) in that directory.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko 
> wrote:
> > I'm quite certain this shouldn't happen also when Commit wasn't called.
> >
> > Mike, can you comment on that?
> >
> > On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
> >  wrote:
> >>
> >> Well, the only thing I see is that there is no place where
> writer.Commit()
> >> is called in the delegate assigned to corpusReader.OnDocument.  I know
> >> that
> >> lucene is very transactional, and at least in 3.x, the writer will never
> >> auto commit to the index.  You can write millions of documents, but if
> >> commit is never called, those documents aren't actually part of the
> index.
> >>  Committing isn't a cheap operation, so you definitely don't want to do
> it
> >> on every document.
> >>
> >> You can test it yourself with this (naive) solution.  Right below the
> >> writer.SetUseCompoundFile(false) line, add "int numDocsAdded = 0;".  At
> >> the
> >> end of the corpusReader.OnDocument delegate add:
> >>
> >> // Example only.  I wouldn't suggest committing this often
> >> if(++numDocsAdded % 5 == 0)
> >> {
> >>writer.Commit();
> >> }
> >>
> >> I had the application crash for real on this file:
> >>
> >>
> http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
> ,
> >> about 20% into the operation.  Without the commit, the index is empty.
> >>  Add
> >> it in, and I get 755 files in the index after it crashes.
> >>
> >>
> >> Thanks,
> >> Christopher
> >>
> >> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
> >> wrote:
> >>
> >>
> >> > Yes, reproduced in first try. See attached program - I referenced it
> to
> >> > current trunk.
> >> >
> >> >
> >> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
> >> > wrote:
> >> >
> >> >> Christopher,
> >> >>
> >> >> I used the IndexBuilder app from here
> >> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThingswith a
> >> >> 8.5GB wikipedia dump.
> >> >>
> >> >> After running for 2.5 days I had to forcefully close it (infinite
> loop
> >> >> in
> >> >> the wiki-markdown parser at 92%, go figure), and the 40-something GB
> >> >> index
> >> >> I had by then was unusable. I then was able to reproduce this
> >> >>
> >> >> Please note I now added a few safe-guards you might want to remove to
> >> >> make sure the app really crashes on process kill.
> >> >>
> >> >> I'll try to come up with a better way to reproduce this - hopefully
> >> >> Mike
> >> >> will be able to suggest better ways than manual process kill...
> >> >>
> >> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
> >> >> currens.ch...@gmail.com> wrote:
> >> >>
> >> >>> Mike, The codebase for lucene.net should be almost identical to
> java's
> >> >>> 3.0.3 release, and LUCENE-1044 is included in that.
> >> >>>
> >> >>> Itamar, are you committing the index regularly?  I only ask because
> I
> >> >>> can't
> >> >>> reproduce it myself by forcibly terminating the process while it's
> >> >>> indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
> all
> >> >>> and
> >> >>> terminate the process (even with a 10,000 4K documents created),
> there
> >> >>> will
> >> >>> be no documents in the index when I open it in luke, which I expect.
> >> >>>  If
> >> >>> I
> >> >>> commit at 10,000 documents, and terminate it a few thousand after
> >> >>> that,
> >> >>> the
> >> >>> index has the first ten thousand that were committed.  I've even
> >> >>> terminated
> >> >>> it *while* a second commit was taking place, and it still had all of
> >> >>> the
> >> >>> documents I expected.
> >> >>>
> >> >>> It may be that I'm not trying to reproducing it correctly.  Do you
> >> >>> have a
> >> >>> minimal amount of code that can reproduce it?
> >> >>>
> >> >>>
> >> >>> Thanks,
> >> >>> Christopher
> >> >>>
> >> >>> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless <
> >> >>> luc...@mikemccandless.com> wrote:
> >> >>>
> >> >>> > Hi Itamar,
> >> >>> >
> >> >>> > One quick question: does Lucene.Net include the fixes done for
> >> >>> > LUCENE-1044 (to fsync files on commit)?  Those are very important
> >> >>> > for
> >> >>> > an index to be intact after OS/JVM crash or power loss.
> >> >>> >
> >> >>> > More responses below:
> >> >>> >
> >> >>> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko <
> >> >>> ita...@code972.com>
> >> >>> > wrote:
> >> >>> >
> >> >>> > > I'm a Lucene.Net committer, and there is a chance we have a bug
> in
> >> >>> our
> >> >>> > > FSDirectory impleme

Re: Corrupt index

2012-06-14 Thread Michael McCandless

Right: Lucene never autocommits anymore ...

If you create a new index, add a bunch of docs, and things crash
before you have a chance to commit, then there is no index (not even a
0 doc one) in that directory.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko  wrote:
> I'm quite certain this shouldn't happen also when Commit wasn't called.
>
> Mike, can you comment on that?
>
> On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
>  wrote:
>>
>> Well, the only thing I see is that there is no place where writer.Commit()
>> is called in the delegate assigned to corpusReader.OnDocument.  I know
>> that
>> lucene is very transactional, and at least in 3.x, the writer will never
>> auto commit to the index.  You can write millions of documents, but if
>> commit is never called, those documents aren't actually part of the index.
>>  Committing isn't a cheap operation, so you definitely don't want to do it
>> on every document.
>>
>> You can test it yourself with this (naive) solution.  Right below the
>> writer.SetUseCompoundFile(false) line, add "int numDocsAdded = 0;".  At
>> the
>> end of the corpusReader.OnDocument delegate add:
>>
>> // Example only.  I wouldn't suggest committing this often
>> if(++numDocsAdded % 5 == 0)
>> {
>>    writer.Commit();
>> }
>>
>> I had the application crash for real on this file:
>>
>> http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2,
>> about 20% into the operation.  Without the commit, the index is empty.
>>  Add
>> it in, and I get 755 files in the index after it crashes.
>>
>>
>> Thanks,
>> Christopher
>>
>> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
>> wrote:
>>
>>
>> > Yes, reproduced in first try. See attached program - I referenced it to
>> > current trunk.
>> >
>> >
>> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
>> > wrote:
>> >
>> >> Christopher,
>> >>
>> >> I used the IndexBuilder app from here
>> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
>> >> 8.5GB wikipedia dump.
>> >>
>> >> After running for 2.5 days I had to forcefully close it (infinite loop
>> >> in
>> >> the wiki-markdown parser at 92%, go figure), and the 40-something GB
>> >> index
>> >> I had by then was unusable. I then was able to reproduce this
>> >>
>> >> Please note I now added a few safe-guards you might want to remove to
>> >> make sure the app really crashes on process kill.
>> >>
>> >> I'll try to come up with a better way to reproduce this - hopefully
>> >> Mike
>> >> will be able to suggest better ways than manual process kill...
>> >>
>> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
>> >> currens.ch...@gmail.com> wrote:
>> >>
>> >>> Mike, The codebase for lucene.net should be almost identical to java's
>> >>> 3.0.3 release, and LUCENE-1044 is included in that.
>> >>>
>> >>> Itamar, are you committing the index regularly?  I only ask because I
>> >>> can't
>> >>> reproduce it myself by forcibly terminating the process while it's
>> >>> indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all
>> >>> and
>> >>> terminate the process (even with a 10,000 4K documents created), there
>> >>> will
>> >>> be no documents in the index when I open it in luke, which I expect.
>> >>>  If
>> >>> I
>> >>> commit at 10,000 documents, and terminate it a few thousand after
>> >>> that,
>> >>> the
>> >>> index has the first ten thousand that were committed.  I've even
>> >>> terminated
>> >>> it *while* a second commit was taking place, and it still had all of
>> >>> the
>> >>> documents I expected.
>> >>>
>> >>> It may be that I'm not trying to reproducing it correctly.  Do you
>> >>> have a
>> >>> minimal amount of code that can reproduce it?
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Christopher
>> >>>
>> >>> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless <
>> >>> luc...@mikemccandless.com> wrote:
>> >>>
>> >>> > Hi Itamar,
>> >>> >
>> >>> > One quick question: does Lucene.Net include the fixes done for
>> >>> > LUCENE-1044 (to fsync files on commit)?  Those are very important
>> >>> > for
>> >>> > an index to be intact after OS/JVM crash or power loss.
>> >>> >
>> >>> > More responses below:
>> >>> >
>> >>> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko <
>> >>> ita...@code972.com>
>> >>> > wrote:
>> >>> >
>> >>> > > I'm a Lucene.Net committer, and there is a chance we have a bug in
>> >>> our
>> >>> > > FSDirectory implementation that causes indexes to get corrupted
>> >>> > > when
>> >>> > > indexing is cut while the IW is still open. As it roots from some
>> >>> > > retroactive fixes you made, I'd appreciate your feedback.
>> >>> > >
>> >>> > > Correct me if I'm wrong, but by design Lucene should be able to
>> >>> recover
>> >>> > > rather quickly from power failures or app crashes. Since existing
>> >>> segment
>> >>> > > files are read only, only new segments that are still being
>> >>> > > written
>> >>> can
>> >>> > get

[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-14 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295395#comment-13295395
 ] 

Michael McCandless commented on LUCENE-4062:


Very cool graphs!  Somehow you should turn them into a blog post :)

> More fine-grained control over the packed integer implementation that is 
> chosen
> ---
>
> Key: LUCENE-4062
> URL: https://issues.apache.org/jira/browse/LUCENE-4062
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>  Labels: performance
> Fix For: 4.0
>
> Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch
>
>
> In order to save space, Lucene has two main PackedInts.Mutable implentations, 
> one that is very fast and is based on a byte/short/integer/long array 
> (Direct*) and another one which packs bits in a memory-efficient manner 
> (Packed*).
> The packed implementation tends to be much slower than the direct one, which 
> discourages some Lucene components to use it. On the other hand, if you store 
> 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
> If you accept to trade some space for speed, you could store 3 of these 21 
> bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
> advantage of this approach is that you never need to read more than one block 
> to read or write a value, so this can be significantly faster than Packed32 
> and Packed64 which always need to read/write two blocks in order to avoid 
> costly branches.
> I ran some tests, and for 1000 21 bits values, this implementation takes 
> less than 2% more space and has 44% faster writes and 30% faster reads. The 
> 12 bits version (5 values per block) has the same performance improvement and 
> a 6% memory overhead compared to the packed implementation.
> In order to select the best implementation for a given integer size, I wrote 
> the {{PackedInts.getMutable(valueCount, bitsPerValue, 
> acceptableOverheadPerValue)}} method. This method select the fastest 
> implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
> per value. For example, if you accept an overhead of 20% 
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
> reasonable, here is what implementations would be selected:
>  * 1: Packed64SingleBlock1
>  * 2: Packed64SingleBlock2
>  * 3: Packed64SingleBlock3
>  * 4: Packed64SingleBlock4
>  * 5: Packed64SingleBlock5
>  * 6: Packed64SingleBlock6
>  * 7: Direct8
>  * 8: Direct8
>  * 9: Packed64SingleBlock9
>  * 10: Packed64SingleBlock10
>  * 11: Packed64SingleBlock12
>  * 12: Packed64SingleBlock12
>  * 13: Packed64
>  * 14: Direct16
>  * 15: Direct16
>  * 16: Direct16
>  * 17: Packed64
>  * 18: Packed64SingleBlock21
>  * 19: Packed64SingleBlock21
>  * 20: Packed64SingleBlock21
>  * 21: Packed64SingleBlock21
>  * 22: Packed64
>  * 23: Packed64
>  * 24: Packed64
>  * 25: Packed64
>  * 26: Packed64
>  * 27: Direct32
>  * 28: Direct32
>  * 29: Direct32
>  * 30: Direct32
>  * 31: Direct32
>  * 32: Direct32
>  * 33: Packed64
>  * 34: Packed64
>  * 35: Packed64
>  * 36: Packed64
>  * 37: Packed64
>  * 38: Packed64
>  * 39: Packed64
>  * 40: Packed64
>  * 41: Packed64
>  * 42: Packed64
>  * 43: Packed64
>  * 44: Packed64
>  * 45: Packed64
>  * 46: Packed64
>  * 47: Packed64
>  * 48: Packed64
>  * 49: Packed64
>  * 50: Packed64
>  * 51: Packed64
>  * 52: Packed64
>  * 53: Packed64
>  * 54: Direct64
>  * 55: Direct64
>  * 56: Direct64
>  * 57: Direct64
>  * 58: Direct64
>  * 59: Direct64
>  * 60: Direct64
>  * 61: Direct64
>  * 62: Direct64
> Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
> choose the slower Packed64 implementation. Allowing a 50% overhead would 
> prevent the packed implementation to be selected for bits per value under 32. 
> Allowing an overhead of 32 bits per value would make sure that a Direct* 
> implementation is always selected.
> Next steps would be to:
>  * make lucene components use this {{getMutable}} method and let users decide 
> what trade-off better suits them,
>  * write a Packed32SingleBlock implementation if necessary (I didn't do it 
> because I have no 32-bits computer to test the performance improvements).
> I think this would allow more fine-grained control over the speed/space 
> trade-off, what do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp

[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295394#comment-13295394
 ] 

Michael McCandless commented on LUCENE-4132:


Hmm we are no longer cloning the IWC passed into IW?  Maybe we shouldn't remove 
testReuse?

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

2012-06-14 Thread Michael Garski (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295390#comment-13295390
 ] 

Michael Garski commented on SOLR-2592:
--

The reason for requiring the unique id to be hashable is that it is required to 
support the distributed real-time get component to retrieve a document based on 
only the unique id, which in turn is required for SolrCloud. Unit tests that 
exercise the patch thoroughly are still needed and I will be diving into later 
this week, so please keep that in mind if you are using this outside of a test 
environment. 

> Pluggable shard lookup mechanism for SolrCloud
> --
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.0
>Reporter: Noble Paul
> Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-06-14 Thread Chris Russell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295389#comment-13295389
 ] 

Chris Russell commented on SOLR-2894:
-

Erik, what revision of solr did you apply the patch to? Did you not encounter 
the issues I encountered?

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 4.0
>
> Attachments: SOLR-2894.patch, SOLR-2894.patch, 
> distributed_pivot.patch, distributed_pivot.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux-Java7-64 - Build # 100 - Failure!

2012-06-14 Thread jenkins

Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java7-64/100/

5 tests failed.
REGRESSION:  
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_FullImport

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([3A718873E85EA769:6418C6443206F7BF]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:459)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:426)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.add1document(TestSqlEntityProcessorDelta2.java:85)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_FullImport(TestSqlEntityProcessorDelta2.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='1']
xml response was: 

010*:* OR add1documentstandard202.2


request 
was:start=0&q=*:*+OR+add1document&qt=standard&rows=20&version=2.2
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:452)
... 40 mor

Re: Welcome Adrien Grand as a new Lucene/Solr committer

2012-06-14 Thread Jan Høydahl

Welcome to the team, Adrien!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 7. juni 2012, at 20:11, Michael McCandless wrote:

> I'm pleased to announce that Adrien Grand has joined our ranks as a
> committer.
> 
> He has been contributing various patches to Lucene/Solr, recently to
> Lucene's packed ints implementation, giving a nice performance gain in
> some cases.  For example check out
> http://people.apache.org/~mikemccand/lucenebench/TermTitleSort.html
> (look for annotation U).
> 
> Adrien, its tradition that you introduce yourself with a brief bio.
> 
> As soon as your SVN access is setup, you should then be able to add
> yourself to the committers list on the website as well.
> 
> Congratulations!
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-1259) scale() function doesn't work in multisegment indexes

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1259.


   Resolution: Fixed
Fix Version/s: (was: 4.0)
   3.1
 Assignee: Yonik Seeley

the core bug was idently fixed long ago, but the issue was left open for future 
improvements.

those improvements are now tracked in SOLR-3545

> scale() function doesn't work in multisegment indexes
> -
>
> Key: SOLR-1259
> URL: https://issues.apache.org/jira/browse/SOLR-1259
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Hoss Man
>Assignee: Yonik Seeley
> Fix For: 3.1
>
> Attachments: SOLR-1259.patch
>
>
> per yonik's comments in an email...
> bq. Darn... another SOLR- related issue.  scale() will now only scale 
> per-segment.
> ...we either need to fix, or document prior to releasing 1.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3545) make scale function more efficient in multi-segment indexes

2012-06-14 Thread Hoss Man (JIRA)

Hoss Man created SOLR-3545:
--

 Summary: make scale function more efficient in multi-segment 
indexes
 Key: SOLR-3545
 URL: https://issues.apache.org/jira/browse/SOLR-3545
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Yonik Seeley


offshoot of SOLR-1259 where yonik said...

bq. ... handle the situation the same as ord()... via top() to pop back to the 
top level reader. This isn't so bad since scale() was never really production 
quality anyway, since it doesn't cache the min and max -recomputing it each 
time.
bq. Committed, and moving remainder of the work (per-segment fieldcache usage, 
caching min+max) ... [to future]


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3041) Solrs using SolrCloud feature for having shared config in ZK, might not all start successfully when started for the first time simultaneously

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-3041:
--

Assignee: Mark Miller

Could you please asses & triage this for 4.0?

> Solrs using SolrCloud feature for having shared config in ZK, might not all 
> start successfully when started for the first time simultaneously
> -
>
> Key: SOLR-3041
> URL: https://issues.apache.org/jira/browse/SOLR-3041
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: Exact version: 
> https://builds.apache.org/job/Solr-trunk/1718/artifact/artifacts/apache-solr-4.0-2011-12-28_08-33-55.tgz
>Reporter: Per Steffensen
>Assignee: Mark Miller
> Fix For: 4.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Starting Solr like this
> java -DzkHost= -Dbootstrap_confdir=./myproject/conf 
> -Dcollection.configName=myproject_conf -Dsolr.solr.home=./myproject -jar 
> start.jar
> When not already there (starting solr for the first time) the content of 
> ./myproject/conf will be copied by Solr into ZK. That process does not work 
> very well in parallel, so if the content is not there and I start several 
> Solrs simultaneously, one or more of them might not start successfully.
> I see exceptions like the ones shown below, and the Solrs throwing them will 
> not work correctly afterwards.
> I know that there could be different workarounds, like making sure to always 
> start one Solr and wait for a while before starting the rest of them, but I 
> think we should really be more robuste in these cases.
> Regards, Per Steffensen
>  exception example 1 (the znode causing the problem can be different than 
> /configs/myproject_conf/protwords.txt) 
> org.apache.solr.common.cloud.ZooKeeperException: 
>   at 
> org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:193)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:337)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:240)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:93)
>   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
>   at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
>   at 
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>   at org.mortbay.jetty.Server.doStart(Server.java:224)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.mortbay.start.Main.invokeMain(Main.java:194)
>   at org.mortbay.start.Main.start(Main.java:534)
>   at org.mortbay.start.Main.start(Main.java:441)
>   at org.mortbay.start.Main.main(Main.java:119)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for /configs/myproject_conf/protwords.txt
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
>   at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java

[jira] [Updated] (SOLR-3313) Rename "Query Type" to "Request Handler" in SolrJ APIs

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3313:
---

Component/s: (was: web gui)
 clients - java
Summary: Rename "Query Type" to "Request Handler" in SolrJ APIs  (was: 
Rename "Query Type" to "Request Handler" in API and UI )

The Admin UI was already updated to reflect this in another issue, so 
clarifying scope of summary to be specific about SolrJ.


> Rename "Query Type" to "Request Handler" in SolrJ APIs
> --
>
> Key: SOLR-3313
> URL: https://issues.apache.org/jira/browse/SOLR-3313
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: David Smiley
> Fix For: 4.0
>
>
> Nobody should speak of "query types" any more; it's "request handlers".  I 
> understand we want to retain the "qt" parameter as such but I think we should 
> change the names of it wherever we can find it.  We can leave some older API 
> methods in place as deprecated.
> As an example, in SolrJ I have to call solrQuery.setQueryType("/blah") 
> instead of setRequestHandler()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-725) CoreContainer/CoreDescriptor/SolrCore cleansing

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-725:
--

Fix Version/s: (was: 4.1)

Removing fix version since this issue hasn't gotten much attention lately and 
doesn't appear to be a priority for anyone at the moment. 

As always: if someone wants to take on this work they are welcome to do so at 
any time and the target release can be revisited


> CoreContainer/CoreDescriptor/SolrCore cleansing
> ---
>
> Key: SOLR-725
> URL: https://issues.apache.org/jira/browse/SOLR-725
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Henri Biestro
> Attachments: solr-725.patch, solr-725.patch, solr-725.patch, 
> solr-725.patch
>
>
> These 3 classes and the name vs alias handling are somewhat confusing.
> The recent SOLR-647 & SOLR-716 have created a bit of a flux.
> This issue attemps to clarify the model and the list of operations. 
> h3. CoreDescriptor: describes the parameters of a SolrCore
> h4. Definitions
> * has one name
>   ** The CoreDescriptor name may represent multiple aliases; in that 
> case, first alias is the SolrCore name
> * has one instance directory location
> * has one config & schema name
> h4. Operations
> The class is only a parameter passing facility
> h3. SolrCore: manages a Lucene index
> h4. Definitions
> * has one unique *name* (in the CoreContainer)
> **the *name* is used in JMX to identify the core
> * has one current set of *aliases*
> **the name is the first alias
> h4. Name & alias operations
> * *get name/aliases*: obvious
> * *alias*: adds an alias to this SolrCore
> * *unalias*: removes an alias from this SolrCore
> * *name*: sets the SolrCore name
> **potentially impacts JMX registration
> * *rename*: picks a new name from the SolrCore aliases
> **triggered when alias name is already in use
> h3. CoreContainer: manages all relations between cores & descriptors
> h4. Definitions
> * has a set of aliases (each of them pointing to one core)
> **ensure alias uniqueness.
> h4. SolrCore instance operations
> * *load*: makes a SolrCore available for requests
> **creates a SolrCore
> **registers all SolrCore aliases in the aliases set
> **(load = create + register)
> * *unload*: removes a core idenitified by one of its aliases
> **stops handling the Lucene index
> **all SolrCore aliases are removed
> * *reload*: recreate the core identified by one of its aliases
> * *create*: create a core from a CoreDescriptor
> **readies up the Lucene index
> * *register*: registers all aliases of a SolrCore
>   
> h4. SolrCore  alias operations
> * *swap*: swaps 2 aliases
> **method: swap
> * *alias*: creates 1 alias for a core, potentially unaliasing a 
> previously used alias
> **The SolrCore name being an alias, this operation might trigger 
> a SolrCore rename
> * *unalias*: removes 1 alias for a core
> **The SolrCore name being an alias, this operation might trigger 
> a SolrCore rename
> *  *rename*: renames a core
> h3. CoreAdminHandler: handles CoreContainer operations
> * *load*/*create*:  CoreContainer load
> * *unload*:  CoreContainer unload
> * *reload*: CoreContainer reload
> * *swap*:  CoreContainer swap
> * *alias*:  CoreContainer alias
> * *unalias*: CoreContainer unalias
> *  *rename*: CoreContainer rename
> * *persist*: CoreContainer persist, writes the solr.xml
> **stauts*: returns the status of all/one SolrCore

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-731) CoreDescriptor.getCoreContainer should not be public

2012-06-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-731:
--

Fix Version/s: (was: 4.0)

Removing fix version since this issue hasn't gotten much attention lately and 
doesn't appear to be a priority for anyone for 4.0.  

As always: if someone wants to take on this work they are welcome to do so at 
any time and the target release can be revisited

In particular: i note that SolreCore.getCoreDescriptor and 
CoreDescriptor.getCoreContainer both seems to be fairly widely used now 
throughout the code base, so it's not clear to be that the intent/belief stated 
in this issue is still valid.

> CoreDescriptor.getCoreContainer should not be public
> 
>
> Key: SOLR-731
> URL: https://issues.apache.org/jira/browse/SOLR-731
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Henri Biestro
> Attachments: solr-731.patch
>
>
> For the very same reasons that CoreDescriptor.getCoreProperties did not need 
> to be public (aka SOLR-724)
> It also means the CoreDescriptor ctor should not need a CoreContainer
> The CoreDescriptor is only meant to be describing a "to-be created SolrCore".
> However, we need access to the CoreContainer from the SolrCore now that we 
> are guaranteed the CoreContainer always exists.
> This is also a natural consequence of SOLR-647 now that the CoreContainer is 
> not a map of CoreDescriptor but a map of SolrCore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight

2012-06-14 Thread Martijn van Groningen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen resolved LUCENE-4082.
---

Resolution: Fixed

Committed to branch4x and trunk.

> Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
> ---
>
> Key: LUCENE-4082
> URL: https://issues.apache.org/jira/browse/LUCENE-4082
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/join
>Affects Versions: 3.4, 3.5, 3.6
>Reporter: Christoph Kaser
>Assignee: Martijn van Groningen
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4082.patch, LUCENE-4082.patch
>
>
> At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an 
> UnsupportedOperationException. It would be useful if it could instead return 
> the score of parent document, even if the explanation on how that score was 
> calculated is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Corrupt index

2012-06-14 Thread Troy Howard

> If this is the case, 2328 probably made it's way to Lucene.Net since we are
> using the released sources for porting, and we now need to apply 3418 in
> the current version.

Iatmar: I confirmed that 2328 is in the latest code.

Thanks,
Troy


On Wed, Jun 13, 2012 at 5:45 PM, Itamar Syn-Hershko  wrote:
> Mike,
>
> On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Itamar,
>>
>> One quick question: does Lucene.Net include the fixes done for
>> LUCENE-1044 (to fsync files on commit)?  Those are very important for
>> an index to be intact after OS/JVM crash or power loss.
>>
>
> Definitely, as Christopher noted we are about to release a 3.0.3 compatible
> version, which is line-by-line port of the Java version.
>
>
>> You shouldn't even have to run CheckIndex ... because (as of
>> LUCENE-1044) we now fsync all segment files before writing the new
>> segments_N file, and then removing old segments_N files (and any
>> segments that are no longer referenced).
>>
>> You do have to remove the write.lock if you aren't using
>> NativeFSLockFactory (but this has been the default lock impl for a
>> while now).
>>
>
> Somewhat unrelated to this thread, but what should I expect to see? from
> time to time we do see write.lock present after an app-crash or power
> failure. Also, what are the steps that are expected to be performed in such
> cases?
>
>
>>
>> > Last week I have been playing with rather large indexes and crashed my
>> app
>> > while it was indexing. I wasn't able to open the index, and Luke was even
>> > kind enough to wipe the index folder clean even though I opened it in
>> > read-only mode. I re-ran this, and after another crash running CheckIndex
>> > revealed nothing - the index was detected to be an empty one. I am not
>> > entirely sure what could be the cause for this, but I suspect it has
>> > been corrupted by the crash.
>>
>> Had no commit completed (no segments file written)?
>>
>> If you don't fsync then all sorts of crazy things are possible...
>>
>
> Ok, so we do have fsync since LUCENE-1044 is present, and there were
> segments present from previous commits. Any idea what went wrong?
>
>
>> > I've been looking at these:
>> >
>> >
>> https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >
>> https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>
>> (And LUCENE-1044 before that ... it was LUCENE-1044 that 
>> LUCENE-2328broke...).
>>
>
> So 2328 broke 1044, and this was fixed only in 3.4, right? so 2328 made it
> to a 3.0.x release while the fix for it (3418) was only released in 3.4. Am
> I right?
>
> If this is the case, 2328 probably made it's way to Lucene.Net since we are
> using the released sources for porting, and we now need to apply 3418 in
> the current version.
>
> Does it make sense to just port FSDirectory from 3.4 to 3.0.3? or were
> there API or other changes that will make our life miserable if we do that?
>
>
>>
>> > And it seems like this is what I was experiencing. Mike and Mark will
>> > probably be able to tell if this is what they saw or not, but as far as I
>> > can tell this is not an expected behavior of a Lucene index.
>>
>> Definitely not expected behavior: assuming nothing is flipping bits,
>> then on OS/JVM crash or power loss your index should be fine, just
>> reverted to the last successful commit.
>>
>
> What I suspected. Will try to reproduce reliably - any recommendations? not
> really feeling like reinventing the wheel here...
>
> MockDirectoryWrapper wasn't ported yet as it appears to only appear in 3.4,
> and as you said it won't really help here anyway
>
>
>>
>> > What I'm looking for at the moment is some advice on what FSDirectory
>> > implementation to use to make sure no corruption can happen. The 3.4
>> version
>> > (which is where LUCENE-3418 was committed to) seems to handle a lot of
>> > things the 3.0 doesn't, but on the other hand LUCENE-3418 was
>> introduced by
>> > changes made to the 3.0 codebase.
>>
>> Hopefully it's just that you are missing fsync!
>>
>> > Also, is there any test in the suite checking for those scenarios?
>>
>> Our test framework has a sneaky MockDirectoryWrapper that, after a
>> test finishes, goes and corrupts any unsync'd files and then verifies
>> the index is still OK... it's good because it'll catch any times we
>> are missing calls t sync, but, it's not low level enough such that if
>> FSDir is failing to actually call fsync (that wsa the bug in
>> LUCENE-3418) then it won't catch that...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>

-
To unsubscribe, e-ma

[jira] [Updated] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight

2012-06-14 Thread Martijn van Groningen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated LUCENE-4082:
--

Attachment: LUCENE-4082.patch

Included explain into random test.

> Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
> ---
>
> Key: LUCENE-4082
> URL: https://issues.apache.org/jira/browse/LUCENE-4082
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/join
>Affects Versions: 3.4, 3.5, 3.6
>Reporter: Christoph Kaser
>Assignee: Martijn van Groningen
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4082.patch, LUCENE-4082.patch
>
>
> At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an 
> UnsupportedOperationException. It would be useful if it could instead return 
> the score of parent document, even if the explanation on how that score was 
> calculated is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight

2012-06-14 Thread Martijn van Groningen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated LUCENE-4082:
--

Fix Version/s: 5.0
   4.0
 Assignee: Martijn van Groningen

> Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
> ---
>
> Key: LUCENE-4082
> URL: https://issues.apache.org/jira/browse/LUCENE-4082
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/join
>Affects Versions: 3.4, 3.5, 3.6
>Reporter: Christoph Kaser
>Assignee: Martijn van Groningen
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4082.patch, LUCENE-4082.patch
>
>
> At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an 
> UnsupportedOperationException. It would be useful if it could instead return 
> the score of parent document, even if the explanation on how that score was 
> calculated is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295310#comment-13295310
 ] 

Hoss Man commented on SOLR-3534:



The point of the test is to assert that DismaxQParser can function correctly 
with nothing but a "q" param and a schema specifying a defaultSearchField.  If 
that's covered by another test you're adding (or that already exists) then 
great, we don't need it.


> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3544) Under heavy load json response is cut at some arbitrary position

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295307#comment-13295307
 ] 

Hoss Man commented on SOLR-3544:


can you provide some more details please...

1) what servlet container are you using?
2) how big (in bytes) are the responses when they work? how big are they when 
"cut off"?
3) does the "cut off" always happen on/around a specific piece of markup? (ie: 
when closing a list or an object) or in the middle of arbitrary string values?  
is it possible there are certain byte sequences that always occur just 
before/at/after the cutoff happens?
4) your blog post mentions...

bq. Unfortunately, there was no indication of any malfunction in Solr except 
for the “Broken Pipe” notification that the client has closed the connection.

...where are you seeing this? in packet sniffing tool? in the solr logs? ... 
what exactly is the full message? 



> Under heavy load json response is cut at some arbitrary position
> 
>
> Key: SOLR-3544
> URL: https://issues.apache.org/jira/browse/SOLR-3544
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.1
> Environment: Linux version 2.6.32-5-amd64 (Debian 2.6.32-38) 
> (b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) )
>Reporter: Dušan Omerčević
>
> We query solr for 30K documents using json as the response format. Normally 
> this works perfectly fine. But when the machine comes under heavy load (all 
> cores utilized) the response got interrupted at arbitrary position. We 
> circumvented the problem by switching to xml response format.
> I've written the full description here: 
> http://restreaming.wordpress.com/2012/06/14/the-curious-case-of-solr-malfunction/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295306#comment-13295306
 ] 

David Smiley commented on SOLR-3534:


TestExtendedDismaxParser line 126 already tests that defaultSearchField is 
consulted.  In this patch I added another test above it to ensure that "df" is 
consulted.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295303#comment-13295303
 ] 

Hoss Man commented on SOLR-3534:


bq. I'll presume that you don't mean "" literally, you 
mean "text".

yes, sorry .. i was using "" as shorthand for 
...something..., that was bad on my 
part and totally confusing.

bq. So are you effectively saying that schema-minimal.xml should add a 
defaultSearchField to it?

No, i'm saying that as long as "..." 
is legal and supported configuration, then this specific test (of 
"dismaxNoDefaults") should use a schema that has a 
"..." in it since that's the point of 
the test.

schema-minimal.xml should certainly not have a "..." added, 
since that would no longer truely be a minimal schema.xml

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2724) Deprecate defaultSearchField and defaultOperator defined in schema.xml

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295299#comment-13295299
 ] 

Hoss Man commented on SOLR-2724:


David: looking back at the mailing list ~ 28/Mar/12 it's not clear what exactly 
was the problem that required reverting at the time ... where the test failures 
even caused by this specific issue, or something else that you committed right 
around the same time?

Given that we've already created the 4x branch and started pushing towards 
Alpha, i would at least move forward with making sure trunk & 4x are on parity 
with 3.6 in terms of the changes to the example and the log/error messages.

Depending on what the issue was with the tests we can figure out how we want to 
move forward from there.

bq. I take issue with the Yonik's comment "we're not really gaining anything 
with this change". ... I don't think defaultSearchField & defaultOperator have 
a need to exist, let alone exist in schema.xml, thereby creating unnecessary 
complexity in understanding the product – albeit in a small way.

I think the question is "if we stop promoting them in the example, and start 
encouraging an alternative instead, what is gained by actually removing the 
support in the code for existing users who already have them in the config and 
upgrade?"

It's one thing to say in CHANGES.txt "we've removed feature X because doing so 
allowed us (add feature|fix bug) Y, so if you used X in the past now you have 
to use Z instead"  but there is no "Y" in this case (that i see) ... we're just 
telling people "we've removed X because we think Z is better, so if you used X 
in the past now you have to use Z instead".

You may feel it's a complexity for new users to understand why these things are 
in schema.xml -- which is fine, but isn't removing from the example schema.xml 
enough to addresses?  What is the value gained in removing the ability to use 
it for existing users who already understand it?  This is the crux of my 
suggestion way, way, WAY back in this issue about why i didn't/don't think 
there was a strong motivation to remove support completely in 4x - an opinion 
echoed by Yonik & Erick.

As evidence from recent mailing list comments by folks like Bernd & Rohit, 
there is already clearly confusion for existing users just by removing these 
from the example -- let alone removing support for it from the code.

> Deprecate defaultSearchField and defaultOperator defined in schema.xml
> --
>
> Key: SOLR-2724
> URL: https://issues.apache.org/jira/browse/SOLR-2724
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, search
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: 
> SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I've always been surprised to see the  element and 
>  defined in the schema.xml file since 
> the first time I saw them.  They just seem out of place to me since they are 
> more query parser related than schema related. But not only are they 
> misplaced, I feel they shouldn't exist. For query parsers, we already have a 
> "df" parameter that works just fine, and explicit field references. And the 
> default lucene query operator should stay at OR -- if a particular query 
> wants different behavior then use q.op or simply use "OR".
>  Seems like something better placed in solrconfig.xml than in the 
> schema. 
> In my opinion, defaultSearchField and defaultOperator configuration elements 
> should be deprecated in Solr 3.x and removed in Solr 4.  And  
> should move to solrconfig.xml. I am willing to do it, provided there is 
> consensus on it of course.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3544) Under heavy load json response is cut at some arbitrary position

2012-06-14 Thread JIRA

Dušan Omerčević created SOLR-3544:
-

 Summary: Under heavy load json response is cut at some arbitrary 
position
 Key: SOLR-3544
 URL: https://issues.apache.org/jira/browse/SOLR-3544
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
 Environment: Linux version 2.6.32-5-amd64 (Debian 2.6.32-38) 
(b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) )
Reporter: Dušan Omerčević


We query solr for 30K documents using json as the response format. Normally 
this works perfectly fine. But when the machine comes under heavy load (all 
cores utilized) the response got interrupted at arbitrary position. We 
circumvented the problem by switching to xml response format.

I've written the full description here: 
http://restreaming.wordpress.com/2012/06/14/the-curious-case-of-solr-malfunction/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295296#comment-13295296
 ] 

David Smiley commented on SOLR-3534:


Hoss, I like your suggestion of refactoring this to SolrPluginUtils (not *Tools 
which doesn't exist).  And also I realized that SolrParams.get() takes a 2nd 
arg for the default which can be s.getDefaultSearchFieldName(), simplifying 
this even more.

bq. As Bernd noted, that test was written at a time when the schema.xml used by 
the test had a  declared – that was/is the entire point of 
the test: that the Dismax(Handler|QParser) could work with a 
"" and a "q" and no other params specified. As long as 
"" is legal (even if it's deprecated and not mentioned in 
the example schema.xml) a test like that should exist somewhere shouldn't it? 
(if/when "" is no longer legal, then certainly change the 
test to add a "df" param and assert that it fails if one isn't specified)

I'm confused by this, especially since you "+1"'ed on throwing an exception.  
I'll presume that you don't mean "" literally, you mean 
"text".  So are you effectively 
saying that schema-minimal.xml should add a defaultSearchField to it?

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED

2012-06-14 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3729:


Attachment: LUCENE-3729.patch

here is a first cut at using FST to hold terms in sorted DocValues. This patch 
holds all data in an FST and currently doesn't support a direct source ie all 
FSTs are loaded into memory even during merging. All test (except BWcompat -- 
wich is good!) pass. I think we can have this as a first step but not being the 
default? 

> Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
> --
>
> Key: LUCENE-3729
> URL: https://issues.apache.org/jira/browse/LUCENE-3729
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-11
> Attachments: LUCENE-3729.patch, LUCENE-3729.patch, LUCENE-3729.patch, 
> LUCENE-3729.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera reassigned LUCENE-4132:
--

Assignee: Shai Erera

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295269#comment-13295269
 ] 

Shai Erera commented on LUCENE-4132:


Thanks Robert. I'll wait until Sunday and commit it.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295268#comment-13295268
 ] 

Hoss Man commented on SOLR-3534:


bq. dismax&edismax should look at 'df' before falling back to defaultSearchField

+1 ... i thought it already did that, but i guess not.  If we are 
"deprecating/discouraging"  and instructing people to use 
"df" instead, then we should absolutely make 100% certain any code path we ship 
that currently consults  checks "df" first.  (if/when the 
code paths that consult  are removed, they should still 
consult "df")

bq. dismax&edismax should throw an exception if neither 'qf', 'df', nor 
defaultSearchField are specified, because these two query parsers are fairly 
useless without them.

+1 ..  (although i suppose edismax could still be usable if every clause is 
fully qualified with a fieldname/alias and fail only when a clause that 
requires a default is encountered ... just like the LuceneQParser)

bq. I ran all tests before committing and found the MinimalSchemaTest failed 
related to the "dismaxNoDefaults" request handler in the test solrconfig.xml 
which was added in SOLR-1776. The problem is throwing an exception if there's 
no 'qf', 'df', or default search field. I disagree with that test – it is 
erroneous/misleading to use dismax without setting specifying via any of those 
3 mechanisms. I am inclined to delete the "dismaxNoDefaults" test request 
handler (assuming there are no other ramifications). I want to get input from 
Hoss who put it there so I'll wait.

As Bernd noted, that test was written at a time when the schema.xml used by the 
test had a  declared -- that was/is the entire point of 
the test: that the Dismax(Handler|QParser) could work with a 
"" and a "q" and no other params specified.  As long as 
"" is legal (even if it's deprecated and not mentioned in 
the example schema.xml) a test like that should exist somewhere shouldn't it?  
(if/when "" is no longer legal, then certainly change the 
test to add a "df" param and assert that it fails if one isn't specified)

--

The current patch looks like a great start to me ... but i would suggest 
refactoring this core little bit into it's own method in SolrPluginTools and 
replacing every use of getDefaultSearchFieldName in the code base with it (and 
add a link to it from getDefaultSearchFieldName javadocs encouraging people to 
use it instead)...

{code}
/**
 * returns the effective default field based on the params or 
 * hardcoded schema default.  may be null if either exists specified.
 * @see CommonParams#DF
 * @see IndexSchema#getDefaultSearchFieldName
 */
public static String getDefaultField(final IndexSchema s, final SolrParams p) {
  String df = p.get(CommonParams.DF);
  if (df == null) {
df = s.getDefaultSearchFieldName();
  }
  return df;
}
{code}


> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-06-14 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295261#comment-13295261
 ] 

Erik Hatcher commented on SOLR-2894:


Trey - would you be in a position to test out the latest patch?   I built my 
latest one by starting with the March 5, 2012 SOLR-2894.patch file.

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 4.0
>
> Attachments: SOLR-2894.patch, SOLR-2894.patch, 
> distributed_pivot.patch, distributed_pivot.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2012-06-14 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-2894:
---

Attachment: SOLR-2894.patch

Patch updated to 4x branch.

Simon, just for you, I removed NamedListHelper as well :)  (folded its one 
method into PivotFacetHelper)

Tests pass.

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 4.0
>
> Attachments: SOLR-2894.patch, SOLR-2894.patch, 
> distributed_pivot.patch, distributed_pivot.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295220#comment-13295220
 ] 

Robert Muir commented on LUCENE-4132:
-

thanks, +1

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-14 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295195#comment-13295195
 ] 

Dawid Weiss commented on LUCENE-4062:
-

Ok, thanks - makes sense. Is the code for these benchmarks somewhere? 

> More fine-grained control over the packed integer implementation that is 
> chosen
> ---
>
> Key: LUCENE-4062
> URL: https://issues.apache.org/jira/browse/LUCENE-4062
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>  Labels: performance
> Fix For: 4.0
>
> Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch
>
>
> In order to save space, Lucene has two main PackedInts.Mutable implentations, 
> one that is very fast and is based on a byte/short/integer/long array 
> (Direct*) and another one which packs bits in a memory-efficient manner 
> (Packed*).
> The packed implementation tends to be much slower than the direct one, which 
> discourages some Lucene components to use it. On the other hand, if you store 
> 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
> If you accept to trade some space for speed, you could store 3 of these 21 
> bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
> advantage of this approach is that you never need to read more than one block 
> to read or write a value, so this can be significantly faster than Packed32 
> and Packed64 which always need to read/write two blocks in order to avoid 
> costly branches.
> I ran some tests, and for 1000 21 bits values, this implementation takes 
> less than 2% more space and has 44% faster writes and 30% faster reads. The 
> 12 bits version (5 values per block) has the same performance improvement and 
> a 6% memory overhead compared to the packed implementation.
> In order to select the best implementation for a given integer size, I wrote 
> the {{PackedInts.getMutable(valueCount, bitsPerValue, 
> acceptableOverheadPerValue)}} method. This method select the fastest 
> implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
> per value. For example, if you accept an overhead of 20% 
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
> reasonable, here is what implementations would be selected:
>  * 1: Packed64SingleBlock1
>  * 2: Packed64SingleBlock2
>  * 3: Packed64SingleBlock3
>  * 4: Packed64SingleBlock4
>  * 5: Packed64SingleBlock5
>  * 6: Packed64SingleBlock6
>  * 7: Direct8
>  * 8: Direct8
>  * 9: Packed64SingleBlock9
>  * 10: Packed64SingleBlock10
>  * 11: Packed64SingleBlock12
>  * 12: Packed64SingleBlock12
>  * 13: Packed64
>  * 14: Direct16
>  * 15: Direct16
>  * 16: Direct16
>  * 17: Packed64
>  * 18: Packed64SingleBlock21
>  * 19: Packed64SingleBlock21
>  * 20: Packed64SingleBlock21
>  * 21: Packed64SingleBlock21
>  * 22: Packed64
>  * 23: Packed64
>  * 24: Packed64
>  * 25: Packed64
>  * 26: Packed64
>  * 27: Direct32
>  * 28: Direct32
>  * 29: Direct32
>  * 30: Direct32
>  * 31: Direct32
>  * 32: Direct32
>  * 33: Packed64
>  * 34: Packed64
>  * 35: Packed64
>  * 36: Packed64
>  * 37: Packed64
>  * 38: Packed64
>  * 39: Packed64
>  * 40: Packed64
>  * 41: Packed64
>  * 42: Packed64
>  * 43: Packed64
>  * 44: Packed64
>  * 45: Packed64
>  * 46: Packed64
>  * 47: Packed64
>  * 48: Packed64
>  * 49: Packed64
>  * 50: Packed64
>  * 51: Packed64
>  * 52: Packed64
>  * 53: Packed64
>  * 54: Direct64
>  * 55: Direct64
>  * 56: Direct64
>  * 57: Direct64
>  * 58: Direct64
>  * 59: Direct64
>  * 60: Direct64
>  * 61: Direct64
>  * 62: Direct64
> Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
> choose the slower Packed64 implementation. Allowing a 50% overhead would 
> prevent the packed implementation to be selected for bits per value under 32. 
> Allowing an overhead of 32 bits per value would make sure that a Direct* 
> implementation is always selected.
> Next steps would be to:
>  * make lucene components use this {{getMutable}} method and let users decide 
> what trade-off better suits them,
>  * write a Packed32SingleBlock implementation if necessary (I didn't do it 
> because I have no 32-bits computer to test the performance improvements).
> I think this would allow more fine-grained control over the speed/space 
> trade-off, what do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more

[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295189#comment-13295189
 ] 

Hoss Man commented on SOLR-3535:


bq. Or simply allow SolrInputDocument as a normal value and existing APIs could 
be used to add them.  This would also be slightly more powerful, allowing more 
than one child list for the same parent.

"allow SolrInputDocument as a normal value" ... a normal value to what? where? 
... are you describing he same thing as Mikhail in modeling "children" 
SolrInputDocuments as field values of the parent SOlrInputDOcument?

If so then i ask you the same questions i asked him above...

{quote}
bq. why new api/property is necessary? is solrInputDoc.addField("skus", new 
Object[]\{sku1, sku2, sku3\}) not enough?

Are you suggesting we model child documents as objects (SolrInputDocuments i 
guess?) in a special field? ... what if i put child documents in multiple 
fields? would that signify the different types of child? how would solr model 
that in the (lucene) Documents when giving them to the InddexWriter? How would 
solr know how to order the children in from multiple fields/lists when creating 
the block? Wouldn't the "type of child" information be better living in the 
child documents itself? (particularly since that "type" information needs to be 
in the child documents anyway so that the filter query for a BJQ can be 
specified.)

It also seems like it would require code that wants to know what children exist 
in a document to do a lot of work to find that out (need to iterate ever field 
in the SolrInputDocument and do reflection to see if they are child-documents 
or not)

Another concern off the top of my head is that a lot of existing code 
(including any custom update processors people might have) would assume those 
child documents are multivaluved field values and would probably break – hence 
a new method on SolrInputDocument seems wiser (code that doens't know about may 
not do what you want, but at least it won't break it)
{quote}

> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

I'm quite certain this shouldn't happen also when Commit wasn't called.

Mike, can you comment on that?

On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens <
currens.ch...@gmail.com> wrote:

> Well, the only thing I see is that there is no place where writer.Commit()
> is called in the delegate assigned to corpusReader.OnDocument.  I know that
> lucene is very transactional, and at least in 3.x, the writer will never
> auto commit to the index.  You can write millions of documents, but if
> commit is never called, those documents aren't actually part of the index.
>  Committing isn't a cheap operation, so you definitely don't want to do it
> on every document.
>
> You can test it yourself with this (naive) solution.  Right below the
> writer.SetUseCompoundFile(false) line, add "int numDocsAdded = 0;".  At the
> end of the corpusReader.OnDocument delegate add:
>
> // Example only.  I wouldn't suggest committing this often
> if(++numDocsAdded % 5 == 0)
> {
>writer.Commit();
> }
>
> I had the application crash for real on this file:
>
> http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
> ,
> about 20% into the operation.  Without the commit, the index is empty.  Add
> it in, and I get 755 files in the index after it crashes.
>
>
> Thanks,
> Christopher
>
> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko  >wrote:
>
> > Yes, reproduced in first try. See attached program - I referenced it to
> > current trunk.
> >
> >
> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko  >wrote:
> >
> >> Christopher,
> >>
> >> I used the IndexBuilder app from here
> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
> >> 8.5GB wikipedia dump.
> >>
> >> After running for 2.5 days I had to forcefully close it (infinite loop
> in
> >> the wiki-markdown parser at 92%, go figure), and the 40-something GB
> index
> >> I had by then was unusable. I then was able to reproduce this
> >>
> >> Please note I now added a few safe-guards you might want to remove to
> >> make sure the app really crashes on process kill.
> >>
> >> I'll try to come up with a better way to reproduce this - hopefully Mike
> >> will be able to suggest better ways than manual process kill...
> >>
> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
> >> currens.ch...@gmail.com> wrote:
> >>
> >>> Mike, The codebase for lucene.net should be almost identical to java's
> >>> 3.0.3 release, and LUCENE-1044 is included in that.
> >>>
> >>> Itamar, are you committing the index regularly?  I only ask because I
> >>> can't
> >>> reproduce it myself by forcibly terminating the process while it's
> >>> indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all
> and
> >>> terminate the process (even with a 10,000 4K documents created), there
> >>> will
> >>> be no documents in the index when I open it in luke, which I expect.
>  If
> >>> I
> >>> commit at 10,000 documents, and terminate it a few thousand after that,
> >>> the
> >>> index has the first ten thousand that were committed.  I've even
> >>> terminated
> >>> it *while* a second commit was taking place, and it still had all of
> the
> >>> documents I expected.
> >>>
> >>> It may be that I'm not trying to reproducing it correctly.  Do you
> have a
> >>> minimal amount of code that can reproduce it?
> >>>
> >>>
> >>> Thanks,
> >>> Christopher
> >>>
> >>> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless <
> >>> luc...@mikemccandless.com> wrote:
> >>>
> >>> > Hi Itamar,
> >>> >
> >>> > One quick question: does Lucene.Net include the fixes done for
> >>> > LUCENE-1044 (to fsync files on commit)?  Those are very important for
> >>> > an index to be intact after OS/JVM crash or power loss.
> >>> >
> >>> > More responses below:
> >>> >
> >>> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko <
> >>> ita...@code972.com>
> >>> > wrote:
> >>> >
> >>> > > I'm a Lucene.Net committer, and there is a chance we have a bug in
> >>> our
> >>> > > FSDirectory implementation that causes indexes to get corrupted
> when
> >>> > > indexing is cut while the IW is still open. As it roots from some
> >>> > > retroactive fixes you made, I'd appreciate your feedback.
> >>> > >
> >>> > > Correct me if I'm wrong, but by design Lucene should be able to
> >>> recover
> >>> > > rather quickly from power failures or app crashes. Since existing
> >>> segment
> >>> > > files are read only, only new segments that are still being written
> >>> can
> >>> > get
> >>> > > corrupted. Hence, recovering from worst-case scenarios is done by
> >>> simply
> >>> > > removing the write.lock file. The worst that could happen then is
> >>> having
> >>> > the
> >>> > > last segment damaged, and that can be fixed by removing those
> files,
> >>> > > possibly by running CheckIndex on the index.
> >>> >
> >>> > You shouldn't even have to run CheckIndex ... because (as of
> >>> > LUCENE-1044) we now fsync all segment files before writing the new
> >>> > segments_N f

[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-14 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295176#comment-13295176
 ] 

Adrien Grand commented on LUCENE-4062:
--

The x axis is the number of bits per value while the y axis is the number of 
values that are read or written per second. For every bitsPerValue and 
bit-packing scheme, I took the impl with the lowest working bitsPerValue. (For 
example, bitsPerValue=19 would give a Direct32, a Packed64(bitsPerValue=19), a 
Packed8ThreeBlocks(24 bits per value) and a 
Packed64SingleBlock(bitsPerValue=21)). There are 4 lines because we currently 
have 4 different bit-packing schemes.

In the two first cases, values are read at random offsets while the two bulk 
tests read/write a large number of values sequentially. I didn't want to test 
{{System.arraycopy}} against a naive for-loop, I just noticed that {{Direct64}} 
bulk operations didn't use {{arraycopy}}, so I fixed that and added a few words 
about it so that people understand why the throughput increases when 
bitsPerValue > 32, which is counter-intuitive.

> More fine-grained control over the packed integer implementation that is 
> chosen
> ---
>
> Key: LUCENE-4062
> URL: https://issues.apache.org/jira/browse/LUCENE-4062
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>  Labels: performance
> Fix For: 4.0
>
> Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch
>
>
> In order to save space, Lucene has two main PackedInts.Mutable implentations, 
> one that is very fast and is based on a byte/short/integer/long array 
> (Direct*) and another one which packs bits in a memory-efficient manner 
> (Packed*).
> The packed implementation tends to be much slower than the direct one, which 
> discourages some Lucene components to use it. On the other hand, if you store 
> 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
> If you accept to trade some space for speed, you could store 3 of these 21 
> bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
> advantage of this approach is that you never need to read more than one block 
> to read or write a value, so this can be significantly faster than Packed32 
> and Packed64 which always need to read/write two blocks in order to avoid 
> costly branches.
> I ran some tests, and for 1000 21 bits values, this implementation takes 
> less than 2% more space and has 44% faster writes and 30% faster reads. The 
> 12 bits version (5 values per block) has the same performance improvement and 
> a 6% memory overhead compared to the packed implementation.
> In order to select the best implementation for a given integer size, I wrote 
> the {{PackedInts.getMutable(valueCount, bitsPerValue, 
> acceptableOverheadPerValue)}} method. This method select the fastest 
> implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
> per value. For example, if you accept an overhead of 20% 
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
> reasonable, here is what implementations would be selected:
>  * 1: Packed64SingleBlock1
>  * 2: Packed64SingleBlock2
>  * 3: Packed64SingleBlock3
>  * 4: Packed64SingleBlock4
>  * 5: Packed64SingleBlock5
>  * 6: Packed64SingleBlock6
>  * 7: Direct8
>  * 8: Direct8
>  * 9: Packed64SingleBlock9
>  * 10: Packed64SingleBlock10
>  * 11: Packed64SingleBlock12
>  * 12: Packed64SingleBlock12
>  * 13: Packed64
>  * 14: Direct16
>  * 15: Direct16
>  * 16: Direct16
>  * 17: Packed64
>  * 18: Packed64SingleBlock21
>  * 19: Packed64SingleBlock21
>  * 20: Packed64SingleBlock21
>  * 21: Packed64SingleBlock21
>  * 22: Packed64
>  * 23: Packed64
>  * 24: Packed64
>  * 25: Packed64
>  * 26: Packed64
>  * 27: Direct32
>  * 28: Direct32
>  * 29: Direct32
>  * 30: Direct32
>  * 31: Direct32
>  * 32: Direct32
>  * 33: Packed64
>  * 34: Packed64
>  * 35: Packed64
>  * 36: Packed64
>  * 37: Packed64
>  * 38: Packed64
>  * 39: Packed64
>  * 40: Packed64
>  * 41: Packed64
>  * 42: Packed64
>  * 43: Packed64
>  * 44: Packed64
>  * 45: Packed64
>  * 46: Packed64
>  * 47: Packed64
>  * 48: Packed64
>  * 49: Packed64
>  * 50: Packed64
>  * 51: Packed64
>  * 52: Packed64
>  * 53: Packed64
>  * 54: Direct64
>  * 55: Direct64
>  * 56: Direct64
>  * 57: Direct64
>  * 58: Direct64
>  * 59: Direct64
>  * 60: Direct64
>  * 61: Direct64
>  * 62: Direct64
> Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
> choose the slower Packed64 implementation. Allo

[JENKINS] Lucene-Solr-tests-only-4.x - Build # 92 - Failure

2012-06-14 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/92/

1 tests failed.
REGRESSION:  
org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeDelete

Error Message:
searcher529 wasn't soon enough after soft529: 1339694370043 !< 1339694369890 + 
100 (fudge)

Stack Trace:
java.lang.AssertionError: searcher529 wasn't soon enough after soft529: 
1339694370043 !< 1339694369890 + 100 (fudge)
at 
__randomizedtesting.SeedInfo.seed([C734A40A40E36661:781C975B4BABD1]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeDelete(SoftAutoCommitTest.java:254)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 10175 lines...]
   [junit4]   2> 13259 T2223 oasc.SolrDeletionPolicy.onCommit 
SolrDeletionPolicy.onCommit: commits:num=1
   [junit4]   2>
commit{dir=MockDirWrapper(org.apache.lucene.store.RAMDirectory@5f2dcf8d 
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@57895c19),segFN=segments_1,generation=1,filenames=[segments_1]
   [junit4]   2> 13259 T2223 oa

[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-14 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295167#comment-13295167
 ] 

Dawid Weiss commented on LUCENE-4062:
-

What's on the axes in those plots? System.copyarray is an intrinsic -- it'll be 
much faster than any other loop that doesn't eliminate bounds checks (and I 
think with more complex logic this will not be done).

> More fine-grained control over the packed integer implementation that is 
> chosen
> ---
>
> Key: LUCENE-4062
> URL: https://issues.apache.org/jira/browse/LUCENE-4062
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>  Labels: performance
> Fix For: 4.0
>
> Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch
>
>
> In order to save space, Lucene has two main PackedInts.Mutable implentations, 
> one that is very fast and is based on a byte/short/integer/long array 
> (Direct*) and another one which packs bits in a memory-efficient manner 
> (Packed*).
> The packed implementation tends to be much slower than the direct one, which 
> discourages some Lucene components to use it. On the other hand, if you store 
> 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
> If you accept to trade some space for speed, you could store 3 of these 21 
> bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
> advantage of this approach is that you never need to read more than one block 
> to read or write a value, so this can be significantly faster than Packed32 
> and Packed64 which always need to read/write two blocks in order to avoid 
> costly branches.
> I ran some tests, and for 1000 21 bits values, this implementation takes 
> less than 2% more space and has 44% faster writes and 30% faster reads. The 
> 12 bits version (5 values per block) has the same performance improvement and 
> a 6% memory overhead compared to the packed implementation.
> In order to select the best implementation for a given integer size, I wrote 
> the {{PackedInts.getMutable(valueCount, bitsPerValue, 
> acceptableOverheadPerValue)}} method. This method select the fastest 
> implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
> per value. For example, if you accept an overhead of 20% 
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
> reasonable, here is what implementations would be selected:
>  * 1: Packed64SingleBlock1
>  * 2: Packed64SingleBlock2
>  * 3: Packed64SingleBlock3
>  * 4: Packed64SingleBlock4
>  * 5: Packed64SingleBlock5
>  * 6: Packed64SingleBlock6
>  * 7: Direct8
>  * 8: Direct8
>  * 9: Packed64SingleBlock9
>  * 10: Packed64SingleBlock10
>  * 11: Packed64SingleBlock12
>  * 12: Packed64SingleBlock12
>  * 13: Packed64
>  * 14: Direct16
>  * 15: Direct16
>  * 16: Direct16
>  * 17: Packed64
>  * 18: Packed64SingleBlock21
>  * 19: Packed64SingleBlock21
>  * 20: Packed64SingleBlock21
>  * 21: Packed64SingleBlock21
>  * 22: Packed64
>  * 23: Packed64
>  * 24: Packed64
>  * 25: Packed64
>  * 26: Packed64
>  * 27: Direct32
>  * 28: Direct32
>  * 29: Direct32
>  * 30: Direct32
>  * 31: Direct32
>  * 32: Direct32
>  * 33: Packed64
>  * 34: Packed64
>  * 35: Packed64
>  * 36: Packed64
>  * 37: Packed64
>  * 38: Packed64
>  * 39: Packed64
>  * 40: Packed64
>  * 41: Packed64
>  * 42: Packed64
>  * 43: Packed64
>  * 44: Packed64
>  * 45: Packed64
>  * 46: Packed64
>  * 47: Packed64
>  * 48: Packed64
>  * 49: Packed64
>  * 50: Packed64
>  * 51: Packed64
>  * 52: Packed64
>  * 53: Packed64
>  * 54: Direct64
>  * 55: Direct64
>  * 56: Direct64
>  * 57: Direct64
>  * 58: Direct64
>  * 59: Direct64
>  * 60: Direct64
>  * 61: Direct64
>  * 62: Direct64
> Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
> choose the slower Packed64 implementation. Allowing a 50% overhead would 
> prevent the packed implementation to be selected for bits per value under 32. 
> Allowing an overhead of 32 bits per value would make sure that a Direct* 
> implementation is always selected.
> Next steps would be to:
>  * make lucene components use this {{getMutable}} method and let users decide 
> what trade-off better suits them,
>  * write a Packed32SingleBlock implementation if necessary (I didn't do it 
> because I have no 32-bits computer to test the performance improvements).
> I think this would allow more fine-grained control over the speed/space 
> trade-off, what do you think?

--
This message is automatically generated by JIRA.
If you think it was s

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-06-14 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295153#comment-13295153
 ] 

Simon Willnauer commented on LUCENE-2878:
-

hey alan, I won't be able to look at this this week but will do early next 
week! good stuff on a brief look!

> Allow Scorer to expose positions and payloads aka. nuke spans 
> --
>
> Key: LUCENE-2878
> URL: https://issues.apache.org/jira/browse/LUCENE-2878
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Positions Branch
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
> mentor
> Fix For: Positions Branch
>
> Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
> PosHighlighter.patch, PosHighlighter.patch
>
>
> Currently we have two somewhat separate types of queries, the one which can 
> make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
> doesn't really do scoring comparable to what other queries do and at the end 
> of the day they are duplicating lot of code all over lucene. Span*Queries are 
> also limited to other Span*Query instances such that you can not use a 
> TermQuery or a BooleanQuery with SpanNear or anthing like that. 
> Beside of the Span*Query limitation other queries lacking a quiet interesting 
> feature since they can not score based on term proximity since scores doesn't 
> expose any positional information. All those problems bugged me for a while 
> now so I stared working on that using the bulkpostings API. I would have done 
> that first cut on trunk but TermScorer is working on BlockReader that do not 
> expose positions while the one in this branch does. I started adding a new 
> Positions class which users can pull from a scorer, to prevent unnecessary 
> positions enums I added ScorerContext#needsPositions and eventually 
> Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
> currently only TermQuery / TermScorer implements this API and other simply 
> return null instead. 
> To show that the API really works and our BulkPostings work fine too with 
> positions I cut over TermSpanQuery to use a TermScorer under the hood and 
> nuked TermSpans entirely. A nice sideeffect of this was that the Position 
> BulkReading implementation got some exercise which now :) work all with 
> positions while Payloads for bulkreading are kind of experimental in the 
> patch and those only work with Standard codec. 
> So all spans now work on top of TermScorer ( I truly hate spans since today ) 
> including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
> to implement the other codecs yet since I want to get feedback on the API and 
> on this first cut before I go one with it. I will upload the corresponding 
> patch in a minute. 
> I also had to cut over SpanQuery.getSpans(IR) to 
> SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
> first but after that pain today I need a break first :).
> The patch passes all core tests 
> (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
> look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux-Java7-64 - Build # 278 - Failure!

2012-06-14 Thread jenkins

Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java7-64/278/

4 tests failed.
REGRESSION:  
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_replace_nodelete

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([65A41DCBAF63C272:E8DE2C1A940D8298]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:459)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:426)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_replace_nodelete(TestSqlEntityProcessorDelta2.java:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='0']
xml response was: 

010desc:hello OR XtestCompositePk_DeltaImport_replace_nodeletestandard202.2prefix-1hello2012-06-14T16:34:07.474Z


request 
was:start=0&q=desc:hello+OR+XtestCompositePk_DeltaImport_replace_nodelete&qt=standard&rows=20&version=2.2
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4

[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 64 - Failure!

2012-06-14 Thread jenkins

Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/64/

1 tests failed.
REGRESSION:  
org.apache.solr.handler.component.SpellCheckComponentTest.testThresholdTokenFrequency

Error Message:
Path not found: /spellcheck/suggestions/[1]/suggestion

Stack Trace:
java.lang.RuntimeException: Path not found: 
/spellcheck/suggestions/[1]/suggestion
at 
__randomizedtesting.SeedInfo.seed([6587EC4535619BCC:EF2063B4BA8AA2B7]:0)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:545)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:493)
at 
org.apache.solr.handler.component.SpellCheckComponentTest.testThresholdTokenFrequency(SpellCheckComponentTest.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 10434 lines...]
   [junit4]>at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
   [junit4]>at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
   [junit4]>at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(Tes

[jira] [Updated] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-06-14 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-2878:
--

Attachment: LUCENE-2878.patch

Updated patch implementing startOffset and endOffset on 
UnionDocsAndPositionsEnum.  MultiPhraseQuery can now return its positions 
properly.



> Allow Scorer to expose positions and payloads aka. nuke spans 
> --
>
> Key: LUCENE-2878
> URL: https://issues.apache.org/jira/browse/LUCENE-2878
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Positions Branch
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
> mentor
> Fix For: Positions Branch
>
> Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
> PosHighlighter.patch, PosHighlighter.patch
>
>
> Currently we have two somewhat separate types of queries, the one which can 
> make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
> doesn't really do scoring comparable to what other queries do and at the end 
> of the day they are duplicating lot of code all over lucene. Span*Queries are 
> also limited to other Span*Query instances such that you can not use a 
> TermQuery or a BooleanQuery with SpanNear or anthing like that. 
> Beside of the Span*Query limitation other queries lacking a quiet interesting 
> feature since they can not score based on term proximity since scores doesn't 
> expose any positional information. All those problems bugged me for a while 
> now so I stared working on that using the bulkpostings API. I would have done 
> that first cut on trunk but TermScorer is working on BlockReader that do not 
> expose positions while the one in this branch does. I started adding a new 
> Positions class which users can pull from a scorer, to prevent unnecessary 
> positions enums I added ScorerContext#needsPositions and eventually 
> Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
> currently only TermQuery / TermScorer implements this API and other simply 
> return null instead. 
> To show that the API really works and our BulkPostings work fine too with 
> positions I cut over TermSpanQuery to use a TermScorer under the hood and 
> nuked TermSpans entirely. A nice sideeffect of this was that the Position 
> BulkReading implementation got some exercise which now :) work all with 
> positions while Payloads for bulkreading are kind of experimental in the 
> patch and those only work with Standard codec. 
> So all spans now work on top of TermScorer ( I truly hate spans since today ) 
> including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
> to implement the other codecs yet since I want to get feedback on the API and 
> on this first cut before I go one with it. I will upload the corresponding 
> patch in a minute. 
> I also had to cut over SpanQuery.getSpans(IR) to 
> SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
> first but after that pain today I need a break first :).
> The patch passes all core tests 
> (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
> look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-14 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4062:
-

Attachment: LUCENE-4062-2.patch

I have run more tests on {{PackedInts}} impls over the last days to test their 
relative performance.

It appears that the specializations in {{Packed64SingleBlock}} don't help much 
and even hurt performance in some cases. Moreover, replacing the naive bulk 
operations by a {{System.arraycopy}} in {{Direct64}} is a big win. (See 
attached patch.)

You can look at the details of the tests here: 
http://people.apache.org/~jpountz/packed_ints.html (contiguous=Packed64, 
padding=Packed64SingleBlock,3 blocks=Packed*ThreeBlocks,direct=Direct*).

The tests were run on a 64-bit computer (Core 2 Duo E5500) with valueCount=10 
000 000. "Memory overhead" is {unused space in bits}/{bits per value} while the 
other charts measure the number of gets/sets per second.

The random get/set results are very good for the packed versions, probably 
because they manage to fit much more values into the CPU caches than other 
impls. The reason why bulk get/set is faster when bitsPerValue>32 is that 
Direct64 uses System.arraycopy instead of naive copy (in a for loop).

Interestingly, the different impls have very close random get performance.

> More fine-grained control over the packed integer implementation that is 
> chosen
> ---
>
> Key: LUCENE-4062
> URL: https://issues.apache.org/jira/browse/LUCENE-4062
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>  Labels: performance
> Fix For: 4.0
>
> Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch
>
>
> In order to save space, Lucene has two main PackedInts.Mutable implentations, 
> one that is very fast and is based on a byte/short/integer/long array 
> (Direct*) and another one which packs bits in a memory-efficient manner 
> (Packed*).
> The packed implementation tends to be much slower than the direct one, which 
> discourages some Lucene components to use it. On the other hand, if you store 
> 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
> If you accept to trade some space for speed, you could store 3 of these 21 
> bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
> advantage of this approach is that you never need to read more than one block 
> to read or write a value, so this can be significantly faster than Packed32 
> and Packed64 which always need to read/write two blocks in order to avoid 
> costly branches.
> I ran some tests, and for 1000 21 bits values, this implementation takes 
> less than 2% more space and has 44% faster writes and 30% faster reads. The 
> 12 bits version (5 values per block) has the same performance improvement and 
> a 6% memory overhead compared to the packed implementation.
> In order to select the best implementation for a given integer size, I wrote 
> the {{PackedInts.getMutable(valueCount, bitsPerValue, 
> acceptableOverheadPerValue)}} method. This method select the fastest 
> implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
> per value. For example, if you accept an overhead of 20% 
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
> reasonable, here is what implementations would be selected:
>  * 1: Packed64SingleBlock1
>  * 2: Packed64SingleBlock2
>  * 3: Packed64SingleBlock3
>  * 4: Packed64SingleBlock4
>  * 5: Packed64SingleBlock5
>  * 6: Packed64SingleBlock6
>  * 7: Direct8
>  * 8: Direct8
>  * 9: Packed64SingleBlock9
>  * 10: Packed64SingleBlock10
>  * 11: Packed64SingleBlock12
>  * 12: Packed64SingleBlock12
>  * 13: Packed64
>  * 14: Direct16
>  * 15: Direct16
>  * 16: Direct16
>  * 17: Packed64
>  * 18: Packed64SingleBlock21
>  * 19: Packed64SingleBlock21
>  * 20: Packed64SingleBlock21
>  * 21: Packed64SingleBlock21
>  * 22: Packed64
>  * 23: Packed64
>  * 24: Packed64
>  * 25: Packed64
>  * 26: Packed64
>  * 27: Direct32
>  * 28: Direct32
>  * 29: Direct32
>  * 30: Direct32
>  * 31: Direct32
>  * 32: Direct32
>  * 33: Packed64
>  * 34: Packed64
>  * 35: Packed64
>  * 36: Packed64
>  * 37: Packed64
>  * 38: Packed64
>  * 39: Packed64
>  * 40: Packed64
>  * 41: Packed64
>  * 42: Packed64
>  * 43: Packed64
>  * 44: Packed64
>  * 45: Packed64
>  * 46: Packed64
>  * 47: Packed64
>  * 48: Packed64
>  * 49: Packed64
>  * 50: Packed64
>  * 51: Packed64
>  * 52: Packed64
>  * 53: Packed64
>  * 54: Direct64
>  * 55: Direct64
>  * 56: Direct64
>  * 57: Direct64
>  * 58

[jira] [Resolved] (SOLR-1958) Empty fetchMailsSince exception

2012-06-14 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-1958.
--

   Resolution: Fixed
Fix Version/s: 5.0

Committed...Trunk: r1350269, Branch_4x: r1350278

> Empty fetchMailsSince exception
> ---
>
> Key: SOLR-1958
> URL: https://issues.apache.org/jira/browse/SOLR-1958
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment:  Ubuntu 9.10 x86_64 Linux 2.6.31-302-rs
>Reporter: Max Lynch
>Assignee: James Dyer
>  Labels: dih
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-1958.patch, SOLR-1958.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When using the MailEntityProcessor, import would fail if fetchMailsSince was 
> not specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-trunk - Build # 1884 - Failure

2012-06-14 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Solr-trunk/1884/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.OverseerTest.testShardAssignmentBigger

Error Message:
could not find counter for shard:null

Stack Trace:
java.lang.AssertionError: could not find counter for shard:null
at 
__randomizedtesting.SeedInfo.seed([B8806B0A91010277:2B6E95717B68FD14]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at 
org.apache.solr.cloud.OverseerTest.__CLR2_6_3v4oypg1pbz(OverseerTest.java:369)
at 
org.apache.solr.cloud.OverseerTest.testShardAssignmentBigger(OverseerTest.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 41827 lines...]
   [junit4]   2>at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
   [junit4]   2>at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
   [junit4]   2> 
   [junit4]   2> 290958 T2034 oasc.LeaderElector$1.process WARNING  
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Sessio

[jira] [Updated] (SOLR-1958) Empty fetchMailsSince exception

2012-06-14 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-1958:
-

Attachment: SOLR-1958.patch

Here's an even simpler patch to fix this.  I will commit this to trunk & 
back-port to 4x as it is a trivial change.  However, I'm "blind" with 
MailEntityProcessor as I do not have a mailserver to run the unit test against. 
 (See SOLR-2175...I've done a little research so far on this but haven't found 
the right answer yet...)

> Empty fetchMailsSince exception
> ---
>
> Key: SOLR-1958
> URL: https://issues.apache.org/jira/browse/SOLR-1958
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment:  Ubuntu 9.10 x86_64 Linux 2.6.31-302-rs
>Reporter: Max Lynch
>Assignee: James Dyer
>  Labels: dih
> Fix For: 4.0
>
> Attachments: SOLR-1958.patch, SOLR-1958.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When using the MailEntityProcessor, import would fail if fetchMailsSince was 
> not specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-14 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295049#comment-13295049
 ] 

Yonik Seeley commented on SOLR-3535:


bq. 1) add "List getChildDocuments()" to SOlrInputDocument

Or simply allow SolrInputDocument *as* a normal value and existing APIs could 
be used to add them.
This would also be slightly more powerful, allowing more than one child list 
for the same parent.


> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3542) Highlighter: Integration of LUCENE-4133 (Part of LUCENE-3440)

2012-06-14 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-3542:


Assignee: Koji Sekiguchi

> Highlighter: Integration of LUCENE-4133 (Part of LUCENE-3440)
> -
>
> Key: SOLR-3542
> URL: https://issues.apache.org/jira/browse/SOLR-3542
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: 4.0
>Reporter: Sebastian Lutze
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter, highlight, patch
> Fix For: 4.0
>
> Attachments: SOLR-3542.patch
>
>
> This patch integrates a weight-based approach for sorting highlighted 
> fragments. 
> See LUCENE-4133 (Part of LUCENE-3440). 
> This patch contains: 
> - Introduction of class WeightedFragListBuilder, a implementation of 
> SolrFragListBuilder
> - Updated example-configuration 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-14 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand reopened LUCENE-4062:
--

  Assignee: Adrien Grand  (was: Michael McCandless)

> More fine-grained control over the packed integer implementation that is 
> chosen
> ---
>
> Key: LUCENE-4062
> URL: https://issues.apache.org/jira/browse/LUCENE-4062
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>  Labels: performance
> Fix For: 4.0
>
> Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
> LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch
>
>
> In order to save space, Lucene has two main PackedInts.Mutable implentations, 
> one that is very fast and is based on a byte/short/integer/long array 
> (Direct*) and another one which packs bits in a memory-efficient manner 
> (Packed*).
> The packed implementation tends to be much slower than the direct one, which 
> discourages some Lucene components to use it. On the other hand, if you store 
> 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
> If you accept to trade some space for speed, you could store 3 of these 21 
> bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
> advantage of this approach is that you never need to read more than one block 
> to read or write a value, so this can be significantly faster than Packed32 
> and Packed64 which always need to read/write two blocks in order to avoid 
> costly branches.
> I ran some tests, and for 1000 21 bits values, this implementation takes 
> less than 2% more space and has 44% faster writes and 30% faster reads. The 
> 12 bits version (5 values per block) has the same performance improvement and 
> a 6% memory overhead compared to the packed implementation.
> In order to select the best implementation for a given integer size, I wrote 
> the {{PackedInts.getMutable(valueCount, bitsPerValue, 
> acceptableOverheadPerValue)}} method. This method select the fastest 
> implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
> per value. For example, if you accept an overhead of 20% 
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
> reasonable, here is what implementations would be selected:
>  * 1: Packed64SingleBlock1
>  * 2: Packed64SingleBlock2
>  * 3: Packed64SingleBlock3
>  * 4: Packed64SingleBlock4
>  * 5: Packed64SingleBlock5
>  * 6: Packed64SingleBlock6
>  * 7: Direct8
>  * 8: Direct8
>  * 9: Packed64SingleBlock9
>  * 10: Packed64SingleBlock10
>  * 11: Packed64SingleBlock12
>  * 12: Packed64SingleBlock12
>  * 13: Packed64
>  * 14: Direct16
>  * 15: Direct16
>  * 16: Direct16
>  * 17: Packed64
>  * 18: Packed64SingleBlock21
>  * 19: Packed64SingleBlock21
>  * 20: Packed64SingleBlock21
>  * 21: Packed64SingleBlock21
>  * 22: Packed64
>  * 23: Packed64
>  * 24: Packed64
>  * 25: Packed64
>  * 26: Packed64
>  * 27: Direct32
>  * 28: Direct32
>  * 29: Direct32
>  * 30: Direct32
>  * 31: Direct32
>  * 32: Direct32
>  * 33: Packed64
>  * 34: Packed64
>  * 35: Packed64
>  * 36: Packed64
>  * 37: Packed64
>  * 38: Packed64
>  * 39: Packed64
>  * 40: Packed64
>  * 41: Packed64
>  * 42: Packed64
>  * 43: Packed64
>  * 44: Packed64
>  * 45: Packed64
>  * 46: Packed64
>  * 47: Packed64
>  * 48: Packed64
>  * 49: Packed64
>  * 50: Packed64
>  * 51: Packed64
>  * 52: Packed64
>  * 53: Packed64
>  * 54: Direct64
>  * 55: Direct64
>  * 56: Direct64
>  * 57: Direct64
>  * 58: Direct64
>  * 59: Direct64
>  * 60: Direct64
>  * 61: Direct64
>  * 62: Direct64
> Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
> choose the slower Packed64 implementation. Allowing a 50% overhead would 
> prevent the packed implementation to be selected for bits per value under 32. 
> Allowing an overhead of 32 bits per value would make sure that a Direct* 
> implementation is always selected.
> Next steps would be to:
>  * make lucene components use this {{getMutable}} method and let users decide 
> what trade-off better suits them,
>  * write a Packed32SingleBlock implementation if necessary (I didn't do it 
> because I have no 32-bits computer to test the performance improvements).
> I think this would allow more fine-grained control over the speed/space 
> trade-off, what do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-06-14 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295026#comment-13295026
 ] 

Erik Hatcher commented on SOLR-2894:


Trey - thanks for the positive feedback.  I'll apply the patch, run the tests, 
review the code, and so on.   Might be a couple of weeks, unless I can get to 
this today.

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 4.0
>
> Attachments: SOLR-2894.patch, distributed_pivot.patch, 
> distributed_pivot.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295025#comment-13295025
 ] 

David Smiley commented on SOLR-3534:


Just to keep these concerns separated, this issue, SOLR-3534 is about two 
things:
* dismax&edismax should look at 'df' before falling back to defaultSearchField
* dismax&edismax should throw an exception if neither 'qf', 'df', nor 
defaultSearchField are specified, because these two query parsers are fairly 
useless without them.

SOLR-2724 is about the deprecation of defaultSearchField

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-14 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295019#comment-13295019
 ] 

David Smiley commented on SOLR-3534:


defaultSearchField may be referenced in a bunch of places but it is always a 
default for something else that you should be specifying (typically 'df').  
I've commented out my defaultSearchField long before it was deprecated.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4132:
---

Attachment: LUCENE-4132.patch

bq. Can we override all methods so the javadocs aren't confusing.

Good idea! Done

bq. Also can we rename it to LiveIndexWriterConfig?

Done

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4144) OOM when call optimize

2012-06-14 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4144.


Resolution: Not A Problem

Please raise this on the java-u...@lucene.apache.org list instead.

Also, we've made good reductions in RAM usage since 2.1, so it could be simply 
upgrading to the latest (3.6) resolves this.

> OOM when call optimize
> --
>
> Key: LUCENE-4144
> URL: https://issues.apache.org/jira/browse/LUCENE-4144
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 2.1
>Reporter: Zhenglin Sun
> Fix For: 2.1
>
>
> The index file is about 6G, when i update the index, it can work good, but i 
> hit a OOM when call the method optimize 
> Caused by: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 
> 969048, Num elements: 242258
> at 
> org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:90)
> at 
> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:133)
> at 
> org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
> at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:482)
> at 
> org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:573)
> at 
> org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:1776)
> at 
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:1670)
> at 
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1521)
> at 
> org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1351)
> at 
> org.apache.lucene.index.IndexWriter.maybeFlushRamSegments(IndexWriter.java:1344)
> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:763)
> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:743)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux-Java7-64 - Build # 276 - Failure!

2012-06-14 Thread jenkins

Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java7-64/276/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.RecoveryZkTest.testDistribSearch

Error Message:
Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread #1,6,]

Stack Trace:
java.lang.RuntimeException: Thread threw an uncaught exception, thread: 
Thread[Lucene Merge Thread #1,6,]
at 
com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: org.apache.lucene.index.MergePolicy$MergeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
at __randomizedtesting.SeedInfo.seed([5210B0FC43222B81]:0)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480)
Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is 
closed
at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241)
at 
org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:321)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3127)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)




Build Log:
[...truncated 33938 lines...]
   [junit4]   2> 27912 T1782 C77 P40466 oasu.DirectUpdateHandler2.commit 
end_commit_flush
   [junit4]   2> 27912 T1789 oasc.SolrCore.registerSearcher [collection1] 
Registered new searcher Searcher@ec28733 
main{StandardDirectoryReader(segments_4:1142 _e2(5.0):C2237/109 _ey(5.0):C247 
_ex(5.0):C3)}
   [junit4]   2> 27915 T1914 C78 P60019 oasu.DirectUpdateHandler2.commit start 
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
   [junit4]   2> 27947 T1914 C78 P60019 oasc.SolrDeletionPolicy.onCommit 
SolrDeletionPolicy.onCommit: commits:num=2
   [junit4]   2>
commit{dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux-Java7-64/checkout/solr/build/solr-core/test/J0/org.apache.solr.cloud.RecoveryZkTest-1339677498320/jetty2/index.20120614233844499,segFN=segments_4,generation=4,filenames=[_ed_Lucene40_0.prx,
 _ee_Lucene40_0.tim, _ee_Lucene40_0.tip, _ec_Lucene40_0.tim, 
_ee_Lucene40_0.frq, _ef_Lucene40_0.frq, _ec_Lucene40_0.tip, _e3_1.del, _e3.si, 
_ed_nrm.cfe, _ec.fnm, _ef.si, _ec_nrm.cfs, _ef_Lucene40_0.tip, _eb.fnm, 
_ef_Lucene40_0.tim, _ec_Lucene40_0.frq, _ed_nrm.cfs, _ef_Lucene40_0.prx, 
_ec_nrm.cfe, _e3_Lucene40_0.prx, _ef.fdx, _ec.fdt, _ef.fdt, _e3_Lucene40_0.frq, 
_ec.fdx, _

Re: Corrupt index

2012-06-14 Thread Michael McCandless

On Wed, Jun 13, 2012 at 8:45 PM, Itamar Syn-Hershko  wrote:
> Mike,
>
> On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless
>  wrote:
>>
>> Hi Itamar,
>>
>> One quick question: does Lucene.Net include the fixes done for
>> LUCENE-1044 (to fsync files on commit)?  Those are very important for
>> an index to be intact after OS/JVM crash or power loss.
>
>
> Definitely, as Christopher noted we are about to release a 3.0.3 compatible
> version, which is line-by-line port of the Java version.

Hmm OK.  Then we still need to explain the corruption...

>> You shouldn't even have to run CheckIndex ... because (as of
>> LUCENE-1044) we now fsync all segment files before writing the new
>> segments_N file, and then removing old segments_N files (and any
>> segments that are no longer referenced).
>>
>> You do have to remove the write.lock if you aren't using
>> NativeFSLockFactory (but this has been the default lock impl for a
>> while now).
>
> Somewhat unrelated to this thread, but what should I expect to see? from
> time to time we do see write.lock present after an app-crash or power
> failure. Also, what are the steps that are expected to be performed in such
> cases?

If you are using NativeFSLockFactory, you will see a write.lock but it
will not actually be locked (according to the OS); so, it's fine.

If you are using SimpleFSLockFactory then the presence of write.lock
means the index is still locked and you'll have to remove it.

>> > Last week I have been playing with rather large indexes and crashed my
>> > app
>> > while it was indexing. I wasn't able to open the index, and Luke was
>> > even
>> > kind enough to wipe the index folder clean even though I opened it in
>> > read-only mode. I re-ran this, and after another crash running
>> > CheckIndex
>> > revealed nothing - the index was detected to be an empty one. I am not
>> > entirely sure what could be the cause for this, but I suspect it has
>> > been corrupted by the crash.
>>
>> Had no commit completed (no segments file written)?
>>
>> If you don't fsync then all sorts of crazy things are possible...
>
> Ok, so we do have fsync since LUCENE-1044 is present, and there were
> segments present from previous commits. Any idea what went wrong?

I don't know!

>> > I've been looking at these:
>> >
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>
>> (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328
>> broke...).
>
> So 2328 broke 1044, and this was fixed only in 3.4, right? so 2328 made it
> to a 3.0.x release while the fix for it (3418) was only released in 3.4. Am
> I right?
>
> If this is the case, 2328 probably made it's way to Lucene.Net since we are
> using the released sources for porting, and we now need to apply 3418 in the
> current version.

OK that makes sense: 2328 broke things as of 3.0.3, and 3418 fixed
things in 3.4.

> Does it make sense to just port FSDirectory from 3.4 to 3.0.3? or were there
> API or other changes that will make our life miserable if we do that?

Hmmm I'm not certain offhand: maybe diff the two sources?  The fix in
3418 was trivial in the end, so maybe just backport that.

>> > And it seems like this is what I was experiencing. Mike and Mark will
>> > probably be able to tell if this is what they saw or not, but as far as
>> > I
>> > can tell this is not an expected behavior of a Lucene index.
>>
>> Definitely not expected behavior: assuming nothing is flipping bits,
>> then on OS/JVM crash or power loss your index should be fine, just
>> reverted to the last successful commit.
>
> What I suspected. Will try to reproduce reliably - any recommendations? not
> really feeling like reinventing the wheel here...
>
> MockDirectoryWrapper wasn't ported yet as it appears to only appear in 3.4,
> and as you said it won't really help here anyway

Use a spare computer and try pulling the plug on it ... or pull a (hot
swappable/pluggable) hard drive while indexing onto it ...

You can also use a virtual machine and power it off ungracefully /
kill the process.

If any of these events can corrupt the index then there's a bug
somewhere (or: the IO system ignores fsync).

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3511) Refactor overseer to use a distributed "work"queue

2012-06-14 Thread Sami Siren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren resolved SOLR-3511.
--

Resolution: Fixed

Committed to 4.x too

> Refactor overseer to use a distributed "work"queue
> --
>
> Key: SOLR-3511
> URL: https://issues.apache.org/jira/browse/SOLR-3511
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Sami Siren
>Assignee: Sami Siren
> Fix For: 4.0
>
> Attachments: SOLR-3511.patch, SOLR-3511.patch
>
>
> By using a queue overseer becomes a watch free, a lot simpler and probably  
> less buggy too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294989#comment-13294989
 ] 

Michael McCandless commented on LUCENE-4132:


Also can we rename it to LiveIndexWriterConfig?  LiveConfig is too generic I 
think...

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-4.x - Build # 9 - Still Failing

2012-06-14 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Solr-4.x/9/

1 tests failed.
FAILED:  org.apache.solr.cloud.RecoveryZkTest.testDistribSearch

Error Message:
Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread #2,6,]

Stack Trace:
java.lang.RuntimeException: Thread threw an uncaught exception, thread: 
Thread[Lucene Merge Thread #2,6,]
at 
com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: org.apache.lucene.index.MergePolicy$MergeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
at __randomizedtesting.SeedInfo.seed([18DE9A9DE2F3DF31]:0)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480)
Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is 
closed
at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241)
at 
org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:321)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3149)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)




Build Log:
[...truncated 46889 lines...]
   [junit4]   2> 26469 T1539 oasc.Overseer.coreChanged Core change pooled: 
127.0.0.1:56723_solr states:[coll:collection1 core:collection1 
props:{num_shards=1, shard=shard1, state=active, core=collection1, 
collection=collection1, node_name=127.0.0.1:56723_solr, 
base_url=http://127.0.0.1:56723/solr}]
   [junit4]   2> 26469 T1539 oascc.ZkStateReader$3.process Updating live nodes
   [junit4]   2> 26470 T1619 oascc.ZkStateReader.updateCloudState Manual update 
of cluster state initiated
   [junit4]   2> 26470 T1619 oascc.ZkStateReader.updateCloudState Updating 
cloud state from ZooKeeper... 
   [junit4]   2> 26470 T1539 oasc.RecoveryStrategy.close WARNING Stopping 
recovery for core collection1 zkNodeName=127.0.0.1:56723_solr_collection1
   [junit4]   2> 26471 T1619 oasc.Overseer$CloudStateUpdater.run Announcing new 
cluster state
   [junit4]   2> 26471 T1539 oascc.SolrZkClient.makePath makePath: 
/collections/collection1/leaders/shard1
   [junit4]   2> 26478 T1483 oascc.ZkStateReader$2.process A cluster state 
change has occurred
   [junit4]   2> 26478 T1479 oascc.ZkStateReader$2.process A cluster state 
change has occurred
   [junit4]   2> 26480 T1539 oascc.ZkStateReader$2.process A cluster state 
change has occurred
   [junit4]   2> 26481 T1539 oasc.Overseer.processLeaderNodesChanged Leader 
nod

[jira] [Resolved] (SOLR-3543) JavaBinLoader catches (and logs) exceptions and the (solrj)client has no idea that an update failed

2012-06-14 Thread Sami Siren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren resolved SOLR-3543.
--

Resolution: Fixed

> JavaBinLoader catches (and logs) exceptions and the (solrj)client has no idea 
> that an update failed
> ---
>
> Key: SOLR-3543
> URL: https://issues.apache.org/jira/browse/SOLR-3543
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Reporter: Sami Siren
> Fix For: 4.0
>
> Attachments: SOLR-3543.patch
>
>
> When submitting docs to solr with the javabin wire format server responses 
> with 200 ok even when there was an error. The exception is only logged at the 
> server.
> When using the xml format error is correctly reported back

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 114 - Failure!

2012-06-14 Thread jenkins

Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java6-64/114/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=74 closes=73

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=73
at __randomizedtesting.SeedInfo.seed([E845A956CCCB28BB]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 9942 lines...]
   [junit4]   2> 108 T775 oejs.AbstractConnector.doStart Started 
SocketConnector@0.0.0.0:35739
   [junit4]   2> 108 T775 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
   [junit4]   2> 108 T775 oasc.SolrResourceLoader.locateSolrHome using system 
property solr.solr.home: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1339672437471/slave
   [junit4]   2> 109 T775 oasc.SolrResourceLoader. new SolrResourceLoader 
for deduced Solr Home: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1339672437471/slave/'
   [junit4]   2> 111 T775 oass.SolrDispatchFilter.init SolrDispatchFilter.init()
   [junit4]   2> 112 T775 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
   [junit4]   2> 112 T775 oasc.SolrResourceLoader.locateSolrHome using system 
property solr.solr.home: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1339672437471/slave
   [junit4]   2> 112 T775 oasc.CoreContainer$Initializer.initialize looking for 
solr.xml: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux-Java6-64/checkout/solr/build/solr-core/test/J1/./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1339672437471/slave/solr.xml
   [junit4]   2> 112 T775 oasc.CoreContainer. New CoreContainer 1100893972
   [junit4]   2> 112 T775 oasc.CoreContainer$Initializer.initialize no solr.xml 
file found - using default
   [junit4]   2> 112 T775 oasc.CoreContainer.load Loading CoreContainer using 
Solr Home: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1339672437471/slave/'
   [junit4]   2> 113 T775 oasc.SolrResourceLoader. new SolrResourceLoader 
for directory: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1339672437471/slave/'
   [junit4]   2> 117 T775 oasc.CoreContainer.load Registering Log Listener
   [junit4]   2> 125 T775 oashc.HttpShardHandlerFactory.getParameter Setting 
socketTimeout to: 0
   [junit4]   2> 125 T775 oashc.HttpShardHandlerFactory.getParameter Setting 
urlScheme to: http://
   [junit4]   2> 125 T775 oashc.HttpShardHandlerFactory.getParameter Setting 
connTimeout to: 0
   [junit4]   2> 125 T775 oashc.HttpShardHandlerFactory.getParameter Setting 
maxConnectionsPerHost to: 20
   [junit4]   2> 126 T775 oashc.Htt

[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294939#comment-13294939
 ] 

Robert Muir commented on LUCENE-4132:
-

Can we override *all* methods so the javadocs aren't confusing.

I don't want the methods split in the javadocs between IWC and LiveConfig: 
LiveConfig is expert and should be a subset, not a portion.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 101 - Failure!

2012-06-14 Thread Dawid Weiss

Wrt thread scheduling -- has anybody ever tried dtrace with hotspot on
a linux system? Does it work?

http://docs.oracle.com/javase/6/docs/technotes/guides/vm/dtrace.html

I see there are probes to inspect thread lifecycle but I never played
with dtrace so I've no idea how it works/ if it does work on linux.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 101 - Failure!

2012-06-14 Thread Dawid Weiss

> It certainly wouldn't be easy to do ... but it sure would it be nice :)

I meant "difficult" as in "practically impossible" :) But then so are
these -- http://js1k.com/

>> There's been some talk about tools to detect data races at the hotspot

Found it -- see this thread:
http://cs.oswego.edu/pipermail/concurrency-interest/2011-September/008205.html

The tool I briefly looked at was this one:
http://babelfish.arc.nasa.gov/trac/jpf

> Or, even, just a way to record and then visualize what the thread
> scheduling had been for a given test failure.  In this case I could
> have easily seen that a merge had completed before the NRT reader was
> pulled (which is... unusual).

This is in fact relatively easy if we allowed a jenkins run with some
minor boot classpath adjustments overriding Thread's init/exit methods
and logging timings from there. Obviously it'd have to be bound to a
particular jvm version/ distribution but it can be done. Bytecode
instrumentation would be a nicer alternative here but I'm not sure how
deep it can go in terms of precedence (it'd probably need to be a
native agent and this seems like an overkill).

I also think (didn't check) YourKit's profiler has a thread schedule
visualizer but this adds additional overhead and requires a gui (or
remoting).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4132:
---

Attachment: LUCENE-4132.patch

Thanks Uwe. The test is now fixed by saving all 'synthetic' methods and all 
'setter' methods and verifying in the end that all of them were received from 
IWC too.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch, 
> LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294867#comment-13294867
 ] 

Uwe Schindler commented on LUCENE-4132:
---

Hi Shai,

ignore all methods with isSynthetic() set (that are covariant overrides 
compatibility methods, access$xx() methods for access to private 
fields/ctors/...).

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4132) IndexWriterConfig live settings

2012-06-14 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4132:
---

Attachment: LUCENE-4132.patch

Sorry if it came across like that, but I don't mean to rush or shove this issue 
in. I'm usually after consensus and I appreciate your feedback.

I took another look at this, and found a solution without generics. Funny thing 
is, that's the first solution that came to my mind, but I guess at the time it 
didn't picture well enough, so I discarded it :).

Now we have only LiveConfig and IndexWriterConfig, where IWC extends LC and 
overrides all setter methods. The "live" setters are overridden just to return 
IWC type, and call super.setXYZ(). So we don't have code dup, and whoever has 
IWC type at hand, will receive IWC back from all set() methods.

LC is public class but with package-private ctors, one that takes IWC (used by 
IndexWriter) and one that takes Analyzer+Version, to match IWC's. It contains 
all "live" members as private, and the others as protected, so that IWC can set 
them. Since it cannot be sub-classed outside the package, this is 'safe'.

The only thing that bothers me, and I'm not sure if it can be fixed, but this 
is not critical either, is TestIWC.testSettersChaining(). For some reason, even 
though I override the setters from LC in IWC, and set their return type to IWC, 
reflection still returns their return type as LiveConfig. This only affects the 
test, since if I do:
{code}
IndexWriterConfig conf;
conf.setMaxBufferedDocs(); // or any other set from LC
{code}
the return type is IWC.

If anyone knows how to solve it, please let me know, otherwise we'll just have 
to live with the modification to the test, and the chance that future "live" 
setters may be incorrectly overridden by IWC to not return IWC type That is not 
an error, just a convenience.

Besides that, and if I follow your comments and concerns properly, I think this 
is now ready to commit -- there's no extra complexity (generics, 3 classes 
etc.), and with better compile time protection against misuse.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch, LUCENE-4132.patch, LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-

[jira] [Commented] (SOLR-3406) Support grouped range and query facets.

2012-06-14 Thread Martijn van Groningen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294847#comment-13294847
 ] 

Martijn van Groningen commented on SOLR-3406:
-

Sure. I think what is in here can be committed. The only thing that needs work 
is caching. Right now no when facet.query in combination with group.facet=true 
is used, caching doesn't take place. I think this can be fixed in a new issue 
that refers to this issue. In the meantime the patch in this issue can get 
committed.

> Support grouped range and query facets.
> ---
>
> Key: SOLR-3406
> URL: https://issues.apache.org/jira/browse/SOLR-3406
> Project: Solr
>  Issue Type: New Feature
>Reporter: David
>Assignee: Martijn van Groningen
>Priority: Critical
> Fix For: 4.0
>
> Attachments: SOLR-2898-backport.patch, SOLR-3406.patch, 
> SOLR-3406.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Need the ability to support grouped range and query facets. Grouped facet 
> fields have already been implemented in SOLR-2898 but we still need the 
> ability to compute grouped range and query facets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

90 matches

Mail list logo