[jira] [Commented] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942821#comment-13942821 ] Vishmi Money commented on LUCENE-5422: -- Hi [~dmitry_key], [~otis], [~mikemccand], I was looking forward for a reply for the comment by [~mikemccand] as it will be a great support for me if there will be a mentor for this project. So then I can directly get support from him/her to resolve the questions I get when I am proceeding with this. I am kindly hoping for a positive answer. Thank you. > Postings lists deduplication > > > Key: LUCENE-5422 > URL: https://issues.apache.org/jira/browse/LUCENE-5422 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Dmitry Kan > Labels: gsoc2014 > > The context: > http://markmail.org/thread/tywtrjjcfdbzww6f > Robert Muir and I have discussed what Robert eventually named "postings > lists deduplication" at Berlin Buzzwords 2013 conference. > The idea is to allow multiple terms to point to the same postings list to > save space. This can be achieved by new index codec implementation, but this > jira is open to other ideas as well. > The application / impact of this is positive for synonyms, exact / inexact > terms, leading wildcard support via storing reversed term etc. > For example, at the moment, when supporting exact (unstemmed) and inexact > (stemmed) > searches, we store both unstemmed and stemmed variant of a word form and > that leads to index bloating. That is why we had to remove the leading > wildcard support via reversing a token on index and query time because of > the same index size considerations. > Comment from Mike McCandless: > Neat idea! > Would this idea allow a single term to point to (the union of) N other > posting lists? It seems like that's necessary e.g. to handle the > exact/inexact case. > And then, to produce the Docs/AndPositionsEnum you'd need to do the > merge sort across those N posting lists? > Such a thing might also be do-able as runtime only wrapper around the > postings API (FieldsProducer), if you could at runtime do the reverse > expansion (e.g. stem -> all of its surface forms). > Comment from Robert Muir: > I think the exact/inexact is trickier (detecting it would be the hard > part), and you are right, another solution might work better. > but for the reverse wildcard and synonyms situation, it seems we could even > detect it on write if we created some hash of the previous terms postings. > if the hash matches for the current term, we know it might be a "duplicate" > and would have to actually do the costly check they are the same. > maybe there are better ways to do it, but it might be a fun postingformat > experiment to try. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5544: --- Comment: was deleted (was: Apologies for the multiple comments -- what did you mean by {{// don't call ensureOpen here: this acts like "close()" in closeable.}}? That the app can call rollback() multiple times? Because currently it can't, since writeLock is set to null by the first call and the second call w/ try to sync on a null instance and hit NPE?) > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942805#comment-13942805 ] Robert Muir commented on LUCENE-5544: - You can definitely call it multiple times, and some tests in fact do just that. Thats why IOUtils.close() is used, which does nothing on a null parameter. > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942804#comment-13942804 ] Shai Erera commented on LUCENE-5544: Apologies for the multiple comments -- what did you mean by {{// don't call ensureOpen here: this acts like "close()" in closeable.}}? That the app can call rollback() multiple times? Because currently it can't, since writeLock is set to null by the first call and the second call w/ try to sync on a null instance and hit NPE? > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942802#comment-13942802 ] Shai Erera commented on LUCENE-5544: bq. Just thinking about making the test more evil. Though if the exception happens from Lock.close(), the lock will still exist and the test will fail asserting that the writer isn't locked. It's a valid exception but nothing we can do about it while calling rollback(). So maybe exclude it from the list of allowed places to fail. Do you think it's better to not swallow the exceptions in the finally part, but add them as suppressed to any original exception? Because if e.g. lock.close() fails, app won't be able to open a new writer, yet all it has as info is the original exception that happened during rollback(), and no info that the lock couldn't be released either. > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942795#comment-13942795 ] Robert Muir commented on LUCENE-5544: - {quote} About the test, maybe instead of asserting that IW.isLocked == false, try to open a new IW? I guess it will fail if you remove the stuff that you added to the finally clause? That will guarantee that we test what the app is likely to do after calling rollback(). {quote} Well the current test doesnt even need that assert: its just for clarity. we dont need an assert for this stuff at all: the last line of directory.close() (MDW) will fail if there are open locks or files! {quote} And also, do you think it's better to use MDW.failOn to randomly fail if we're somewhere in rollback() stack? Cause currently the test fails only in one of two places. Just thinking about making the test more evil. {quote} This is a good idea. > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of
[ https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942792#comment-13942792 ] Shawn Heisey commented on SOLR-5228: The schema version changes how Solr interprets default settings. I'm fairly sure that it has nothing to do with the XML structure. I don't think we need a new schema version for this. +1 to Robert's idea in the first comment. I will restate it below to make sure I understand it properly: * Allow and at the top level under . * Deprecate and in 4x. Remove them in trunk. The unknown tags will fail parsing. * Don't worry about supporting all options in the deprecated sections. > Don't require or be inside of -- or that > be inside of > - > > Key: SOLR-5228 > URL: https://issues.apache.org/jira/browse/SOLR-5228 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Hoss Man >Assignee: Hoss Man > > On the solr-user mailing list, Nutan recently mentioned spending days trying > to track down a problem that turned out to be because he had attempted to add > a {{}} that was outside of the {{}} block in his > schema.xml -- Solr was just silently ignoring it. > We have made improvements in other areas of config validation by generating > statup errors when tags/attributes are found that are not expected -- but in > this case i think we should just stop expecting/requiring that the > {{}} and {{}} tags will be used to group these sorts of > things. I think schema.xml parsing should just start ignoring them and only > care about finding the {{}}, {{}}, and {{}} > tags wherever they may be. > If people want to keep using them, fine. If people want to mix fieldTypes > and fields side by side (perhaps specify a fieldType, then list all the > fields using it) fine. I don't see any value in forcing people to use them, > but we definitely shouldn't leave things the way they are with otherwise > perfectly valid field/type declarations being silently ignored. > --- > I'll take this on unless i see any objections. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942789#comment-13942789 ] Shai Erera commented on LUCENE-5544: Patch looks good. So basically with this patch, the state of IW after rollback() is it's always closed() and doesn't leak any important resources like write.lock and pooled readers. And there's no way to continue using this instance - app must create a new IW instance. We can still end up w/ a segments_N file in the directory though (if its delete failed), but I guess IW will detect it's corrupt and use the one from the previous commit. About the test, maybe instead of asserting that IW.isLocked == false, try to open a new IW? I guess it will fail if you remove the stuff that you added to the finally clause? That will guarantee that we test what the app is likely to do after calling rollback(). And also, do you think it's better to use MDW.failOn to randomly fail if we're somewhere in rollback() stack? Cause currently the test fails only in one of two places. Just thinking about making the test more evil. > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_51) - Build # 9755 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9755/ Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:-UseSuperWord 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.rest.schema.TestDynamicFieldCollectionResource Error Message: ERROR: SolrIndexSearcher opens=1 closes=3 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=1 closes=3 at __randomizedtesting.SeedInfo.seed([AF66F1D89DBCEAEB]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:405) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:176) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:789) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at java.lang.Thread.run(Thread.java:744) Build Log: [...truncated 10443 lines...] [junit4] Suite: org.apache.solr.rest.schema.TestDynamicFieldCollectionResource [junit4] 2> 287789 T1125 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (true) and clientAuth (true) [junit4] 2> 287790 T1125 oas.SolrTestCaseJ4.initCore initCore [junit4] 2> Creating dataDir: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build/solr-core/test/J1/./solrtest-TestDynamicFieldCollectionResource-1395375921711 [junit4] 2> 287790 T1125 oas.SolrTestCaseJ4.initCore initCore end [junit4] 2> 287791 T1125 oejs.Server.doStart jetty-8.1.10.v20130312 [junit4] 2> 287794 T1125 oejus.SslContextFactory.doStart Enabled Protocols [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2] [junit4] 2> 287797 T1125 oejs.AbstractConnector.doStart Started SslSelectChannelConnector@127.0.0.1:47178 [junit4] 2> 287798 T1125 oass.SolrDispatchFilter.init SolrDispatchFilter.init() [junit4] 2> 287799 T1125 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) [junit4] 2> 287799 T1125 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr [junit4] 2> 287799 T1125 oasc.SolrResourceLoader. new SolrResourceLoader for directory: '/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr/' [junit4] 2> 287813 T1125 oasc.ConfigSolr.fromFile Loading container configuration from /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr/solr.xml [junit4] 2> 287857 T1125 oasc.CoreContainer. New CoreContainer 811381485 [junit4] 2> 287857 T1125 oasc.CoreContainer.load Loading cores into CoreContainer [instanceDir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr/] [junit4] 2> 287858 T1125 oashc.HttpShardHandlerFactory.getParameter Setting socketTimeout to: 9 [junit4] 2> 287859 T1125 oashc.HttpShardHandlerFactory.getParameter Setting urlScheme to: https [junit4] 2> 287859 T1125 oashc.HttpShardHandlerFactory.getParameter
[jira] [Updated] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
[ https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5544: Attachment: LUCENE-5544.patch Here's the start to a patch. Really the current rollback code is too crazy, there is no need for it to call the super-scary closeInternal(false, false) at the end, when in this case all that huge complicated piece of code is doing, is just calling close() on IndexFileDeleter and releasing write.lock. > exceptions during IW.rollback can leak files and locks > -- > > Key: LUCENE-5544 > URL: https://issues.apache.org/jira/browse/LUCENE-5544 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: LUCENE-5544.patch > > > Today, rollback() doesn't always succeed: if it does, it closes the writer > nicely. otherwise, if it hits exception, it leaves you with a half-broken > writer, still potentially holding file handles and write lock. > This is especially bad if you use Native locks, because you are kind of > hosed, the static map prevents you from forcefully unlocking (e.g. > IndexWriter.unlock) so you have no real course of action to try to recover. > If rollback() hits exception, it should still deliver the exception, but > release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5544) exceptions during IW.rollback can leak files and locks
Robert Muir created LUCENE-5544: --- Summary: exceptions during IW.rollback can leak files and locks Key: LUCENE-5544 URL: https://issues.apache.org/jira/browse/LUCENE-5544 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.8, 5.0, 4.7.1 Today, rollback() doesn't always succeed: if it does, it closes the writer nicely. otherwise, if it hits exception, it leaves you with a half-broken writer, still potentially holding file handles and write lock. This is especially bad if you use Native locks, because you are kind of hosed, the static map prevents you from forcefully unlocking (e.g. IndexWriter.unlock) so you have no real course of action to try to recover. If rollback() hits exception, it should still deliver the exception, but release things (e.g. like IOUtils.close). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5542) Explore making DVConsumer sparse-aware
[ https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942754#comment-13942754 ] Shai Erera commented on LUCENE-5542: I don't think it makes the API more complicated. To the users of the API we say "pass only docs with values". To the Codec developers we say "you are going to get only docs with values, so encode however you see fit such that you can later provide docsWithFields efficiently". It's not about performance yet, but about making the API clear (in my opinion) - stating that {{null}} denotes a missing value for a document is not better than just not passing the document in the first place. > Explore making DVConsumer sparse-aware > -- > > Key: LUCENE-5542 > URL: https://issues.apache.org/jira/browse/LUCENE-5542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Shai Erera > > Today DVConsumer API requires the caller to pass a value for every document, > where {{null}} means "this doc has no value". The Codec can then choose how > to encode the values, i.e. whether it encodes a 0 for a numeric field, or > encodes the sparse docs. In practice, from what I see, we choose to encode > the 0s. > I wonder if we e.g. added an {{Iterable}} to > DVConsumer.addXYZField(), if that would make a better API. The caller only > passes pairs and it's up to the Codec to decide how it wants to > encode the missing values. Like, if a user's app truly has a sparse NDV, > IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the > Codec. > To be clear, I don't propose to change any Codec implementation in this issue > (w.r.t. sparse encoding - yes/no), only change the API to reflect that > sparseness. I think that if we'll ever want to encode sparse values, it will > be a more convenient API. > Thoughts? I volunteer to do this work, but want to get others' opinion before > I start. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0) - Build # 9861 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9861/ Java: 32bit/jdk1.8.0 -server -XX:+UseSerialGC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin {#2 seed=[7FF2374655968ABC:36A2C61D9F0BFDF1]} Error Message: Shouldn't match I#1:Pt(x=-48.0,y=-66.0) Q:Pt(x=-90.0,y=-76.0) Stack Trace: java.lang.AssertionError: Shouldn't match I#1:Pt(x=-48.0,y=-66.0) Q:Pt(x=-90.0,y=-76.0) at __randomizedtesting.SeedInfo.seed([7FF2374655968ABC:36A2C61D9F0BFDF1]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:355) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:335) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at java.lang.Thread.run(Thread.java:744) Build Log: [...truncated 9167 lines...] [junit4] Suite: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest [junit4] 1> Strategy: RecursivePrefixTreeStrategy(prefixGridScanLevel:-2,SPG:(QuadPrefixTree(maxLevels:2,ctx:SpatialContext{geo=fa
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942720#comment-13942720 ] rulinma commented on SOLR-1301: --- mark. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 4.7, 5.0 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942684#comment-13942684 ] Da Huang commented on LUCENE-4396: -- A new iteration on the proposal has just been submitted. The new iteration has added a part "Supplementary Notes" to describe how to fit my design to the new design on the current lucene trunk, such as renaming BooleanScorer to BooleanBulkScorer, creating a new BooleanScorer extended from Scorer. > BooleanScorer should sometimes be used for MUST clauses > --- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 100 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942635#comment-13942635 ] Kranti Parisa commented on SOLR-4787: - so for any query you might return one or more EVENTS matching the title search terms + filters. say you have 30 events matching the given criteria but your pagination is 1-10, so you would be displaying the top 10 most relevant EVENTS.. this would be the docList of your first query.. and from the ResponseWriter you would need to make a call to TICKETS core, by using the original filters + the 10 event ids and execute that request (you might need to use LocalSolrQueryRequest and pre-processed filters etc to hit the caches of the first query). and collect the field info you need for each EVENT.. >From the joins implementation point of view, there is no such thing to fetch >the values or scores from the secondCore.. it would be very costly to do >that.. you would need to do write some custom ResponseWriters etc which does >this stuff.. especially considering your requirement of maintaing EVENTS and >TICKETS separately. There is also a new feature Collapse, Expand results.. but >then I am not sure about using them for your use case.. > Join Contrib > > > Key: SOLR-4787 > URL: https://issues.apache.org/jira/browse/SOLR-4787 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.2.1 >Reporter: Joel Bernstein >Priority: Minor > Fix For: 4.8 > > Attachments: SOLR-4787-deadlock-fix.patch, > SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, > SOLR-4797-hjoin-multivaluekeys-trunk.patch > > > This contrib provides a place where different join implementations can be > contributed to Solr. This contrib currently includes 3 join implementations. > The initial patch was generated from the Solr 4.3 tag. Because of changes in > the FieldCache API this patch will only build with Solr 4.2 or above. > *HashSetJoinQParserPlugin aka hjoin* > The hjoin provides a join implementation that filters results in one core > based on the results of a search in another core. This is similar in > functionality to the JoinQParserPlugin but the implementation differs in a > couple of important ways. > The first way is that the hjoin is designed to work with int and long join > keys only. So, in order to use hjoin, int or long join keys must be included > in both the to and from core. > The second difference is that the hjoin builds memory structures that are > used to quickly connect the join keys. So, the hjoin will need more memory > then the JoinQParserPlugin to perform the join. > The main advantage of the hjoin is that it can scale to join millions of keys > between cores and provide sub-second response time. The hjoin should work > well with up to two million results from the fromIndex and tens of millions > of results from the main query. > The hjoin supports the following features: > 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will > turn on the PostFilter. The PostFilter will typically outperform the Lucene > query when the main query results have been narrowed down. > 2) With the lucene query implementation there is an option to build the > filter with threads. This can greatly improve the performance of the query if > the main query index is very large. The "threads" parameter turns on > threading. For example *threads=6* will use 6 threads to build the filter. > This will setup a fixed threadpool with six threads to handle all hjoin > requests. Once the threadpool is created the hjoin will always use it to > build the filter. Threading does not come into play with the PostFilter. > 3) The *size* local parameter can be used to set the initial size of the > hashset used to perform the join. If this is set above the number of results > from the fromIndex then the you can avoid hashset resizing which improves > performance. > 4) Nested filter queries. The local parameter "fq" can be used to nest a > filter query within the join. The nested fq will filter the results of the > join query. This can point to another join to support nested joins. > 5) Full caching support for the lucene query implementation. The filterCache > and queryResultCache should work properly even with deep nesting of joins. > Only the queryResultCache comes into play with the PostFilter implementation > because PostFilters are not cacheable in the filterCache. > The syntax of the hjoin is similar to the JoinQParserPlu
Re: Analyzing primitive types, why can't we do this in Solr?
Hi Erick, Maybe work with me offline on that idea. Sounds interesting and I would love to hear more details. There is more Solr popularization stuff in works as well, so plenty of opportunities for - ugh - synergies. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Mar 21, 2014 at 8:20 AM, Erick Erickson wrote: > Yeah, I finally got a little smart and clicked the hierarchy builder > link in IntelliJ when resting on UpdateRequestProcessorFactory. I > don't use the built-in IDE tools _nearly_ enough. > > That page looks great BTWl. I'm going to follow a parallel path with > the Solr docs that I think will complement yours, just a brief outline > of what's there similar to the "Analyzers and Tokenizers" page If > I find the time... Siigggh. > > Erick > > On Thu, Mar 20, 2014 at 5:53 PM, Alexandre Rafalovitch > wrote: >> That "chain" issue is exactly why I built the web page above. That, >> plus the Javadoc links all over the place. >> >> Next, I am working on a similar page for all Solr Analyzers, >> Tokenizers and Filters. Should be ready soon. >> >> Regards, >>Alex. >> Personal website: http://www.outerthoughts.com/ >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> - Time is the quality of nature that keeps events from happening all >> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >> >> On Fri, Mar 21, 2014 at 7:50 AM, Erick Erickson >> wrote: >>> Thanks Alexandre! That's what I _thought_ I remembered! >>> >>> It looks like I found all the extends for UpdateProcessorFactory, but >>> didn't follow the chain through FieldMutatingUpdateProcessorFactory >>> which would have found that one for me. >>> >>> Siiihhh. >>> >>> Thanks again, >>> Erick >>> >>> On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch >>> wrote: Do you mean like: http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html ? https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570 Regards, Alex. P.s. Quick URP lookup comes to you courtesy of: http://www.solr-start.com/update-request-processor/4.6.1/ :-) On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson wrote: > I suppose for the special case of dates we could create a > DateFormatProcessorFactory that just took a list of standard Java > SimpleDateFormat strings and applied the first one that fit. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analyzing primitive types, why can't we do this in Solr?
Yeah, I finally got a little smart and clicked the hierarchy builder link in IntelliJ when resting on UpdateRequestProcessorFactory. I don't use the built-in IDE tools _nearly_ enough. That page looks great BTWl. I'm going to follow a parallel path with the Solr docs that I think will complement yours, just a brief outline of what's there similar to the "Analyzers and Tokenizers" page If I find the time... Siigggh. Erick On Thu, Mar 20, 2014 at 5:53 PM, Alexandre Rafalovitch wrote: > That "chain" issue is exactly why I built the web page above. That, > plus the Javadoc links all over the place. > > Next, I am working on a similar page for all Solr Analyzers, > Tokenizers and Filters. Should be ready soon. > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Fri, Mar 21, 2014 at 7:50 AM, Erick Erickson > wrote: >> Thanks Alexandre! That's what I _thought_ I remembered! >> >> It looks like I found all the extends for UpdateProcessorFactory, but >> didn't follow the chain through FieldMutatingUpdateProcessorFactory >> which would have found that one for me. >> >> Siiihhh. >> >> Thanks again, >> Erick >> >> On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch >> wrote: >>> Do you mean like: >>> http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html >>> ? >>> https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570 >>> >>> Regards, >>>Alex. >>> P.s. Quick URP lookup comes to you courtesy of: >>> http://www.solr-start.com/update-request-processor/4.6.1/ :-) >>> >>> On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson >>> wrote: I suppose for the special case of dates we could create a DateFormatProcessorFactory that just took a list of standard Java SimpleDateFormat strings and applied the first one that fit. >>> >>> >>> >>> Personal website: http://www.outerthoughts.com/ >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>> - Time is the quality of nature that keeps events from happening all >>> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >>> book) >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4984. - Resolution: Fixed Fix Version/s: 5.0 4.8 > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.8, 5.0 > > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942617#comment-13942617 ] ASF subversion and git services commented on LUCENE-4984: - Commit 1579855 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1579855 ] LUCENE-4984: Fix ThaiWordFilter, smartcn WordTokenFilter > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.8, 5.0 > > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analyzing primitive types, why can't we do this in Solr?
That "chain" issue is exactly why I built the web page above. That, plus the Javadoc links all over the place. Next, I am working on a similar page for all Solr Analyzers, Tokenizers and Filters. Should be ready soon. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Mar 21, 2014 at 7:50 AM, Erick Erickson wrote: > Thanks Alexandre! That's what I _thought_ I remembered! > > It looks like I found all the extends for UpdateProcessorFactory, but > didn't follow the chain through FieldMutatingUpdateProcessorFactory > which would have found that one for me. > > Siiihhh. > > Thanks again, > Erick > > On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch > wrote: >> Do you mean like: >> http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html >> ? >> https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570 >> >> Regards, >>Alex. >> P.s. Quick URP lookup comes to you courtesy of: >> http://www.solr-start.com/update-request-processor/4.6.1/ :-) >> >> On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson >> wrote: >>> I suppose for the special case of dates we could create a >>> DateFormatProcessorFactory that just took a list of standard Java >>> SimpleDateFormat strings and applied the first one that fit. >> >> >> >> Personal website: http://www.outerthoughts.com/ >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> - Time is the quality of nature that keeps events from happening all >> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942599#comment-13942599 ] ASF subversion and git services commented on LUCENE-4984: - Commit 1579853 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1579853 ] LUCENE-4984: actually pass down the AttributeFactory to superclass > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Google Summer of Code
What does it take to be a mentor? I have a couple of Solr ideas I would be happy to mentor someone on. But do mentors have to sign agreements, be part of Apache formally, etc? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 20, 2014 at 10:12 PM, Michael McCandless wrote: > Unfortunately, the only two GSoC mentors we seem to have this year is > David Smiley and myself, and we each are already signed up to mentor > one student, and there's at least two other students expressing > interest in different issues. > > So it looks like we have too many students and too few mentors. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Mar 20, 2014 at 9:50 AM, Furkan KAMACI wrote: >> Hi; >> >> I want to apply for Google Summer of Code if I can catch up the deadline. >> I've checked the issues. I want to ask that is there any issue which is >> labeled for GSoC and has a volunteer mentor but nobody is applied? Because I >> see that there are comments at some issues which asks about volunteer >> mentors. If there is any issue I will be appreciated to work on it. >> >> Thanks; >> Furkan KAMACI > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analyzing primitive types, why can't we do this in Solr?
Thanks Alexandre! That's what I _thought_ I remembered! It looks like I found all the extends for UpdateProcessorFactory, but didn't follow the chain through FieldMutatingUpdateProcessorFactory which would have found that one for me. Siiihhh. Thanks again, Erick On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch wrote: > Do you mean like: > http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html > ? > https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570 > > Regards, >Alex. > P.s. Quick URP lookup comes to you courtesy of: > http://www.solr-start.com/update-request-processor/4.6.1/ :-) > > On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson > wrote: >> I suppose for the special case of dates we could create a >> DateFormatProcessorFactory that just took a list of standard Java >> SimpleDateFormat strings and applied the first one that fit. > > > > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analyzing primitive types, why can't we do this in Solr?
Do you mean like: http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html ? https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570 Regards, Alex. P.s. Quick URP lookup comes to you courtesy of: http://www.solr-start.com/update-request-processor/4.6.1/ :-) On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson wrote: > I suppose for the special case of dates we could create a > DateFormatProcessorFactory that just took a list of standard Java > SimpleDateFormat strings and applied the first one that fit. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1426 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1426/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.component.DebugComponentTest Error Message: ERROR: SolrIndexSearcher opens=2 closes=3 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=2 closes=3 at __randomizedtesting.SeedInfo.seed([591D70B3D1E85AEF]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:420) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:179) at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:789) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at java.lang.Thread.run(Thread.java:744) Build Log: [...truncated 10667 lines...] [junit4] Suite: org.apache.solr.handler.component.DebugComponentTest [junit4] 2> 1612742 T7076 oas.SolrTestCaseJ4.startTrackingSearchers WARN startTrackingSearchers: numOpens=5 numCloses=4 [junit4] 2> 1612742 T7076 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (true) and clientAuth (false) [junit4] 2> 1612743 T7076 oas.SolrTestCaseJ4.initCore initCore [junit4] 2> Creating dataDir: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/./solrtest-DebugComponentTest-1395359725304 [junit4] 2> 1612744 T7076 oasc.SolrResourceLoader. new SolrResourceLoader for directory: '/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/' [junit4] 2> 1612745 T7076 oasc.SolrResourceLoader.replaceClassLoader Adding 'file:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/lib/.svn/' to classloader [junit4] 2> 1612746 T7076 oasc.SolrResourceLoader.replaceClassLoader Adding 'file:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/lib/classes/' to classloader [junit4] 2> 1612746 T7076 oasc.SolrResourceLoader.replaceClassLoader Adding 'file:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/lib/README' to classloader [junit4] 2> 1612800 T7076 oasc.SolrConfig. Using Lucene MatchVersion: LUCENE_50 [junit4] 2> 1612819 T7076 oasc.SolrConfig. Loaded SolrConfig: solrconfig.xml [junit4] 2> 1612821 T7076 oass.IndexSchema.readSchema Reading Solr Schema from schema.xml [junit4] 2> 1612824 T7076 oass.IndexSchema.readSchema [null] Schema name=test [junit4] 2> 1612947 T7076 oass.OpenExchangeRatesOrgProvider.init Initialized with rates=open-exchange-rates.json, refreshInterval=1440. [junit4] 2> 1612951 T7076 oass.IndexSchema.readSchema default search field in schema is text [junit4] 2> 1612952 T7076 oass.IndexSchema.readSchema unique key field: id [junit4] 2> 1612958 T7076 oass.FileExchangeRateProvider.reload Reloading exchange rates from file currency.xml [junit4] 2> 1612964 T7076 oass.FileExchangeRateProvi
[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942582#comment-13942582 ] ASF subversion and git services commented on LUCENE-4984: - Commit 1579846 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1579846 ] LUCENE-4984: Fix ThaiWordFilter, smartcn WordTokenFilter > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942577#comment-13942577 ] Da Huang commented on LUCENE-4396: -- I'm afraid that if BooleanBulkScorer also handle MUST, it couldn't make use of .advance(), as its subScorers are BulkScorer which could not call .advance(). > BooleanScorer should sometimes be used for MUST clauses > --- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 100 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of
[ https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942559#comment-13942559 ] Tomás Fernández Löbbe commented on SOLR-5228: - What about increasing the schema version? It is currently 1.5. Solr could continue supporting 1.5 as it is now with and , create the version 1.6 that does not support those (and throws exception if present). 5.x would support 1.6+ versions, 4.x should support both but use 1.6 in the example. Anyone who needs to upgrade between 4.x versions can just keep their schema using 1.5. Anyone creating a new schema would start with the 1.6. > Don't require or be inside of -- or that > be inside of > - > > Key: SOLR-5228 > URL: https://issues.apache.org/jira/browse/SOLR-5228 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Hoss Man >Assignee: Hoss Man > > On the solr-user mailing list, Nutan recently mentioned spending days trying > to track down a problem that turned out to be because he had attempted to add > a {{}} that was outside of the {{}} block in his > schema.xml -- Solr was just silently ignoring it. > We have made improvements in other areas of config validation by generating > statup errors when tags/attributes are found that are not expected -- but in > this case i think we should just stop expecting/requiring that the > {{}} and {{}} tags will be used to group these sorts of > things. I think schema.xml parsing should just start ignoring them and only > care about finding the {{}}, {{}}, and {{}} > tags wherever they may be. > If people want to keep using them, fine. If people want to mix fieldTypes > and fields side by side (perhaps specify a fieldType, then list all the > fields using it) fine. I don't see any value in forcing people to use them, > but we definitely shouldn't leave things the way they are with otherwise > perfectly valid field/type declarations being silently ignored. > --- > I'll take this on unless i see any objections. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942532#comment-13942532 ] Robert Muir commented on LUCENE-5489: - This looks good, thanks for moving combine(), as the expression already indicates how to combine with the score. It would be cool for us to add that subclass in a followup issue, then we have a better feeling the abstractions are really working. > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch, > LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5542) Explore making DVConsumer sparse-aware
[ https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942530#comment-13942530 ] Robert Muir commented on LUCENE-5542: - The codec can already decide how to encode the values. Making the API more complicated doesn't seem to buy us anything. I'm open to a benchmark showing this, but I'm not seeing it. > Explore making DVConsumer sparse-aware > -- > > Key: LUCENE-5542 > URL: https://issues.apache.org/jira/browse/LUCENE-5542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Shai Erera > > Today DVConsumer API requires the caller to pass a value for every document, > where {{null}} means "this doc has no value". The Codec can then choose how > to encode the values, i.e. whether it encodes a 0 for a numeric field, or > encodes the sparse docs. In practice, from what I see, we choose to encode > the 0s. > I wonder if we e.g. added an {{Iterable}} to > DVConsumer.addXYZField(), if that would make a better API. The caller only > passes pairs and it's up to the Codec to decide how it wants to > encode the missing values. Like, if a user's app truly has a sparse NDV, > IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the > Codec. > To be clear, I don't propose to change any Codec implementation in this issue > (w.r.t. sparse encoding - yes/no), only change the API to reflect that > sparseness. I think that if we'll ever want to encode sparse values, it will > be a more convenient API. > Thoughts? I volunteer to do this work, but want to get others' opinion before > I start. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942481#comment-13942481 ] Michael McCandless commented on LUCENE-4396: Using BooleanScorer (a Scorer) when there is one or more MUST makes sense I think, but we need to test perf. It could be letting BooleanBulkScorer also handle MUST gives a good performance gain, in which case we could let both handle it ... bq. Besides, I'm afraid that the name of BulkScorer may be confusing. That's a good point ... for a while on that issue we had the name TopScorer ... maybe we need to revisit that :) > BooleanScorer should sometimes be used for MUST clauses > --- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 100 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5489: --- Attachment: LUCENE-5489.patch New patch, folding in feedback ... I think it's ready. > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch, > LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5543) Remove Directory.fileExists
[ https://issues.apache.org/jira/browse/LUCENE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5543: --- Attachment: LUCENE-5543.patch Patch, I think it's ready. > Remove Directory.fileExists > --- > > Key: LUCENE-5543 > URL: https://issues.apache.org/jira/browse/LUCENE-5543 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5543.patch > > > Since 3.0.x/3.6.x (see LUCENE-5541), Lucene has substantially removed > its reliance on fileExists to the point where I think we can fully > remove it now. > Like the other iffy IO methods we've removed over time (touchFile, > fileModified, seeking back during write, ...), File.exists is > dangerous because a low level IO issue can cause it to return false > when it should have returned true. The fewer IO operations we rely on > the more reliable/portable Lucene is. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5543) Remove Directory.fileExists
Michael McCandless created LUCENE-5543: -- Summary: Remove Directory.fileExists Key: LUCENE-5543 URL: https://issues.apache.org/jira/browse/LUCENE-5543 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.8, 5.0 Since 3.0.x/3.6.x (see LUCENE-5541), Lucene has substantially removed its reliance on fileExists to the point where I think we can fully remove it now. Like the other iffy IO methods we've removed over time (touchFile, fileModified, seeking back during write, ...), File.exists is dangerous because a low level IO issue can cause it to return false when it should have returned true. The fewer IO operations we rely on the more reliable/portable Lucene is. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942382#comment-13942382 ] Simon Willnauer commented on LUCENE-4984: - I really like the base class! The patch LGTM +1 to commit > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942377#comment-13942377 ] Simon Willnauer commented on LUCENE-5489: - yeah it can wait I guess - please go ahead and put a TODO > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942369#comment-13942369 ] Michael McCandless commented on LUCENE-5489: bq. I also think this method should only be on QueryRescorer and not in the interface? Woops, right, I'll move it. {quote} I also wonder why you extract the IDs and Scores, I think you should clone the scoreDocs array and sort that first. Then you can just sort the rescored scoreDocs array and simply merge the scores. Once you are done you resort the previously cloned array and we don't need to do all the auto boxing in that hashmap and it's the same sorting we already do? {quote} I think this can wait? It's just an optimization (making the code more hairy but a bit faster). I'll put a TODO... > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942364#comment-13942364 ] Michael McCandless commented on LUCENE-5489: bq. I guess we should really just pass a boolean to make thinks clear. I'll switch to a boolean; I agree the sig is weird now. > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated SOLR-5865: - Attachment: SOLR-5865addendum2.patch Here's a patch that depends on LuceneTestCase. As I mentioned above, I haven't run the whole suite with this. > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, > SOLR-5865addendum.patch, SOLR-5865addendum2.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942330#comment-13942330 ] Gregory Chanan commented on SOLR-5865: -- Hmm, at this point may make more sense to not try to get the test to work outside of the test hierarchy completely. We could try to recreate the minimum set of what we need there (SystemPropertiesRestoreRules and ThreadLeakScopes) but that may change in the test hierarchy itself, requiring just this test to update. The important thing, I think, is that we don't require the complete SolrCloud test hierarchy, e.g. AbstractFullDistribZkTestBase and the like. The question then, is whether we rely on LuceneTestCase or SolrTestCaseJ4. LuceneTestCase is arguably better, because we know we don't rely on anything solr-specific for the test, although the downside is we may have to update it to keep in sync with the SolrTestCaseJ4. I don't have a strong preference either way. I messed around with that a little bit and I have a patch that seems to work with just LuceneTestCase -- I had to import the a couple of rules from SolrTestCaseJ4, but not much. I haven't run the full suite though, so I'm not 100% sure it's kosher. > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5542) Explore making DVConsumer sparse-aware
Shai Erera created LUCENE-5542: -- Summary: Explore making DVConsumer sparse-aware Key: LUCENE-5542 URL: https://issues.apache.org/jira/browse/LUCENE-5542 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Shai Erera Today DVConsumer API requires the caller to pass a value for every document, where {{null}} means "this doc has no value". The Codec can then choose how to encode the values, i.e. whether it encodes a 0 for a numeric field, or encodes the sparse docs. In practice, from what I see, we choose to encode the 0s. I wonder if we e.g. added an {{Iterable}} to DVConsumer.addXYZField(), if that would make a better API. The caller only passes pairs and it's up to the Codec to decide how it wants to encode the missing values. Like, if a user's app truly has a sparse NDV, IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the Codec. To be clear, I don't propose to change any Codec implementation in this issue (w.r.t. sparse encoding - yes/no), only change the API to reflect that sparseness. I think that if we'll ever want to encode sparse values, it will be a more convenient API. Thoughts? I volunteer to do this work, but want to get others' opinion before I start. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942278#comment-13942278 ] Simon Willnauer commented on LUCENE-5489: - oh I see the Float was to mark a match / non-match.. I guess we should really just pass a boolean to make thinks clear. > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5489) Add query rescoring API
[ https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942272#comment-13942272 ] Simon Willnauer commented on LUCENE-5489: - hey mike, thanks for the new patch I think you overlooked that one but the signature looks funky: {code} protected abstract float combine(float firstPassScore, Float secondPassScore); {code} I guess we can use the primitive for both args? I also think this method should only be on QueryRescorer and not in the interface? I also wonder why you extract the IDs and Scores, I think you should clone the scoreDocs array and sort that first. Then you can just sort the rescored scoreDocs array and simply merge the scores. Once you are done you resort the previously cloned array and we don't need to do all the auto boxing in that hashmap and it's the same sorting we already do? > Add query rescoring API > --- > > Key: LUCENE-5489 > URL: https://issues.apache.org/jira/browse/LUCENE-5489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch > > > When costly scoring factors are used during searching, a common > approach is to do a cheaper / basic query first, collect the top few > hundred hits, and then rescore those hits using the more costly > query. > It's not clear/simple to do this with Lucene today; I think we should > make it easier. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5892) Document asynchronous OCP and CoreAdmin calls
Anshum Gupta created SOLR-5892: -- Summary: Document asynchronous OCP and CoreAdmin calls Key: SOLR-5892 URL: https://issues.apache.org/jira/browse/SOLR-5892 Project: Solr Issue Type: Task Components: documentation Reporter: Anshum Gupta Assignee: Anshum Gupta Document the feature committed via SOLR-5477. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analyzing primitive types, why can't we do this in Solr?
And one of my co-workers reminded me of what can be used in Solr right now to accomplish this, so the problem is moot. RegexReplaceProcessorFactory Gets inserted into Solr's update process before it gets fed to the field, so any transformations one wants to do can be done there. I guess the only thing I can't do would be to feed in multiple values to the primitive type field, but that's no big deal I suppose for the special case of dates we could create a DateFormatProcessorFactory that just took a list of standard Java SimpleDateFormat strings and applied the first one that fit. On Thu, Mar 20, 2014 at 11:57 AM, Erick Erickson wrote: > Uwe: > > Thanks! I peeked at the code briefly and I see that it would > be hard. > > Figured there was a good reason. > > What about a higher-level approach? I'm thinking a thin > wrapper for Solr that would apply the analysis chains and feed > the results into the native Lucene primitive processing. Seems > kind of kludgy, I'm mostly wondering if it's conceptually > possible/reasonable. > > Frankly, I'm not convinced there's enough all for something like > this to justify the work/complexification though. > > Erick > > On Thu, Mar 20, 2014 at 11:17 AM, Uwe Schindler wrote: >> Hi Erick, >> >> The numerics are in fact "analyzed". The data is read using a Tokenizer that >> works on top of oal.analysis.NumericTokenStream from Lucene. This one >> produces the tokens from the numerical value given as native data type to >> the TokenStream. Those are indexed (in fact, it is binary data in different >> precisions according to the precision step). >> Additional analysis on top of that is not easy possible, because the >> Tokenizer does all the work, there is no way to inject a TokenFilter. >> Theoretically, there would only be the possibility to add a CharFilter >> before the numeric tokenizer. But the field type does not allow to do that >> at the moment, because the "analysis" is hardcoded in the field type. >> >> >> Uwe >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -Original Message- >>> From: Erick Erickson [mailto:erickerick...@gmail.com] >>> Sent: Thursday, March 20, 2014 6:52 PM >>> To: dev@lucene.apache.org >>> Subject: Analyzing primitive types, why can't we do this in Solr? >>> >>> It's bugged me for a while that we can't define any analysis on primitive >>> types. This is especially acute with date types, we require a very exact >>> format >>> and have to tell people "transform it correctly on the ingestion side", or >>> "create an custom update processor that transforms it". >>> >>> I thought I remembered something about being able to do this, but can't find >>> it. I suspect I was confusing it with DIH. >>> >>> What's the reason for primitive types being unanalyzed? Just "it's always >>> been that way", or "it would lead to a very sticky wicket we never wanted to >>> get stuck in"? Both are perfectly valid, I'm just sayin'. >>> >>> I realize this would provide some "interesting" output. Say you defined a >>> regex for an int type that removed all non-numerics. If the input was >>> "30asdf" and it was transformed correctly into 30 for the underlying int >>> field, >>> it would still come back as 30asdf from the stored data, but that's true >>> about >>> all analysis steps. >>> >>> Or perhaps you'd like to have a string of integers as input to a >>> multiValued int >>> field. Or >>> >>> Musings sparked by seeing this crop up again in another context. >>> >>> Erick >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >>> commands, e-mail: dev-h...@lucene.apache.org >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists
[ https://issues.apache.org/jira/browse/LUCENE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5541: --- Attachment: LUCENE-5541.patch Patch with two classes: * FileExistsCachingDirectory, to work around File.exists unreliability * FixCFS to re-insert a missing CFS sub-file if you hit this corruption > FileExistsCachingDirectory, to work around unreliable File.exists > - > > Key: LUCENE-5541 > URL: https://issues.apache.org/jira/browse/LUCENE-5541 > Project: Lucene - Core > Issue Type: Bug > Components: core/store >Reporter: Michael McCandless > Attachments: LUCENE-5541.patch > > > File.exists is a dangerous method in Java, because if there is a > low-level IOException (permission denied, out of file handles, etc.) > the method can return false when it should return true. > Fortunately, as of Lucene 4.x, we rely much less on File.exists, > because we track which files the codec components created, and we know > those files then exist. > But, unfortunately, going from 3.0.x to 3.6.x, we increased our > reliance on File.exists, e.g. when creating CFS we check File.exists > on each sub-file before trying to add it, and I have a customer > corruption case where apparently a transient low level IOE caused > File.exists to incorrectly return false for one of the sub-files. It > results in corruption like this: > {noformat} > java.io.FileNotFoundException: No sub-file with id .fnm found > (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx]) > > org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157) > > org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146) > org.apache.lucene.index.FieldInfos.(FieldInfos.java:71) > org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212) > > org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228) > org.apache.lucene.index.IndexWriter.(IndexWriter.java:1161) > {noformat} > I think typically local file systems don't often hit such low level > errors, but if you have an index on a remote filesystem, where network > hiccups can cause problems, it's more likely. > As a simple workaround, I created a basic Directory delegator that > holds a Set of all created but not deleted files, and short-circuits > fileExists to return true if the file is in that set. > I don't plan to commit this: we aren't doing bug-fix releases on > 3.6.x anymore (it's very old by now), and this problem is already > "fixed" in 4.x (by reducing our reliance on File.exists), but I wanted > to post the code here in case others hit it. It looks like it was hit > e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and > https://issues.jboss.org/browse/ISPN-2981 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4890) QueryTreeBuilder.getBuilder() only finds interfaces on the most derived class
[ https://issues.apache.org/jira/browse/LUCENE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942162#comment-13942162 ] ASF subversion and git services commented on LUCENE-4890: - Commit 1579717 from [~mikemccand] in branch 'dev/branches/lucene_solr_3_6' [ https://svn.apache.org/r1579717 ] LUCENE-4890: get this test passing again > QueryTreeBuilder.getBuilder() only finds interfaces on the most derived class > - > > Key: LUCENE-4890 > URL: https://issues.apache.org/jira/browse/LUCENE-4890 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, > 3.0.3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.6.1, 3.6.2 > Environment: Lucene 3.3.0 on Win32 >Reporter: Philip Searle >Assignee: Adriano Crestani >Priority: Minor > Fix For: 3.6.3, 4.4, 5.0 > > Attachments: LUCENE-4890_2013_05_25.patch > > > QueryBuilder implementations registered with QueryTreeBuilder.setBuilder() > are not recognized by QueryTreeBuilder.getBuilder() if they are registered > for an interface implemented by a superclass. Registering them for a concrete > query node class or an interface implemented by the most-derived class do > work. > {code:title=example.java|borderStyle=solid} > /* Our custom query builder */ > class CustomQueryTreeBuilder extends QueryTreeBuilder { > public CustomQueryTreeBuilder() { > /* Turn field:"value" into an application-specific object */ > setBuilder(FieldQueryNode.class, new QueryBuilder() { > @Override > public Object build(QueryNode queryNode) { > FieldQueryNode node = (FieldQueryNode) queryNode; > return new ApplicationSpecificClass(node.getFieldAsString()); > } > }); > /* Ignore all other query node types */ > setBuilder(QueryNode.class, new QueryBuilder() { > @Override > public Object build(QueryNode queryNode) { > return null; > } > }); > } > } > /* Assume this is in the main program: */ > StandardQueryParser queryParser = new StandardQueryParser(); > queryParser.setQueryBuilder(new CustomQueryTreeBuilder()); > /* The following line will throw an exception because it can't find a builder > for BooleanQueryNode.class */ > Object queryObject = queryParser.parse("field:\"value\" field2:\"value2\"", > "field"); > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analyzing primitive types, why can't we do this in Solr?
Uwe: Thanks! I peeked at the code briefly and I see that it would be hard. Figured there was a good reason. What about a higher-level approach? I'm thinking a thin wrapper for Solr that would apply the analysis chains and feed the results into the native Lucene primitive processing. Seems kind of kludgy, I'm mostly wondering if it's conceptually possible/reasonable. Frankly, I'm not convinced there's enough all for something like this to justify the work/complexification though. Erick On Thu, Mar 20, 2014 at 11:17 AM, Uwe Schindler wrote: > Hi Erick, > > The numerics are in fact "analyzed". The data is read using a Tokenizer that > works on top of oal.analysis.NumericTokenStream from Lucene. This one > produces the tokens from the numerical value given as native data type to the > TokenStream. Those are indexed (in fact, it is binary data in different > precisions according to the precision step). > Additional analysis on top of that is not easy possible, because the > Tokenizer does all the work, there is no way to inject a TokenFilter. > Theoretically, there would only be the possibility to add a CharFilter before > the numeric tokenizer. But the field type does not allow to do that at the > moment, because the "analysis" is hardcoded in the field type. > > > Uwe > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Thursday, March 20, 2014 6:52 PM >> To: dev@lucene.apache.org >> Subject: Analyzing primitive types, why can't we do this in Solr? >> >> It's bugged me for a while that we can't define any analysis on primitive >> types. This is especially acute with date types, we require a very exact >> format >> and have to tell people "transform it correctly on the ingestion side", or >> "create an custom update processor that transforms it". >> >> I thought I remembered something about being able to do this, but can't find >> it. I suspect I was confusing it with DIH. >> >> What's the reason for primitive types being unanalyzed? Just "it's always >> been that way", or "it would lead to a very sticky wicket we never wanted to >> get stuck in"? Both are perfectly valid, I'm just sayin'. >> >> I realize this would provide some "interesting" output. Say you defined a >> regex for an int type that removed all non-numerics. If the input was >> "30asdf" and it was transformed correctly into 30 for the underlying int >> field, >> it would still come back as 30asdf from the stored data, but that's true >> about >> all analysis steps. >> >> Or perhaps you'd like to have a string of integers as input to a multiValued >> int >> field. Or >> >> Musings sparked by seeing this crop up again in another context. >> >> Erick >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >> commands, e-mail: dev-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5891) Problems installing Apache Solr with Apache Tomcat
[ https://issues.apache.org/jira/browse/SOLR-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe resolved SOLR-5891. - Resolution: Invalid Looks like a configuration issue. You are trying to set the solr home to /opt/solr but apparently Solr is reading from /root/solr-4.7.0/example/solr (maybe you set that somewhere else?). Anyway, you should raise this question int the users list: https://lucene.apache.org/solr/discussion.html > Problems installing Apache Solr with Apache Tomcat > -- > > Key: SOLR-5891 > URL: https://issues.apache.org/jira/browse/SOLR-5891 > Project: Solr > Issue Type: Bug > Components: clients - java, SolrCloud >Affects Versions: 4.7 > Environment: Centos 6.5 Installing Apache Tomcat 7.0.52 with Apache > Solr 4.7.0 >Reporter: Dean Zambrano > Labels: build, newbie > Fix For: 4.7 > > Original Estimate: 26h > Remaining Estimate: 26h > > I installed Apache Solr, version 4.7.0. As part of the install, I performed > the following: > (Based on these instructions: > http://sachkadam.wordpress.com/2013/04/29/solr-installation-on-centos-6/) -> > The summary is as follows: > - I moved the "/example/solr" directory to /opt/solr. > - I created a "solr.xml" file with contains the following xml code: > # more solr.xml > > > override=”true”/> > > The "solr.xml" file is located in: > /usr/share/apache-tomcat-7.0.52/conf/Catalina/localhost > **When I try to access solr through the following URL: > http://107.170.94.202:8983, I receive the following error: > {msg=SolrCore 'collection1' is not available due to init failure: Could not > load config file > /root/solr-4.7.0/example/solr/collection1/solrconfig.xml,trace=org.apache.solr.common.SolrException: > SolrCore 'collection1' is not available due to init failure: Could not load > config file /root/solr-4.7.0/example/solr/collection1/solrconfig.xml > at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:317) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:368) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.solr.common.SolrException: Could not load config file > /root/solr-4.7.0/example/solr/collection1/solrconfig.xml > at > org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530) > at org.apache.s
[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added
[ https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942114#comment-13942114 ] Brett Hoerner commented on SOLR-5890: - I believe I have the same issue (using implicit also). Is there any way for me as the user to send the equivalent of "_route_" with a delete by ID? I have enough information to target the right shard, I'm just not sure how to "tell" it that. > Delete silently fails if not sent to shard where document was added > --- > > Key: SOLR-5890 > URL: https://issues.apache.org/jira/browse/SOLR-5890 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.7 > Environment: Debian 7.4. >Reporter: Peter Inglesby > Fix For: 4.8, 5.0, 4.7.1 > > > We have SolrCloud set up with two shards, each with a leader and a replica. > We use haproxy to distribute requests between the four nodes. > Regardless of which node we send an add request to, following a commit, the > newly-added document is returned in a search, as expected. > However, we can only delete a document if the delete request is sent to a > node in the shard where the document was added. If we send the delete > request to a node in the other shard (and then send a commit) the document is > not deleted. Such a delete request will get a 200 response, with the > following body: > {'responseHeader'=>{'status'=>0,'QTime'=>7}} > Apart from the the very low QTime, this is indistinguishable from a > successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942112#comment-13942112 ] Shawn Heisey commented on SOLR-5852: The inclusion of my patch should not be taken as an endorsement on this issue. I had these ideas floating around in my head that wanted to be put into actual code, so I acquiesced and wrote it. I don't believe we need this at all. If there's consensus that disagrees, then I think it requires the robustness that I put into my patch. > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, > SOLR-5852_FK.patch, SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942091#comment-13942091 ] Andrew Buchanan commented on SOLR-2649: --- Ping for Jan Høydahl to review > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Bug > Components: query parsers >Reporter: Magnus Bergmark >Priority: Minor > Fix For: 4.8 > > Attachments: SOLR-2649.diff > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Analyzing primitive types, why can't we do this in Solr?
Hi Erick, The numerics are in fact "analyzed". The data is read using a Tokenizer that works on top of oal.analysis.NumericTokenStream from Lucene. This one produces the tokens from the numerical value given as native data type to the TokenStream. Those are indexed (in fact, it is binary data in different precisions according to the precision step). Additional analysis on top of that is not easy possible, because the Tokenizer does all the work, there is no way to inject a TokenFilter. Theoretically, there would only be the possibility to add a CharFilter before the numeric tokenizer. But the field type does not allow to do that at the moment, because the "analysis" is hardcoded in the field type. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, March 20, 2014 6:52 PM > To: dev@lucene.apache.org > Subject: Analyzing primitive types, why can't we do this in Solr? > > It's bugged me for a while that we can't define any analysis on primitive > types. This is especially acute with date types, we require a very exact > format > and have to tell people "transform it correctly on the ingestion side", or > "create an custom update processor that transforms it". > > I thought I remembered something about being able to do this, but can't find > it. I suspect I was confusing it with DIH. > > What's the reason for primitive types being unanalyzed? Just "it's always > been that way", or "it would lead to a very sticky wicket we never wanted to > get stuck in"? Both are perfectly valid, I'm just sayin'. > > I realize this would provide some "interesting" output. Say you defined a > regex for an int type that removed all non-numerics. If the input was > "30asdf" and it was transformed correctly into 30 for the underlying int > field, > it would still come back as 30asdf from the stored data, but that's true about > all analysis steps. > > Or perhaps you'd like to have a string of integers as input to a multiValued > int > field. Or > > Musings sparked by seeing this crop up again in another context. > > Erick > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5860) Logging around core wait for state during startup / recovery is confusing
[ https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942076#comment-13942076 ] Shalin Shekhar Mangar commented on SOLR-5860: - I'm seeing some test failures with the patch. Ran it twice already. I have to call it a day but if nobody else gets to it first, I'll debug tomorrow and commit. {quote} [junit4] Tests with failures: [junit4] - org.apache.solr.handler.component.TermVectorComponentDistributedTest.testDistribSearch [junit4] - org.apache.solr.handler.component.DistributedExpandComponentTest.testDistribSearch [junit4] - org.apache.solr.handler.component.DistributedSuggestComponentTest.testDistribSearch [junit4] - org.apache.solr.TestDistributedGrouping.testDistribSearch [junit4] - org.apache.solr.handler.component.DistributedTermsComponentTest.testDistribSearch [junit4] - org.apache.solr.handler.component.DistributedSpellCheckComponentTest.testDistribSearch [junit4] - org.apache.solr.TestDistributedMissingSort.testDistribSearch [junit4] - org.apache.solr.TestDistributedSearch.testDistribSearch [junit4] - org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.testDistribSearch [junit4] {quote} > Logging around core wait for state during startup / recovery is confusing > - > > Key: SOLR-5860 > URL: https://issues.apache.org/jira/browse/SOLR-5860 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter >Assignee: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-5860.patch > > > I'm seeing some log messages like this: > I was asked to wait on state recovering for HOST:8984_solr but I still do not > see the requested state. I see state: recovering live:true > This is very confusing because from the log, it seems like it's waiting to > see the state it's in ... After digging through the code, it appears that it > is really waiting for a leader to become active so that it has a leader to > recover from. > I'd like to improve the logging around this critical wait loop to give better > context to what is happening. > Also, I would like to change the following so that we force state updates > every 15 seconds for the entire wait period. > - if (retry == 15 || retry == 60) { > + if (retry % 15 == 0) { > As-is, it's waiting 120 seconds but only forcing the state to update twice, > once after 15 seconds and again after 60 … might be good to force updates for > the full wait period. > Lastly, I think it would be good to use the leaderConflictResolveWait setting > (from ZkController) here as well since 120 may not be enough for a leader to > become active in a busy cluster, esp. after the node the Overseer is running > on. Maybe leaderConflictResolveWait + 5 seconds? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_51) - Build # 9857 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9857/ Java: 64bit/jdk1.7.0_51 -XX:+UseCompressedOops -XX:+UseSerialGC -XX:-UseSuperWord 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin {#8 seed=[B2C8C580BA165153:878CD25E862E637E]} Error Message: Shouldn't match I#4:Rect(minX=139.0,maxX=148.0,minY=119.0,maxY=121.0) Q:Pt(x=128.0,y=117.0) Stack Trace: java.lang.AssertionError: Shouldn't match I#4:Rect(minX=139.0,maxX=148.0,minY=119.0,maxY=121.0) Q:Pt(x=128.0,y=117.0) at __randomizedtesting.SeedInfo.seed([B2C8C580BA165153:878CD25E862E637E]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:355) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:335) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at java.lang.Thread.run(Thread.java:744) Build Log: [...truncated 9084 lines...] [junit4] Suite: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest [junit4] 1> Strategy: RecursiveP
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942057#comment-13942057 ] Shawn Heisey commented on SOLR-5852: bq. As I read this, I don't quite see the utility of offering all the different ways of specifying the ensemble. bq. Aren't these all handled by the "typical" ZK ensemble connection string? I actually agree. But if a method like this is created, that's how I would want to do it. > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, > SOLR-5852_FK.patch, SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942046#comment-13942046 ] Furkan KAMACI commented on SOLR-5852: - [~erickerickson] could you check my comments and my patch? > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, > SOLR-5852_FK.patch, SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Analyzing primitive types, why can't we do this in Solr?
It's bugged me for a while that we can't define any analysis on primitive types. This is especially acute with date types, we require a very exact format and have to tell people "transform it correctly on the ingestion side", or "create an custom update processor that transforms it". I thought I remembered something about being able to do this, but can't find it. I suspect I was confusing it with DIH. What's the reason for primitive types being unanalyzed? Just "it's always been that way", or "it would lead to a very sticky wicket we never wanted to get stuck in"? Both are perfectly valid, I'm just sayin'. I realize this would provide some "interesting" output. Say you defined a regex for an int type that removed all non-numerics. If the input was "30asdf" and it was transformed correctly into 30 for the underlying int field, it would still come back as 30asdf from the stored data, but that's true about all analysis steps. Or perhaps you'd like to have a string of integers as input to a multiValued int field. Or Musings sparked by seeing this crop up again in another context. Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5888) SyncSliceTest is slower than it should be.
[ https://issues.apache.org/jira/browse/SOLR-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-5888. --- Resolution: Fixed > SyncSliceTest is slower than it should be. > -- > > Key: SOLR-5888 > URL: https://issues.apache.org/jira/browse/SOLR-5888 > Project: Solr > Issue Type: Test > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 4.8, 5.0 > > > This test is surprisingly slow. Turns out, it's waiting around in many cases > when it does not necessarily need to. > Part of the fix should speed up some other tests a bit as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942015#comment-13942015 ] Mark Miller commented on SOLR-5865: --- Hmm - nope, I can still see something. It's just becoming rarer. I think this one is from threads leaking past the end of the test. The test framework has a linger for this and attempted interrupts (linger has shown to be especially important for zk tests, which has threads that can seem to linger for a while after shutdown). > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941974#comment-13941974 ] Erick Erickson commented on SOLR-5852: -- As I read this, I don't quite see the utility of offering all the different ways of specifying the ensemble. 1> ("host1:2181", "/mychroot") 2> ("127.0.0.1:3000", "127.0.0.1:3001", "127.0.0.1:3002") 3> ("localhost:2181", "localhost:2181", "localhost:2181/solrtwo") 4> ("zoo1:2181", "zoo2:2181", "zoo3:2181", "/solr-three") 5> ("zoo1.example.com:2181", "zoo2.example.com:2181","zoo3.example/com:2181","/solr-three") 6> ("zoo1:2181/root", "zoo2:2181/root", "zoo3:2181/root") Aren't these all handled by the "typical" ZK ensemble connection string? I.e. 1> "host1:2101/mychroot" 2> "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002" 3> "localhost:2181,localhost:2181,localhost:2181/solrtwo" 4> like 3 5> like 3 6> like 3 I confess I'm just looking at it from a rather ignorant level, but it seems like this would add complexity for no added functionality. Of course I may be missing a lot, if there are places where this kind of processing is _already_ being done and this moves things to a c'tor that would be a reason. I'd rather have a single form than multiple forms, unless the multiple forms give me added functionality. Otherwise, one adds maintenance without adding value.. Let me know if I've missed the boat here. > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, > SOLR-5852_FK.patch, SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5232) SolrCloud should distribute updates via streaming rather than buffering.
[ https://issues.apache.org/jira/browse/SOLR-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941957#comment-13941957 ] Jessica Cheng commented on SOLR-5232: - Just curious--has anyone gotten a chance to run with both before and after this change to see if the throughput is improved? > SolrCloud should distribute updates via streaming rather than buffering. > > > Key: SOLR-5232 > URL: https://issues.apache.org/jira/browse/SOLR-5232 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Critical > Fix For: 4.6, 5.0 > > Attachments: SOLR-5232.patch, SOLR-5232.patch, SOLR-5232.patch, > SOLR-5232.patch, SOLR-5232.patch, SOLR-5232.patch > > > The current approach was never the best for SolrCloud - it was designed for a > pre SolrCloud Solr - it also uses too many connections and threads - nailing > that down is likely wasted effort when we should really move away from > explicitly buffering docs and sending small batches per thread as we have > been doing. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-5852: --- Attachment: SOLR-5852-SH.patch New patch against trunk. Previous patch was against trunk too, but a couple of hours after I went to bed, a conflicting patch was committed. This does make a change to CloudSolrServerTest bits that just got added, but only to eliminate warnings. It does not change the function. > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, > SOLR-5852_FK.patch, SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941912#comment-13941912 ] Erick Erickson commented on SOLR-5852: -- Yeah, I saw that. Was it against trunk or 4x? That might account for the difference. BTW, I thought I'd mention that sleep is a good thing :). > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852.patch, SOLR-5852_FK.patch, > SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941914#comment-13941914 ] ASF subversion and git services commented on SOLR-5865: --- Commit 1579682 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1579682 ] SOLR-5865: Un@Ignore test again. > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941910#comment-13941910 ] ASF subversion and git services commented on SOLR-5865: --- Commit 1579679 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1579679 ] SOLR-5865: Un@Ignore test again. > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added
[ https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941897#comment-13941897 ] Xavier Riley commented on SOLR-5890: Yes router is set to "implicit" > Delete silently fails if not sent to shard where document was added > --- > > Key: SOLR-5890 > URL: https://issues.apache.org/jira/browse/SOLR-5890 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.7 > Environment: Debian 7.4. >Reporter: Peter Inglesby > Fix For: 4.8, 5.0, 4.7.1 > > > We have SolrCloud set up with two shards, each with a leader and a replica. > We use haproxy to distribute requests between the four nodes. > Regardless of which node we send an add request to, following a commit, the > newly-added document is returned in a search, as expected. > However, we can only delete a document if the delete request is sent to a > node in the shard where the document was added. If we send the delete > request to a node in the other shard (and then send a commit) the document is > not deleted. Such a delete request will get a 200 response, with the > following body: > {'responseHeader'=>{'status'=>0,'QTime'=>7}} > Apart from the the very low QTime, this is indistinguishable from a > successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5860) Logging around core wait for state during startup / recovery is confusing
[ https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941893#comment-13941893 ] Shalin Shekhar Mangar commented on SOLR-5860: - Yes, the patch looks good to me. I'll commit after running all tests. > Logging around core wait for state during startup / recovery is confusing > - > > Key: SOLR-5860 > URL: https://issues.apache.org/jira/browse/SOLR-5860 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter >Assignee: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-5860.patch > > > I'm seeing some log messages like this: > I was asked to wait on state recovering for HOST:8984_solr but I still do not > see the requested state. I see state: recovering live:true > This is very confusing because from the log, it seems like it's waiting to > see the state it's in ... After digging through the code, it appears that it > is really waiting for a leader to become active so that it has a leader to > recover from. > I'd like to improve the logging around this critical wait loop to give better > context to what is happening. > Also, I would like to change the following so that we force state updates > every 15 seconds for the entire wait period. > - if (retry == 15 || retry == 60) { > + if (retry % 15 == 0) { > As-is, it's waiting 120 seconds but only forcing the state to update twice, > once after 15 seconds and again after 60 … might be good to force updates for > the full wait period. > Lastly, I think it would be good to use the leaderConflictResolveWait setting > (from ZkController) here as well since 120 may not be enough for a leader to > become active in a busy cluster, esp. after the node the Overseer is running > on. Maybe leaderConflictResolveWait + 5 seconds? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5749) Implement an Overseer status API
[ https://issues.apache.org/jira/browse/SOLR-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5749: Attachment: SOLR-5749.patch This patch adds tracking 10 most recent failures (with entire request/response) for each Collection API action. I think this along with the requeststatus API added in SOLR-5477 removes the need to expose entire logs. This can be committed now. In order to write/read stats from ZK, we need to be able to serialize Timer and related classes. I shall do that via a different issue. > Implement an Overseer status API > > > Key: SOLR-5749 > URL: https://issues.apache.org/jira/browse/SOLR-5749 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Shalin Shekhar Mangar > Fix For: 5.0 > > Attachments: SOLR-5749.patch, SOLR-5749.patch, SOLR-5749.patch, > SOLR-5749.patch, SOLR-5749.patch > > > Right now there is little to no information exposed about the overseer from > SolrCloud. > I propose that we have an API for overseer status which can return: > # Past N commands executed (grouped by command type) > # Status (queue-size, current overseer leader node) > # Overseer log -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5860) Logging around core wait for state during startup / recovery is confusing
[ https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941884#comment-13941884 ] Mark Miller commented on SOLR-5860: --- I've touched on this area working on SOLR-5884 as well - this is more thoughtful stuff though, so would be great to get it in before I commit SOLR-5884. > Logging around core wait for state during startup / recovery is confusing > - > > Key: SOLR-5860 > URL: https://issues.apache.org/jira/browse/SOLR-5860 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter >Assignee: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-5860.patch > > > I'm seeing some log messages like this: > I was asked to wait on state recovering for HOST:8984_solr but I still do not > see the requested state. I see state: recovering live:true > This is very confusing because from the log, it seems like it's waiting to > see the state it's in ... After digging through the code, it appears that it > is really waiting for a leader to become active so that it has a leader to > recover from. > I'd like to improve the logging around this critical wait loop to give better > context to what is happening. > Also, I would like to change the following so that we force state updates > every 15 seconds for the entire wait period. > - if (retry == 15 || retry == 60) { > + if (retry % 15 == 0) { > As-is, it's waiting 120 seconds but only forcing the state to update twice, > once after 15 seconds and again after 60 … might be good to force updates for > the full wait period. > Lastly, I think it would be good to use the leaderConflictResolveWait setting > (from ZkController) here as well since 120 may not be enough for a leader to > become active in a busy cluster, esp. after the node the Overseer is running > on. Maybe leaderConflictResolveWait + 5 seconds? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941876#comment-13941876 ] Mark Miller commented on SOLR-5865: --- bq. I think the main issue is the zkHost sys pro Hmm - that is not it either - the MiniSolrCloudCluster will clear those on shutdown as well. I'm still seeing some leakage somehow though. bq. Can you just use SystemPropertiesRestoreRule? Let me give a try. > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists
Michael McCandless created LUCENE-5541: -- Summary: FileExistsCachingDirectory, to work around unreliable File.exists Key: LUCENE-5541 URL: https://issues.apache.org/jira/browse/LUCENE-5541 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless File.exists is a dangerous method in Java, because if there is a low-level IOException (permission denied, out of file handles, etc.) the method can return false when it should return true. Fortunately, as of Lucene 4.x, we rely much less on File.exists, because we track which files the codec components created, and we know those files then exist. But, unfortunately, going from 3.0.x to 3.6.x, we increased our reliance on File.exists, e.g. when creating CFS we check File.exists on each sub-file before trying to add it, and I have a customer corruption case where apparently a transient low level IOE caused File.exists to incorrectly return false for one of the sub-files. It results in corruption like this: {noformat} java.io.FileNotFoundException: No sub-file with id .fnm found (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx]) org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157) org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146) org.apache.lucene.index.FieldInfos.(FieldInfos.java:71) org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212) org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228) org.apache.lucene.index.IndexWriter.(IndexWriter.java:1161) {noformat} I think typically local file systems don't often hit such low level errors, but if you have an index on a remote filesystem, where network hiccups can cause problems, it's more likely. As a simple workaround, I created a basic Directory delegator that holds a Set of all created but not deleted files, and short-circuits fileExists to return true if the file is in that set. I don't plan to commit this: we aren't doing bug-fix releases on 3.6.x anymore (it's very old by now), and this problem is already "fixed" in 4.x (by reducing our reliance on File.exists), but I wanted to post the code here in case others hit it. It looks like it was hit e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and https://issues.jboss.org/browse/ISPN-2981 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941873#comment-13941873 ] Alan Woodward commented on SOLR-5865: - Can you just use SystemPropertiesRestoreRule? > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added
[ https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941865#comment-13941865 ] Mark Miller commented on SOLR-5890: --- If you look at the SolrAdmin cloud section, under the zk tree view, what router impl do you see in clusterstate.json? Is it implicit by any chance? > Delete silently fails if not sent to shard where document was added > --- > > Key: SOLR-5890 > URL: https://issues.apache.org/jira/browse/SOLR-5890 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.7 > Environment: Debian 7.4. >Reporter: Peter Inglesby > Fix For: 4.8, 5.0, 4.7.1 > > > We have SolrCloud set up with two shards, each with a leader and a replica. > We use haproxy to distribute requests between the four nodes. > Regardless of which node we send an add request to, following a commit, the > newly-added document is returned in a search, as expected. > However, we can only delete a document if the delete request is sent to a > node in the shard where the document was added. If we send the delete > request to a node in the other shard (and then send a commit) the document is > not deleted. Such a delete request will get a 200 response, with the > following body: > {'responseHeader'=>{'status'=>0,'QTime'=>7}} > Apart from the the very low QTime, this is indistinguishable from a > successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5890) Delete silently fails if not sent to shard where document was added
[ https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-5890: -- Fix Version/s: 4.7.1 5.0 4.8 > Delete silently fails if not sent to shard where document was added > --- > > Key: SOLR-5890 > URL: https://issues.apache.org/jira/browse/SOLR-5890 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.7 > Environment: Debian 7.4. >Reporter: Peter Inglesby > Fix For: 4.8, 5.0, 4.7.1 > > > We have SolrCloud set up with two shards, each with a leader and a replica. > We use haproxy to distribute requests between the four nodes. > Regardless of which node we send an add request to, following a commit, the > newly-added document is returned in a search, as expected. > However, we can only delete a document if the delete request is sent to a > node in the shard where the document was added. If we send the delete > request to a node in the other shard (and then send a commit) the document is > not deleted. Such a delete request will get a 200 response, with the > following body: > {'responseHeader'=>{'status'=>0,'QTime'=>7}} > Apart from the the very low QTime, this is indistinguishable from a > successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added
[ https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941862#comment-13941862 ] Mark Miller commented on SOLR-5890: --- Very strange - the request should be forwarded. This will be interesting. > Delete silently fails if not sent to shard where document was added > --- > > Key: SOLR-5890 > URL: https://issues.apache.org/jira/browse/SOLR-5890 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.7 > Environment: Debian 7.4. >Reporter: Peter Inglesby > Fix For: 4.8, 5.0, 4.7.1 > > > We have SolrCloud set up with two shards, each with a leader and a replica. > We use haproxy to distribute requests between the four nodes. > Regardless of which node we send an add request to, following a commit, the > newly-added document is returned in a search, as expected. > However, we can only delete a document if the delete request is sent to a > node in the shard where the document was added. If we send the delete > request to a node in the other shard (and then send a commit) the document is > not deleted. Such a delete request will get a 200 response, with the > following body: > {'responseHeader'=>{'status'=>0,'QTime'=>7}} > Apart from the the very low QTime, this is indistinguishable from a > successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941860#comment-13941860 ] Gopal Patwa commented on SOLR-4787: --- Thanks Kranti, here is my usecase Event Collection: eventId=1 title=Lady Gaga date=06/03/2014 EventTicketStats Collection eventId=1 minPrice=200 minQuantity=5 When user search for "lady gaga" on event document using hjoin with EventTicketStats then result should include min price and qty data from join core. Final Result for Event Collection: eventId=1 title=Lady Gaga date=06/03/2014 minPrice=200 minQuantity=5 And user has option to filter result for price and qty like show events for minPrice < 100 The reason we have EventStats in separate document that our ticket data changes every 5 seconds but Event data changes are like twice a day I thought using Updatable Numeric DocValue after denormalizing Event document with min price and qty fields But Solr does not have support for that feature yet. So I need to rely on using join > Join Contrib > > > Key: SOLR-4787 > URL: https://issues.apache.org/jira/browse/SOLR-4787 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.2.1 >Reporter: Joel Bernstein >Priority: Minor > Fix For: 4.8 > > Attachments: SOLR-4787-deadlock-fix.patch, > SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, > SOLR-4797-hjoin-multivaluekeys-trunk.patch > > > This contrib provides a place where different join implementations can be > contributed to Solr. This contrib currently includes 3 join implementations. > The initial patch was generated from the Solr 4.3 tag. Because of changes in > the FieldCache API this patch will only build with Solr 4.2 or above. > *HashSetJoinQParserPlugin aka hjoin* > The hjoin provides a join implementation that filters results in one core > based on the results of a search in another core. This is similar in > functionality to the JoinQParserPlugin but the implementation differs in a > couple of important ways. > The first way is that the hjoin is designed to work with int and long join > keys only. So, in order to use hjoin, int or long join keys must be included > in both the to and from core. > The second difference is that the hjoin builds memory structures that are > used to quickly connect the join keys. So, the hjoin will need more memory > then the JoinQParserPlugin to perform the join. > The main advantage of the hjoin is that it can scale to join millions of keys > between cores and provide sub-second response time. The hjoin should work > well with up to two million results from the fromIndex and tens of millions > of results from the main query. > The hjoin supports the following features: > 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will > turn on the PostFilter. The PostFilter will typically outperform the Lucene > query when the main query results have been narrowed down. > 2) With the lucene query implementation there is an option to build the > filter with threads. This can greatly improve the performance of the query if > the main query index is very large. The "threads" parameter turns on > threading. For example *threads=6* will use 6 threads to build the filter. > This will setup a fixed threadpool with six threads to handle all hjoin > requests. Once the threadpool is created the hjoin will always use it to > build the filter. Threading does not come into play with the PostFilter. > 3) The *size* local parameter can be used to set the initial size of the > hashset used to perform the join. If this is set above the number of results > from the fromIndex then the you can avoid hashset resizing which improves > performance. > 4) Nested filter queries. The local parameter "fq" can be used to nest a > filter query within the join. The nested fq will filter the results of the > join query. This can point to another join to support nested joins. > 5) Full caching support for the lucene query implementation. The filterCache > and queryResultCache should work properly even with deep nesting of joins. > Only the queryResultCache comes into play with the PostFilter implementation > because PostFilters are not cacheable in the filterCache. > The syntax of the hjoin is similar to the JoinQParserPlugin except that the > plugin is referenced by the string "hjoin" rather then "join". > fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 > fq=$qq\}user:customer1&qq=group:5 > The example filter query a
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941857#comment-13941857 ] Mark Miller commented on SOLR-5865: --- Thanks Greg - I think the main issue is the zkHost sys prop - I've added the following as well: System.clearProperty("solr.solrxml.location"); System.clearProperty("zkHost"); That's one complication of avoiding the test framework - normally there are checks applied for this type of thing and the test will fail if you violate it and tell which sys props were not reset or which threads were not stopped, etc. > Provide a MiniSolrCloudCluster to enable easier testing > --- > > Key: SOLR-5865 > URL: https://issues.apache.org/jira/browse/SOLR-5865 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7, 5.0 >Reporter: Gregory Chanan >Assignee: Mark Miller > Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch > > > Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, > which has a couple of issues around support for downstream projects: > - It's difficult to test SolrCloud support in a downstream project that may > have its own test framework. For example, some projects have support for > different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests > against each of the different backends. This is difficult to do cleanly, > because the Solr tests require derivation from LuceneTestCase, while the > other don't > - The LuceneTestCase class hierarchy is really designed for internal solr > tests (e.g. it randomizes a lot of parameters to get test coverage, but a > downstream project probably doesn't care about that). It's also quite > complicated and dense, much more so than a downstream project would want. > Given these reasons, it would be nice to provide a simple > "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or > HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5891) Problems installing Apache Solr with Apache Tomcat
Dean Zambrano created SOLR-5891: --- Summary: Problems installing Apache Solr with Apache Tomcat Key: SOLR-5891 URL: https://issues.apache.org/jira/browse/SOLR-5891 Project: Solr Issue Type: Bug Components: clients - java, SolrCloud Affects Versions: 4.7 Environment: Centos 6.5 Installing Apache Tomcat 7.0.52 with Apache Solr 4.7.0 Reporter: Dean Zambrano Fix For: 4.7 I installed Apache Solr, version 4.7.0. As part of the install, I performed the following: (Based on these instructions: http://sachkadam.wordpress.com/2013/04/29/solr-installation-on-centos-6/) -> The summary is as follows: - I moved the "/example/solr" directory to /opt/solr. - I created a "solr.xml" file with contains the following xml code: # more solr.xml The "solr.xml" file is located in: /usr/share/apache-tomcat-7.0.52/conf/Catalina/localhost **When I try to access solr through the following URL: http://107.170.94.202:8983, I receive the following error: {msg=SolrCore 'collection1' is not available due to init failure: Could not load config file /root/solr-4.7.0/example/solr/collection1/solrconfig.xml,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Could not load config file /root/solr-4.7.0/example/solr/collection1/solrconfig.xml at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:317) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.solr.common.SolrException: Could not load config file /root/solr-4.7.0/example/solr/collection1/solrconfig.xml at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classp
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 541 - Still Failing
I committed a fix. Mike McCandless http://blog.mikemccandless.com On Wed, Mar 19, 2014 at 6:33 PM, Apache Jenkins Server wrote: > Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/541/ > > 1 tests failed. > REGRESSION: > org.apache.lucene.replicator.LocalReplicatorTest.testObtainMissingFile > > Error Message: > /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-NightlyTests-4.x/lucene/build/replicator/test/J1/index3004630048tmp/madeUpFile > > Stack Trace: > java.nio.file.NoSuchFileException: > /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-NightlyTests-4.x/lucene/build/replicator/test/J1/index3004630048tmp/madeUpFile > at > __randomizedtesting.SeedInfo.seed([441C78757690D0BA:621FD230CEC36F9D]:0) > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:334) > at > org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:82) > at > org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80) > at > org.apache.lucene.replicator.IndexRevision.open(IndexRevision.java:136) > at > org.apache.lucene.replicator.LocalReplicator.obtainFile(LocalReplicator.java:198) > at > org.apache.lucene.replicator.LocalReplicatorTest.testObtainMissingFile(LocalReplicatorTest.java:155) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAsse
[jira] [Commented] (SOLR-5880) org.apache.solr.client.solrj.impl.CloudSolrServerTest is failing pretty much every time for a long time with an exception about not being able to connect to ZooKeeper wi
[ https://issues.apache.org/jira/browse/SOLR-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941854#comment-13941854 ] Mark Miller commented on SOLR-5880: --- They were related to SOLR-5865. > org.apache.solr.client.solrj.impl.CloudSolrServerTest is failing pretty much > every time for a long time with an exception about not being able to connect > to ZooKeeper within the timeout. > -- > > Key: SOLR-5880 > URL: https://issues.apache.org/jira/browse/SOLR-5880 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.8, 5.0 > > > This test is failing consistently, though currently only on Policeman Jenkins > servers. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Google Summer of Code
Hi Michael; Thanks for the explanation. Furkan KAMACI 2014-03-20 17:12 GMT+02:00 Michael McCandless : > Unfortunately, the only two GSoC mentors we seem to have this year is > David Smiley and myself, and we each are already signed up to mentor > one student, and there's at least two other students expressing > interest in different issues. > > So it looks like we have too many students and too few mentors. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Mar 20, 2014 at 9:50 AM, Furkan KAMACI > wrote: > > Hi; > > > > I want to apply for Google Summer of Code if I can catch up the deadline. > > I've checked the issues. I want to ask that is there any issue which is > > labeled for GSoC and has a volunteer mentor but nobody is applied? > Because I > > see that there are comments at some issues which asks about volunteer > > mentors. If there is any issue I will be appreciated to work on it. > > > > Thanks; > > Furkan KAMACI > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941852#comment-13941852 ] Shawn Heisey commented on SOLR-5852: I just made that patch a few hours ago! New patch coming up as soon as I work my way through the conflicts. > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852.patch, SOLR-5852_FK.patch, > SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5890) Delete silently fails if not sent to shard where document was added
[ https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Inglesby updated SOLR-5890: - Summary: Delete silently fails if not sent to shard where document was added (was: Delete silently fails if not sent to node where document was added) > Delete silently fails if not sent to shard where document was added > --- > > Key: SOLR-5890 > URL: https://issues.apache.org/jira/browse/SOLR-5890 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.7 > Environment: Debian 7.4. >Reporter: Peter Inglesby > > We have SolrCloud set up with two shards, each with a leader and a replica. > We use haproxy to distribute requests between the four nodes. > Regardless of which node we send an add request to, following a commit, the > newly-added document is returned in a search, as expected. > However, we can only delete a document if the delete request is sent to a > node in the shard where the document was added. If we send the delete > request to a node in the other shard (and then send a commit) the document is > not deleted. Such a delete request will get a 200 response, with the > following body: > {'responseHeader'=>{'status'=>0,'QTime'=>7}} > Apart from the the very low QTime, this is indistinguishable from a > successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Deleted] (SOLR-5889) aaaa
[ https://issues.apache.org/jira/browse/SOLR-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler deleted SOLR-5889: > > > > Key: SOLR-5889 > URL: https://issues.apache.org/jira/browse/SOLR-5889 > Project: Solr > Issue Type: Bug >Reporter: linxiaohu > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5890) Delete silently fails if not sent to node where document was added
Peter Inglesby created SOLR-5890: Summary: Delete silently fails if not sent to node where document was added Key: SOLR-5890 URL: https://issues.apache.org/jira/browse/SOLR-5890 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7 Environment: Debian 7.4. Reporter: Peter Inglesby We have SolrCloud set up with two shards, each with a leader and a replica. We use haproxy to distribute requests between the four nodes. Regardless of which node we send an add request to, following a commit, the newly-added document is returned in a search, as expected. However, we can only delete a document if the delete request is sent to a node in the shard where the document was added. If we send the delete request to a node in the other shard (and then send a commit) the document is not deleted. Such a delete request will get a 200 response, with the following body: {'responseHeader'=>{'status'=>0,'QTime'=>7}} Apart from the the very low QTime, this is indistinguishable from a successful delete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't
[ https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated SOLR-5394: - Attachment: SOLR-5394.patch This patch sets the default threads to 1 (single thread execution) as per Vitaly's suggestion. Fixed the test case with unspecified threads parameter: the number of threads is expected to be the default (=1). The tests in TestSimpleFacet pass. > facet.method=fcs seems to be using threads when it shouldn't > > > Key: SOLR-5394 > URL: https://issues.apache.org/jira/browse/SOLR-5394 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6 >Reporter: Michael McCandless > Attachments: SOLR-5394.patch, SOLR-5394.patch, > SOLR-5394_keep_threads_original_value.patch > > > I built a wikipedia index, with multiple fields for faceting. > When I do facet.method=fcs with facet.field=dateFacet and > facet.field=userNameFacet, and then kill -QUIT the java process, I see a > bunch (46, I think) of facetExecutor-7-thread-N threads had spun up. > But I thought threads for each field is turned off by default? > Even if I add facet.threads=0, it still spins up all the threads. > I think something is wrong in SimpleFacets.parseParams; somehow, that method > returns early (because localParams) is null, leaving threads=-1, and then the > later code that would have set threads to 0 never runs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Google Summer of Code
Unfortunately, the only two GSoC mentors we seem to have this year is David Smiley and myself, and we each are already signed up to mentor one student, and there's at least two other students expressing interest in different issues. So it looks like we have too many students and too few mentors. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 20, 2014 at 9:50 AM, Furkan KAMACI wrote: > Hi; > > I want to apply for Google Summer of Code if I can catch up the deadline. > I've checked the issues. I want to ask that is there any issue which is > labeled for GSoC and has a volunteer mentor but nobody is applied? Because I > see that there are comments at some issues which asks about volunteer > mentors. If there is any issue I will be appreciated to work on it. > > Thanks; > Furkan KAMACI - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5824) Merge up Solr MapReduce contrib code to latest external changes.
[ https://issues.apache.org/jira/browse/SOLR-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941793#comment-13941793 ] ASF subversion and git services commented on SOLR-5824: --- Commit 1579648 from [~markrmil...@gmail.com] in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1579648 ] SOLR-5824: Merge up Solr MapReduce contrib code to latest external changes. Includes a few minor bug fixes. > Merge up Solr MapReduce contrib code to latest external changes. > > > Key: SOLR-5824 > URL: https://issues.apache.org/jira/browse/SOLR-5824 > Project: Solr > Issue Type: Task >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.8, 5.0, 4.7.1 > > Attachments: SOLR-5824.patch > > > There are a variety of changes in the mapreduce contrib code that have > occurred while getting the initial stuff committed - they need to be merged > in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941779#comment-13941779 ] Robert Muir commented on LUCENE-4984: - Its even simpler than that. But i wanted to do that in a followup issue. 4.8 is a good time to fix it, as its easy with this tokenizer! > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941775#comment-13941775 ] Da Huang commented on LUCENE-4396: -- Sorry for my late reply. I have been thinking about the new code/design on the trunk these days. The new code breaks out BulkScorer from Scorer, and it is necessary to create a new BooleanScorer (a Scorer), just as you said. I'm afraid that we do have to take Scorer instead as subScorer in the new BooleanScorer. And yes: BooleanBulkScorer should not be embeded as its docIDs are out of order. My idea is to keep BooleanBulkScorer just supporting no-MUST-clause case, and let the new BooleanScorer to deal with the case where there is at least one MUST clause. I think this is one of the best ways to be compatible with the current design. Besides, I'm afraid that the name of BulkScorer may be confusing. The new BooleanScorer is also implemented by scoring a range of documents at once, but it actually can act as Sub-Scorer. > BooleanScorer should sometimes be used for MUST clauses > --- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 100 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941765#comment-13941765 ] Kranti Parisa commented on SOLR-4787: - Gopal, you can't get the values using joins. you will need to make a second call with the result (potentially sorted and paginated on firstCore). Using FQs in the first join call, you can hit the caches in the second call. if you need more details, describe your use case > Join Contrib > > > Key: SOLR-4787 > URL: https://issues.apache.org/jira/browse/SOLR-4787 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.2.1 >Reporter: Joel Bernstein >Priority: Minor > Fix For: 4.8 > > Attachments: SOLR-4787-deadlock-fix.patch, > SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, > SOLR-4797-hjoin-multivaluekeys-trunk.patch > > > This contrib provides a place where different join implementations can be > contributed to Solr. This contrib currently includes 3 join implementations. > The initial patch was generated from the Solr 4.3 tag. Because of changes in > the FieldCache API this patch will only build with Solr 4.2 or above. > *HashSetJoinQParserPlugin aka hjoin* > The hjoin provides a join implementation that filters results in one core > based on the results of a search in another core. This is similar in > functionality to the JoinQParserPlugin but the implementation differs in a > couple of important ways. > The first way is that the hjoin is designed to work with int and long join > keys only. So, in order to use hjoin, int or long join keys must be included > in both the to and from core. > The second difference is that the hjoin builds memory structures that are > used to quickly connect the join keys. So, the hjoin will need more memory > then the JoinQParserPlugin to perform the join. > The main advantage of the hjoin is that it can scale to join millions of keys > between cores and provide sub-second response time. The hjoin should work > well with up to two million results from the fromIndex and tens of millions > of results from the main query. > The hjoin supports the following features: > 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will > turn on the PostFilter. The PostFilter will typically outperform the Lucene > query when the main query results have been narrowed down. > 2) With the lucene query implementation there is an option to build the > filter with threads. This can greatly improve the performance of the query if > the main query index is very large. The "threads" parameter turns on > threading. For example *threads=6* will use 6 threads to build the filter. > This will setup a fixed threadpool with six threads to handle all hjoin > requests. Once the threadpool is created the hjoin will always use it to > build the filter. Threading does not come into play with the PostFilter. > 3) The *size* local parameter can be used to set the initial size of the > hashset used to perform the join. If this is set above the number of results > from the fromIndex then the you can avoid hashset resizing which improves > performance. > 4) Nested filter queries. The local parameter "fq" can be used to nest a > filter query within the join. The nested fq will filter the results of the > join query. This can point to another join to support nested joins. > 5) Full caching support for the lucene query implementation. The filterCache > and queryResultCache should work properly even with deep nesting of joins. > Only the queryResultCache comes into play with the PostFilter implementation > because PostFilters are not cacheable in the filterCache. > The syntax of the hjoin is similar to the JoinQParserPlugin except that the > plugin is referenced by the string "hjoin" rather then "join". > fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 > fq=$qq\}user:customer1&qq=group:5 > The example filter query above will search the fromIndex (collection2) for > "user:customer1" applying the local fq parameter to filter the results. The > lucene filter query will be built using 6 threads. This query will generate a > list of values from the "from" field that will be used to filter the main > query. Only records from the main query, where the "to" field is present in > the "from" list will be included in the results. > The solrconfig.xml in the main query core must contain the reference to the > hjoin. > class="org.apache.solr.joins.HashSetJoinQParserPlugin"/> > And th
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941760#comment-13941760 ] Erick Erickson commented on SOLR-5852: -- Hey Shawn: I tried to apply your patch to a new checkout for trunk and had merge conflicts. It looks like the test code changed. Could you regenerate the patch? Thanks! > Add CloudSolrServer helper method to connect to a ZK ensemble > - > > Key: SOLR-5852 > URL: https://issues.apache.org/jira/browse/SOLR-5852 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker > Attachments: SOLR-5852-SH.patch, SOLR-5852.patch, SOLR-5852_FK.patch, > SOLR-5852_FK.patch > > > We should have a CloudSolrServer constructor which takes a list of ZK servers > to connect to. > Something Like > {noformat} > public CloudSolrServer(String... zkHost); > {noformat} > - Document the current constructor better to mention that to connect to a ZK > ensemble you can pass a comma-delimited list of ZK servers like > zk1:2181,zk2:2181,zk3:2181 > - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter
[ https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941752#comment-13941752 ] Ryan Ernst commented on LUCENE-4984: +1, patch lgtm Is fixing Smart Chinese to not emit punctuation as simple as hardcoding the list of punctuation characters and skipping them in something like incrementWord()? > Fix ThaiWordFilter > -- > > Key: LUCENE-4984 > URL: https://issues.apache.org/jira/browse/LUCENE-4984 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch > > > ThaiWordFilter is an offender in TestRandomChains because it creates > positions and updates offsets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Google Summer of Code
Hi; I want to apply for Google Summer of Code if I can catch up the deadline. I've checked the issues. I want to ask that is there any issue which is labeled for GSoC and has a volunteer mentor but nobody is applied? Because I see that there are comments at some issues which asks about volunteer mentors. If there is any issue I will be appreciated to work on it. Thanks; Furkan KAMACI
[jira] [Commented] (LUCENE-3122) Cascaded grouping
[ https://issues.apache.org/jira/browse/LUCENE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941727#comment-13941727 ] Furkan KAMACI commented on LUCENE-3122: --- [~mikemccand] Could you explain this issue a bit more? > Cascaded grouping > - > > Key: LUCENE-3122 > URL: https://issues.apache.org/jira/browse/LUCENE-3122 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/grouping >Reporter: Michael McCandless > Labels: gsoc2014 > Fix For: 4.8 > > > Similar to SOLR-2526, in that you are grouping on 2 separate fields, but > instead of treating those fields as a single grouping by a compound key, this > change would let you first group on key1 for the primary groups and then > secondarily on key2 within the primary groups. > Ie, the result you get back would have groups A, B, C (grouped by key1) but > then the documents within group A would be grouped by key 2. > I think this will be important for apps whose documents are the product of > denormalizing, ie where the Lucene document is really a sub-document of a > different identifier field. Borrowing an example from LUCENE-3097, you have > doctors but each doctor may have multiple offices (addresses) where they > practice and so you index doctor X address as your lucene documents. In this > case, your "identifier" field (that which "counts" for facets, and should be > "grouped" for presentation) is doctorid. When you offer users search over > this index, you'd likely want to 1) group by distance (ie, < 0.1 miles, < 0.2 > miles, etc., as a function query), but 2) also group by doctorid, ie cascaded > grouping. > I suspect this would be easier to implement than it sounds: the per-group > collector used by the 2nd pass grouping collector for key1's grouping just > needs to be another grouping collector. Spookily, though, that collection > would also have to be 2-pass, so it could get tricky since grouping is sort > of recursing on itself once we have LUCENE-3112, though, that should > enable efficient single pass grouping by the identifier (doctorid). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5889) aaaa
[ https://issues.apache.org/jira/browse/SOLR-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] linxiaohu closed SOLR-5889. --- Resolution: Duplicate > > > > Key: SOLR-5889 > URL: https://issues.apache.org/jira/browse/SOLR-5889 > Project: Solr > Issue Type: Bug >Reporter: linxiaohu > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5889) aaaa
linxiaohu created SOLR-5889: --- Summary: Key: SOLR-5889 URL: https://issues.apache.org/jira/browse/SOLR-5889 Project: Solr Issue Type: Bug Reporter: linxiaohu -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Removal of Scorer.weight
Github user shebiki commented on the pull request: https://github.com/apache/lucene-solr/pull/40#issuecomment-38165595 Mikhail, I have a similar use case and opted for creating the `BooleanScorer2` directly instead of trying to associate each child `Scorer` with the drilldown id. I chose not to use the [QueryWrapper](https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/facet/src/java/org/apache/lucene/facet/search/DrillSideways.java#L352) pattern from `DrillSideways` in 4.6.0 because I felt it would prevent future optimizations and it was no longer in use in 4.7. I didn't consider the idea of just comparing `scorer.getWeight().getQuery()` but it's essentially the same work flow. The reason that I felt that prevented further optimization is that it prevents a `Weight` instance from returning an already created child `Scorer`. For example: * A `BooleanQuery` consisting of just `SHOULD` clauses with `disableCoord` set to `true`. If a segment only has one non-null scorer then `BooleanWeight.scorer()` should be able to return just that child scorer instead of having to wrap it with another. * Introduction of a extra scoring metadata (imagine decorating each score with an additional `boolean`). In this case a composing query (variant of `BooleanQuery`, ``DisjunctionMaxQuery`, etc) would want to aggregated this extra metadata at scoring time. If the metadata has a decent default value then only some of the child `Scorer`s will be able to provide it. If non of the child `Scorer`s provide this metadata then it's calculation can probably be short circuited and it can just return a `BooleanScorer`, `ConjunctionScorer`, or `DisjunctionScorer` as needed. This would be more efficient than making it wrap unconditionally. Quick question about your particular drillsideways query. Do you call `score()`, `freq()`, or something else to ensure the `SHOULD` `Scorer`s are correctly positioned? Do you optimize for when `BooleanQuery` returns a `DisjunctionScorer` and the child `Scorer`s are already positioned? --Terry --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org